Using Large Diabetes Databases for Research.
Wild, Sarah; Fischbacher, Colin; McKnight, John
2016-09-01
There are an increasing number of clinical, administrative and trial databases that can be used for research. These are particularly valuable if there are opportunities for linkage to other databases. This paper describes examples of the use of large diabetes databases for research. It reviews the advantages and disadvantages of using large diabetes databases for research and suggests solutions for some challenges. Large, high-quality databases offer potential sources of information for research at relatively low cost. Fundamental issues for using databases for research are the completeness of capture of cases within the population and time period of interest and accuracy of the diagnosis of diabetes and outcomes of interest. The extent to which people included in the database are representative should be considered if the database is not population based and there is the intention to extrapolate findings to the wider diabetes population. Information on key variables such as date of diagnosis or duration of diabetes may not be available at all, may be inaccurate or may contain a large amount of missing data. Information on key confounding factors is rarely available for the nondiabetic or general population limiting comparisons with the population of people with diabetes. However comparisons that allow for differences in distribution of important demographic factors may be feasible using data for the whole population or a matched cohort study design. In summary, diabetes databases can be used to address important research questions. Understanding the strengths and limitations of this approach is crucial to interpret the findings appropriately. © 2016 Diabetes Technology Society.
Including the Group Quarters Population in the US Synthesized Population Database
Chasteen, Bernadette M.; Wheaton, William D.; Cooley, Philip C.; Ganapathi, Laxminarayana; Wagener, Diane K.
2011-01-01
In 2005, RTI International researchers developed methods to generate synthesized population data on US households for the US Synthesized Population Database. These data are used in agent-based modeling, which simulates large-scale social networks to test how changes in the behaviors of individuals affect the overall network. Group quarters are residences where individuals live in close proximity and interact frequently. Although the Synthesized Population Database represents the population living in households, data for the nation’s group quarters residents are not easily quantified because of US Census Bureau reporting methods designed to protect individuals’ privacy. Including group quarters population data can be an important factor in agent-based modeling because the number of residents and the frequency of their interactions are variables that directly affect modeling results. Particularly with infectious disease modeling, the increased frequency of agent interaction may increase the probability of infectious disease transmission between individuals and the probability of disease outbreaks. This report reviews our methods to synthesize data on group quarters residents to match US Census Bureau data. Our goal in developing the Group Quarters Population Database was to enable its use with RTI’s US Synthesized Population Database in the Modeling of Infectious Diseases Agent Study. PMID:21841972
SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access.
Amigo, Jorge; Salas, Antonio; Phillips, Christopher; Carracedo, Angel
2008-10-10
In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and are widely used as reference sources for population genetics studies. Although many useful conclusions may be extracted by querying databases individually, the lack of flexibility for combining data from within and between each database does not allow the calculation of key population variability statistics. We have developed a novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNPs) in widespread use in human population genetics: SPSmart (SNPs for Population Studies). A fast pipeline creates and maintains a data mart from the most commonly accessed databases of genotypes containing population information: data is mined, summarized into the standard statistical reference indices, and stored into a relational database that currently handles as many as 4 x 10(9) genotypes and that can be easily extended to new database initiatives. We have also built a web interface to the data mart that allows the browsing of underlying data indexed by population and the combining of populations, allowing intuitive and straightforward comparison of population groups. All the information served is optimized for web display, and most of the computations are already pre-processed in the data mart to speed up the data browsing and any computational treatment requested. In practice, SPSmart allows populations to be combined into user-defined groups, while multiple databases can be accessed and compared in a few simple steps from a single query. It performs the queries rapidly and gives straightforward graphical summaries of SNP population variability through visual inspection of allele frequencies outlined in standard pie-chart format. In addition, full numerical description of the data is output in statistical results panels that include common population genetics metrics such as heterozygosity, Fst and In.
Design and deployment of a large brain-image database for clinical and nonclinical research
NASA Astrophysics Data System (ADS)
Yang, Guo Liang; Lim, Choie Cheio Tchoyoson; Banukumar, Narayanaswami; Aziz, Aamer; Hui, Francis; Nowinski, Wieslaw L.
2004-04-01
An efficient database is an essential component of organizing diverse information on image metadata and patient information for research in medical imaging. This paper describes the design, development and deployment of a large database system serving as a brain image repository that can be used across different platforms in various medical researches. It forms the infrastructure that links hospitals and institutions together and shares data among them. The database contains patient-, pathology-, image-, research- and management-specific data. The functionalities of the database system include image uploading, storage, indexing, downloading and sharing as well as database querying and management with security and data anonymization concerns well taken care of. The structure of database is multi-tier client-server architecture with Relational Database Management System, Security Layer, Application Layer and User Interface. Image source adapter has been developed to handle most of the popular image formats. The database has a user interface based on web browsers and is easy to handle. We have used Java programming language for its platform independency and vast function libraries. The brain image database can sort data according to clinically relevant information. This can be effectively used in research from the clinicians" points of view. The database is suitable for validation of algorithms on large population of cases. Medical images for processing could be identified and organized based on information in image metadata. Clinical research in various pathologies can thus be performed with greater efficiency and large image repositories can be managed more effectively. The prototype of the system has been installed in a few hospitals and is working to the satisfaction of the clinicians.
The Odense University Pharmacoepidemiological Database (OPED)
The Odense University Pharmacoepidemiological Database is one of two large prescription registries in Denmark and covers a stable population that is representative of the Danish population as a whole.
Very large database of lipids: rationale and design.
Martin, Seth S; Blaha, Michael J; Toth, Peter P; Joshi, Parag H; McEvoy, John W; Ahmed, Haitham M; Elshazly, Mohamed B; Swiger, Kristopher J; Michos, Erin D; Kwiterovich, Peter O; Kulkarni, Krishnaji R; Chimera, Joseph; Cannon, Christopher P; Blumenthal, Roger S; Jones, Steven R
2013-11-01
Blood lipids have major cardiovascular and public health implications. Lipid-lowering drugs are prescribed based in part on categorization of patients into normal or abnormal lipid metabolism, yet relatively little emphasis has been placed on: (1) the accuracy of current lipid measures used in clinical practice, (2) the reliability of current categorizations of dyslipidemia states, and (3) the relationship of advanced lipid characterization to other cardiovascular disease biomarkers. To these ends, we developed the Very Large Database of Lipids (NCT01698489), an ongoing database protocol that harnesses deidentified data from the daily operations of a commercial lipid laboratory. The database includes individuals who were referred for clinical purposes for a Vertical Auto Profile (Atherotech Inc., Birmingham, AL), which directly measures cholesterol concentrations of low-density lipoprotein, very low-density lipoprotein, intermediate-density lipoprotein, high-density lipoprotein, their subclasses, and lipoprotein(a). Individual Very Large Database of Lipids studies, ranging from studies of measurement accuracy, to dyslipidemia categorization, to biomarker associations, to characterization of rare lipid disorders, are investigator-initiated and utilize peer-reviewed statistical analysis plans to address a priori hypotheses/aims. In the first database harvest (Very Large Database of Lipids 1.0) from 2009 to 2011, there were 1 340 614 adult and 10 294 pediatric patients; the adult sample had a median age of 59 years (interquartile range, 49-70 years) with even representation by sex. Lipid distributions closely matched those from the population-representative National Health and Nutrition Examination Survey. The second harvest of the database (Very Large Database of Lipids 2.0) is underway. Overall, the Very Large Database of Lipids database provides an opportunity for collaboration and new knowledge generation through careful examination of granular lipid data on a large scale. © 2013 Wiley Periodicals, Inc.
Zhivotovsky, Lev A; Malyarchuk, Boris A; Derenko, Miroslava V; Wozniak, Marcin; Grzybowski, Tomasz
2009-09-01
Developing a forensic DNA database on a population that consists of local ethnic groups separated by physical and cultural barriers is questionable as it can be genetically subdivided. On the other side, small sizes of ethnic groups, especially in alpine regions where they are sub-structured further into small villages, prevent collecting a large sample from each ethnic group. For such situations, we suggest to obtain both a total population database on allele frequencies across ethnic groups and a list of theta-values between the groups and the total data. We have genotyped 558 individuals from the native population of South Siberia, consisting of nine ethnic groups, at 17 autosomal STR loci of the kit packages AmpFlSTR SGM Plus i, Cyrillic AmpFlSTR Profiler Plus. The groups differentiate from each other with average theta-values of around 1.1%, and some reach up to three to four percent at certain loci. There exists between-village differentiation as well. Therefore, a database for the population of South Siberia is composed of data on allele frequencies in the pool of ethnic groups and data on theta-values that indicate variation in allele frequencies across the groups. Comparison to additional data on northeastern Asia (the Chukchi and Koryak) shows that differentiation in allele frequencies among small groups that are separated by large geographic distance can be even greater. In contrast, populations of Russians that live in large cities of the European part of Russia are homogeneous in allele frequencies, despite large geographic distance between them, and thus can be described by a database on allele frequencies alone, without any specific information on theta-values.
Le Huec, Jean Charles; Hasegawa, Kazuhiro
2016-11-01
Sagittal balance analysis has gained importance and the measure of the radiographic spinopelvic parameters is now a routine part of many interventions of spine surgery. Indeed, surgical correction of lumbar lordosis must be proportional to the pelvic incidence (PI). The compensatory mechanisms [pelvic retroversion with increased pelvic tilt (PT) and decreased thoracic kyphosis] spontaneously reverse after successful surgery. This study is the first to provide 3D standing spinopelvic reference values from a large database of Caucasian (n = 137) and Japanese (n = 131) asymptomatic subjects. The key spinopelvic parameters [e.g., PI, PT, sacral slope (SS)] were comparable in Japanese and Caucasian populations. Three equations, namely lumbar lordosis based on PI, PT based on PI and SS based on PI, were calculated after linear regression modeling and were comparable in both populations: lumbar lordosis (L1-S1) = 0.54*PI + 27.6, PT = 0.44*PI - 11.4 and SS = 0.54*PI + 11.90. We showed that the key spinopelvic parameters obtained from a large database of healthy subjects were comparable for Causasian and Japanese populations. The normative values provided in this study and the equations obtained after linear regression modeling could help to estimate pre-operatively the lumbar lordosis restoration and could be also used as guidelines for spinopelvic sagittal balance.
WheatGenome.info: an integrated database and portal for wheat genome information.
Lai, Kaitao; Berkman, Paul J; Lorenc, Michal Tadeusz; Duran, Chris; Smits, Lars; Manoli, Sahana; Stiller, Jiri; Edwards, David
2012-02-01
Bread wheat (Triticum aestivum) is one of the most important crop plants, globally providing staple food for a large proportion of the human population. However, improvement of this crop has been limited due to its large and complex genome. Advances in genomics are supporting wheat crop improvement. We provide a variety of web-based systems hosting wheat genome and genomic data to support wheat research and crop improvement. WheatGenome.info is an integrated database resource which includes multiple web-based applications. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second-generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This system includes links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/.
Daniell, Nathan; Fraysse, François; Paul, Gunther
2012-01-01
Anthropometry has long been used for a range of ergonomic applications & product design. Although products are often designed for specific cohorts, anthropometric data are typically sourced from large scale surveys representative of the general population. Additionally, few data are available for emerging markets like China and India. This study measured 80 Chinese males that were representative of a specific cohort targeted for the design of a new product. Thirteen anthropometric measurements were recorded and compared to two large databases that represented a general population, a Chinese database and a Western database. Substantial differences were identified between the Chinese males measured in this study and both databases. The subjects were substantially taller, heavier and broader than subjects in the older Chinese database. However, they were still substantially smaller, lighter and thinner than Western males. Data from current Western anthropometric surveys are unlikely to accurately represent the target population for product designers and manufacturers in emerging markets like China.
Williams, Christopher R; Brooke, Benjamin S
2017-10-01
Patient outcomes after open abdominal aortic aneurysm and endovascular aortic aneurysm repair have been widely reported from several large, randomized, controlled trials. It is not clear whether these trial outcomes are representative of abdominal aortic aneurysm repair procedures performed in real-world hospital settings across the United States. This study was designed to evaluate population-based outcomes after endovascular aortic aneurysm repair versus open abdominal aortic aneurysm repair using statewide inpatient databases and examine how they have helped improve our understanding of abdominal aortic aneurysm repair. A systematic search of MEDLINE, EMBASE, and CINAHL databases was performed to identify articles comparing endovascular aortic aneurysm repair and open abdominal aortic aneurysm repair using data from statewide inpatient databases. This search was limited to studies published in the English language after 1990, and abstracts were screened and abstracted by 2 authors. Our search yielded 17 studies published between 2004 and 2016 that used data from 29 different statewide inpatient databases to compare endovascular aortic aneurysm repair versus open abdominal aortic aneurysm repair. These studies support the randomized, controlled trial results, including a lower mortality associated with endovascular aortic aneurysm repair extended from the perioperative period up to 3 years after operation, as well as a higher complication rate after endovascular aortic aneurysm repair. The evidence from statewide inpatient database analyses has also elucidated trends in procedure volume, patient case mix, volume-outcome relationships, and health care disparities associated with endovascular aortic aneurysm repair versus open abdominal aortic aneurysm repair. Population analyses of endovascular aortic aneurysm repair and open abdominal aortic aneurysm repair using statewide inpatient databases have confirmed short- and long-term mortality outcomes obtained from large, randomized, controlled trials. Moreover, these analyses have allowed us to assess the effect of endovascular aortic aneurysm repair adoption on population outcomes and patient case mix over time. Published by Elsevier Inc.
Kennedy, Amy E.; Khoury, Muin J.; Ioannidis, John P.A.; Brotzman, Michelle; Miller, Amy; Lane, Crystal; Lai, Gabriel Y.; Rogers, Scott D.; Harvey, Chinonye; Elena, Joanne W.; Seminara, Daniela
2017-01-01
Background We report on the establishment of a web-based Cancer Epidemiology Descriptive Cohort Database (CEDCD). The CEDCD’s goals are to enhance awareness of resources, facilitate interdisciplinary research collaborations, and support existing cohorts for the study of cancer-related outcomes. Methods Comprehensive descriptive data were collected from large cohorts established to study cancer as primary outcome using a newly developed questionnaire. These included an inventory of baseline and follow-up data, biospecimens, genomics, policies, and protocols. Additional descriptive data extracted from publicly available sources were also collected. This information was entered in a searchable and publicly accessible database. We summarized the descriptive data across cohorts and reported the characteristics of this resource. Results As of December 2015, the CEDCD includes data from 46 cohorts representing more than 6.5 million individuals (29% ethnic/racial minorities). Overall, 78% of the cohorts have collected blood at least once, 57% at multiple time points, and 46% collected tissue samples. Genotyping has been performed by 67% of the cohorts, while 46% have performed whole-genome or exome sequencing in subsets of enrolled individuals. Information on medical conditions other than cancer has been collected in more than 50% of the cohorts. More than 600,000 incident cancer cases and more than 40,000 prevalent cases are reported, with 24 cancer sites represented. Conclusions The CEDCD assembles detailed descriptive information on a large number of cancer cohorts in a searchable database. Impact Information from the CEDCD may assist the interdisciplinary research community by facilitating identification of well-established population resources and large-scale collaborative and integrative research. PMID:27439404
Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes
Amigo, Jorge; Phillips, Christopher; Salas, Antonio; Carracedo, Ángel
2009-01-01
Background Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. Results To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. Conclusion The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest. PMID:19344481
Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes.
Amigo, Jorge; Phillips, Christopher; Salas, Antonio; Carracedo, Angel
2009-03-19
Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jochem, Warren C; Sims, Kelly M; Bright, Eddie A
In recent years, uses of high-resolution population distribution databases are increasing steadily for environmental, socioeconomic, public health, and disaster-related research and operations. With the development of daytime population distribution, temporal resolution of such databases has been improved. However, the lack of incorporation of transitional population, namely business and leisure travelers, leaves a significant population unaccounted for within the critical infrastructure networks, such as at transportation hubs. This paper presents two general methodologies for estimating passenger populations in airport and cruise port terminals at a high temporal resolution which can be incorporated into existing population distribution models. The methodologies are geographicallymore » scalable and are based on, and demonstrate how, two different transportation hubs with disparate temporal population dynamics can be modeled utilizing publicly available databases including novel data sources of flight activity from the Internet which are updated in near-real time. The airport population estimation model shows great potential for rapid implementation for a large collection of airports on a national scale, and the results suggest reasonable accuracy in the estimated passenger traffic. By incorporating population dynamics at high temporal resolutions into population distribution models, we hope to improve the estimates of populations exposed to or at risk to disasters, thereby improving emergency planning and response, and leading to more informed policy decisions.« less
Pemberton, T J; Jakobsson, M; Conrad, D F; Coop, G; Wall, J D; Pritchard, J K; Patel, P I; Rosenberg, N A
2008-07-01
When performing association studies in populations that have not been the focus of large-scale investigations of haplotype variation, it is often helpful to rely on genomic databases in other populations for study design and analysis - such as in the selection of tag SNPs and in the imputation of missing genotypes. One way of improving the use of these databases is to rely on a mixture of database samples that is similar to the population of interest, rather than using the single most similar database sample. We demonstrate the effectiveness of the mixture approach in the application of African, European, and East Asian HapMap samples for tag SNP selection in populations from India, a genetically intermediate region underrepresented in genomic studies of haplotype variation.
Kennedy, Amy E; Khoury, Muin J; Ioannidis, John P A; Brotzman, Michelle; Miller, Amy; Lane, Crystal; Lai, Gabriel Y; Rogers, Scott D; Harvey, Chinonye; Elena, Joanne W; Seminara, Daniela
2016-10-01
We report on the establishment of a web-based Cancer Epidemiology Descriptive Cohort Database (CEDCD). The CEDCD's goals are to enhance awareness of resources, facilitate interdisciplinary research collaborations, and support existing cohorts for the study of cancer-related outcomes. Comprehensive descriptive data were collected from large cohorts established to study cancer as primary outcome using a newly developed questionnaire. These included an inventory of baseline and follow-up data, biospecimens, genomics, policies, and protocols. Additional descriptive data extracted from publicly available sources were also collected. This information was entered in a searchable and publicly accessible database. We summarized the descriptive data across cohorts and reported the characteristics of this resource. As of December 2015, the CEDCD includes data from 46 cohorts representing more than 6.5 million individuals (29% ethnic/racial minorities). Overall, 78% of the cohorts have collected blood at least once, 57% at multiple time points, and 46% collected tissue samples. Genotyping has been performed by 67% of the cohorts, while 46% have performed whole-genome or exome sequencing in subsets of enrolled individuals. Information on medical conditions other than cancer has been collected in more than 50% of the cohorts. More than 600,000 incident cancer cases and more than 40,000 prevalent cases are reported, with 24 cancer sites represented. The CEDCD assembles detailed descriptive information on a large number of cancer cohorts in a searchable database. Information from the CEDCD may assist the interdisciplinary research community by facilitating identification of well-established population resources and large-scale collaborative and integrative research. Cancer Epidemiol Biomarkers Prev; 25(10); 1392-401. ©2016 AACR. ©2016 American Association for Cancer Research.
Existing data sources for clinical epidemiology: The North Denmark Bacteremia Research Database
Schønheyder, Henrik C; Søgaard, Mette
2010-01-01
Bacteremia is associated with high morbidity and mortality. Improving prevention and treatment requires better knowledge of the disease and its prognosis. However, in order to study the entire spectrum of bacteremia patients, we need valid sources of information, prospective data collection, and complete follow-up. In North Denmark Region, all patients diagnosed with bacteremia have been registered in a population-based database since 1981. The information has been recorded prospectively since 1992 and the main variables are: the patient’s unique civil registration number, date of sampling the first positive blood culture, date of admission, clinical department, date of notification of growth, place of acquisition, focus of infection, microbiological species, antibiogram, and empirical antimicrobial treatment. During the time from 1981 to 2008, information on 22,556 cases of bacteremia has been recorded. The civil registration number makes it possible to link the database to other medical databases and thereby build large cohorts with detailed longitudinal data that include hospital histories since 1977, comorbidity data, and complete follow-up of survival. The database is suited for epidemiological research and, presently, approximately 60 studies have been published. Other Danish departments of clinical microbiology have recently started to record the same information and a population base of 2.3 million will be available for future studies. PMID:20865114
NASA Astrophysics Data System (ADS)
Paprotny, Dominik; Morales-Nápoles, Oswaldo; Jonkman, Sebastiaan N.
2018-03-01
The influence of social and economic change on the consequences of natural hazards has been a matter of much interest recently. However, there is a lack of comprehensive, high-resolution data on historical changes in land use, population, or assets available to study this topic. Here, we present the Historical Analysis of Natural Hazards in Europe (HANZE) database, which contains two parts: (1) HANZE-Exposure with maps for 37 countries and territories from 1870 to 2020 in 100 m resolution and (2) HANZE-Events, a compilation of past disasters with information on dates, locations, and losses, currently limited to floods only. The database was constructed using high-resolution maps of present land use and population, a large compilation of historical statistics, and relatively simple disaggregation techniques and rule-based land use reallocation schemes. Data encompassed in HANZE allow one to "normalize" information on losses due to natural hazards by taking into account inflation as well as changes in population, production, and wealth. This database of past events currently contains 1564 records (1870-2016) of flash, river, coastal, and compound floods. The HANZE database is freely available at https://data.4tu.nl/repository/collection:HANZE.
Geographic differences in allele frequencies of susceptibility SNPs for cardiovascular disease
2011-01-01
Background We hypothesized that the frequencies of risk alleles of SNPs mediating susceptibility to cardiovascular diseases differ among populations of varying geographic origin and that population-specific selection has operated on some of these variants. Methods From the database of genome-wide association studies (GWAS), we selected 36 cardiovascular phenotypes including coronary heart disease, hypertension, and stroke, as well as related quantitative traits (eg, body mass index and plasma lipid levels). We identified 292 SNPs in 270 genes associated with a disease or trait at P < 5 × 10-8. As part of the Human Genome-Diversity Project (HGDP), 158 (54.1%) of these SNPs have been genotyped in 938 individuals belonging to 52 populations from seven geographic areas. A measure of population differentiation, FST, was calculated to quantify differences in risk allele frequencies (RAFs) among populations and geographic areas. Results Large differences in RAFs were noted in populations of Africa, East Asia, America and Oceania, when compared with other geographic regions. The mean global FST (0.1042) for 158 SNPs among the populations was not significantly higher than the mean global FST of 158 autosomal SNPs randomly sampled from the HGDP database. Significantly higher global FST (P < 0.05) was noted in eight SNPs, based on an empirical distribution of global FST of 2036 putatively neutral SNPs. For four of these SNPs, additional evidence of selection was noted based on the integrated Haplotype Score. Conclusion Large differences in RAFs for a set of common SNPs that influence risk of cardiovascular disease were noted between the major world populations. Pairwise comparisons revealed RAF differences for at least eight SNPs that might be due to population-specific selection or demographic factors. These findings are relevant to a better understanding of geographic variation in the prevalence of cardiovascular disease. PMID:21507254
VCGDB: a dynamic genome database of the Chinese population
2014-01-01
Background The data released by the 1000 Genomes Project contain an increasing number of genome sequences from different nations and populations with a large number of genetic variations. As a result, the focus of human genome studies is changing from single and static to complex and dynamic. The currently available human reference genome (GRCh37) is based on sequencing data from 13 anonymous Caucasian volunteers, which might limit the scope of genomics, transcriptomics, epigenetics, and genome wide association studies. Description We used the massive amount of sequencing data published by the 1000 Genomes Project Consortium to construct the Virtual Chinese Genome Database (VCGDB), a dynamic genome database of the Chinese population based on the whole genome sequencing data of 194 individuals. VCGDB provides dynamic genomic information, which contains 35 million single nucleotide variations (SNVs), 0.5 million insertions/deletions (indels), and 29 million rare variations, together with genomic annotation information. VCGDB also provides a highly interactive user-friendly virtual Chinese genome browser (VCGBrowser) with functions like seamless zooming and real-time searching. In addition, we have established three population-specific consensus Chinese reference genomes that are compatible with mainstream alignment software. Conclusions VCGDB offers a feasible strategy for processing big data to keep pace with the biological data explosion by providing a robust resource for genomics studies; in particular, studies aimed at finding regions of the genome associated with diseases. PMID:24708222
Big data and ophthalmic research.
Clark, Antony; Ng, Jonathon Q; Morlet, Nigel; Semmens, James B
2016-01-01
Large population-based health administrative databases, clinical registries, and data linkage systems are a rapidly expanding resource for health research. Ophthalmic research has benefited from the use of these databases in expanding the breadth of knowledge in areas such as disease surveillance, disease etiology, health services utilization, and health outcomes. Furthermore, the quantity of data available for research has increased exponentially in recent times, particularly as e-health initiatives come online in health systems across the globe. We review some big data concepts, the databases and data linkage systems used in eye research-including their advantages and limitations, the types of studies previously undertaken, and the future direction for big data in eye research. Copyright © 2016 Elsevier Inc. All rights reserved.
GIS model for identifying urban areas vulnerable to noise pollution: case study
NASA Astrophysics Data System (ADS)
Bilaşco, Ştefan; Govor, Corina; Roşca, Sanda; Vescan, Iuliu; Filip, Sorin; Fodorean, Ioan
2017-04-01
The unprecedented expansion of the national car ownership over the last few years has been determined by economic growth and the need for the population and economic agents to reduce travel time in progressively expanding large urban centres. This has led to an increase in the level of road noise and a stronger impact on the quality of the environment. Noise pollution generated by means of transport represents one of the most important types of pollution with negative effects on a population's health in large urban areas. As a consequence, tolerable limits of sound intensity for the comfort of inhabitants have been determined worldwide and the generation of sound maps has been made compulsory in order to identify the vulnerable zones and to make recommendations how to decrease the negative impact on humans. In this context, the present study aims at presenting a GIS spatial analysis model-based methodology for identifying and mapping zones vulnerable to noise pollution. The developed GIS model is based on the analysis of all the components influencing sound propagation, represented as vector databases (points of sound intensity measurements, buildings, lands use, transport infrastructure), raster databases (DEM), and numerical databases (wind direction and speed, sound intensity). Secondly, the hourly changes (for representative hours) were analysed to identify the hotspots characterised by major traffic flows specific to rush hours. The validated results of the model are represented by GIS databases and useful maps for the local public administration to use as a source of information and in the process of making decisions.
Park, Moon Soo; Ju, Young-Su; Moon, Seong-Hwan; Kim, Tae-Hwan; Oh, Jae Keun; Makhni, Melvin C; Riew, K Daniel
2016-10-15
National population-based cohort study. To compare the reoperation rates between cervical spondylotic radiculopathy and myelopathy in a national population of patients. There is an inherently low incidence of reoperation after surgery for cervical degenerative disease. Therefore, it is difficult to sufficiently power studies to detect differences between reoperation rates of different cervical diagnoses. National population-based databases provide large, longitudinally followed cohorts that may help overcome this challenge. We used the Korean Health Insurance Review and Assessment Service national database to select our study population. We included patients with the diagnosis of cervical spondylotic radiculopathy or myelopathy who underwent anterior cervical discectomy and fusion from January 2009 to June 2014. We separated patients into two groups based on diagnosis codes: cervical spondylotic radiculopathy or cervical spondylotic myelopathy. Age, sex, presence of diabetes, osteoporosis, associated comorbidities, number of operated cervical disc levels, and hospital types were considered potential confounding factors. The overall reoperation rate was 2.45%. The reoperation rate was significantly higher in patients with cervical spondylotic myelopathy than in patients with cervical radiculopathy (myelopathy: P = 0.0293, hazard ratio = 1.433, 95% confidence interval 1.037-1.981). Male sex, presence of diabetes or associated comorbidities, and hospital type were noted to be risk factors for reoperation. The reoperation rate after anterior cervical discectomy and fusion was higher for cervical spondylotic myelopathy than for cervical spondylotic radiculopathy in a national population of patients. 3.
Yang, Xiaohuan; Huang, Yaohuan; Dong, Pinliang; Jiang, Dong; Liu, Honghui
2009-01-01
The spatial distribution of population is closely related to land use and land cover (LULC) patterns on both regional and global scales. Population can be redistributed onto geo-referenced square grids according to this relation. In the past decades, various approaches to monitoring LULC using remote sensing and Geographic Information Systems (GIS) have been developed, which makes it possible for efficient updating of geo-referenced population data. A Spatial Population Updating System (SPUS) is developed for updating the gridded population database of China based on remote sensing, GIS and spatial database technologies, with a spatial resolution of 1 km by 1 km. The SPUS can process standard Moderate Resolution Imaging Spectroradiometer (MODIS L1B) data integrated with a Pattern Decomposition Method (PDM) and an LULC-Conversion Model to obtain patterns of land use and land cover, and provide input parameters for a Population Spatialization Model (PSM). The PSM embedded in SPUS is used for generating 1 km by 1 km gridded population data in each population distribution region based on natural and socio-economic variables. Validation results from finer township-level census data of Yishui County suggest that the gridded population database produced by the SPUS is reliable.
Kab, Sofiane; Moisan, Frédéric; Preux, Pierre-Marie; Marin, Benoît; Elbaz, Alexis
2017-08-01
There are no estimates of the nationwide incidence of motor neuron disease (MND) in France. We used the French health insurance information system to identify incident MND cases (2012-2014), and compared incidence figures to those from three external sources. We identified incident MND cases (2012-2014) based on three data sources (riluzole claims, hospitalisation records, long-term chronic disease benefits), and computed MND incidence by age, gender, and geographic region. We used French mortality statistics, Limousin ALS registry data, and previous European studies based on administrative databases to perform external comparisons. We identified 6553 MND incident cases. After standardisation to the United States 2010 population, the age/gender-standardised incidence was 2.72/100,000 person-years (males, 3.37; females, 2.17; male:female ratio = 1.53, 95% CI1.46-1.61). There was no major spatial difference in MND distribution. Our data were in agreement with the French death database (standardised mortality ratio = 1.01, 95% CI = 0.96-1.06) and Limousin ALS registry (standardised incidence ratio = 0.92, 95% CI = 0.72-1.15). Incidence estimates were in the same range as those from previous studies. We report French nationwide incidence estimates of MND. Administrative databases including hospital discharge data and riluzole claims offer an interesting approach to identify large population-based samples of patients with MND for epidemiologic studies and surveillance.
Database extraction strategies for low-template evidence.
Bleka, Øyvind; Dørum, Guro; Haned, Hinda; Gill, Peter
2014-03-01
Often in forensic cases, the profile of at least one of the contributors to a DNA evidence sample is unknown and a database search is needed to discover possible perpetrators. In this article we consider two types of search strategies to extract suspects from a database using methods based on probability arguments. The performance of the proposed match scores is demonstrated by carrying out a study of each match score relative to the level of allele drop-out in the crime sample, simulating low-template DNA. The efficiency was measured by random man simulation and we compared the performance using the SGM Plus kit and the ESX 17 kit for the Norwegian population, demonstrating that the latter has greatly enhanced power to discover perpetrators of crime in large national DNA databases. The code for the database extraction strategies will be prepared for release in the R-package forensim. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Freire, Sergio Miranda; Teodoro, Douglas; Wei-Kleiner, Fang; Sundvall, Erik; Karlsson, Daniel; Lambrix, Patrick
2016-01-01
This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest. PMID:26958859
Freire, Sergio Miranda; Teodoro, Douglas; Wei-Kleiner, Fang; Sundvall, Erik; Karlsson, Daniel; Lambrix, Patrick
2016-01-01
This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest.
Clusters of genetic diseases in Brazil.
Cardoso, Gabriela Costa; de Oliveira, Marcelo Zagonel; Paixão-Côrtes, Vanessa Rodrigues; Castilla, Eduardo Enrique; Schuler-Faccini, Lavínia
2018-06-02
The aim of this paper is to present a database of isolated communities (CENISO) with high prevalence of genetic disorders or congenital anomalies in Brazil. We used two strategies to identify such communities: (1) a systematic literature review and (2) a "rumor strategy" based on anecdotal accounts. All rumors and reports were validated in a stepwise process. The bibliographical search identified 34 rumors and 245 rumors through the rumor strategy, and 144 were confirmed. A database like this one presented here represents an important tool for the planning of health priorities for rare diseases in low- and middle-income countries with large populations.
Nishio, Shin-Ya; Usami, Shin-Ichi
2017-03-01
Recent advances in next-generation sequencing (NGS) have given rise to new challenges due to the difficulties in variant pathogenicity interpretation and large dataset management, including many kinds of public population databases as well as public or commercial disease-specific databases. Here, we report a new database development tool, named the "Clinical NGS Database," for improving clinical NGS workflow through the unified management of variant information and clinical information. This database software offers a two-feature approach to variant pathogenicity classification. The first of these approaches is a phenotype similarity-based approach. This database allows the easy comparison of the detailed phenotype of each patient with the average phenotype of the same gene mutation at the variant or gene level. It is also possible to browse patients with the same gene mutation quickly. The other approach is a statistical approach to variant pathogenicity classification based on the use of the odds ratio for comparisons between the case and the control for each inheritance mode (families with apparently autosomal dominant inheritance vs. control, and families with apparently autosomal recessive inheritance vs. control). A number of case studies are also presented to illustrate the utility of this database. © 2016 The Authors. **Human Mutation published by Wiley Periodicals, Inc.
Risson, Valery; Ghodge, Bhaskar; Bonzani, Ian C; Korn, Jonathan R; Medin, Jennie; Saraykar, Tanmay; Sengupta, Souvik; Saini, Deepanshu; Olson, Melvin
2016-09-22
An enormous amount of information relevant to public health is being generated directly by online communities. To explore the feasibility of creating a dataset that links patient-reported outcomes data, from a Web-based survey of US patients with multiple sclerosis (MS) recruited on open Internet platforms, to health care utilization information from health care claims databases. The dataset was generated by linkage analysis to a broader MS population in the United States using both pharmacy and medical claims data sources. US Facebook users with an interest in MS were alerted to a patient-reported survey by targeted advertisements. Eligibility criteria were diagnosis of MS by a specialist (primary progressive, relapsing-remitting, or secondary progressive), ≥12-month history of disease, age 18-65 years, and commercial health insurance. Participants completed a questionnaire including data on demographic and disease characteristics, current and earlier therapies, relapses, disability, health-related quality of life, and employment status and productivity. A unique anonymous profile was generated for each survey respondent. Each anonymous profile was linked to a number of medical and pharmacy claims datasets in the United States. Linkage rates were assessed and survey respondents' representativeness was evaluated based on differences in the distribution of characteristics between the linked survey population and the general MS population in the claims databases. The advertisement was placed on 1,063,973 Facebook users' pages generating 68,674 clicks, 3719 survey attempts, and 651 successfully completed surveys, of which 440 could be linked to any of the claims databases for 2014 or 2015 (67.6% linkage rate). Overall, no significant differences were found between patients who were linked and not linked for educational status, ethnicity, current or prior disease-modifying therapy (DMT) treatment, or presence of a relapse in the last 12 months. The frequencies of the most common MS symptoms did not differ significantly between linked patients and the general MS population in the databases. Linked patients were slightly younger and less likely to be men than those who were not linkable. Linking patient-reported outcomes data, from a Web-based survey of US patients with MS recruited on open Internet platforms, to health care utilization information from claims databases may enable rapid generation of a large population of representative patients with MS suitable for outcomes analysis.
Differences in Antipsychotic-Related Adverse Events in Adult, Pediatric, and Geriatric Populations.
Sagreiya, Hersh; Chen, Yi-Ren; Kumarasamy, Narmadan A; Ponnusamy, Karthik; Chen, Doris; Das, Amar K
2017-02-26
In recent years, antipsychotic medications have increasingly been used in pediatric and geriatric populations, despite the fact that many of these drugs were approved based on clinical trials in adult patients only. Preliminary studies have shown that the "off-label" use of these drugs in pediatric and geriatric populations may result in adverse events not found in adults. In this study, we utilized the large-scale U.S. Food and Drug Administration (FDA) Adverse Events Reporting System (AERS) database to look at differences in adverse events from antipsychotics among adult, pediatric, and geriatric populations. We performed a systematic analysis of the FDA AERS database using MySQL by standardizing the database using structured terminologies and ontologies. We compared adverse event profiles of atypical versus typical antipsychotic medications among adult (18-65), pediatric (age < 18), and geriatric (> 65) populations. We found statistically significant differences between the number of adverse events in the pediatric versus adult populations with aripiprazole, clozapine, fluphenazine, haloperidol, olanzapine, quetiapine, risperidone, and thiothixene, and between the geriatric versus adult populations with aripiprazole, chlorpromazine, clozapine, fluphenazine, haloperidol, paliperidone, promazine, risperidone, thiothixene, and ziprasidone (p < 0.05, with adjustment for multiple comparisons). Furthermore, the particular types of adverse events reported also varied significantly between each population for aripiprazole, clozapine, haloperidol, olanzapine, quetiapine, risperidone, and ziprasidone (Chi-square, p < 10 -6 ). Diabetes was the most commonly reported side effect in the adult population, compared to behavioral problems in the pediatric population and neurologic symptoms in the geriatric population. We also found discrepancies between the frequencies of reports in AERS and in the literature. Our analysis of the FDA AERS database shows that there are significant differences in both the numbers and types of adverse events among these age groups and between atypical and typical antipsychotics. It is important for clinicians to be mindful of these differences when prescribing antipsychotics, especially when prescribing medications off-label.
GPU-Based Point Cloud Superpositioning for Structural Comparisons of Protein Binding Sites.
Leinweber, Matthias; Fober, Thomas; Freisleben, Bernd
2018-01-01
In this paper, we present a novel approach to solve the labeled point cloud superpositioning problem for performing structural comparisons of protein binding sites. The solution is based on a parallel evolution strategy that operates on large populations and runs on GPU hardware. The proposed evolution strategy reduces the likelihood of getting stuck in a local optimum of the multimodal real-valued optimization problem represented by labeled point cloud superpositioning. The performance of the GPU-based parallel evolution strategy is compared to a previously proposed CPU-based sequential approach for labeled point cloud superpositioning, indicating that the GPU-based parallel evolution strategy leads to qualitatively better results and significantly shorter runtimes, with speed improvements of up to a factor of 1,500 for large populations. Binary classification tests based on the ATP, NADH, and FAD protein subsets of CavBase, a database containing putative binding sites, show average classification rate improvements from about 92 percent (CPU) to 96 percent (GPU). Further experiments indicate that the proposed GPU-based labeled point cloud superpositioning approach can be superior to traditional protein comparison approaches based on sequence alignments.
Bustillo, Jorge L; Diaz, Jose D; Pacheco, Idarmes C; Gritz, David C
2015-03-01
Serological studies indicate that rates of ocular toxoplasmosis (OT) vary geographically, with higher rates in tropical regions. Little is known about population-based rates of active OT. We aimed to describe the epidemiology of OT in Central Cuba. This large-population, cross-sectional cohort study used a prospective database at a large regional referral centre in Central Cuba. The patient database was searched for all patients who presented with OT during the 12-month study period from 1 April 2011 to 31 March 2012. Inclusion criteria were the clinical diagnosis of OT, characterised by focal retinochoroidal inflammation and a response to therapy as expected. Gender-stratified and age-stratified study population data from the 2012 Cuban Census were used to calculate incidence rates and prevalence ratios. Among 279 identified patients with OT, 158 presented with active OT. Of these, 122 new-onset and 36 prior-onset cases were confirmed. Based on the total population in the Sancti Spiritus province (466,106 persons), the overall incidence of active OT was 26.2 per 100,000 person-years (95% CI 21.7 to 31.3) with an annual prevalence ratio of 33.9 per 100,000 persons (95% CI 28.8 to 39.6). The incidence of active OT was lowest in the oldest age group and highest in patients aged 25-44 years (4.5 and 42.1 per 100,000 person-years, respectively). This first report describing population-based rates of OT in the Cuban population highlights the importance of patient age as a likely risk factor for OT. Disease rates were found to be highest in females and young to middle-aged adults. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Surgical treatment of malrotation after infancy: a population-based study.
Malek, Marcus M; Burd, Randall S
2005-01-01
Because malrotation most commonly presents in infants, treatment recommendations for older children (>1 year) have been based on data obtained from small case series. The purpose of this study was to use a large national database to determine the clinical significance of older children presenting with malrotation to develop treatment recommendations for this group. Records of children undergoing a Ladd's procedure were identified in the Kids' Inpatient Database, an administrative database that contains all pediatric discharges from 27 states during 2000. Patient characteristics, associated diagnoses, operations performed, and mortality were evaluated. Discharge weighting was used to obtain a national estimate of the number of children older than 1 year treated for malrotation. Two hundred nineteen older children (>1 and <18 years) undergoing a Ladd's procedure were identified in the database. One hundred sixty-four (75%) of these patients were admitted for treatment of malrotation, whereas most of the remaining 55 patients (25%) were admitted for another diagnosis and underwent a Ladd's procedure incidental to another abdominal operation. Seventy-five patients underwent a Ladd's procedure during an emergency admission. Thirty-one patients had volvulus or intestinal ischemia, 7 underwent intestinal resection, and 1 patient died. Based on case weightings, it was estimated that 362 older children underwent a Ladd's procedure for symptoms related to malrotation in 2000 in the United States (5.3 cases per million population). These findings provide support for performing a Ladd's procedure in older children with incidentally found malrotation to prevent the rare but potentially devastating complications of this anomaly.
Brandstätter, Anita; Peterson, Christine T; Irwin, Jodi A; Mpoke, Solomon; Koech, Davy K; Parson, Walther; Parsons, Thomas J
2004-10-01
Large forensic mtDNA databases which adhere to strict guidelines for generation and maintenance, are not available for many populations outside of the United States and western Europe. We have established a high quality mtDNA control region sequence database for urban Nairobi as both a reference database for forensic investigations, and as a tool to examine the genetic variation of Kenyan sequences in the context of known African variation. The Nairobi sequences exhibited high variation and a low random match probability, indicating utility for forensic testing. Haplogroup identification and frequencies were compared with those reported from other published studies on African, or African-origin populations from Mozambique, Sierra Leone, and the United States, and suggest significant differences in the mtDNA compositions of the various populations. The quality of the sequence data in our study was investigated and supported using phylogenetic measures. Our data demonstrate the diversity and distinctiveness of African populations, and underline the importance of establishing additional forensic mtDNA databases of indigenous African populations.
Osteoporosis therapies: evidence from health-care databases and observational population studies.
Silverman, Stuart L
2010-11-01
Osteoporosis is a well-recognized disease with severe consequences if left untreated. Randomized controlled trials are the most rigorous method for determining the efficacy and safety of therapies. Nevertheless, randomized controlled trials underrepresent the real-world patient population and are costly in both time and money. Modern technology has enabled researchers to use information gathered from large health-care or medical-claims databases to assess the practical utilization of available therapies in appropriate patients. Observational database studies lack randomization but, if carefully designed and successfully completed, can provide valuable information that complements results obtained from randomized controlled trials and extends our knowledge to real-world clinical patients. Randomized controlled trials comparing fracture outcomes among osteoporosis therapies are difficult to perform. In this regard, large observational database studies could be useful in identifying clinically important differences among therapeutic options. Database studies can also provide important information with regard to osteoporosis prevalence, health economics, and compliance and persistence with treatment. This article describes the strengths and limitations of both randomized controlled trials and observational database studies, discusses considerations for observational study design, and reviews a wealth of information generated by database studies in the field of osteoporosis.
Comorbidity of gout and rheumatoid arthritis in a large population database.
Merdler-Rabinowicz, Rona; Tiosano, Shmuel; Comaneshter, Doron; Cohen, Arnon D; Amital, Howard
2017-03-01
Coexistence of rheumatoid arthritis and gout is considered to be unusual. The current study was designed as a population-based cross-sectional study, utilizing the medical database of Clalit Health Services, the largest healthcare provider organization in Israel. Data of adult patients who were previously diagnosed with rheumatoid arthritis was retrieved. For each patient, five age- and sex-matched control patients were randomly selected. Different parameters including BMI, socioeconomic status, and existence of gout as well as smoking and hypertension were examined for both groups. The study included 11,540 patients with rheumatoid arthritis and 56,763 controls. The proportion of gout in the study group was high compared to controls (1.61 vs. 0.92%, P < 0.001). In a multivariate analysis, rheumatoid arthritis was associated with gout (OR = 1.72, 95% CI 1.45-2.05, P = 0.00). The proportion of gout in rheumatoid arthritis patients is not lower than in the general population.
Song, Sun Ok; Jung, Chang Hee; Song, Young Duk; Park, Cheol-Young; Kwon, Hyuk-Sang; Cha, Bong Soo; Park, Joong-Yeol; Lee, Ki-Up
2014-01-01
Background The National Health Insurance Service (NHIS) recently signed an agreement to provide limited open access to the databases within the Korean Diabetes Association for the benefit of Korean subjects with diabetes. Here, we present the history, structure, contents, and way to use data procurement in the Korean National Health Insurance (NHI) system for the benefit of Korean researchers. Methods The NHIS in Korea is a single-payer program and is mandatory for all residents in Korea. The three main healthcare programs of the NHI, Medical Aid, and long-term care insurance (LTCI) provide 100% coverage for the Korean population. The NHIS in Korea has adopted a fee-for-service system to pay health providers. Researchers can obtain health information from the four databases of the insured that contain data on health insurance claims, health check-ups and LTCI. Results Metabolic disease as chronic disease is increasing with aging society. NHIS data is based on mandatory, serial population data, so, this might show the time course of disease and predict some disease progress, and also be used in primary and secondary prevention of disease after data mining. Conclusion The NHIS database represents the entire Korean population and can be used as a population-based database. The integrated information technology of the NHIS database makes it a world-leading population-based epidemiology and disease research platform. PMID:25349827
Song, Sun Ok; Jung, Chang Hee; Song, Young Duk; Park, Cheol-Young; Kwon, Hyuk-Sang; Cha, Bong Soo; Park, Joong-Yeol; Lee, Ki-Up; Ko, Kyung Soo; Lee, Byung-Wan
2014-10-01
The National Health Insurance Service (NHIS) recently signed an agreement to provide limited open access to the databases within the Korean Diabetes Association for the benefit of Korean subjects with diabetes. Here, we present the history, structure, contents, and way to use data procurement in the Korean National Health Insurance (NHI) system for the benefit of Korean researchers. The NHIS in Korea is a single-payer program and is mandatory for all residents in Korea. The three main healthcare programs of the NHI, Medical Aid, and long-term care insurance (LTCI) provide 100% coverage for the Korean population. The NHIS in Korea has adopted a fee-for-service system to pay health providers. Researchers can obtain health information from the four databases of the insured that contain data on health insurance claims, health check-ups and LTCI. Metabolic disease as chronic disease is increasing with aging society. NHIS data is based on mandatory, serial population data, so, this might show the time course of disease and predict some disease progress, and also be used in primary and secondary prevention of disease after data mining. The NHIS database represents the entire Korean population and can be used as a population-based database. The integrated information technology of the NHIS database makes it a world-leading population-based epidemiology and disease research platform.
Automatic initialization and quality control of large-scale cardiac MRI segmentations.
Albà, Xènia; Lekadir, Karim; Pereañez, Marco; Medrano-Gracia, Pau; Young, Alistair A; Frangi, Alejandro F
2018-01-01
Continuous advances in imaging technologies enable ever more comprehensive phenotyping of human anatomy and physiology. Concomitant reduction of imaging costs has resulted in widespread use of imaging in large clinical trials and population imaging studies. Magnetic Resonance Imaging (MRI), in particular, offers one-stop-shop multidimensional biomarkers of cardiovascular physiology and pathology. A wide range of analysis methods offer sophisticated cardiac image assessment and quantification for clinical and research studies. However, most methods have only been evaluated on relatively small databases often not accessible for open and fair benchmarking. Consequently, published performance indices are not directly comparable across studies and their translation and scalability to large clinical trials or population imaging cohorts is uncertain. Most existing techniques still rely on considerable manual intervention for the initialization and quality control of the segmentation process, becoming prohibitive when dealing with thousands of images. The contributions of this paper are three-fold. First, we propose a fully automatic method for initializing cardiac MRI segmentation, by using image features and random forests regression to predict an initial position of the heart and key anatomical landmarks in an MRI volume. In processing a full imaging database, the technique predicts the optimal corrective displacements and positions in relation to the initial rough intersections of the long and short axis images. Second, we introduce for the first time a quality control measure capable of identifying incorrect cardiac segmentations with no visual assessment. The method uses statistical, pattern and fractal descriptors in a random forest classifier to detect failures to be corrected or removed from subsequent statistical analysis. Finally, we validate these new techniques within a full pipeline for cardiac segmentation applicable to large-scale cardiac MRI databases. The results obtained based on over 1200 cases from the Cardiac Atlas Project show the promise of fully automatic initialization and quality control for population studies. Copyright © 2017 Elsevier B.V. All rights reserved.
Chen-Ying Hung; Wei-Chen Chen; Po-Tsun Lai; Ching-Heng Lin; Chi-Chun Lee
2017-07-01
Electronic medical claims (EMCs) can be used to accurately predict the occurrence of a variety of diseases, which can contribute to precise medical interventions. While there is a growing interest in the application of machine learning (ML) techniques to address clinical problems, the use of deep-learning in healthcare have just gained attention recently. Deep learning, such as deep neural network (DNN), has achieved impressive results in the areas of speech recognition, computer vision, and natural language processing in recent years. However, deep learning is often difficult to comprehend due to the complexities in its framework. Furthermore, this method has not yet been demonstrated to achieve a better performance comparing to other conventional ML algorithms in disease prediction tasks using EMCs. In this study, we utilize a large population-based EMC database of around 800,000 patients to compare DNN with three other ML approaches for predicting 5-year stroke occurrence. The result shows that DNN and gradient boosting decision tree (GBDT) can result in similarly high prediction accuracies that are better compared to logistic regression (LR) and support vector machine (SVM) approaches. Meanwhile, DNN achieves optimal results by using lesser amounts of patient data when comparing to GBDT method.
A knowledge base for tracking the impact of genomics on population health.
Yu, Wei; Gwinn, Marta; Dotson, W David; Green, Ridgely Fisk; Clyne, Mindy; Wulf, Anja; Bowen, Scott; Kolor, Katherine; Khoury, Muin J
2016-12-01
We created an online knowledge base (the Public Health Genomics Knowledge Base (PHGKB)) to provide systematically curated and updated information that bridges population-based research on genomics with clinical and public health applications. Weekly horizon scanning of a wide variety of online resources is used to retrieve relevant scientific publications, guidelines, and commentaries. After curation by domain experts, links are deposited into Web-based databases. PHGKB currently consists of nine component databases. Users can search the entire knowledge base or search one or more component databases directly and choose options for customizing the display of their search results. PHGKB offers researchers, policy makers, practitioners, and the general public a way to find information they need to understand the complicated landscape of genomics and population health.Genet Med 18 12, 1312-1314.
Global Distribution of Outbreaks of Water-Associated Infectious Diseases
Yang, Kun; LeJeune, Jeffrey; Alsdorf, Doug; Lu, Bo; Shum, C. K.; Liang, Song
2012-01-01
Background Water plays an important role in the transmission of many infectious diseases, which pose a great burden on global public health. However, the global distribution of these water-associated infectious diseases and underlying factors remain largely unexplored. Methods and Findings Based on the Global Infectious Disease and Epidemiology Network (GIDEON), a global database including water-associated pathogens and diseases was developed. In this study, reported outbreak events associated with corresponding water-associated infectious diseases from 1991 to 2008 were extracted from the database. The location of each reported outbreak event was identified and geocoded into a GIS database. Also collected in the GIS database included geo-referenced socio-environmental information including population density (2000), annual accumulated temperature, surface water area, and average annual precipitation. Poisson models with Bayesian inference were developed to explore the association between these socio-environmental factors and distribution of the reported outbreak events. Based on model predictions a global relative risk map was generated. A total of 1,428 reported outbreak events were retrieved from the database. The analysis suggested that outbreaks of water-associated diseases are significantly correlated with socio-environmental factors. Population density is a significant risk factor for all categories of reported outbreaks of water-associated diseases; water-related diseases (e.g., vector-borne diseases) are associated with accumulated temperature; water-washed diseases (e.g., conjunctivitis) are inversely related to surface water area; both water-borne and water-related diseases are inversely related to average annual rainfall. Based on the model predictions, “hotspots” of risks for all categories of water-associated diseases were explored. Conclusions At the global scale, water-associated infectious diseases are significantly correlated with socio-environmental factors, impacting all regions which are affected disproportionately by different categories of water-associated infectious diseases. PMID:22348158
Design and implementation of a distributed large-scale spatial database system based on J2EE
NASA Astrophysics Data System (ADS)
Gong, Jianya; Chen, Nengcheng; Zhu, Xinyan; Zhang, Xia
2003-03-01
With the increasing maturity of distributed object technology, CORBA, .NET and EJB are universally used in traditional IT field. However, theories and practices of distributed spatial database need farther improvement in virtue of contradictions between large scale spatial data and limited network bandwidth or between transitory session and long transaction processing. Differences and trends among of CORBA, .NET and EJB are discussed in details, afterwards the concept, architecture and characteristic of distributed large-scale seamless spatial database system based on J2EE is provided, which contains GIS client application, web server, GIS application server and spatial data server. Moreover the design and implementation of components of GIS client application based on JavaBeans, the GIS engine based on servlet, the GIS Application server based on GIS enterprise JavaBeans(contains session bean and entity bean) are explained.Besides, the experiments of relation of spatial data and response time under different conditions are conducted, which proves that distributed spatial database system based on J2EE can be used to manage, distribute and share large scale spatial data on Internet. Lastly, a distributed large-scale seamless image database based on Internet is presented.
Data harmonization and federated analysis of population-based studies: the BioSHaRE project
2013-01-01
Abstracts Background Individual-level data pooling of large population-based studies across research centres in international research projects faces many hurdles. The BioSHaRE (Biobank Standardisation and Harmonisation for Research Excellence in the European Union) project aims to address these issues by building a collaborative group of investigators and developing tools for data harmonization, database integration and federated data analyses. Methods Eight population-based studies in six European countries were recruited to participate in the BioSHaRE project. Through workshops, teleconferences and electronic communications, participating investigators identified a set of 96 variables targeted for harmonization to answer research questions of interest. Using each study’s questionnaires, standard operating procedures, and data dictionaries, harmonization potential was assessed. Whenever harmonization was deemed possible, processing algorithms were developed and implemented in an open-source software infrastructure to transform study-specific data into the target (i.e. harmonized) format. Harmonized datasets located on server in each research centres across Europe were interconnected through a federated database system to perform statistical analysis. Results Retrospective harmonization led to the generation of common format variables for 73% of matches considered (96 targeted variables across 8 studies). Authenticated investigators can now perform complex statistical analyses of harmonized datasets stored on distributed servers without actually sharing individual-level data using the DataSHIELD method. Conclusion New Internet-based networking technologies and database management systems are providing the means to support collaborative, multi-center research in an efficient and secure manner. The results from this pilot project show that, given a strong collaborative relationship between participating studies, it is possible to seamlessly co-analyse internationally harmonized research databases while allowing each study to retain full control over individual-level data. We encourage additional collaborative research networks in epidemiology, public health, and the social sciences to make use of the open source tools presented herein. PMID:24257327
Washington, Donna L; Sun, Su; Canning, Mark
2010-01-01
Most veteran research is conducted in Department of Veterans Affairs (VA) healthcare settings, although most veterans obtain healthcare outside the VA. Our objective was to determine the adequacy and relative contributions of Veterans Health Administration (VHA), Veterans Benefits Administration (VBA), and Department of Defense (DOD) administrative databases for representing the U.S. veteran population, using as an example the creation of a sampling frame for the National Survey of Women Veterans. In 2008, we merged the VHA, VBA, and DOD databases. We identified the number of unique records both overall and from each database. The combined databases yielded 925,946 unique records, representing 51% of the 1,802,000 U.S. women veteran population. The DOD database included 30% of the population (with 8% overlap with other databases). The VHA enrollment database contributed an additional 20% unique women veterans (with 6% overlap with VBA databases). VBA databases contributed an additional 2% unique women veterans (beyond 10% overlap with other databases). Use of VBA and DOD databases substantially expands access to the population of veterans beyond those in VHA databases, regardless of VA use. Adoption of these additional databases would enhance the value and generalizability of a wide range of studies of both male and female veterans.
Administrative Databases in Orthopaedic Research: Pearls and Pitfalls of Big Data.
Patel, Alpesh A; Singh, Kern; Nunley, Ryan M; Minhas, Shobhit V
2016-03-01
The drive for evidence-based decision-making has highlighted the shortcomings of traditional orthopaedic literature. Although high-quality, prospective, randomized studies in surgery are the benchmark in orthopaedic literature, they are often limited by size, scope, cost, time, and ethical concerns and may not be generalizable to larger populations. Given these restrictions, there is a growing trend toward the use of large administrative databases to investigate orthopaedic outcomes. These datasets afford the opportunity to identify a large numbers of patients across a broad spectrum of comorbidities, providing information regarding disparities in care and outcomes, preoperative risk stratification parameters for perioperative morbidity and mortality, and national epidemiologic rates and trends. Although there is power in these databases in terms of their impact, potential problems include administrative data that are at risk of clerical inaccuracies, recording bias secondary to financial incentives, temporal changes in billing codes, a lack of numerous clinically relevant variables and orthopaedic-specific outcomes, and the absolute requirement of an experienced epidemiologist and/or statistician when evaluating results and controlling for confounders. Despite these drawbacks, administrative database studies are fundamental and powerful tools in assessing outcomes on a national scale and will likely be of substantial assistance in the future of orthopaedic research.
Yoo, Seong Yeon; Cho, Nam Soo; Park, Myung Jin; Seong, Ki Min; Hwang, Jung Ho; Song, Seok Bean; Han, Myun Soo; Lee, Won Tae; Chung, Ki Wha
2011-01-01
Genotyping of highly polymorphic short tandem repeat (STR) markers is widely used for the genetic identification of individuals in forensic DNA analyses and in paternity disputes. The National DNA Profile Databank recently established by the DNA Identification Act in Korea contains the computerized STR DNA profiles of individuals convicted of crimes. For the establishment of a large autosomal STR loci population database, 1805 samples were obtained at random from Korean individuals and 15 autosomal STR markers were analyzed using the AmpFlSTR Identifiler PCR Amplification kit. For the 15 autosomal STR markers, no deviations from the Hardy-Weinberg equilibrium were observed. The most informative locus in our data set was the D2S1338 with a discrimination power of 0.9699. The combined matching probability was 1.521 × 10-17. This large STR profile dataset including atypical alleles will be important for the establishment of the Korean DNA database and for forensic applications. PMID:21597912
Yoo, Seong Yeon; Cho, Nam Soo; Park, Myung Jin; Seong, Ki Min; Hwang, Jung Ho; Song, Seok Bean; Han, Myun Soo; Lee, Won Tae; Chung, Ki Wha
2011-07-01
Genotyping of highly polymorphic short tandem repeat (STR) markers is widely used for the genetic identification of individuals in forensic DNA analyses and in paternity disputes. The National DNA Profile Databank recently established by the DNA Identification Act in Korea contains the computerized STR DNA profiles of individuals convicted of crimes. For the establishment of a large autosomal STR loci population database, 1805 samples were obtained at random from Korean individuals and 15 autosomal STR markers were analyzed using the AmpFlSTR Identifiler PCR Amplification kit. For the 15 autosomal STR markers, no deviations from the Hardy-Weinberg equilibrium were observed. The most informative locus in our data set was the D2S1338 with a discrimination power of 0.9699. The combined matching probability was 1.521 × 10(-17). This large STR profile dataset including atypical alleles will be important for the establishment of the Korean DNA database and for forensic applications.
Burton, Tanya; Le Nestour, Elisabeth; Neary, Maureen; Ludlam, William H
2016-04-01
This study aimed to develop an algorithm to identify patients with CD, and quantify the clinical and economic burden that patients with CD face compared to CD-free controls. A retrospective cohort study of CD patients was conducted in a large US commercial health plan database between 1/1/2007 and 12/31/2011. A control group with no evidence of CD during the same time was matched 1:3 based on demographics. Comorbidity rates were compared using Poisson and health care costs were compared using robust variance estimation. A case-finding algorithm identified 877 CD patients, who were matched to 2631 CD-free controls. The age and sex distribution of the selected population matched the known epidemiology of CD. CD patients were found to have comorbidity rates that were two to five times higher and health care costs that were four to seven times higher than CD-free controls. An algorithm based on eight pituitary conditions and procedures appeared to identify CD patients in a claims database without a unique diagnosis code. Young CD patients had high rates of comorbidities that are more commonly observed in an older population (e.g., diabetes, hypertension, and cardiovascular disease). Observed health care costs were also high for CD patients compared to CD-free controls, but may have been even higher if the sample had included healthier controls with no health care use as well. Earlier diagnosis, improved surgery success rates, and better treatments may all help to reduce the chronic comorbidity and high health care costs associated with CD.
Construction and validation of a population-based bone densitometry database.
Leslie, William D; Caetano, Patricia A; Macwilliam, Leonard R; Finlayson, Gregory S
2005-01-01
Utilization of dual-energy X-ray absorptiometry (DXA) for the initial diagnostic assessment of osteoporosis and in monitoring treatment has risen dramatically in recent years. Population-based studies of the impact of DXA and osteoporosis remain challenging because of incomplete and fragmented test data that exist in most regions. Our aim was to create and assess completeness of a database of all clinical DXA services and test results for the province of Manitoba, Canada and to present descriptive data resulting from testing. A regionally based bone density program for the province of Manitoba, Canada was established in 1997. Subsequent DXA services were prospectively captured in a program database. This database was retrospectively populated with earlier DXA results dating back to 1990 (the year that the first DXA scanner was installed) by integrating multiple data sources. A random chart audit was performed to assess completeness and accuracy of this dataset. For comparison, testing rates determined from the DXA database were compared with physician administrative claims data. There was a high level of completeness of this database (>99%) and accurate personal identifier information sufficient for linkage with other health care administrative data (>99%). This contrasted with physician billing data that were found to be markedly incomplete. Descriptive data provide a profile of individuals receiving DXA and their test results. In conclusion, the Manitoba bone density database has great potential as a resource for clinical and health policy research because it is population based with a high level of completeness and accuracy.
Bodner, Martin; Bastisch, Ingo; Butler, John M; Fimmers, Rolf; Gill, Peter; Gusmão, Leonor; Morling, Niels; Phillips, Christopher; Prinz, Mechthild; Schneider, Peter M; Parson, Walther
2016-09-01
The statistical evaluation of autosomal Short Tandem Repeat (STR) genotypes is based on allele frequencies. These are empirically determined from sets of randomly selected human samples, compiled into STR databases that have been established in the course of population genetic studies. There is currently no agreed procedure of performing quality control of STR allele frequency databases, and the reliability and accuracy of the data are largely based on the responsibility of the individual contributing research groups. It has been demonstrated with databases of haploid markers (EMPOP for mitochondrial mtDNA, and YHRD for Y-chromosomal loci) that centralized quality control and data curation is essential to minimize error. The concepts employed for quality control involve software-aided likelihood-of-genotype, phylogenetic, and population genetic checks that allow the researchers to compare novel data to established datasets and, thus, maintain the high quality required in forensic genetics. Here, we present STRidER (http://strider.online), a publicly available, centrally curated online allele frequency database and quality control platform for autosomal STRs. STRidER expands on the previously established ENFSI DNA WG STRbASE and applies standard concepts established for haploid and autosomal markers as well as novel tools to reduce error and increase the quality of autosomal STR data. The platform constitutes a significant improvement and innovation for the scientific community, offering autosomal STR data quality control and reliable STR genotype estimates. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Mirsky, Matthew M; Marrie, Ruth Ann; Rae-Grant, Alexander
2016-01-01
Background: The Explorys Enterprise Performance Management (EPM) database contains de-identified clinical data for 50 million patients. Multiple sclerosis (MS) disease-modifying therapies (DMTs), specifically interferon beta (IFNβ) treatments, may potentiate depression. Conflicting data have emerged, and a large-scale claims-based study by Patten et al. did not support such an association. This study compares the results of Patten et al. with those using the EPM database. Methods: "Power searches" were built to test the relationship between antidepressant drug use and DMT in the MS population. Searches were built to produce a cohort of individuals diagnosed as having MS in the past 3 years taking a specific DMT who were then given any antidepressant drug. The antidepressant drug therapy prevalence was tested in the MS population on the following DMTs: IFNβ-1a, IFNβ-1b, combined IFNβ, glatiramer acetate, natalizumab, fingolimod, and dimethyl fumarate. Results: In patients with MS, the rate of antidepressant drug use in those receiving DMTs was 40.60% to 44.57%. The rate of antidepressant drug use for combined IFNβ DMTs was 41.61% (males: 31.25%-39.62%; females: 43.10%-47.33%). Antidepressant drug use peaked in the group aged 45 to 54 years for five of six DMTs. Conclusions: We found no association between IFNβ treatment and antidepressant drug use in the MS population compared with other DMTs. The EPM database has been validated against the Patten et al. data for future use in the MS population.
A general temporal data model and the structured population event history register
Clark, Samuel J.
2010-01-01
At this time there are 37 demographic surveillance system sites active in sub-Saharan Africa, Asia and Central America, and this number is growing continuously. These sites and other longitudinal population and health research projects generate large quantities of complex temporal data in order to describe, explain and investigate the event histories of individuals and the populations they constitute. This article presents possible solutions to some of the key data management challenges associated with those data. The fundamental components of a temporal system are identified and both they and their relationships to each other are given simple, standardized definitions. Further, a metadata framework is proposed to endow this abstract generalization with specific meaning and to bind the definitions of the data to the data themselves. The result is a temporal data model that is generalized, conceptually tractable, and inherently contains a full description of the primary data it organizes. Individual databases utilizing this temporal data model can be customized to suit the needs of their operators without modifying the underlying design of the database or sacrificing the potential to transparently share compatible subsets of their data with other similar databases. A practical working relational database design based on this general temporal data model is presented and demonstrated. This work has arisen out of experience with demographic surveillance in the developing world, and although the challenges and their solutions are more general, the discussion is organized around applications in demographic surveillance. An appendix contains detailed examples and working prototype databases that implement the examples discussed in the text. PMID:20396614
Speisky, Hernan; López-Alarcón, Camilo; Gómez, Maritza; Fuentes, Jocelyn; Sandoval-Acuña, Cristian
2012-09-12
This paper reports the first database on antioxidants contained in fruits produced and consumed within the south Andes region of South America. The database ( www.portalantioxidantes.com ) contains over 500 total phenolics (TP) and ORAC values for more than 120 species/varieties of fruits. All analyses were conducted by a single ISO/IEC 17025-certified laboratory. The characterization comprised native berries such as maqui ( Aristotelia chilensis ), murtilla ( Ugni molinae ), and calafate ( Barberis microphylla ), which largely outscored all other studied fruits. Major differences in TP and ORAC were observed as a function of the fruit variety in berries, avocado, cherries, and apples. In fruits such as pears, apples, apricots, and peaches, a significant part of the TP and ORAC was accounted for by the antioxidants present in the peel. These data should be useful to estimate the fruit-based intake of TP and, through the ORAC data, their antioxidant-related contribution to the diet of south Andes populations.
Ahmetovic, Dragan; Manduchi, Roberto; Coughlan, James M.; Mascetti, Sergio
2016-01-01
In this paper we propose a computer vision-based technique that mines existing spatial image databases for discovery of zebra crosswalks in urban settings. Knowing the location of crosswalks is critical for a blind person planning a trip that includes street crossing. By augmenting existing spatial databases (such as Google Maps or OpenStreetMap) with this information, a blind traveler may make more informed routing decisions, resulting in greater safety during independent travel. Our algorithm first searches for zebra crosswalks in satellite images; all candidates thus found are validated against spatially registered Google Street View images. This cascaded approach enables fast and reliable discovery and localization of zebra crosswalks in large image datasets. While fully automatic, our algorithm could also be complemented by a final crowdsourcing validation stage for increased accuracy. PMID:26824080
Ng, Kevin Kit Siong; Lee, Soon Leong; Tnah, Lee Hong; Nurul-Farhanah, Zakaria; Ng, Chin Hong; Lee, Chai Ting; Tani, Naoki; Diway, Bibian; Lai, Pei Sing; Khoo, Eyen
2016-07-01
Illegal logging and smuggling of Gonystylus bancanus (Thymelaeaceae) poses a serious threat to this fragile valuable peat swamp timber species. Using G. bancanus as a case study, DNA markers were used to develop identification databases at the species, population and individual level. The species level database for Gonystylus comprised of an rDNA (ITS2) and two cpDNA (trnH-psbA and trnL) markers based on a 20 Gonystylus species database. When concatenated, taxonomic species recognition was achieved with a resolution of 90% (18 out of the 20 species). In addition, based on 17 natural populations of G. bancanus throughout West (Peninsular Malaysia) and East (Sabah and Sarawak) Malaysia, population and individual identification databases were developed using cpDNA and STR markers respectively. A haplotype distribution map for Malaysia was generated using six cpDNA markers, resulting in 12 unique multilocus haplotypes, from 24 informative intraspecific variable sites. These unique haplotypes suggest a clear genetic structuring of West and East regions. A simulation procedure based on the composition of the samples was used to test whether a suspected sample conformed to a given regional origin. Overall, the observed type I and II errors of the databases showed good concordance with the predicted 5% threshold which indicates that the databases were useful in revealing provenance and establishing conformity of samples from West and East Malaysia. Sixteen STRs were used to develop the DNA profiling databases for individual identification. Bayesian clustering analyses divided the 17 populations into two main genetic clusters, corresponding to the regions of West and East Malaysia. Population substructuring (K=2) was observed within each region. After removal of bias resulting from sampling effects and population subdivision, conservativeness tests showed that the West and East Malaysia databases were conservative. This suggests that both databases can be used independently for random match probability estimation within respective regions. The reliability of the databases was further determined by independent self-assignment tests based on the likelihood of each individual's multilocus genotype occurring in each identified population, genetic cluster and region with an average percentage of correctly assigned individuals of 54.80%, 99.60% and 100% respectively. Thus, after appropriate validation, the genetic identification databases developed for G. bancanus in this study could support forensic applications and help safeguard this valuable species into the future. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Scandinavian epidemiological research in gastroenterology and hepatology.
Björnsson, Einar S; Ekbom, Anders
2015-06-01
In the last decades, a large number of epidemiological studies in gastroenterology and hepatology have originated from the Scandinavian countries. With the help of large health databases, with good validity and other registries related to patient outcomes, researchers from the Scandinavian countries have been able to make some very important contributions to the field. These countries, Sweden, Norway, Finland, Denmark and Iceland, have all universal access to health care and have shown to be ideal for epidemiological research. Population-based studies have been frequent and follow-up studies have been able to describe the temporal trends and changes in phenotypes. Our ability in Scandinavia to follow up defined groups of patients over time has been crucial to learn the natural history of many gastrointestinal and liver diseases and often in a population-based setting. Patient-related outcomes measures will probably gain increasing importance in the future, but Scandinavian gastroenterologists and surgeons are likely to have a better infrastructure for such endeavors compared to most other populations. Thus, there is a bright future for international competitive research within the field of gastrointestinal and liver diseases in Scandinavia.
QBIC project: querying images by content, using color, texture, and shape
NASA Astrophysics Data System (ADS)
Niblack, Carlton W.; Barber, Ron; Equitz, Will; Flickner, Myron D.; Glasman, Eduardo H.; Petkovic, Dragutin; Yanker, Peter; Faloutsos, Christos; Taubin, Gabriel
1993-04-01
In the query by image content (QBIC) project we are studying methods to query large on-line image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, and shape of image objects and regions. Potential applications include medical (`Give me other images that contain a tumor with a texture like this one'), photo-journalism (`Give me images that have blue at the top and red at the bottom'), and many others in art, fashion, cataloging, retailing, and industry. Key issues include derivation and computation of attributes of images and objects that provide useful query functionality, retrieval methods based on similarity as opposed to exact match, query by image example or user drawn image, the user interfaces, query refinement and navigation, high dimensional database indexing, and automatic and semi-automatic database population. We currently have a prototype system written in X/Motif and C running on an RS/6000 that allows a variety of queries, and a test database of over 1000 images and 1000 objects populated from commercially available photo clip art images. In this paper we present the main algorithms for color texture, shape and sketch query that we use, show example query results, and discuss future directions.
Su, Xiaoquan; Xu, Jian; Ning, Kang
2012-10-01
It has long been intriguing scientists to effectively compare different microbial communities (also referred as 'metagenomic samples' here) in a large scale: given a set of unknown samples, find similar metagenomic samples from a large repository and examine how similar these samples are. With the current metagenomic samples accumulated, it is possible to build a database of metagenomic samples of interests. Any metagenomic samples could then be searched against this database to find the most similar metagenomic sample(s). However, on one hand, current databases with a large number of metagenomic samples mostly serve as data repositories that offer few functionalities for analysis; and on the other hand, methods to measure the similarity of metagenomic data work well only for small set of samples by pairwise comparison. It is not yet clear, how to efficiently search for metagenomic samples against a large metagenomic database. In this study, we have proposed a novel method, Meta-Storms, that could systematically and efficiently organize and search metagenomic data. It includes the following components: (i) creating a database of metagenomic samples based on their taxonomical annotations, (ii) efficient indexing of samples in the database based on a hierarchical taxonomy indexing strategy, (iii) searching for a metagenomic sample against the database by a fast scoring function based on quantitative phylogeny and (iv) managing database by index export, index import, data insertion, data deletion and database merging. We have collected more than 1300 metagenomic data from the public domain and in-house facilities, and tested the Meta-Storms method on these datasets. Our experimental results show that Meta-Storms is capable of database creation and effective searching for a large number of metagenomic samples, and it could achieve similar accuracies compared with the current popular significance testing-based methods. Meta-Storms method would serve as a suitable database management and search system to quickly identify similar metagenomic samples from a large pool of samples. ningkang@qibebt.ac.cn Supplementary data are available at Bioinformatics online.
Said, Joseph I; Knapka, Joseph A; Song, Mingzhou; Zhang, Jinfa
2015-08-01
A specialized database currently containing more than 2200 QTL is established, which allows graphic presentation, visualization and submission of QTL. In cotton quantitative trait loci (QTL), studies are focused on intraspecific Gossypium hirsutum and interspecific G. hirsutum × G. barbadense populations. These two populations are commercially important for the textile industry and are evaluated for fiber quality, yield, seed quality, resistance, physiological, and morphological trait QTL. With meta-analysis data based on the vast amount of QTL studies in cotton it will be beneficial to organize the data into a functional database for the cotton community. Here we provide a tool for cotton researchers to visualize previously identified QTL and submit their own QTL to the Cotton QTLdb database. The database provides the user with the option of selecting various QTL trait types from either the G. hirsutum or G. hirsutum × G. barbadense populations. Based on the user's QTL trait selection, graphical representations of chromosomes of the population selected are displayed in publication ready images. The database also provides users with trait information on QTL, LOD scores, and explained phenotypic variances for all QTL selected. The CottonQTLdb database provides cotton geneticist and breeders with statistical data on cotton QTL previously identified and provides a visualization tool to view QTL positions on chromosomes. Currently the database (Release 1) contains 2274 QTLs, and succeeding QTL studies will be updated regularly by the curators and members of the cotton community that contribute their data to keep the database current. The database is accessible from http://www.cottonqtldb.org.
Elliott, Irmina A; Epelboym, Irene; Winner, Megan; Allendorf, John D; Haigh, Philip I
2017-01-01
Endocrine and exocrine insufficiency after partial pancreatectomy affect quality of life, cardiovascular health, and nutritional status. However, their incidence and predictors are unknown. To identify the incidence and predictors of new-onset diabetes and exocrine insufficiency after partial pancreatectomy. We retrospectively reviewed 1165 cases of partial pancreatectomy, performed from 1998 to 2010, from a large population-based database. Incidence of new onset diabetes and exocrine insufficiency RESULTS: Of 1165 patients undergoing partial pancreatectomy, 41.8% had preexisting diabetes. In the remaining 678 patients, at a median 3.6 months, diabetes developed in 274 (40.4%) and pancreatic insufficiency developed in 235 (34.7%) patients. Independent predictors of new-onset diabetes were higher Charlson Comorbidity Index (CCI; hazard ratio [HR] = 1.62 for CCI of 1, p = 0.02; HR = 1.95 for CCI ≥ 2, p < 0.01) and pancreatitis (HR = 1.51, p = 0.03). There was no difference in diabetes after Whipple procedure vs distal pancreatic resections, or malignant vs benign pathologic findings. Independent predictors of exocrine insufficiency were female sex (HR = 1.32, p = 0.002) and higher CCI (HR = 1.85 for CCI of 1, p < 0.01; HR = 2.05 for CCI ≥ 2, p < 0.01). Distal resection and Asian race predicted decreased exocrine insufficiency (HR = 0.35, p < 0.01; HR = 0.54, p < 0.01, respectively). In a large population-based database, the rates of postpancreatectomy endocrine and exocrine insufficiency were 40% and 35%, respectively. These data are critical for informing patients' and physicians' expectations.
The value of trauma registries.
Moore, Lynne; Clark, David E
2008-06-01
Trauma registries are databases that document acute care delivered to patients hospitalised with injuries. They are designed to provide information that can be used to improve the efficiency and quality of trauma care. Indeed, the combination of trauma registry data at regional or national levels can produce very large databases that allow unprecedented opportunities for the evaluation of patient outcomes and inter-hospital comparisons. However, the creation and upkeep of trauma registries requires a substantial investment of money, time and effort, data quality is an important challenge and aggregated trauma data sets rarely represent a population-based sample of trauma. In addition, trauma hospitalisations are already routinely documented in administrative hospital discharge databases. The present review aims to provide evidence that trauma registry data can be used to improve the care dispensed to victims of injury in ways that could not be achieved with information from administrative databases alone. In addition, we will define the structure and purpose of contemporary trauma registries, acknowledge their limitations, and discuss possible ways to make them more useful.
Diway, Bibian; Khoo, Eyen
2017-01-01
The development of timber tracking methods based on genetic markers can provide scientific evidence to verify the origin of timber products and fulfill the growing requirement for sustainable forestry practices. In this study, the origin of an important Dark Red Meranti wood, Shorea platyclados, was studied by using the combination of seven chloroplast DNA and 15 short tandem repeats (STRs) markers. A total of 27 natural populations of S. platyclados were sampled throughout Malaysia to establish population level and individual level identification databases. A haplotype map was generated from chloroplast DNA sequencing for population identification, resulting in 29 multilocus haplotypes, based on 39 informative intraspecific variable sites. Subsequently, a DNA profiling database was developed from 15 STRs allowing for individual identification in Malaysia. Cluster analysis divided the 27 populations into two genetic clusters, corresponding to the region of Eastern and Western Malaysia. The conservativeness tests showed that the Malaysia database is conservative after removal of bias from population subdivision and sampling effects. Independent self-assignment tests correctly assigned individuals to the database in an overall 60.60−94.95% of cases for identified populations, and in 98.99−99.23% of cases for identified regions. Both the chloroplast DNA database and the STRs appear to be useful for tracking timber originating in Malaysia. Hence, this DNA-based method could serve as an effective addition tool to the existing forensic timber identification system for ensuring the sustainably management of this species into the future. PMID:28430826
A Database Approach for Predicting and Monitoring Baked Anode Properties
NASA Astrophysics Data System (ADS)
Lauzon-Gauthier, Julien; Duchesne, Carl; Tessier, Jayson
2012-11-01
The baked anode quality control strategy currently used by most carbon plants based on testing anode core samples in the laboratory is inadequate for facing increased raw material variability. The low core sampling rate limited by lab capacity and the common practice of reporting averaged properties based on some anode population mask a significant amount of individual anode variability. In addition, lab results are typically available a few weeks after production and the anodes are often already set in the reduction cells preventing early remedial actions when necessary. A database approach is proposed in this work to develop a soft-sensor for predicting individual baked anode properties at the end of baking cycle. A large historical database including raw material properties, process operating parameters and anode core data was collected from a modern Alcoa plant. A multivariate latent variable PLS regression method was used for analyzing the large database and building the soft-sensor model. It is shown that the general low frequency trends in most anode physical and mechanical properties driven by raw material changes are very well captured by the model. Improvements in the data infrastructure (instrumentation, sampling frequency and location) will be necessary for predicting higher frequency variations in individual baked anode properties. This paper also demonstrates how multivariate latent variable models can be interpreted against process knowledge and used for real-time process monitoring of carbon plants, and detection of faults and abnormal operation.
Erectile Dysfunction in Patients with Sleep Apnea--A Nationwide Population-Based Study.
Chen, Chia-Min; Tsai, Ming-Ju; Wei, Po-Ju; Su, Yu-Chung; Yang, Chih-Jen; Wu, Meng-Ni; Hsu, Chung-Yao; Hwang, Shang-Jyh; Chong, Inn-Wen; Huang, Ming-Shyan
2015-01-01
Increased incidence of erectile dysfunction (ED) has been reported among patients with sleep apnea (SA). However, this association has not been confirmed in a large-scale study. We therefore performed a population-based cohort study using Taiwan National Health Insurance (NHI) database to investigate the association of SA and ED. From the database of one million representative subjects randomly sampled from individuals enrolled in the NHI system in 2010, we identified adult patients having SA and excluded those having a diagnosis of ED prior to SA. From these suspected SA patients, those having SA diagnosis after polysomnography were defined as probable SA patients. The dates of their first SA diagnosis were defined as their index dates. Each SA patient was matched to 30 randomly-selected, age-matched control subjects without any SA diagnosis. The control subjects were assigned index dates as their corresponding SA patients, and were ensured having no ED diagnosis prior to their index dates. Totally, 4,835 male patients with suspected SA (including 1,946 probable SA patients) were matched to 145,050 control subjects (including 58,380 subjects matched to probable SA patients). The incidence rate of ED was significantly higher in probable SA patients as compared with the corresponding control subjects (5.7 vs. 2.3 per 1000 patient-year; adjusted incidence rate ratio = 2.0 [95% CI: 1.8-2.2], p<0.0001). The cumulative incidence was also significantly higher in the probable SA patients (p<0.0001). In multivariable Cox regression analysis, probable SA remained a significant risk factor for the development of ED after adjusting for age, residency, income level and comorbidities (hazard ratio = 2.0 [95%CI: 1.5-2.7], p<0.0001). In line with previous studies, this population-based large-scale study confirmed an increased ED incidence in SA patients in Chinese population. Physicians need to pay attention to the possible underlying SA while treating ED patients.
The influence of solid rocket motor retro-burns on the space debris environment
NASA Astrophysics Data System (ADS)
Stabroth, Sebastian; Homeister, Maren; Oswald, Michael; Wiedemann, Carsten; Klinkrad, Heiner; Vörsmann, Peter
The ESA space debris population model MASTER (Meteoroid and Space Debris Terrestrial Environment Reference) considers firings of solid rocket motors (SRM) as a debris source with the associated generation of slag and dust particles. The resulting slag and dust population is a major contribution to the sub-millimetre size debris environment in Earth orbit. The current model version, MASTER-2005, is based on the simulation of 1076 orbital SRM firings which contributed to the long-term debris environment. A comparison of the modelled flux with impact data from returned surfaces shows that the shape and quantity of the modelled SRM dust distribution matches that of recent Hubble Space Telescope (HST) solar array measurements very well. However, the absolute flux level for dust is under-predicted for some of the analysed Long Duration Exposure Facility (LDEF) surfaces. This points into the direction of some past SRM firings not included in the current event database. The most suitable candidates for these firings are the large number of SRM retro-burns of return capsules. Objects released by those firings have highly eccentric orbits with perigees in the lower regions of the atmosphere. Thus, they produce no long-term effect on the debris environment. However, a large number of those firings during the on-orbit time frame of LDEF might lead to an increase of the dust population for some of the LDEF surfaces. In this paper, the influence of SRM retro-burns on the short- and long-term debris environment is analysed. The existing firing database is updated with gathered information of some 800 Russian retro-firings. Each firing is simulated with the MASTER population generation module. The resulting population is compared against the existing background population of SRM slag and dust particles in terms of spatial density and flux predictions.
Brief Report: The Negev Hospital-University-Based (HUB) Autism Database
ERIC Educational Resources Information Center
Meiri, Gal; Dinstein, Ilan; Michaelowski, Analya; Flusser, Hagit; Ilan, Michal; Faroy, Michal; Bar-Sinai, Asif; Manelis, Liora; Stolowicz, Dana; Yosef, Lili Lea; Davidovitch, Nadav; Golan, Hava; Arbelle, Shosh; Menashe, Idan
2017-01-01
Elucidating the heterogeneous etiologies of autism will require investment in comprehensive longitudinal data acquisition from large community based cohorts. With this in mind, we have established a hospital-university-based (HUB) database of autism which incorporates prospective and retrospective data from a large and ethnically diverse…
A facial expression image database and norm for Asian population: a preliminary report
NASA Astrophysics Data System (ADS)
Chen, Chien-Chung; Cho, Shu-ling; Horszowska, Katarzyna; Chen, Mei-Yen; Wu, Chia-Ching; Chen, Hsueh-Chih; Yeh, Yi-Yu; Cheng, Chao-Min
2009-01-01
We collected 6604 images of 30 models in eight types of facial expression: happiness, anger, sadness, disgust, fear, surprise, contempt and neutral. Among them, 406 most representative images from 12 models were rated by more than 200 human raters for perceived emotion category and intensity. Such large number of emotion categories, models and raters is sufficient for most serious expression recognition research both in psychology and in computer science. All the models and raters are of Asian background. Hence, this database can also be used when the culture background is a concern. In addition, 43 landmarks each of the 291 rated frontal view images were identified and recorded. This information should facilitate feature based research of facial expression. Overall, the diversity in images and richness in information should make our database and norm useful for a wide range of research.
Chesapeake Bay Program Water Quality Database
The Chesapeake Information Management System (CIMS), designed in 1996, is an integrated, accessible information management system for the Chesapeake Bay Region. CIMS is an organized, distributed library of information and software tools designed to increase basin-wide public access to Chesapeake Bay information. The information delivered by CIMS includes technical and public information, educational material, environmental indicators, policy documents, and scientific data. Through the use of relational databases, web-based programming, and web-based GIS a large number of Internet resources have been established. These resources include multiple distributed on-line databases, on-demand graphing and mapping of environmental data, and geographic searching tools for environmental information. Baseline monitoring data, summarized data and environmental indicators that document ecosystem status and trends, confirm linkages between water quality, habitat quality and abundance, and the distribution and integrity of biological populations are also available. One of the major features of the CIMS network is the Chesapeake Bay Program's Data Hub, providing users access to a suite of long- term water quality and living resources databases. Chesapeake Bay mainstem and tidal tributary water quality, benthic macroinvertebrates, toxics, plankton, and fluorescence data can be obtained for a network of over 800 monitoring stations.
Assessing the quality of life history information in publicly available databases.
Thorson, James T; Cope, Jason M; Patrick, Wesley S
2014-01-01
Single-species life history parameters are central to ecological research and management, including the fields of macro-ecology, fisheries science, and ecosystem modeling. However, there has been little independent evaluation of the precision and accuracy of the life history values in global and publicly available databases. We therefore develop a novel method based on a Bayesian errors-in-variables model that compares database entries with estimates from local experts, and we illustrate this process by assessing the accuracy and precision of entries in FishBase, one of the largest and oldest life history databases. This model distinguishes biases among seven life history parameters, two types of information available in FishBase (i.e., published values and those estimated from other parameters), and two taxa (i.e., bony and cartilaginous fishes) relative to values from regional experts in the United States, while accounting for additional variance caused by sex- and region-specific life history traits. For published values in FishBase, the model identifies a small positive bias in natural mortality and negative bias in maximum age, perhaps caused by unacknowledged mortality caused by fishing. For life history values calculated by FishBase, the model identified large and inconsistent biases. The model also demonstrates greatest precision for body size parameters, decreased precision for values derived from geographically distant populations, and greatest between-sex differences in age at maturity. We recommend that our bias and precision estimates be used in future errors-in-variables models as a prior on measurement errors. This approach is broadly applicable to global databases of life history traits and, if used, will encourage further development and improvements in these databases.
Prevalence rates for depression by industry: a claims database analysis.
Wulsin, Lawson; Alterman, Toni; Timothy Bushnell, P; Li, Jia; Shen, Rui
2014-11-01
To estimate and interpret differences in depression prevalence rates among industries, using a large, group medical claims database. Depression cases were identified by ICD-9 diagnosis code in a population of 214,413 individuals employed during 2002-2005 by employers based in western Pennsylvania. Data were provided by Highmark, Inc. (Pittsburgh and Camp Hill, PA). Rates were adjusted for age, gender, and employee share of health care costs. National industry measures of psychological distress, work stress, and physical activity at work were also compiled from other data sources. Rates for clinical depression in 55 industries ranged from 6.9 to 16.2 %, (population rate = 10.45 %). Industries with the highest rates tended to be those which, on the national level, require frequent or difficult interactions with the public or clients, and have high levels of stress and low levels of physical activity. Additional research is needed to help identify industries with relatively high rates of depression in other regions and on the national level, and to determine whether these differences are due in part to specific work stress exposures and physical inactivity at work. Claims database analyses may provide a cost-effective way to identify priorities for depression treatment and prevention in the workplace.
Hypothyroidism in Patients with Psoriasis or Rosacea: A Large Population Study.
James, Sara M; Hill, Dane E; Feldman, Steven R
2016-10-15
Hypothyroidism is a common disease, and there may be a link between hypothyroidism and inflammatory skin disease. The purpose of this study is to assess whether hypothyroidism is more prevalent in psoriasis or rosacea patients. We utilized a large claims-based database to analyze rates of hypothyroidism in patients with psoriasis and rosacea compared to other patients with skin diseases. Participants were patients between 20-64 years of age with ICD-9 diagnosis codes for psoriasis, rosacea, and hypothyroidism. We found that rates of hypothyroidism in rosacea and psoriasis patients were similar to rates of hypothyroidism in those without rosacea or psoriasis.
Hypothyroidism in Patients with Psoriasis or Rosacea: A Large Population Study.
James, Sara M; Hill, Dane E; Feldman, Steven R
2016-09-15
Hypothyroidism is a common disease, and there may be a link between hypothyroidism and inflammatory skin disease. The purpose of this study is to assess whether hypothyroidism is more prevalent in psoriasis or rosacea patients. We utilized a large claims-based database to analyze rates of hypothyroidism in patients with psoriasis and rosacea compared to other patients with skin diseases. Participants were patients between 20-64 years of age with ICD-9 diagnosis codes for psoriasis, rosacea, and hypothyroidism. We found that rates of hypothyroidism in rosacea and psoriasis patients were similar to rates of hypothyroidism in those without rosacea or psoriasis.
Digital hand atlas for web-based bone age assessment: system design and implementation
NASA Astrophysics Data System (ADS)
Cao, Fei; Huang, H. K.; Pietka, Ewa; Gilsanz, Vicente
2000-04-01
A frequently used assessment method of skeletal age is atlas matching by a radiological examination of a hand image against a small set of Greulich-Pyle patterns of normal standards. The method however can lead to significant deviation in age assessment, due to a variety of observers with different levels of training. The Greulich-Pyle atlas based on middle upper class white populations in the 1950s, is also not fully applicable for children of today, especially regarding the standard development in other racial groups. In this paper, we present our system design and initial implementation of a digital hand atlas and computer-aided diagnostic (CAD) system for Web-based bone age assessment. The digital atlas will remove the disadvantages of the currently out-of-date one and allow the bone age assessment to be computerized and done conveniently via Web. The system consists of a hand atlas database, a CAD module and a Java-based Web user interface. The atlas database is based on a large set of clinically normal hand images of diverse ethnic groups. The Java-based Web user interface allows users to interact with the hand image database form browsers. Users can use a Web browser to push a clinical hand image to the CAD server for a bone age assessment. Quantitative features on the examined image, which reflect the skeletal maturity, is then extracted and compared with patterns from the atlas database to assess the bone age.
Assessment of COPD-related outcomes via a national electronic medical record database.
Asche, Carl; Said, Quayyim; Joish, Vijay; Hall, Charles Oaxaca; Brixner, Diana
2008-01-01
The technology and sophistication of healthcare utilization databases have expanded over the last decade to include results of lab tests, vital signs, and other clinical information. This review provides an assessment of the methodological and analytical challenges of conducting chronic obstructive pulmonary disease (COPD) outcomes research in a national electronic medical records (EMR) dataset and its potential application towards the assessment of national health policy issues, as well as a description of the challenges or limitations. An EMR database and its application to measuring outcomes for COPD are described. The ability to measure adherence to the COPD evidence-based practice guidelines, generated by the NIH and HEDIS quality indicators, in this database was examined. Case studies, before and after their publication, were used to assess the adherence to guidelines and gauge the conformity to quality indicators. EMR was the only source of information for pulmonary function tests, but low frequency in ordering by primary care was an issue. The EMR data can be used to explore impact of variation in healthcare provision on clinical outcomes. The EMR database permits access to specific lab data and biometric information. The richness and depth of information on "real world" use of health services for large population-based analytical studies at relatively low cost render such databases an attractive resource for outcomes research. Various sources of information exist to perform outcomes research. It is important to understand the desired endpoints of such research and choose the appropriate database source.
The case for improving road safety in Pacific Islands: a population-based study from Fiji (TRIP 6).
Herman, Josephine; Ameratunga, Shanthi; Wainiqolo, Iris; Kafoa, Berlin; McCaig, Eddie; Jackson, Rod
2012-10-01
To estimate the incidence and demographic characteristics associated with road traffic injuries (RTIs) resulting in deaths or hospital admission for 12 hours or more in Viti Levu, Fiji. Analysis of the prospective population-based Fiji Injury Surveillance in Hospitals database (October 2005 - September 2006). Of the 374 RTI cases identified (17% of all injuries), 72% were males and one third were aged 15-29 years. RTI fatalities (10.3 per 100,000 per year) were higher among Indians compared to Fijians. Two-thirds of deaths (largely ascribed to head, chest and abdominal trauma) occurred before hospital admission. While the RTI fatality rate was comparable to the global average for high-income countries, the level of motorisation in Fiji is considerably lower. To avert rising RTI rates with increasing motorisation, Fiji requires a robust road safety strategy alongside effective trauma-care services and a reliable population-based RTI surveillance system. © 2012 The Authors. ANZJPH © 2012 Public Health Association of Australia.
Enhanced Living by Assessing Voice Pathology Using a Co-Occurrence Matrix
Muhammad, Ghulam; Alhamid, Mohammed F.; Hossain, M. Shamim; Almogren, Ahmad S.; Vasilakos, Athanasios V.
2017-01-01
A large number of the population around the world suffers from various disabilities. Disabilities affect not only children but also adults of different professions. Smart technology can assist the disabled population and lead to a comfortable life in an enhanced living environment (ELE). In this paper, we propose an effective voice pathology assessment system that works in a smart home framework. The proposed system takes input from various sensors, and processes the acquired voice signals and electroglottography (EGG) signals. Co-occurrence matrices in different directions and neighborhoods from the spectrograms of these signals were obtained. Several features such as energy, entropy, contrast, and homogeneity from these matrices were calculated and fed into a Gaussian mixture model-based classifier. Experiments were performed with a publicly available database, namely, the Saarbrucken voice database. The results demonstrate the feasibility of the proposed system in light of its high accuracy and speed. The proposed system can be extended to assess other disabilities in an ELE. PMID:28146069
Enhanced Living by Assessing Voice Pathology Using a Co-Occurrence Matrix.
Muhammad, Ghulam; Alhamid, Mohammed F; Hossain, M Shamim; Almogren, Ahmad S; Vasilakos, Athanasios V
2017-01-29
A large number of the population around the world suffers from various disabilities. Disabilities affect not only children but also adults of different professions. Smart technology can assist the disabled population and lead to a comfortable life in an enhanced living environment (ELE). In this paper, we propose an effective voice pathology assessment system that works in a smart home framework. The proposed system takes input from various sensors, and processes the acquired voice signals and electroglottography (EGG) signals. Co-occurrence matrices in different directions and neighborhoods from the spectrograms of these signals were obtained. Several features such as energy, entropy, contrast, and homogeneity from these matrices were calculated and fed into a Gaussian mixture model-based classifier. Experiments were performed with a publicly available database, namely, the Saarbrucken voice database. The results demonstrate the feasibility of the proposed system in light of its high accuracy and speed. The proposed system can be extended to assess other disabilities in an ELE.
Pharmacoepidemiology resources in Ireland-an introduction to pharmacy claims data.
Sinnott, Sarah-Jo; Bennett, Kathleen; Cahir, Caitriona
2017-11-01
Administrative health data, such as pharmacy claims data, present a valuable resource for conducting pharmacoepidemiological and health services research. Often, data are available for whole populations allowing population level analyses. Moreover, their routine collection ensures that the data reflect health care utilisation in the real-world setting compared to data collected in clinical trials. The Irish Health Service Executive-Primary Care Reimbursement Service (HSE-PCRS) community pharmacy claims database is described. The availability of demographic variables and drug-related information is discussed. The strengths and limitations associated using this database for conducting research are presented, in particular, internal and external validity. Examples of recently conducted research using the HSE-PCRS pharmacy claims database are used to illustrate the breadth of its use. The HSE-PCRS national pharmacy claims database is a large, high-quality, valid and accurate data source for measuring drug exposure in specific populations in Ireland. The main limitation is the lack of generalisability for those aged <70 years and the lack of information on indication or outcome.
“NaKnowBase”: A Nanomaterials Relational Database
NaKnowBase is an internal relational database populated with data from peer-reviewed ORD nanomaterials research publications. The database focuses on papers describing the actions of nanomaterials in environmental or biological media including their interactions, transformations...
NASA Technical Reports Server (NTRS)
Handley, Thomas H., Jr.; Collins, Donald J.; Doyle, Richard J.; Jacobson, Allan S.
1991-01-01
Viewgraphs on DataHub knowledge based assistance for science visualization and analysis using large distributed databases. Topics covered include: DataHub functional architecture; data representation; logical access methods; preliminary software architecture; LinkWinds; data knowledge issues; expert systems; and data management.
van Baal, Sjozef; Kaimakis, Polynikis; Phommarinh, Manyphong; Koumbi, Daphne; Cuppens, Harry; Riccardino, Francesca; Macek, Milan; Scriver, Charles R; Patrinos, George P
2007-01-01
Frequency of INherited Disorders database (FINDbase) (http://www.findbase.org) is a relational database, derived from the ETHNOS software, recording frequencies of causative mutations leading to inherited disorders worldwide. Database records include the population and ethnic group, the disorder name and the related gene, accompanied by links to any corresponding locus-specific mutation database, to the respective Online Mendelian Inheritance in Man entries and the mutation together with its frequency in that population. The initial information is derived from the published literature, locus-specific databases and genetic disease consortia. FINDbase offers a user-friendly query interface, providing instant access to the list and frequencies of the different mutations. Query outputs can be either in a table or graphical format, accompanied by reference(s) on the data source. Registered users from three different groups, namely administrator, national coordinator and curator, are responsible for database curation and/or data entry/correction online via a password-protected interface. Databaseaccess is free of charge and there are no registration requirements for data querying. FINDbase provides a simple, web-based system for population-based mutation data collection and retrieval and can serve not only as a valuable online tool for molecular genetic testing of inherited disorders but also as a non-profit model for sustainable database funding, in the form of a 'database-journal'.
Gagnon, Alain; Smith, Ken R; Tremblay, Marc; Vézina, Hélène; Paré, Paul-Philippe; Desjardins, Bertrand
2009-01-01
Frontier populations provide exceptional opportunities to test the hypothesis of a trade-off between fertility and longevity. In such populations, mechanisms favoring reproduction usually find fertile ground, and if these mechanisms reduce longevity, demographers should observe higher postreproductive mortality among highly fertile women. We test this hypothesis using complete female reproductive histories from three large demographic databases: the Registre de la population du Québec ancien (Université de Montréal), which covers the first centuries of settlement in Quebec; the BALSAC database (Université du Québec à Chicoutimi), including comprehensive records for the Saguenay-Lac-St-Jean (SLSJ) in Quebec in the nineteenth and twentieth centuries; and the Utah Population Database (University of Utah), including all individuals who experienced a vital event on the Mormon Trail and their descendants. Together, the three samples allow for comparisons over time and space, and represent one of the largest set of natural fertility cohorts used to simultaneously assess reproduction and longevity. Using survival analyses, we found a negative influence of parity and a positive influence of age at last child on postreproductive survival in the three populations, as well as a significant interaction between these two variables. The effect sizes of all these parameters were remarkably similar in the three samples. However, we found little evidence that early fertility affects postreproductive survival. The use of Heckman's procedure assessing the impact of mortality selection during reproductive ages did not appreciably alter these results. We conclude our empirical investigation by discussing the advantages of comparative approaches. 2009 Wiley-Liss, Inc.
“NaKnowBase”: A Nanomaterials Relational Database
NaKnowBase is a relational database populated with data from peer-reviewed ORD nanomaterials research publications. The database focuses on papers describing the actions of nanomaterials in environmental or biological media including their interactions, transformations and poten...
EvoSNP-DB: A database of genetic diversity in East Asian populations.
Kim, Young Uk; Kim, Young Jin; Lee, Jong-Young; Park, Kiejung
2013-08-01
Genome-wide association studies (GWAS) have become popular as an approach for the identification of large numbers of phenotype-associated variants. However, differences in genetic architecture and environmental factors mean that the effect of variants can vary across populations. Understanding population genetic diversity is valuable for the investigation of possible population specific and independent effects of variants. EvoSNP-DB aims to provide information regarding genetic diversity among East Asian populations, including Chinese, Japanese, and Korean. Non-redundant SNPs (1.6 million) were genotyped in 54 Korean trios (162 samples) and were compared with 4 million SNPs from HapMap phase II populations. EvoSNP-DB provides two user interfaces for data query and visualization, and integrates scores of genetic diversity (Fst and VarLD) at the level of SNPs, genes, and chromosome regions. EvoSNP-DB is a web-based application that allows users to navigate and visualize measurements of population genetic differences in an interactive manner, and is available online at [http://biomi.cdc.go.kr/EvoSNP/].
Multicenter neonatal databases: Trends in research uses.
Creel, Liza M; Gregory, Sean; McNeal, Catherine J; Beeram, Madhava R; Krauss, David R
2017-01-13
In the US, approximately 12.7% of all live births are preterm, 8.2% of live births were low birth weight (LBW), and 1.5% are very low birth weight (VLBW). Although technological advances have improved mortality rates among preterm and LBW infants, improving overall rates of prematurity and LBW remains a national priority. Monitoring short- and long-term outcomes is critical for advancing medical treatment and minimizing morbidities associated with prematurity or LBW; however, studying these infants can be challenging. Several large, multi-center neonatal databases have been developed to improve research and quality improvement of treatments for and outcomes of premature and LBW infants. The purpose of this systematic review was to describe three multi-center neonatal databases. We conducted a literature search using PubMed and Google Scholar over the period 1990 to August 2014. Studies were included in our review if one of the databases was used as a primary source of data or comparison. Included studies were categorized by year of publication; study design employed, and research focus. A total of 343 studies published between 1991 and 2014 were included. Studies of premature and LBW infants using these databases have increased over time, and provide evidence for both neonatology and community-based pediatric practice. Research into treatment and outcomes of premature and LBW infants is expanding, partially due to the availability of large, multicenter databases. The consistency of clinical conditions and neonatal outcomes studied since 1990 demonstrates that there are dedicated research agendas and resources that allow for long-term, and potentially replicable, studies within this population.
Pettengill, James B; Pightling, Arthur W; Baugher, Joseph D; Rand, Hugh; Strain, Errol
2016-01-01
The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging due to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). When analyzing empirical data (whole-genome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.
Pettengill, James B.; Pightling, Arthur W.; Baugher, Joseph D.; ...
2016-11-10
The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging duemore » to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). Finally, when analyzing empirical data (wholegenome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pettengill, James B.; Pightling, Arthur W.; Baugher, Joseph D.
The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging duemore » to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). Finally, when analyzing empirical data (wholegenome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.« less
Elhassan, Nuha; Gebremeskel, Eyoab Iyasu; Elnour, Mohamed Ali; Isabirye, Dan; Okello, John; Hussien, Ayman; Kwiatksowski, Dominic; Hirbo, Jibril; Tishkoff, Sara; Ibrahim, Muntaser E
2014-01-01
Human genetic variation particularly in Africa is still poorly understood. This is despite a consensus on the large African effective population size compared to populations from other continents. Based on sequencing of the mitochondrial Cytochrome C Oxidase subunit II (MT-CO2), and genome wide microsatellite data we observe evidence suggesting the effective size (Ne) of humans to be larger than the current estimates, with a foci of increased genetic diversity in east Africa, and a population size of east Africans being at least 2-6 fold larger than other populations. Both phylogenetic and network analysis indicate that east Africans possess more ancestral lineages in comparison to various continental populations placing them at the root of the human evolutionary tree. Our results also affirm east Africa as the likely spot from which migration towards Asia has taken place. The study reflects the spectacular level of sequence variation within east Africans in comparison to the global sample, and appeals for further studies that may contribute towards filling the existing gaps in the database. The implication of these data to current genomic research, as well as the need to carry out defined studies of human genetic variation that includes more African populations; particularly east Africans is paramount.
National Databases for Neurosurgical Outcomes Research: Options, Strengths, and Limitations.
Karhade, Aditya V; Larsen, Alexandra M G; Cote, David J; Dubois, Heloise M; Smith, Timothy R
2017-08-05
Quality improvement, value-based care delivery, and personalized patient care depend on robust clinical, financial, and demographic data streams of neurosurgical outcomes. The neurosurgical literature lacks a comprehensive review of large national databases. To assess the strengths and limitations of various resources for outcomes research in neurosurgery. A review of the literature was conducted to identify surgical outcomes studies using national data sets. The databases were assessed for the availability of patient demographics and clinical variables, longitudinal follow-up of patients, strengths, and limitations. The number of unique patients contained within each data set ranged from thousands (Quality Outcomes Database [QOD]) to hundreds of millions (MarketScan). Databases with both clinical and financial data included PearlDiver, Premier Healthcare Database, Vizient Clinical Data Base and Resource Manager, and the National Inpatient Sample. Outcomes collected by databases included patient-reported outcomes (QOD); 30-day morbidity, readmissions, and reoperations (National Surgical Quality Improvement Program); and disease incidence and disease-specific survival (Surveillance, Epidemiology, and End Results-Medicare). The strengths of large databases included large numbers of rare pathologies and multi-institutional nationally representative sampling; the limitations of these databases included variable data veracity, variable data completeness, and missing disease-specific variables. The improvement of existing large national databases and the establishment of new registries will be crucial to the future of neurosurgical outcomes research. Copyright © 2017 by the Congress of Neurological Surgeons
Application of kernel functions for accurate similarity search in large chemical databases.
Wang, Xiaohong; Huan, Jun; Smalter, Aaron; Lushington, Gerald H
2010-04-29
Similarity search in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases. To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep. Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases.
Reflections on CD-ROM: Bridging the Gap between Technology and Purpose.
ERIC Educational Resources Information Center
Saviers, Shannon Smith
1987-01-01
Provides a technological overview of CD-ROM (Compact Disc-Read Only Memory), an optically-based medium for data storage offering large storage capacity, computer-based delivery system, read-only medium, and economic mass production. CD-ROM database attributes appropriate for information delivery are also reviewed, including large database size,…
Average probability that a "cold hit" in a DNA database search results in an erroneous attribution.
Song, Yun S; Patil, Anand; Murphy, Erin E; Slatkin, Montgomery
2009-01-01
We consider a hypothetical series of cases in which the DNA profile of a crime-scene sample is found to match a known profile in a DNA database (i.e., a "cold hit"), resulting in the identification of a suspect based only on genetic evidence. We show that the average probability that there is another person in the population whose profile matches the crime-scene sample but who is not in the database is approximately 2(N - d)p(A), where N is the number of individuals in the population, d is the number of profiles in the database, and p(A) is the average match probability (AMP) for the population. The AMP is estimated by computing the average of the probabilities that two individuals in the population have the same profile. We show further that if a priori each individual in the population is equally likely to have left the crime-scene sample, then the average probability that the database search attributes the crime-scene sample to a wrong person is (N - d)p(A).
Brudey, Karine; Driscoll, Jeffrey R; Rigouts, Leen; Prodinger, Wolfgang M; Gori, Andrea; Al-Hajoj, Sahal A; Allix, Caroline; Aristimuño, Liselotte; Arora, Jyoti; Baumanis, Viesturs; Binder, Lothar; Cafrune, Patricia; Cataldi, Angel; Cheong, Soonfatt; Diel, Roland; Ellermeier, Christopher; Evans, Jason T; Fauville-Dufaux, Maryse; Ferdinand, Séverine; de Viedma, Dario Garcia; Garzelli, Carlo; Gazzola, Lidia; Gomes, Harrison M; Guttierez, M Cristina; Hawkey, Peter M; van Helden, Paul D; Kadival, Gurujaj V; Kreiswirth, Barry N; Kremer, Kristin; Kubin, Milan; Kulkarni, Savita P; Liens, Benjamin; Lillebaek, Troels; Ly, Ho Minh; Martin, Carlos; Martin, Christian; Mokrousov, Igor; Narvskaïa, Olga; Ngeow, Yun Fong; Naumann, Ludmilla; Niemann, Stefan; Parwati, Ida; Rahim, Zeaur; Rasolofo-Razanamparany, Voahangy; Rasolonavalona, Tiana; Rossetti, M Lucia; Rüsch-Gerdes, Sabine; Sajduda, Anna; Samper, Sofia; Shemyakin, Igor G; Singh, Urvashi B; Somoskovi, Akos; Skuce, Robin A; van Soolingen, Dick; Streicher, Elisabeth M; Suffys, Philip N; Tortoli, Enrico; Tracevska, Tatjana; Vincent, Véronique; Victor, Tommie C; Warren, Robin M; Yap, Sook Fan; Zaman, Khadiza; Portaels, Françoise; Rastogi, Nalin; Sola, Christophe
2006-01-01
Background The Direct Repeat locus of the Mycobacterium tuberculosis complex (MTC) is a member of the CRISPR (Clustered regularly interspaced short palindromic repeats) sequences family. Spoligotyping is the widely used PCR-based reverse-hybridization blotting technique that assays the genetic diversity of this locus and is useful both for clinical laboratory, molecular epidemiology, evolutionary and population genetics. It is easy, robust, cheap, and produces highly diverse portable numerical results, as the result of the combination of (1) Unique Events Polymorphism (UEP) (2) Insertion-Sequence-mediated genetic recombination. Genetic convergence, although rare, was also previously demonstrated. Three previous international spoligotype databases had partly revealed the global and local geographical structures of MTC bacilli populations, however, there was a need for the release of a new, more representative and extended, international spoligotyping database. Results The fourth international spoligotyping database, SpolDB4, describes 1939 shared-types (STs) representative of a total of 39,295 strains from 122 countries, which are tentatively classified into 62 clades/lineages using a mixed expert-based and bioinformatical approach. The SpolDB4 update adds 26 new potentially phylogeographically-specific MTC genotype families. It provides a clearer picture of the current MTC genomes diversity as well as on the relationships between the genetic attributes investigated (spoligotypes) and the infra-species classification and evolutionary history of the species. Indeed, an independent Naïve-Bayes mixture-model analysis has validated main of the previous supervised SpolDB3 classification results, confirming the usefulness of both supervised and unsupervised models as an approach to understand MTC population structure. Updated results on the epidemiological status of spoligotypes, as well as genetic prevalence maps on six main lineages are also shown. Our results suggests the existence of fine geographical genetic clines within MTC populations, that could mirror the passed and present Homo sapiens sapiens demographical and mycobacterial co-evolutionary history whose structure could be further reconstructed and modelled, thereby providing a large-scale conceptual framework of the global TB Epidemiologic Network. Conclusion Our results broaden the knowledge of the global phylogeography of the MTC complex. SpolDB4 should be a very useful tool to better define the identity of a given MTC clinical isolate, and to better analyze the links between its current spreading and previous evolutionary history. The building and mining of extended MTC polymorphic genetic databases is in progress. PMID:16519816
The clinical value of large neuroimaging data sets in Alzheimer's disease.
Toga, Arthur W
2012-02-01
Rapid advances in neuroimaging and cyberinfrastructure technologies have brought explosive growth in the Web-based warehousing, availability, and accessibility of imaging data on a variety of neurodegenerative and neuropsychiatric disorders and conditions. There has been a prolific development and emergence of complex computational infrastructures that serve as repositories of databases and provide critical functionalities such as sophisticated image analysis algorithm pipelines and powerful three-dimensional visualization and statistical tools. The statistical and operational advantages of collaborative, distributed team science in the form of multisite consortia push this approach in a diverse range of population-based investigations. Copyright © 2012 Elsevier Inc. All rights reserved.
Feline mitochondrial DNA sampling for forensic analysis: when enough is enough!
Grahn, Robert A; Alhaddad, Hasan; Alves, Paulo C; Randi, Ettore; Waly, Nashwa E; Lyons, Leslie A
2015-05-01
Pet hair has a demonstrated value in resolving legal issues. Cat hair is chronically shed and it is difficult to leave a home with cats without some level of secondary transfer. The power of cat hair as an evidentiary resource may be underused because representative genetic databases are not available for exclusionary purposes. Mitochondrial control region databases are highly valuable for hair analyses and have been developed for the cat. In a representative worldwide data set, 83% of domestic cat mitotypes belong to one of twelve major types. Of the remaining 17%, 7.5% are unique within the published 1394 sample database. The current research evaluates the sample size necessary to establish a representative population for forensic comparison of the mitochondrial control region for the domestic cat. For most worldwide populations, randomly sampling 50 unrelated local individuals will achieve saturation at 95%. The 99% saturation is achieved by randomly sampling 60-170 cats, depending on the numbers of mitotypes available in the population at large. Likely due to the recent domestication of the cat and minimal localized population substructure, fewer cats are needed to meet mitochondria DNA control region database practical saturation than for humans or dogs. Coupled with the available worldwide feline control region database of nearly 1400 cats, minimal local sampling will be required to establish an appropriate comparative representative database and achieve significant exclusionary power. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
de Hoyos-Alonso, M C; Bonis, J; Tapias-Merino, E; Castell, M V; Otero, A
2016-01-01
The progressive rise in dementia prevalence increases the need for rapid methods that complement population-based prevalence studies. To estimate the prevalence of dementia in the population aged 65 and older based on use of cholinesterase inhibitors and memantine. Descriptive study of use and prescription of cholinesterase inhibitors and/or memantine in 2011 according to 2 databases: Farm@drid (pharmacy billing records for the Region of Madrid) and BIFAP (database for pharmacoepidemiology research in primary care, with diagnosis and prescription records). We tested the comparability of drug use results from each database using the chi-square test and prevalence ratios. The prevalence of dementia in Madrid was estimated based on the dose per 100 inhabitants/day, adjusting the result for data obtained from BIFAP on combination treatment in the general population (0.37%) and the percentage of dementia patients undergoing treatment (41.13%). Cholinesterase inhibitors and memantine were taken by 2.08% and 0.72% of Madrid residents aged 65 and older was respectively. Both databases displayed similar results for use of these drugs. The estimated prevalence of dementia in individuals aged 65 and older is 5.91% (95% CI%, 5.85-5.95) (52 287 people), and it is higher in women (7.16%) than in men (4.00%). The estimated prevalence of dementia is similar to that found in population-based studies. Analysing consumption of specific dementia drugs can be a reliable and inexpensive means of updating prevalence data periodically and helping rationalise healthcare resources. Copyright © 2014 Sociedad Española de Neurología. Published by Elsevier España, S.L.U. All rights reserved.
Wang, Shirley V; Schneeweiss, Sebastian; Berger, Marc L; Brown, Jeffrey; de Vries, Frank; Douglas, Ian; Gagne, Joshua J; Gini, Rosa; Klungel, Olaf; Mullins, C Daniel; Nguyen, Michael D; Rassen, Jeremy A; Smeeth, Liam; Sturkenboom, Miriam
2017-09-01
Defining a study population and creating an analytic dataset from longitudinal healthcare databases involves many decisions. Our objective was to catalogue scientific decisions underpinning study execution that should be reported to facilitate replication and enable assessment of validity of studies conducted in large healthcare databases. We reviewed key investigator decisions required to operate a sample of macros and software tools designed to create and analyze analytic cohorts from longitudinal streams of healthcare data. A panel of academic, regulatory, and industry experts in healthcare database analytics discussed and added to this list. Evidence generated from large healthcare encounter and reimbursement databases is increasingly being sought by decision-makers. Varied terminology is used around the world for the same concepts. Agreeing on terminology and which parameters from a large catalogue are the most essential to report for replicable research would improve transparency and facilitate assessment of validity. At a minimum, reporting for a database study should provide clarity regarding operational definitions for key temporal anchors and their relation to each other when creating the analytic dataset, accompanied by an attrition table and a design diagram. A substantial improvement in reproducibility, rigor and confidence in real world evidence generated from healthcare databases could be achieved with greater transparency about operational study parameters used to create analytic datasets from longitudinal healthcare databases. © 2017 The Authors. Pharmacoepidemiology & Drug Safety Published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Indrayana, I. N. E.; P, N. M. Wirasyanti D.; Sudiartha, I. KG
2018-01-01
Mobile application allow many users to access data from the application without being limited to space, space and time. Over time the data population of this application will increase. Data access time will cause problems if the data record has reached tens of thousands to millions of records.The objective of this research is to maintain the performance of data execution for large data records. One effort to maintain data access time performance is to apply query optimization method. The optimization used in this research is query heuristic optimization method. The built application is a mobile-based financial application using MySQL database with stored procedure therein. This application is used by more than one business entity in one database, thus enabling rapid data growth. In this stored procedure there is an optimized query using heuristic method. Query optimization is performed on a “Select” query that involves more than one table with multiple clausa. Evaluation is done by calculating the average access time using optimized and unoptimized queries. Access time calculation is also performed on the increase of population data in the database. The evaluation results shown the time of data execution with query heuristic optimization relatively faster than data execution time without using query optimization.
Prevalence rates for depression by industry: a claims database analysis
Alterman, Toni; Bushnell, P. Timothy; Li, Jia; Shen, Rui
2015-01-01
Purpose To estimate and interpret differences in depression prevalence rates among industries, using a large, group medical claims database. Methods Depression cases were identified by ICD-9 diagnosis code in a population of 214,413 individuals employed during 2002–2005 by employers based in western Pennsylvania. Data were provided by Highmark, Inc. (Pittsburgh and Camp Hill, PA). Rates were adjusted for age, gender, and employee share of health care costs. National industry measures of psychological distress, work stress, and physical activity at work were also compiled from other data sources. Results Rates for clinical depression in 55 industries ranged from 6.9 to 16.2 %, (population rate = 10.45 %). Industries with the highest rates tended to be those which, on the national level, require frequent or difficult interactions with the public or clients, and have high levels of stress and low levels of physical activity. Conclusions Additional research is needed to help identify industries with relatively high rates of depression in other regions and on the national level, and to determine whether these differences are due in part to specific work stress exposures and physical inactivity at work. Clinical significance Claims database analyses may provide a cost-effective way to identify priorities for depression treatment and prevention in the workplace. PMID:24907896
Yoo, Danny; Xu, Iris; Berardini, Tanya Z; Rhee, Seung Yon; Narayanasamy, Vijay; Twigger, Simon
2006-03-01
For most systems in biology, a large body of literature exists that describes the complexity of the system based on experimental results. Manual review of this literature to extract targeted information into biological databases is difficult and time consuming. To address this problem, we developed PubSearch and PubFetch, which store literature, keyword, and gene information in a relational database, index the literature with keywords and gene names, and provide a Web user interface for annotating the genes from experimental data found in the associated literature. A set of protocols is provided in this unit for installing, populating, running, and using PubSearch and PubFetch. In addition, we provide support protocols for performing controlled vocabulary annotations. Intended users of PubSearch and PubFetch are database curators and biology researchers interested in tracking the literature and capturing information about genes of interest in a more effective way than with conventional spreadsheets and lab notebooks.
Jekova, Irena; Krasteva, Vessela; Schmid, Ramun
2018-01-27
Human identification (ID) is a biometric task, comparing single input sample to many stored templates to identify an individual in a reference database. This paper aims to present the perspectives of personalized heartbeat pattern for reliable ECG-based identification. The investigations are using a database with 460 pairs of 12-lead resting electrocardiograms (ECG) with 10-s durations recorded at time-instants T1 and T2 > T1 + 1 year. Intra-subject long-term ECG stability and inter-subject variability of personalized PQRST (500 ms) and QRS (100 ms) patterns is quantified via cross-correlation, amplitude ratio and pattern matching between T1 and T2 using 7 features × 12-leads. Single and multi-lead ID models are trained on the first 230 ECG pairs. Their validation on 10, 20, ... 230 reference subjects (RS) from the remaining 230 ECG pairs shows: (i) two best single-lead ID models using lead II for a small population RS = (10-140) with identification accuracy AccID = (89.4-67.2)% and aVF for a large population RS = (140-230) with AccID = (67.2-63.9)%; (ii) better performance of the 6-lead limb vs. the 6-lead chest ID model-(91.4-76.1)% vs. (90.9-70)% for RS = (10-230); (iii) best performance of the 12-lead ID model-(98.4-87.4)% for RS = (10-230). The tolerable reference database size, keeping AccID > 80%, is RS = 30 in the single-lead ID scenario (II); RS = 50 (6 chest leads); RS = 100 (6 limb leads), RS > 230-maximal population in this study (12-lead ECG).
López-Lacort, Mónica; Orrico-Sánchez, Alejandro; Díez-Domingo, Javier
2018-01-01
ABSTRACT The objective of the study was to evaluate the role of age and sex and their combined effect in the development of post-herpetic neuralgia (PHN) in a large population-based study, in order to confirm the results published previously by Amicizia et al. Data were extracted from population and healthcare databases from the Valencia Region (2009–2014). Logistic regressions were implemented to estimate the effect of increasing age on the probability of developing PHN stratified by sex. From a cohort of 2,289,485 subjects ≥ 50 years, 87,086 cases of HZ were registered and 13,658 (15.7%) of them developed PHN. In our population, PHN cases were more common in women and rose with increasing age independently of the sex. PMID:29244612
Pollard, Richard J; Hopkins, Thomas; Smith, C Tyler; May, Bryan V; Doyle, James; Chambers, C Labron; Clark, Reese; Buhrman, William
2018-05-21
Perianesthetic mortality (death occurring within 48 hours of an anesthetic) continues to vary widely depending on the study population examined. The authors study in a private practice physician group that covers multiple anesthetizing locations in the Southeastern United States. This group has in place a robust quality assurance (QA) database to follow all patients undergoing anesthesia. With this study, we estimate the incidence of anesthesia-related and perianesthetic mortality in this QA database. Following institutional review board approval, data from 2011 to 2016 were obtained from the QA database of a large, community-based anesthesiology group practice. The physician practice covers 233 anesthetizing locations across 20 facilities in 2 US states. All detected cases of perianesthetic death were extracted from the database and compared to the patients' electronic medical record. These cases were further examined by a committee of 3 anesthesiologists to determine whether the death was anesthesia related (a perioperative death solely attributable to either the anesthesia provider or anesthetic technique), anesthetic contributory (a perioperative death in which anesthesia role could not be entirely excluded), or not due to anesthesia. A total of 785,467 anesthesia procedures were examined from the study period. A total of 592 cases of perianesthetic deaths were detected, giving an overall death rate of 75.37 in 100,000 cases (95% CI, 69.5-81.7). Mortality judged to be anesthesia related was found in 4 cases, giving a mortality rate of 0.509 in 100,000 (95% CI, 0.198-1.31). Mortality judged to be anesthesia contributory were found in 18 cases, giving a mortality of 2.29 in 100,000 patients (95% CI, 1.45-3.7). A total of 570 cases were judged to be nonanesthesia related, giving an incidence of 72.6 per 100,000 anesthetics (95% CI, 69.3-75.7). In a large, comprehensive database representing the full range of anesthesia practices and locations in the Southeastern United States, the rate of perianesthestic death was 0.509 in 100,000 (95% CI, 0.198-1.31). Future in-depth analysis of the epidemiology of perianesthetic deaths will be reported in later studies.
PedNavigator: a pedigree drawing servlet for large and inbred populations.
Mancosu, Gianmaria; Ledda, Giuseppe; Melis, Paola M
2003-03-22
PedNavigator is a pedigree drawing application for large and complex pedigrees. It has been developed especially for genetic and epidemiological studies of isolated populations characterized by high inbreeding and multiple matrimonies. PedNavigator is written in Java and is intended as a server-side web application, allowing researchers to 'walk' through family ties by point-and-clicking on person's symbols. The application is able to enrich the pedigree drawings with genotypic and phenotypic information taken from the underlying relational database.
Newgard, Craig; Malveau, Susan; Staudenmayer, Kristan; Wang, N. Ewen; Hsia, Renee Y.; Mann, N. Clay; Holmes, James F.; Kuppermann, Nathan; Haukoos, Jason S.; Bulger, Eileen M.; Dai, Mengtao; Cook, Lawrence J.
2012-01-01
Objectives The objective was to evaluate the process of using existing data sources, probabilistic linkage, and multiple imputation to create large population-based injury databases matched to outcomes. Methods This was a retrospective cohort study of injured children and adults transported by 94 emergency medical systems (EMS) agencies to 122 hospitals in seven regions of the western United States over a 36-month period (2006 to 2008). All injured patients evaluated by EMS personnel within specific geographic catchment areas were included, regardless of field disposition or outcome. The authors performed probabilistic linkage of EMS records to four hospital and postdischarge data sources (emergency department [ED] data, patient discharge data, trauma registries, and vital statistics files) and then handled missing values using multiple imputation. The authors compare and evaluate matched records, match rates (proportion of matches among eligible patients), and injury outcomes within and across sites. Results There were 381,719 injured patients evaluated by EMS personnel in the seven regions. Among transported patients, match rates ranged from 14.9% to 87.5% and were directly affected by the availability of hospital data sources and proportion of missing values for key linkage variables. For vital statistics records (1-year mortality), estimated match rates ranged from 88.0% to 98.7%. Use of multiple imputation (compared to complete case analysis) reduced bias for injury outcomes, although sample size, percentage missing, type of variable, and combined-site versus single-site imputation models all affected the resulting estimates and variance. Conclusions This project demonstrates the feasibility and describes the process of constructing population-based injury databases across multiple phases of care using existing data sources and commonly available analytic methods. Attention to key linkage variables and decisions for handling missing values can be used to increase match rates between data sources, minimize bias, and preserve sampling design. PMID:22506952
Population Dynamics of Early Human Migration in Britain
Vahia, Mayank N.; Ladiwala, Uma; Mahathe, Pavan; Mathur, Deepak
2016-01-01
Background Early human migration is largely determined by geography and human needs. These are both deterministic parameters when small populations move into unoccupied areas where conflicts and large group dynamics are not important. The early period of human migration into the British Isles provides such a laboratory which, because of its relative geographical isolation, may allow some insights into the complex dynamics of early human migration and interaction. Method and Results We developed a simulation code based on human affinity to habitable land, as defined by availability of water sources, altitude, and flatness of land, in choosing the path of migration. Movement of people on the British island over the prehistoric period from their initial entry points was simulated on the basis of data from the megalithic period. Topographical and hydro-shed data from satellite databases was used to define habitability, based on distance from water bodies, flatness of the terrain, and altitude above sea level. We simulated population movement based on assumptions of affinity for more habitable places, with the rate of movement tempered by existing populations. We compared results of our computer simulations with genetic data and show that our simulation can predict fairly accurately the points of contacts between different migratory paths. Such comparison also provides more detailed information about the path of peoples’ movement over ~2000 years before the present era. Conclusions We demonstrate an accurate method to simulate prehistoric movements of people based upon current topographical satellite data. Our findings are validated by recently-available genetic data. Our method may prove useful in determining early human population dynamics even when no genetic information is available. PMID:27148959
Child mental health differences amongst ethnic groups in Britain: a systematic review
Goodman, Anna; Patel, Vikram; Leon, David A
2008-01-01
Background Inter-ethnic differences have been reported for many mental health outcomes in the UK, but no systematic review on child mental health has been published. The aim of this review is to compare the population-based prevalence of child mental disorders between ethnic groups in Britain, and relate these findings to ethnic differences in mental health service use. Methods A systematic search of bibliographic databases for population-based and clinic-based studies of children aged 0–19, including all ethnic groups and the main child mental disorders. We synthesised findings by comparing each minority group to the White British study sample. Results 31 population-based and 18 clinic-based studies met the inclusion criteria. Children in the main minority groups have similar or better mental health than White British children for common disorders, but may have higher rates for some less common conditions. The causes of these differences are unclear. There may be unmet need for services among Pakistani and Bangladeshi children. Conclusion Inter-ethnic differences exist but are largely unexplained. Future studies should address the challenges of cross-cultural psychiatry and investigate reasons for inter-ethnic differences. PMID:18655701
[New population curves in spanish extremely preterm neonates].
García-Muñoz Rodrigo, F; García-Alix Pérez, A; Figueras Aloy, J; Saavedra Santana, P
2014-08-01
Most anthropometric reference data for extremely preterm infants used in Spain are outdated and based on non-Spanish populations, or are derived from small hospital-based samples that failed to include neonates of borderline viability. To develop gender-specific, population-based curves for birth weight, length, and head circumference in extremely preterm Caucasian infants, using a large contemporary sample size of Spanish singletons. Anthropometric data from neonates ≤ 28 weeks of gestational age were collected between January 2002 and December 2010 using the Spanish database SEN1500. Gestational age was estimated according to obstetric data (early pregnancy ultrasound). The data were analyzed with the SPSS.20 package, and centile tables were created for males and females using the Cole and Green LMS method. This study presents the first population-based growth curves for extremely preterm infants, including those of borderline viability, in Spain. A sexual dimorphism is evident for all of the studied parameters, starting at early gestation. These new gender-specific and population-based data could be useful for the improvement of growth assessments of extremely preterm infants in our country, for the development of epidemiological studies, for the evaluation of temporal trends, and for clinical or public health interventions seeking to optimize fetal growth. Copyright © 2013 Asociación Española de Pediatría. Published by Elsevier Espana. All rights reserved.
Indian genetic disease database
Pradhan, Sanchari; Sengupta, Mainak; Dutta, Anirban; Bhattacharyya, Kausik; Bag, Sumit K.; Dutta, Chitra; Ray, Kunal
2011-01-01
Indians, representing about one-sixth of the world population, consist of several thousands of endogamous groups with strong potential for excess of recessive diseases. However, no database is available on Indian population with comprehensive information on the diseases common in the country. To address this issue, we present Indian Genetic Disease Database (IGDD) release 1.0 (http://www.igdd.iicb.res.in)—an integrated and curated repository of growing number of mutation data on common genetic diseases afflicting the Indian populations. Currently the database covers 52 diseases with information on 5760 individuals carrying the mutant alleles of causal genes. Information on locus heterogeneity, type of mutation, clinical and biochemical data, geographical location and common mutations are furnished based on published literature. The database is currently designed to work best with Internet Explorer 8 (optimal resolution 1440 × 900) and it can be searched based on disease of interest, causal gene, type of mutation and geographical location of the patients or carriers. Provisions have been made for deposition of new data and logistics for regular updation of the database. The IGDD web portal, planned to be made freely available, contains user-friendly interfaces and is expected to be highly useful to the geneticists, clinicians, biologists and patient support groups of various genetic diseases. PMID:21037256
Familial aggregation of age-related macular degeneration in the Utah population.
Luo, Ling; Harmon, Jennifer; Yang, Xian; Chen, Haoyu; Patel, Shrena; Mineau, Geraldine; Yang, Zhenglin; Constantine, Ryan; Buehler, Jeanette; Kaminoh, Yuuki; Ma, Xiang; Wong, Tien Y; Zhang, Maonian; Zhang, Kang
2008-02-01
We examined familial aggregation and risk of age-related macular degeneration in the Utah population using a population-based case-control study. Over one million unique patient records were searched within the University of Utah Health Sciences Center and the Utah Population Database (UPDB), identifying 4764 patients with AMD. Specialized kinship analysis software was used to test for familial aggregation of disease, estimate the magnitude of familial risks, and identify families at high risk for disease. The population-attributable risk (PAR) for AMD was calculated to be 0.34. Recurrence risks in relatives indicate increased relative risks in siblings (2.95), first cousins (1.29), second cousins (1.13), and parents (5.66) of affected cases. There were 16 extended large families with AMD identified for potential use in genetic studies. Each family had five or more living affected members. The familial aggregation of AMD shown in this study exemplifies the merit of the UPDB and supports recent research demonstrating significant genetic contribution to disease development and progression.
Record linkage for pharmacoepidemiological studies in cancer patients.
Herk-Sukel, Myrthe P P van; Lemmens, Valery E P P; Poll-Franse, Lonneke V van de; Herings, Ron M C; Coebergh, Jan Willem W
2012-01-01
An increasing need has developed for the post-approval surveillance of (new) anti-cancer drugs by means of pharmacoepidemiology and outcomes research in the area of oncology. To create an overview that makes researchers aware of the available database linkages in Northern America and Europe which facilitate pharmacoepidemiology and outcomes research in cancer patients. In addition to our own database, i.e. the Eindhoven Cancer Registry (ECR) linked to the PHARMO Record Linkage System, we considered database linkages between a population-based cancer registry and an administrative healthcare database that at least contains information on drug use and offers a longitudinal perspective on healthcare utilization. Eligible database linkages were limited to those that had been used in multiple published articles in English language included in Pubmed. The HMO Cancer Research Network (CRN) in the US was excluded from this review, as an overview of the linked databases participating in the CRN is already provided elsewhere. Researchers who had worked with the data resources included in our review were contacted for additional information and verification of the data presented in the overview. The following database linkages were included: the Surveillance, Epidemiology, and End-Results-Medicare; cancer registry data linked to Medicaid; Canadian cancer registries linked to population-based drug databases; the Scottish cancer registry linked to the Tayside drug dispensing data; linked databases in the Nordic Countries of Europe: Norway, Sweden, Finland and Denmark; and the ECR-PHARMO linkage in the Netherlands. Descriptives of the included database linkages comprise population size, generalizability of the population, year of first data availability, contents of the cancer registry, contents of the administrative healthcare database, the possibility to select a cancer-free control cohort, and linkage to other healthcare databases. The linked databases offer a longitudinal perspective, allowing for observations of health care utilization before, during, and after cancer diagnosis. They create new powerful data resources for the monitoring of post-approval drug utilization, as well as a framework to explore the (cost-)effectiveness of new, often expensive, anti-cancer drugs as used in everyday practice. Copyright © 2011 John Wiley & Sons, Ltd.
DaVIE: Database for the Visualization and Integration of Epigenetic data
Fejes, Anthony P.; Jones, Meaghan J.; Kobor, Michael S.
2014-01-01
One of the challenges in the analysis of large data sets, particularly in a population-based setting, is the ability to perform comparisons across projects. This has to be done in such a way that the integrity of each individual project is maintained, while ensuring that the data are comparable across projects. These issues are beginning to be observed in human DNA methylation studies, as the Illumina 450k platform and next generation sequencing-based assays grow in popularity and decrease in price. This increase in productivity is enabling new insights into epigenetics, but also requires the development of pipelines and software capable of handling the large volumes of data. The specific problems inherent in creating a platform for the storage, comparison, integration, and visualization of DNA methylation data include data storage, algorithm efficiency and ability to interpret the results to derive biological meaning from them. Databases provide a ready-made solution to these issues, but as yet no tools exist that that leverage these advantages while providing an intuitive user interface for interpreting results in a genomic context. We have addressed this void by integrating a database to store DNA methylation data with a web interface to query and visualize the database and a set of libraries for more complex analysis. The resulting platform is called DaVIE: Database for the Visualization and Integration of Epigenetics data. DaVIE can use data culled from a variety of sources, and the web interface includes the ability to group samples by sub-type, compare multiple projects and visualize genomic features in relation to sites of interest. We have used DaVIE to identify patterns of DNA methylation in specific projects and across different projects, identify outlier samples, and cross-check differentially methylated CpG sites identified in specific projects across large numbers of samples. A demonstration server has been setup using GEO data at http://echelon.cmmt.ubc.ca/dbaccess/, with login “guest” and password “guest.” Groups may download and install their own version of the server following the instructions on the project's wiki. PMID:25278960
The 2010-2015 Prevalence of Eosinophilic Esophagitis in the USA: A Population-Based Study.
Mansoor, Emad; Cooper, Gregory S
2016-10-01
Eosinophilic esophagitis (EoE) is a chronic inflammatory disorder with increasing prevalence. However, epidemiologic data have mostly been acquired from small studies. We sought to describe the epidemiology of EoE in the USA, utilizing a large database. We queried a commercial database (Explorys Inc, Cleveland, OH, USA), an aggregate of electronic health record data from 26 major integrated US healthcare systems from 1999 to July 2015. We identified an aggregated patient cohort of eligible patients with EoE and a history of proton-pump inhibitor use between July 2010 and July 2015, based on Systematized Nomenclature of Medicine-Clinical Terms. We calculated the prevalence of EoE among different patient groups. Of the 30,301,440 individuals in the database, we identified 7840 patients with EoE with an overall prevalence of 25.9/100,000 persons. Prevalence was higher in males than females [odds ratio (OR) 2.00; 95 % CI 1.92-2.10, p < 0.0001], Caucasians versus African-Americans and Asians (OR 2.00; 95 % CI 1.86-2.14, p < 0.0001), and adults (18-65 years) versus elderly (>65 years) and children (<18 years) (OR 1.63; 95 % CI 1.54-1.71, p < 0.0001). Compared with controls (individuals in database without EoE), individuals with EoE were more likely to have other gastrointestinal diagnoses such as dysphagia and at least one allergic condition. In this large study, we found that the estimated prevalence of EoE in the USA is 25.9/100,000, which is at the lower end of prevalence rates reported in the USA and other industrial countries. We confirmed that EoE has a strong association with allergic and gastrointestinal diagnoses.
Development of forensic-quality full mtGenome haplotypes: success rates with low template specimens.
Just, Rebecca S; Scheible, Melissa K; Fast, Spence A; Sturk-Andreaggi, Kimberly; Higginbotham, Jennifer L; Lyons, Elizabeth A; Bush, Jocelyn M; Peck, Michelle A; Ring, Joseph D; Diegoli, Toni M; Röck, Alexander W; Huber, Gabriela E; Nagl, Simone; Strobl, Christina; Zimmermann, Bettina; Parson, Walther; Irwin, Jodi A
2014-05-01
Forensic mitochondrial DNA (mtDNA) testing requires appropriate, high quality reference population data for estimating the rarity of questioned haplotypes and, in turn, the strength of the mtDNA evidence. Available reference databases (SWGDAM, EMPOP) currently include information from the mtDNA control region; however, novel methods that quickly and easily recover mtDNA coding region data are becoming increasingly available. Though these assays promise to both facilitate the acquisition of mitochondrial genome (mtGenome) data and maximize the general utility of mtDNA testing in forensics, the appropriate reference data and database tools required for their routine application in forensic casework are lacking. To address this deficiency, we have undertaken an effort to: (1) increase the large-scale availability of high-quality entire mtGenome reference population data, and (2) improve the information technology infrastructure required to access/search mtGenome data and employ them in forensic casework. Here, we describe the application of a data generation and analysis workflow to the development of more than 400 complete, forensic-quality mtGenomes from low DNA quantity blood serum specimens as part of a U.S. National Institute of Justice funded reference population databasing initiative. We discuss the minor modifications made to a published mtGenome Sanger sequencing protocol to maintain a high rate of throughput while minimizing manual reprocessing with these low template samples. The successful use of this semi-automated strategy on forensic-like samples provides practical insight into the feasibility of producing complete mtGenome data in a routine casework environment, and demonstrates that large (>2kb) mtDNA fragments can regularly be recovered from high quality but very low DNA quantity specimens. Further, the detailed empirical data we provide on the amplification success rates across a range of DNA input quantities will be useful moving forward as PCR-based strategies for mtDNA enrichment are considered for targeted next-generation sequencing workflows. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Use of medical administrative data for the surveillance of psychotic disorders in France.
Chan Chee, Christine; Chin, Francis; Ha, Catherine; Beltzer, Nathalie; Bonaldi, Christophe
2017-12-04
Psychotic disorders are among the most severe psychiatric disorders that have great effects on the individuals and the society. For surveillance of chronic low prevalence conditions such as psychotic disorders, medical administrative databases can be useful due to their large coverage of the population, their continuous availability and low costs with possibility of linkage between different databases. The aims of this study are to identify the population with psychotic disorders by different algorithms based on the French medical administrative data and examine the prevalence and characteristics of this population in 2014. The health insurance system covers the entire population living in France and all reimbursements of ambulatory care in private practice are included in a national health insurance claim database, which can be linked with the national hospital discharge databases. Three algorithms were used to select most appropriately persons with psychotic disorders through data from hospital discharge databases, reimbursements for psychotropic medication and full insurance coverage for chronic and costly conditions. In France in 2014, estimates of the number of individuals with psychotic disorders were 469,587 (54.6% males) including 237,808 with schizophrenia (63.6% males). Of those, 77.0% with psychotic disorders and 70.8% with schizophrenia received exclusively ambulatory care. Prevalence rates of psychotic disorders were 7.4 per 1000 inhabitants (8.3 in males and 6.4 in females) and 3.8 per 1000 inhabitants (4.9 in males and 2.6 in females) for schizophrenia. Prevalence of psychotic disorders reached a maximum of 14 per 1000 in males between 35 and 49 years old then decreased with age while in females, the highest rate of 10 per 1000 was reached at age 50 without decrease with advancing age. No such plateau was observed in schizophrenia. This study is the first in France using an exhaustive sample of medical administrative data to derive prevalence rates for psychotic disorders. Although only individuals in contact with healthcare services were included, the rates were congruent with reported estimates from systematic reviews. The feasibility of this study will allow the implementation of a national surveillance of psychotic disorders essential for healthcare management and policy planning.
Del Fiol, Guilherme; Butler, Jorie; Livnat, Yarden; Mayer, Jeanmarie; Samore, Matthew; Jones, Makoto; Weir, Charlene
2016-01-01
Summary Objective Big data or population-based information has the potential to reduce uncertainty in medicine by informing clinicians about individual patient care. The objectives of this study were: 1) to explore the feasibility of extracting and displaying population-based information from an actual clinical population’s database records, 2) to explore specific design features for improving population display, 3) to explore perceptions of population information displays, and 4) to explore the impact of population information display on cognitive outcomes. Methods We used the Veteran’s Affairs (VA) database to identify similar complex patients based on a similar complex patient case. Study outcomes measures were 1) preferences for population information display 2) time looking at the population display, 3) time to read the chart, and 4) appropriateness of plans with pre- and post-presentation of population data. Finally, we redesigned the population information display based on our findings from this study. Results The qualitative data analysis for preferences of population information display resulted in four themes: 1) trusting the big/population data can be an issue, 2) embedded analytics is necessary to explore patient similarities, 3) need for tools to control the view (overview, zoom and filter), and 4) different presentations of the population display can be beneficial to improve the display. We found that appropriateness of plans was at 60% for both groups (t9=-1.9; p=0.08), and overall time looking at the population information display was 2.3 minutes versus 3.6 minutes with experts processing information faster than non-experts (t8= -2.3, p=0.04). Conclusion A population database has great potential for reducing complexity and uncertainty in medicine to improve clinical care. The preferences identified for the population information display will guide future health information technology system designers for better and more intuitive display. PMID:27437065
Salemi, Jason L; Salinas-Miranda, Abraham A; Wilson, Roneé E; Salihu, Hamisu M
2015-01-01
Objective To describe the use of a clinically enhanced maternal and child health (MCH) database to strengthen community-engaged research activities, and to support the sustainability of data infrastructure initiatives. Data Sources/Study Setting Population-based, longitudinal database covering over 2.3 million mother–infant dyads during a 12-year period (1998–2009) in Florida. Setting: A community-based participatory research (CBPR) project in a socioeconomically disadvantaged community in central Tampa, Florida. Study Design Case study of the use of an enhanced state database for supporting CBPR activities. Principal Findings A federal data infrastructure award resulted in the creation of an MCH database in which over 92 percent of all birth certificate records for infants born between 1998 and 2009 were linked to maternal and infant hospital encounter-level data. The population-based, longitudinal database was used to supplement data collected from focus groups and community surveys with epidemiological and health care cost data on important MCH disparity issues in the target community. Data were used to facilitate a community-driven, decision-making process in which the most important priorities for intervention were identified. Conclusions Integrating statewide all-payer, hospital-based databases into CBPR can empower underserved communities with a reliable source of health data, and it can promote the sustainability of newly developed data systems. PMID:25879276
Pearson, Sallie-Anne; Schaffer, Andrea
2014-01-01
Introduction After medicines have been subsidised in Australia we know little about their use in routine clinical practice, impact on resource utilisation, effectiveness or safety. Routinely collected administrative health data are available to address these issues in large population-based pharmacoepidemiological studies. By bringing together cross-jurisdictional data collections that link drug exposure to real-world outcomes, this research programme aims to evaluate the use and impact of cancer medicines in a subset of elderly Australians in the real-world clinical setting. Methods and analysis This ongoing research programme involves a series of retrospective cohort studies of Australian Government Department of Veterans’ Affairs (DVA) clients. The study population includes 104 635 veterans who reside in New South Wales, Australia, and were aged 65 years and over as of 1 July 2004. We will investigate trends in cancer medicines use according to cancer type and other sociodemographic characteristics as well as predictors of the initiation of cancer medicines and other treatment modalities, survival and adverse outcomes among patients with cancer. The programme is underpinned by the linkage of eight health administrative databases under the custodianship of the DVA and the New South Wales Ministry of Health, including cancer notifications, medicines dispensing data, hospitalisation data and health services data. The cancer notifications database is available from 1994 with all other databases available from 2005 onwards. Ethics and dissemination Ethics approval has been granted by the DVA and New South Wales Population and Health Service Research Ethics Committees. Results Results will be reported in peer-reviewed publications, conference presentations and policy forums. The programme has high translational potential, providing invaluable evidence about cancer medicines in an elderly population who are under-represented in clinical trials. PMID:24793244
Application of Large-Scale Database-Based Online Modeling to Plant State Long-Term Estimation
NASA Astrophysics Data System (ADS)
Ogawa, Masatoshi; Ogai, Harutoshi
Recently, attention has been drawn to the local modeling techniques of a new idea called “Just-In-Time (JIT) modeling”. To apply “JIT modeling” to a large amount of database online, “Large-scale database-based Online Modeling (LOM)” has been proposed. LOM is a technique that makes the retrieval of neighboring data more efficient by using both “stepwise selection” and quantization. In order to predict the long-term state of the plant without using future data of manipulated variables, an Extended Sequential Prediction method of LOM (ESP-LOM) has been proposed. In this paper, the LOM and the ESP-LOM are introduced.
The Influence of Solid Rocket Motor Retro-Burns on the Space Debris Environment
NASA Astrophysics Data System (ADS)
Stabroth, S.; Homeister, M.; Oswald, M.; Wiedemann, C.; Klinkrad, H.; Vörsmann, P.
The ESA space debris population model MASTER Meteoroid and Space Debris Terrestrial Environment Reference considers firings of solid rocket motors SRM as a debris source with the associated generation of slag and dust particles The resulting slag and dust population is a major contribution to the sub-millimetre size debris environment in Earth orbit The current model version MASTER-2005 is based on the simulation of 1 076 orbital SRM firings which contributed to the long-term debris environment A comparison of the modelled flux with impact data from returned surfaces shows that the shape and quantity of the modelled SRM dust distribution matches that of recent Hubble Space Telescope HST solar array measurements very well However the absolute flux level for dust is under-predicted for some of the analysed Long Duration Exposure Facility LDEF surfaces This points into the direction of some past SRM firings not included in the current event database The most suitable candidates for these firings are the large number of SRM retro-burns of return capsules Objects released by those firings have highly eccentric orbits with perigees in the lower regions of the atmosphere Thus they produce no long-term effect on the debris environment However a large number of those firings during the on-orbit time frame of LDEF might lead to an increase of the dust population for some of the LDEF surfaces In this paper the influence of SRM retro-burns on the short- and long-term debris environment is analysed The existing firing database is updated with gathered
A Review of Stellar Abundance Databases and the Hypatia Catalog Database
NASA Astrophysics Data System (ADS)
Hinkel, Natalie Rose
2018-01-01
The astronomical community is interested in elements from lithium to thorium, from solar twins to peculiarities of stellar evolution, because they give insight into different regimes of star formation and evolution. However, while some trends between elements and other stellar or planetary properties are well known, many other trends are not as obvious and are a point of conflict. For example, stars that host giant planets are found to be consistently enriched in iron, but the same cannot be definitively said for any other element. Therefore, it is time to take advantage of large stellar abundance databases in order to better understand not only the large-scale patterns, but also the more subtle, small-scale trends within the data.In this overview to the special session, I will present a review of large stellar abundance databases that are both currently available (i.e. RAVE, APOGEE) and those that will soon be online (i.e. Gaia-ESO, GALAH). Additionally, I will discuss the Hypatia Catalog Database (www.hypatiacatalog.com) -- which includes abundances from individual literature sources that observed stars within 150pc. The Hypatia Catalog currently contains 72 elements as measured within ~6000 stars, with a total of ~240,000 unique abundance determinations. The online database offers a variety of solar normalizations, stellar properties, and planetary properties (where applicable) that can all be viewed through multiple interactive plotting interfaces as well as in a tabular format. By analyzing stellar abundances for large populations of stars and from a variety of different perspectives, a wealth of information can be revealed on both large and small scales.
Ali, Zulfiqar; Alsulaiman, Mansour; Muhammad, Ghulam; Elamvazuthi, Irraivan; Al-Nasheri, Ahmed; Mesallam, Tamer A; Farahat, Mohamed; Malki, Khalid H
2017-05-01
A large population around the world has voice complications. Various approaches for subjective and objective evaluations have been suggested in the literature. The subjective approach strongly depends on the experience and area of expertise of a clinician, and human error cannot be neglected. On the other hand, the objective or automatic approach is noninvasive. Automatic developed systems can provide complementary information that may be helpful for a clinician in the early screening of a voice disorder. At the same time, automatic systems can be deployed in remote areas where a general practitioner can use them and may refer the patient to a specialist to avoid complications that may be life threatening. Many automatic systems for disorder detection have been developed by applying different types of conventional speech features such as the linear prediction coefficients, linear prediction cepstral coefficients, and Mel-frequency cepstral coefficients (MFCCs). This study aims to ascertain whether conventional speech features detect voice pathology reliably, and whether they can be correlated with voice quality. To investigate this, an automatic detection system based on MFCC was developed, and three different voice disorder databases were used in this study. The experimental results suggest that the accuracy of the MFCC-based system varies from database to database. The detection rate for the intra-database ranges from 72% to 95%, and that for the inter-database is from 47% to 82%. The results conclude that conventional speech features are not correlated with voice, and hence are not reliable in pathology detection. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Learning about Severe Combined Immunodeficiency (SCID)
... Genomics Regulation of Genetic Tests Statute and Legislation Database Newsroom Calendar of Events Current News Releases Image ... Release: February 22, 2005 X-linked SCID mutation database (IL2RGbase) On Other Sites: Development of population-based ...
Holland, Katherine D; Bouley, Thomas M; Horn, Paul S
2017-07-01
Variants in neuronal voltage-gated sodium channel α-subunits genes SCN1A, SCN2A, and SCN8A are common in early onset epileptic encephalopathies and other autosomal dominant childhood epilepsy syndromes. However, in clinical practice, missense variants are often classified as variants of uncertain significance when missense variants are identified but heritability cannot be determined. Genetic testing reports often include results of computational tests to estimate pathogenicity and the frequency of that variant in population-based databases. The objective of this work was to enhance clinicians' understanding of results by (1) determining how effectively computational algorithms predict epileptogenicity of sodium channel (SCN) missense variants; (2) optimizing their predictive capabilities; and (3) determining if epilepsy-associated SCN variants are present in population-based databases. This will help clinicians better understand the results of indeterminate SCN test results in people with epilepsy. Pathogenic, likely pathogenic, and benign variants in SCNs were identified using databases of sodium channel variants. Benign variants were also identified from population-based databases. Eight algorithms commonly used to predict pathogenicity were compared. In addition, logistic regression was used to determine if a combination of algorithms could better predict pathogenicity. Based on American College of Medical Genetic Criteria, 440 variants were classified as pathogenic or likely pathogenic and 84 were classified as benign or likely benign. Twenty-eight variants previously associated with epilepsy were present in population-based gene databases. The output provided by most computational algorithms had a high sensitivity but low specificity with an accuracy of 0.52-0.77. Accuracy could be improved by adjusting the threshold for pathogenicity. Using this adjustment, the Mendelian Clinically Applicable Pathogenicity (M-CAP) algorithm had an accuracy of 0.90 and a combination of algorithms increased the accuracy to 0.92. Potentially pathogenic variants are present in population-based sources. Most computational algorithms overestimate pathogenicity; however, a weighted combination of several algorithms increased classification accuracy to >0.90. Wiley Periodicals, Inc. © 2017 International League Against Epilepsy.
Olsen, Margaret A; Young-Xu, Yinong; Stwalley, Dustin; Kelly, Ciarán P; Gerding, Dale N; Saeed, Mohammed J; Mahé, Cedric; Dubberke, Erik R
2016-04-22
Many administrative data sources are available to study the epidemiology of infectious diseases, including Clostridium difficile infection (CDI), but few publications have compared CDI event rates across databases using similar methodology. We used comparable methods with multiple administrative databases to compare the incidence of CDI in older and younger persons in the United States. We performed a retrospective study using three longitudinal data sources (Medicare, OptumInsight LabRx, and Healthcare Cost and Utilization Project State Inpatient Database (SID)), and two hospital encounter-level data sources (Nationwide Inpatient Sample (NIS) and Premier Perspective database) to identify CDI in adults aged 18 and older with calculation of CDI incidence rates/100,000 person-years of observation (pyo) and CDI categorization (onset and association). The incidence of CDI ranged from 66/100,000 in persons under 65 years (LabRx), 383/100,000 in elderly persons (SID), and 677/100,000 in elderly persons (Medicare). Ninety percent of CDI episodes in the LabRx population were characterized as community-onset compared to 41 % in the Medicare population. The majority of CDI episodes in the Medicare and LabRx databases were identified based on only a CDI diagnosis, whereas almost ¾ of encounters coded for CDI in the Premier hospital data were confirmed with a positive test result plus treatment with metronidazole or oral vancomycin. Using only the Medicare inpatient data to calculate encounter-level CDI events resulted in 553 CDI events/100,000 persons, virtually the same as the encounter proportion calculated using the NIS (544/100,000 persons). We found that the incidence of CDI was 35 % higher in the Medicare data and fewer episodes were attributed to hospital acquisition when all medical claims were used to identify CDI, compared to only inpatient data lacking information on diagnosis and treatment in the outpatient setting. The incidence of CDI was 10-fold lower and the proportion of community-onset CDI was much higher in the privately insured younger LabRx population compared to the elderly Medicare population. The methods we developed to identify incident CDI can be used by other investigators to study the incidence of other infectious diseases and adverse events using large generalizable administrative datasets.
NASA Astrophysics Data System (ADS)
Kutzleb, C. D.
1997-02-01
The high incidence of recidivism (repeat offenders) in the criminal population makes the use of the IAFIS III/FBI criminal database an important tool in law enforcement. The problems and solutions employed by IAFIS III/FBI criminal subject searches are discussed for the following topics: (1) subject search selectivity and reliability; (2) the difficulty and limitations of identifying subjects whose anonymity may be a prime objective; (3) database size, search workload, and search response time; (4) techniques and advantages of normalizing the variability in an individual's name and identifying features into identifiable and discrete categories; and (5) the use of database demographics to estimate the likelihood of a match between a search subject and database subjects.
Population-based programs for increasing colorectal cancer screening in the United States.
Verma, Manisha; Sarfaty, Mona; Brooks, Durado; Wender, Richard C
2015-01-01
Answer questions and earn CME/CNE Screening to detect polyps or cancer at an early stage has been shown to produce better outcomes in colorectal cancer (CRC). Programs with a population-based approach can reach a large majority of the eligible population and can offer cost-effective interventions with the potential benefit of maximizing early cancer detection and prevention using a complete follow-up plan. The purpose of this review was to summarize the key features of population-based programs to increase CRC screening in the United States. A search was conducted in the SCOPUS, OvidSP, and PubMed databases. The authors selected published reports of population-based programs that met at least 5 of the 6 International Agency for Research on Cancer (IARC) criteria for cancer prevention and were known to the National Colorectal Cancer Roundtable. Interventions at the level of individual practices were not included in this review. IARC cancer prevention criteria served as a framework to assess the effective processes and elements of a population-based program. Eight programs were included in this review. Half of the programs met all IARC criteria, and all programs led to improvements in screening rates. The rate of colonoscopy after a positive stool test was heterogeneous among programs. Different population-based strategies were used to promote these screening programs, including system-based, provider-based, patient-based, and media-based strategies. Treatment of identified cancer cases was not included explicitly in 4 programs but was offered through routine medical care. Evidence-based methods for promoting CRC screening at a population level can guide the development of future approaches in health care prevention. The key elements of a successful population-based approach include adherence to the 6 IARC criteria and 4 additional elements (an identified external funding source, a structured policy for positive fecal occult blood test results and confirmed cancer cases, outreach activities for recruitment and patient education, and an established rescreening process). © 2015 American Cancer Society.
CCDB: a curated database of genes involved in cervix cancer.
Agarwal, Subhash M; Raghav, Dhwani; Singh, Harinder; Raghava, G P S
2011-01-01
The Cervical Cancer gene DataBase (CCDB, http://crdd.osdd.net/raghava/ccdb) is a manually curated catalog of experimentally validated genes that are thought, or are known to be involved in the different stages of cervical carcinogenesis. In spite of the large women population that is presently affected from this malignancy still at present, no database exists that catalogs information on genes associated with cervical cancer. Therefore, we have compiled 537 genes in CCDB that are linked with cervical cancer causation processes such as methylation, gene amplification, mutation, polymorphism and change in expression level, as evident from published literature. Each record contains details related to gene like architecture (exon-intron structure), location, function, sequences (mRNA/CDS/protein), ontology, interacting partners, homology to other eukaryotic genomes, structure and links to other public databases, thus augmenting CCDB with external data. Also, manually curated literature references have been provided to support the inclusion of the gene in the database and establish its association with cervix cancer. In addition, CCDB provides information on microRNA altered in cervical cancer as well as search facility for querying, several browse options and an online tool for sequence similarity search, thereby providing researchers with easy access to the latest information on genes involved in cervix cancer.
Monitoring Earth's reservoir and lake dynamics from space
NASA Astrophysics Data System (ADS)
Donchyts, G.; Eilander, D.; Schellekens, J.; Winsemius, H.; Gorelick, N.; Erickson, T.; Van De Giesen, N.
2016-12-01
Reservoirs and lakes constitute about 90% of the Earth's fresh surface water. They play a major role in the water cycle and are critical for the ever increasing demands of the world's growing population. Water from reservoirs is used for agricultural, industrial, domestic, and other purposes. Current digital databases of lakes and reservoirs are scarce, mainly providing only descriptive and static properties of the reservoirs. The Global Reservoir and Dam (GRanD) database contains almost 7000 entries while OpenStreetMap counts more than 500 000 entries tagged as a reservoir. In the last decade several research efforts already focused on accurate estimates of surface water dynamics, mainly using satellite altimetry, However, currently they are limited only to less than 1000 (mostly large) water bodies. Our approach is based on three main components. Firstly, a novel method, allowing automated and accurate estimation of surface area from (partially) cloud-free optical multispectral or radar satellite imagery. The algorithm uses satellite imagery measured by Landsat, Sentinel and MODIS missions. Secondly, a database to store reservoir static and dynamic parameters. Thirdly, a web-based tool, built on top of Google Earth Engine infrastructure. The tool allows estimation of surface area for lakes and reservoirs at planetary-scale at high spatial and temporal resolution. A prototype version of the method, database, and tool will be presented as well as validation using in-situ measurements.
Relationships between human population density and burned area at continental and global scales.
Bistinas, Ioannis; Oom, Duarte; Sá, Ana C L; Harrison, Sandy P; Prentice, I Colin; Pereira, José M C
2013-01-01
We explore the large spatial variation in the relationship between population density and burned area, using continental-scale Geographically Weighted Regression (GWR) based on 13 years of satellite-derived burned area maps from the global fire emissions database (GFED) and the human population density from the gridded population of the world (GPW 2005). Significant relationships are observed over 51.5% of the global land area, and the area affected varies from continent to continent: population density has a significant impact on fire over most of Asia and Africa but is important in explaining fire over < 22% of Europe and Australia. Increasing population density is associated with both increased and decreased in fire. The nature of the relationship depends on land-use: increasing population density is associated with increased burned are in rangelands but with decreased burned area in croplands. Overall, the relationship between population density and burned area is non-monotonic: burned area initially increases with population density and then decreases when population density exceeds a threshold. These thresholds vary regionally. Our study contributes to improved understanding of how human activities relate to burned area, and should contribute to a better estimate of atmospheric emissions from biomass burning.
Relationships between Human Population Density and Burned Area at Continental and Global Scales
Bistinas, Ioannis; Oom, Duarte; Sá, Ana C. L.; Harrison, Sandy P.; Prentice, I. Colin; Pereira, José M. C.
2013-01-01
We explore the large spatial variation in the relationship between population density and burned area, using continental-scale Geographically Weighted Regression (GWR) based on 13 years of satellite-derived burned area maps from the global fire emissions database (GFED) and the human population density from the gridded population of the world (GPW 2005). Significant relationships are observed over 51.5% of the global land area, and the area affected varies from continent to continent: population density has a significant impact on fire over most of Asia and Africa but is important in explaining fire over < 22% of Europe and Australia. Increasing population density is associated with both increased and decreased in fire. The nature of the relationship depends on land-use: increasing population density is associated with increased burned are in rangelands but with decreased burned area in croplands. Overall, the relationship between population density and burned area is non-monotonic: burned area initially increases with population density and then decreases when population density exceeds a threshold. These thresholds vary regionally. Our study contributes to improved understanding of how human activities relate to burned area, and should contribute to a better estimate of atmospheric emissions from biomass burning. PMID:24358108
Montinaro, Francesco; Boschi, Ilaria; Trombetta, Federica; Merigioli, Sara; Anagnostou, Paolo; Battaggia, Cinzia; Capocasa, Marco; Crivellaro, Federica; Destro Bisol, Giovanni; Coia, Valentina
2012-12-01
The study of geographically and/or linguistically isolated populations could represent a potential area of interaction between population and forensic genetics. These investigations may be useful to evaluate the suitability of loci which have been selected using forensic criteria for bio-anthropological studies. At the same time, they give us an opportunity to evaluate the efficiency of forensic tools for parentage testing in groups with peculiar allele frequency profiles. Within the frame of a long-term project concerning Italian linguistic isolates, we studied 15 microsatellite loci (Identifiler kit) comprising the CODIS panel in 11 populations from the north-eastern Italian Alps (Veneto, Trentino and Friuli Venezia Giulia regions). All our analyses of inter-population differentiation highlight the genetic distinctiveness of most Alpine populations comparing them either to each other or with large and non-isolated Italian populations. Interestingly, we brought to light some aspects of population genetic structure which cannot be detected using unilinear polymorphisms. In fact, the analysis of genotypic disequilibrium between loci detected signals of population substructure when all the individuals of Alpine populations are pooled in a single group. Furthermore, despite the relatively low number of loci analyzed, genetic differentiation among Alpine populations was detected at individual level using a Bayesian method to cluster multilocus genotypes. Among the various populations studied, the four linguistic minorities (Fassa Valley, Luserna, Sappada and Sauris) showed the most pronounced diversity and signatures of a peculiar genetic ancestry. Finally, we show that database replacement may affect estimates of probability of paternity even when the local database is replaced by another based on populations which share a common genetic background but which differ in their demographic history. These findings point to the importance of considering the demographic and cultural profile of populations in forensic applications, even in a context of substantial genetic homogeneity such as that of European populations. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Iris indexing based on local intensity order pattern
NASA Astrophysics Data System (ADS)
Emerich, Simina; Malutan, Raul; Crisan, Septimiu; Lefkovits, Laszlo
2017-03-01
In recent years, iris biometric systems have increased in popularity and have been proven that are capable of handling large-scale databases. The main advantage of these systems is accuracy and reliability. A proper iris patterns classification is expected to reduce the matching time in huge databases. This paper presents an iris indexing technique based on Local Intensity Order Pattern. The performance of the present approach is evaluated on UPOL database and is compared with other recent systems designed for iris indexing. The results illustrate the potential of the proposed method for large scale iris identification.
Utilization of Large Data Sets in Maternal Health in Finland: A Case for Global Health Research.
Lamminpää, Reeta; Gissler, Mika; Vehviläinen-Julkunen, Katri
In recent years, the use of large data sets, such as electronic health records, has increased. These large data sets are often referred to as "Big Data," which have various definitions. The purpose of this article was to summarize and review the utilization, strengths, and challenges of register data, which means a written record containing regular entries of items or details, and Big Data, especially in maternal nursing, using 4 examples of studies from the Finnish Medical Birth Register data and relate these to other international databases and data sets. Using large health register data is crucial when studying and understanding outcomes of maternity care. This type of data enables comparisons on a population level and can be utilized in research related to maternal health, with important issues and implications for future research and clinical practice. Although there are challenges connected with register data and Big Data, these large data sets offer the opportunity for timely insight into population-based information on relevant research topics in maternal health. Nurse researchers need to understand the possibilities and limitations of using existing register data in maternity research. Maternal child nurse researchers can be leaders of the movement to utilize Big Data to improve global maternal health.
Knick, Steven T.; Schueck, Linda
2002-01-01
The Snake River Field Station of the Forest and Rangeland Ecosystem Science Center has developed and now maintains a database of the spatial information needed to address management of sage grouse and sagebrush steppe habitats in the western United States. The SAGEMAP project identifies and collects infor-mation for the region encompassing the historical extent of sage grouse distribution. State and federal agencies, the primary entities responsible for managing sage grouse and their habitats, need the information to develop an objective assessment of the current status of sage grouse populations and their habitats, or to provide responses and recommendations for recovery if sage grouse are listed as a Threatened or Endangered Species. The spatial data on the SAGEMAP website (http://SAGEMAP.wr.usgs.gov) are an important component in documenting current habitat and other environmental conditions. In addition, the data can be used to identify areas that have undergone significant changes in land cover and to determine underlying causes. As such, the database permits an analysis for large-scale and range-wide factors that may be causing declines of sage grouse populations. The spatial data contained on this site also will be a critical component guiding the decision processes for restoration of habitats in the Great Basin. Therefore, development of this database and the capability to disseminate the information carries multiple benefits for land and wildlife management.
Large Scale Landslide Database System Established for the Reservoirs in Southern Taiwan
NASA Astrophysics Data System (ADS)
Tsai, Tsai-Tsung; Tsai, Kuang-Jung; Shieh, Chjeng-Lun
2017-04-01
Typhoon Morakot seriously attack southern Taiwan awaken the public awareness of large scale landslide disasters. Large scale landslide disasters produce large quantity of sediment due to negative effects on the operating functions of reservoirs. In order to reduce the risk of these disasters within the study area, the establishment of a database for hazard mitigation / disaster prevention is necessary. Real time data and numerous archives of engineering data, environment information, photo, and video, will not only help people make appropriate decisions, but also bring the biggest concern for people to process and value added. The study tried to define some basic data formats / standards from collected various types of data about these reservoirs and then provide a management platform based on these formats / standards. Meanwhile, in order to satisfy the practicality and convenience, the large scale landslide disasters database system is built both provide and receive information abilities, which user can use this large scale landslide disasters database system on different type of devices. IT technology progressed extreme quick, the most modern system might be out of date anytime. In order to provide long term service, the system reserved the possibility of user define data format /standard and user define system structure. The system established by this study was based on HTML5 standard language, and use the responsive web design technology. This will make user can easily handle and develop this large scale landslide disasters database system.
Identifying work-related motor vehicle crashes in multiple databases.
Thomas, Andrea M; Thygerson, Steven M; Merrill, Ray M; Cook, Lawrence J
2012-01-01
To compare and estimate the magnitude of work-related motor vehicle crashes in Utah using 2 probabilistically linked statewide databases. Data from 2006 and 2007 motor vehicle crash and hospital databases were joined through probabilistic linkage. Summary statistics and capture-recapture were used to describe occupants injured in work-related motor vehicle crashes and estimate the size of this population. There were 1597 occupants in the motor vehicle crash database and 1673 patients in the hospital database identified as being in a work-related motor vehicle crash. We identified 1443 occupants with at least one record from either the motor vehicle crash or hospital database indicating work-relatedness that linked to any record in the opposing database. We found that 38.7 percent of occupants injured in work-related motor vehicle crashes identified in the motor vehicle crash database did not have a primary payer code of workers' compensation in the hospital database and 40.0 percent of patients injured in work-related motor vehicle crashes identified in the hospital database did not meet our definition of a work-related motor vehicle crash in the motor vehicle crash database. Depending on how occupants injured in work-related motor crashes are identified, we estimate the population to be between 1852 and 8492 in Utah for the years 2006 and 2007. Research on single databases may lead to biased interpretations of work-related motor vehicle crashes. Combining 2 population based databases may still result in an underestimate of the magnitude of work-related motor vehicle crashes. Improved coding of work-related incidents is needed in current databases.
MendeLIMS: a web-based laboratory information management system for clinical genome sequencing.
Grimes, Susan M; Ji, Hanlee P
2014-08-27
Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis. To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies. We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.
US geographic distribution of rt-PA utilization by hospital for acute ischemic stroke.
Kleindorfer, Dawn; Xu, Yingying; Moomaw, Charles J; Khatri, Pooja; Adeoye, Opeolu; Hornung, Richard
2009-11-01
Previously, we have estimated US national rates of recombinant tissue plasminogen activator (rt-PA) use to be 1.8% to 3.0% of all ischemic stroke patients. However, we hypothesized that the rate of rt-PA use may vary widely depending on regional variation, and that a large percentage of the US population likely does not have access to hospitals using rt-PA regularly. We describe the US geographic distribution of hospitals using rt-PA for acute ischemic stroke. This analysis used the MEDPAR database, which is a claims-based dataset that contains every fee-for-service Medicare-eligible hospital discharge in the US. Cases potentially eligible for rt-PA treatment based on diagnosis were defined as those with a hospital DRG code of 14, 15, or 559, and that also had an ICD-9 code of 433, 434, or 436. Thrombolysis use was defined as an ICD-9 code of 99.1. Study interval was July 1, 2005 to June 30, 2007. Hospital locations were mapped using ArcView software; population densities and regions of the US are based on US Census 2000. There were 4750 hospitals in the MEDPAR database, which included 495 186 ischemic stroke admissions during the study period. Of these hospitals, 64% had no reported treatments with rt-PA for ischemic stroke, and 0.9% reported >10% treatment rates within the MEDPAR dataset. Bed size, rural or underserved designation, and population density were significantly associated with reported rt-PA treatment rates, and remained significant in the multivariable regression. Approximately 162 million US citizens reside in counties containing a hospital reporting a >or=2.4% treatment rate within the MEDPAR dataset. We report the first description of US hospital rt-PA treatment rates by hospital. Unfortunately, we found that 64% of US hospitals did not report giving rt-PA at all within the MEDPAR database within a 2-year period. These tended to be hospitals that were smaller (average bed size of 95), located in less densely populated areas, or located in the South or Midwest. In addition, 40% of the US population resides in counties without a hospital that administered rt-PA to at least 2.4% of ischemic stroke patients, although distinguishing transferred patients is problematic within administrative datasets. Such national-based resource-utilization data is important for planning at the local and national level, especially for such initiatives as telemedicine, to reach underserved areas.
Menditto, Enrica; Bolufer De Gea, Angela; Cahir, Caitriona; Marengoni, Alessandra; Riegler, Salvatore; Fico, Giuseppe; Costa, Elisio; Monaco, Alessandro; Pecorelli, Sergio; Pani, Luca; Prados-Torres, Alexandra
2016-01-01
Computerized health care databases have been widely described as an excellent opportunity for research. The availability of "big data" has brought about a wave of innovation in projects when conducting health services research. Most of the available secondary data sources are restricted to the geographical scope of a given country and present heterogeneous structure and content. Under the umbrella of the European Innovation Partnership on Active and Healthy Ageing, collaborative work conducted by the partners of the group on "adherence to prescription and medical plans" identified the use of observational and large-population databases to monitor medication-taking behavior in the elderly. This article describes the methodology used to gather the information from available databases among the Adherence Action Group partners with the aim of improving data sharing on a European level. A total of six databases belonging to three different European countries (Spain, Republic of Ireland, and Italy) were included in the analysis. Preliminary results suggest that there are some similarities. However, these results should be applied in different contexts and European countries, supporting the idea that large European studies should be designed in order to get the most of already available databases.
Sagace: A web-based search engine for biomedical databases in Japan
2012-01-01
Background In the big data era, biomedical research continues to generate a large amount of data, and the generated information is often stored in a database and made publicly available. Although combining data from multiple databases should accelerate further studies, the current number of life sciences databases is too large to grasp features and contents of each database. Findings We have developed Sagace, a web-based search engine that enables users to retrieve information from a range of biological databases (such as gene expression profiles and proteomics data) and biological resource banks (such as mouse models of disease and cell lines). With Sagace, users can search more than 300 databases in Japan. Sagace offers features tailored to biomedical research, including manually tuned ranking, a faceted navigation to refine search results, and rich snippets constructed with retrieved metadata for each database entry. Conclusions Sagace will be valuable for experts who are involved in biomedical research and drug development in both academia and industry. Sagace is freely available at http://sagace.nibio.go.jp/en/. PMID:23110816
The database design of LAMOST based on MYSQL/LINUX
NASA Astrophysics Data System (ADS)
Li, Hui-Xian, Sang, Jian; Wang, Sha; Luo, A.-Li
2006-03-01
The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) will be set up in the coming years. A fully automated software system for reducing and analyzing the spectra has to be developed with the telescope. This database system is an important part of the software system. The requirements for the database of the LAMOST, the design of the LAMOST database system based on MYSQL/LINUX and performance tests of this system are described in this paper.
2012-01-01
Background In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database. Methods The GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology. The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and processing. Results and conclusions Entries coming from Virus (814,122), Plant (1,365,360) and Invertebrate (959,065) divisions of GenBank rel.180 have been loaded in the Molecular Biodiversity Database by GIDL. Our system, combining the Sequence Ontology and the Chado schema, allows a more powerful query expressiveness compared with the most commonly used sequence retrieval systems like Entrez or SRS. PMID:22536971
Jiang, Z; Dou, Z; Yan, Z H; Song, W L; Chen, Y; Ren, X L; Chen, J; Cao, W; Xu, J; Wu, Z Y
2017-09-10
Objective: To analyze the effect of missing data in population based viral load (PVL) survey in HIV infected men who have sex with men (MSM) sampled in 16 cities in China. Methods: The database of 3 virus load sampling survey conducted consecutively in HIV infected MSM population in 16 large cities (Beijing, Shanghai, Nanjing, Hangzhou, Wuhan, Chongqing, Kunming, Xi'an, Guangzhou, Shenzhen, Nanning, Urumuqi, Harbin, Changchun, Chengdu and Tianjin) during 2013-2015 was used. SPSS 17.0 software was used to describe distribution of the missing data and analyze associated factors. Results: A total of 12 150 HIV infected MSM were randomly selected for the surveys, in whom, 9 141 (75.2 % ) received virus load tests, while 3 009 (24.8 % ) received no virus load tests, whose virus load data missed. The virus load data missing rates in MSM with or without access to antiretroviral therapy (ART) were 11.5 % (765/6 675) and 39.4 % (2 060/5 223) respectively, and the virus load data missing rates were 21.9 % (1 866/8 523) and 28.4 % (959/3 374), respectively, in local residents and non-local residents (migrants). Conclusions: The analysis indicated that the data missing occurred in the virus load survey in HIV infected MSM population. ART status and census registering status were the main influencing factors. Data missing could influence the accurate evaluation of community viral load (CVL) and population viral load(PVL) levels in HIV infected MSM in China.
Korean Variant Archive (KOVA): a reference database of genetic variations in the Korean population.
Lee, Sangmoon; Seo, Jihae; Park, Jinman; Nam, Jae-Yong; Choi, Ahyoung; Ignatius, Jason S; Bjornson, Robert D; Chae, Jong-Hee; Jang, In-Jin; Lee, Sanghyuk; Park, Woong-Yang; Baek, Daehyun; Choi, Murim
2017-06-27
Despite efforts to interrogate human genome variation through large-scale databases, systematic preference toward populations of Caucasian descendants has resulted in unintended reduction of power in studying non-Caucasians. Here we report a compilation of coding variants from 1,055 healthy Korean individuals (KOVA; Korean Variant Archive). The samples were sequenced to a mean depth of 75x, yielding 101 singleton variants per individual. Population genetics analysis demonstrates that the Korean population is a distinct ethnic group comparable to other discrete ethnic groups in Africa and Europe, providing a rationale for such independent genomic datasets. Indeed, KOVA conferred 22.8% increased variant filtering power in addition to Exome Aggregation Consortium (ExAC) when used on Korean exomes. Functional assessment of nonsynonymous variant supported the presence of purifying selection in Koreans. Analysis of copy number variants detected 5.2 deletions and 10.3 amplifications per individual with an increased fraction of novel variants among smaller and rarer copy number variable segments. We also report a list of germline variants that are associated with increased tumor susceptibility. This catalog can function as a critical addition to the pre-existing variant databases in pursuing genetic studies of Korean individuals.
Kim, Hyun Soo
2018-01-01
Aged population is increasing worldwide due to the aging process that is inevitable. Accordingly, longevity and healthy aging have been spotlighted to promote social contribution of aged population. Many studies in the past few decades have reported the process of aging and longevity, emphasizing the importance of maintaining genomic stability in exceptionally long-lived population. Underlying reason of longevity remains unclear due to its complexity involving multiple factors. With advances in sequencing technology and human genome-associated approaches, studies based on population-based genomic studies are increasing. In this review, we summarize recent longevity and healthy aging studies of human population focusing on DNA repair as a major factor in maintaining genome integrity. To keep pace with recent growth in genomic research, aging- and longevity-associated genomic databases are also briefly introduced. To suggest novel approaches to investigate longevity-associated genetic variants related to DNA repair using genomic databases, gene set analysis was conducted, focusing on DNA repair- and longevity-associated genes. Their biological networks were additionally analyzed to grasp major factors containing genetic variants of human longevity and healthy aging in DNA repair mechanisms. In summary, this review emphasizes DNA repair activity in human longevity and suggests approach to conduct DNA repair-associated genomic study on human healthy aging.
The functional spectrum of low-frequency coding variation.
Marth, Gabor T; Yu, Fuli; Indap, Amit R; Garimella, Kiran; Gravel, Simon; Leong, Wen Fung; Tyler-Smith, Chris; Bainbridge, Matthew; Blackwell, Tom; Zheng-Bradley, Xiangqun; Chen, Yuan; Challis, Danny; Clarke, Laura; Ball, Edward V; Cibulskis, Kristian; Cooper, David N; Fulton, Bob; Hartl, Chris; Koboldt, Dan; Muzny, Donna; Smith, Richard; Sougnez, Carrie; Stewart, Chip; Ward, Alistair; Yu, Jin; Xue, Yali; Altshuler, David; Bustamante, Carlos D; Clark, Andrew G; Daly, Mark; DePristo, Mark; Flicek, Paul; Gabriel, Stacey; Mardis, Elaine; Palotie, Aarno; Gibbs, Richard
2011-09-14
Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency. The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants. This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.
Cigarette Smoking and the Risk of Barrett's Esophagus
Kubo, Ai; Levin, T.R.; Block, Gladys; Rumore, Gregory; Quesenberry, Charles P.; Buffler, Patricia; Corley, Douglas A.
2008-01-01
Introduction We examined the association between smoking and the risk of Barrett's esophagus (BE), a metaplastic precursor to esophageal adenocarcinoma. Methods We conducted a case-control study within the Kaiser Permanente Northern California population. Patients with a new diagnosis of BE (n=320) were matched to persons with gastroesophageal reflux disease (GERD) (n=316) and to population controls (n=317). Information was collected using validated questionnaires from direct in-person interviews and electronic databases. Analyses used multivariate unconditional logistic regression that controlled for age, gender, race and education. Results Ever smoking status, smoking intensity (pack-years), and smoking cessation were not associated with the risk of BE. Stratified analyses suggested that ever smoking may be associated with an increased risk of BE among some groups (compared to population controls): persons with long-segment Barrett's esophagus (odds ratio [OR]=1.72, 95% confidence interval [CI] 1.12-2.63); subjects without GERD symptoms (OR=3.98, 95% CI 1.58-10.0); obese subjects (OR=3.38, 95%CI 1.46-7.82); and persons with a large abdominal circumference (OR=3.02, 95%CI (1.18-2.75)). Conclusion Smoking was not a strong or consistent risk factor for BE in a large community-based study, although associations may be present in some population subgroups. PMID:18853262
Cyclic subway networks are less risky in metropolises
NASA Astrophysics Data System (ADS)
Xiao, Ying; Zhang, Hai-Tao; Xu, Bowen; Zhu, Tao; Chen, Guanrong; Chen, Duxin
2018-02-01
Subways are crucial in modern transportation systems of metropolises. To quantitatively evaluate the potential risks of subway networks suffered from natural disasters or deliberate attacks, real data from seven Chinese subway systems are collected and their population distributions and anti-risk capabilities are analyzed. Counterintuitively, it is found that transfer stations with large numbers of connections are not the most crucial, but the stations and lines with large betweenness centrality are essential, if subway networks are being attacked. It is also found that cycles reduce such correlations due to the existence of alternative paths. To simulate the data-based observations, a network model is proposed to characterize the dynamics of subway systems under various intensities of attacks on stations and lines. This study sheds some light onto risk assessment of subway networks in metropolitan cities.
[Benefits of large healthcare databases for drug risk research].
Garbe, Edeltraut; Pigeot, Iris
2015-08-01
Large electronic healthcare databases have become an important worldwide data resource for drug safety research after approval. Signal generation methods and drug safety studies based on these data facilitate the prospective monitoring of drug safety after approval, as has been recently required by EU law and the German Medicines Act. Despite its large size, a single healthcare database may include insufficient patients for the study of a very small number of drug-exposed patients or the investigation of very rare drug risks. For that reason, in the United States, efforts have been made to work on models that provide the linkage of data from different electronic healthcare databases for monitoring the safety of medicines after authorization in (i) the Sentinel Initiative and (ii) the Observational Medical Outcomes Partnership (OMOP). In July 2014, the pilot project Mini-Sentinel included a total of 178 million people from 18 different US databases. The merging of the data is based on a distributed data network with a common data model. In the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCEPP) there has been no comparable merging of data from different databases; however, first experiences have been gained in various EU drug safety projects. In Germany, the data of the statutory health insurance providers constitute the most important resource for establishing a large healthcare database. Their use for this purpose has so far been severely restricted by the Code of Social Law (Section 75, Book 10). Therefore, a reform of this section is absolutely necessary.
Miller, P Elliott; Martin, Seth S; Toth, Peter P; Santos, Raul D; Blaha, Michael J; Nasir, Khurram; Virani, Salim S; Post, Wendy S; Blumenthal, Roger S; Jones, Steven R
2015-01-01
Familial hypercholesterolemia (FH) is an autosomal dominant dyslipidemia characterized by defective low-density lipoprotein (LDL) clearance. The aim of this study was to compare Friedewald-estimated LDL cholesterol (LDL-C) to biologic LDL-C in individuals screening positive for FH and then further characterize FH phenotypes. We studied 1,320,581 individuals from the Very Large Database of Lipids, referred from 2009 to 2011 for Vertical Auto Profile ultracentrifugation testing. Friedewald LDL-C was defined as the cholesterol content of LDL-C, intermediate-density lipoprotein cholesterol, and lipoprotein(a) cholesterol (Lp(a)-C), with LDL-C representing biologic LDL-C. Using Friedewald LDL-C, we phenotypically categorized patients by the National Lipid Association guideline age-based screening thresholds for FH. In those meeting criteria, we categorized patients using population percentile-equivalent biologic LDL-C cutpoints and explored Lp(a)-C and remnant lipoprotein cholesterol (RLP-C) levels. Overall, 3829 patients met phenotypic criteria for FH by Friedewald LDL-C screening (FH+). Of those screening FH+, 78.8% were above and 21.2% were below the population percentile-equivalent biologic LDL-C. The mean difference in Friedewald biologic LDL-C percentiles was -0.01 (standard deviation, 0.17) for those above, and 1.92 (standard deviation, 9.16) for those below, respectively. Over 1 of 3 were found to have an elevated Lp(a)-C and over 50% had RLP-C greater than 95th percentile of the entire VLDL population. Of those who screened FH+, Friedewald and biologic LDL-C levels were closely correlated. Large proportions of the FH+ group had excess levels of Lp(a)-C and RLP-C. Future studies are warranted to study these mixed phenotypic groups and determine the role for further risk stratification and treatment algorithms. Copyright © 2015 National Lipid Association. Published by Elsevier Inc. All rights reserved.
Robinson, William P
2017-12-01
Ruptured abdominal aortic aneurysm is one of the most difficult clinical problems in surgical practice, with extraordinarily high morbidity and mortality. During the past 23 years, the literature has become replete with reports regarding ruptured endovascular aneurysm repair. A variety of study designs and databases have been utilized to compare ruptured endovascular aneurysm repair and open surgical repair for ruptured abdominal aortic aneurysm and studies of various designs from different databases have yielded vastly different conclusions. It therefore remains controversial whether ruptured endovascular aneurysm repair improves outcomes after ruptured abdominal aortic aneurysm in comparison to open surgical repair. The purpose of this article is to review the best available evidence comparing ruptured endovascular aneurysm repair and open surgical repair of ruptured abdominal aortic aneurysm, including single institution and multi-institutional retrospective observational studies, large national population-based studies, large national registries of prospectively collected data, and randomized controlled clinical trials. This article will analyze the study designs and databases utilized with their attendant strengths and weaknesses to understand the sometimes vastly different conclusions the studies have reached. This article will attempt to integrate the data to distill some of the lessons that have been learned regarding ruptured endovascular aneurysm repair and identify ongoing needs in this field. Copyright © 2017 Elsevier Inc. All rights reserved.
Zedler, Barbara K; Saunders, William B; Joyce, Andrew R; Vick, Catherine C; Murrelle, E Lenn
2018-01-01
Abstract Objective To validate a risk index that estimates the likelihood of overdose or serious opioid-induced respiratory depression (OIRD) among medical users of prescription opioids. Subjects and Methods A case-control analysis of 18,365,497 patients with an opioid prescription from 2009 to 2013 in the IMS PharMetrics Plus commercially insured health plan claims database (CIP). An OIRD event occurred in 7,234 cases. Four controls were selected per case. Validity of the Risk Index for Overdose or Serious Opioid-induced Respiratory Depression (RIOSORD), developed previously using Veterans Health Administration (VHA) patient data, was assessed. Multivariable logistic regression was used within the CIP study population to develop a slightly refined RIOSORD. The composition and performance of the CIP-based RIOSORD was evaluated and compared with VHA-based RIOSORD. Results VHA-RIOSORD performed well in discriminating OIRD events in CIP (C-statistic = 0.85). Additionally, re-estimation of logistic model coefficients in CIP yielded a 0.90 C-statistic. The resulting comorbidity and pharmacotherapy variables most highly associated with OIRD and retained in the CIP-RIOSORD were largely concordant with VHA-RIOSORD. These variables included neuropsychiatric and cardiopulmonary disorders, impaired drug excretion, opioid characteristics, and concurrent psychoactive medications. The average predicted probability of OIRD ranged from 2% to 83%, with excellent agreement between predicted and observed incidence across risk classes. Conclusions RIOSORD had excellent predictive accuracy in a large population of US medical users of prescription opioids, similar to its performance in VHA. This practical risk index is designed to support clinical decision-making for safer opioid prescribing, and its clinical utility should be evaluated prospectively. PMID:28340046
Hakenberg, Jörg; Cheng, Wei-Yi; Thomas, Philippe; Wang, Ying-Chih; Uzilov, Andrew V; Chen, Rong
2016-01-08
Data from a plethora of high-throughput sequencing studies is readily available to researchers, providing genetic variants detected in a variety of healthy and disease populations. While each individual cohort helps gain insights into polymorphic and disease-associated variants, a joint perspective can be more powerful in identifying polymorphisms, rare variants, disease-associations, genetic burden, somatic variants, and disease mechanisms. We have set up a Reference Variant Store (RVS) containing variants observed in a number of large-scale sequencing efforts, such as 1000 Genomes, ExAC, Scripps Wellderly, UK10K; various genotyping studies; and disease association databases. RVS holds extensive annotations pertaining to affected genes, functional impacts, disease associations, and population frequencies. RVS currently stores 400 million distinct variants observed in more than 80,000 human samples. RVS facilitates cross-study analysis to discover novel genetic risk factors, gene-disease associations, potential disease mechanisms, and actionable variants. Due to its large reference populations, RVS can also be employed for variant filtration and gene prioritization. A web interface to public datasets and annotations in RVS is available at https://rvs.u.hpc.mssm.edu/.
Young, Katherine
2014-09-30
database.) In fiscal year 2015, NREL is working with universities to populate additional case studies on OpenEI. The goal is to provide a large enough dataset to start conducting analyses of exploration programs to identify correlations between successful exploration plans for areas with similar geologic occurrence models.
Patterns of cannabis use in patients with Inflammatory Bowel Disease: A population based analysis.
Weiss, Alexandra; Friedenberg, Frank
2015-11-01
Tobacco use patterns and effects in patients with Inflammatory Bowel Disease have been extensively studied, however the role and patterns of cannabis use remains poorly defined. Our aim was to evaluate patterns of marijuana use in a large population based survey. Cases were identified from the NHANES database from the National Center for Health Statistics for the time period from January, 2009 through December, 2010 as having ulcerative colitis or Crohn's disease, and exact matched with controls using the Propensity Score Module of SPSS, based on age, gender, and sample weighted using the nearest neighbor method. After weighting, 2084,895 subjects with IBD and 2013,901 control subjects were identified with no significant differences in demographic characteristics. Subjects with IBD had a higher incidence of ever having used marijuana/hashish (M/H) (67.3% vs. 60.0%) and an earlier age of onset of M/H use (15.7 years vs. 19.6 years). Patients with IBD were less likely to have used M/H every month for a year, but more likely to use a heavier amount per day (64.9% subjects with IBD used three or more joints per day vs. 80.5% of subjects without IBD used two or fewer joints per day). In multivariable logistic regression, presence of IBD, male gender, and age over 40 years predicted M/H use. Our study is the first to evaluate marijuana patterns in a large-scale population based survey. Older, male IBD patients have the highest odds of marijuana use. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Conservation status of polar bears (Ursus maritimus) in relation to projected sea-ice declines
NASA Astrophysics Data System (ADS)
Laidre, K. L.; Regehr, E. V.; Akcakaya, H. R.; Amstrup, S. C.; Atwood, T.; Lunn, N.; Obbard, M.; Stern, H. L., III; Thiemann, G.; Wiig, O.
2016-12-01
Loss of Arctic sea ice due to climate change is the most serious threat to polar bears (Ursus maritimus) throughout their circumpolar range. We performed a data-based sensitivity analysis with respect to this threat by evaluating the potential response of the global polar bear population to projected sea-ice conditions. We conducted 1) an assessment of generation length for polar bears, 2) developed of a standardized sea-ice metric representing important habitat characteristics for the species; and 3) performed population projections over three generations, using computer simulation and statistical models representing alternative relationships between sea ice and polar bear abundance. Using three separate approaches, the median percent change in mean global population size for polar bears between 2015 and 2050 ranged from -4% (95% CI = -62%, 50%) to -43% (95% CI = -76%, -20%). Results highlight the potential for large reductions in the global population if sea-ice loss continues. They also highlight the large amount of uncertainty in statistical projections of polar bear abundance and the sensitivity of projections to plausible alternative assumptions. The median probability of a reduction in the mean global population size of polar bears greater than 30% over three generations was approximately 0.71 (range 0.20-0.95. The median probability of a reduction greater than 50% was approximately 0.07 (range 0-0.35), and the probability of a reduction greater than 80% was negligible.
ERIC Educational Resources Information Center
Gurfinkel, Debbie M.; Wolever, Thomas M. S.
2017-01-01
The completion of a 3-d food record, using commonly available nutrient analysis software, is a typical assignment for students in nutrition and food science programs. While these assignments help students evaluate their personal diets, it is insufficient to teach students about surveys of large population cohorts. This paper shows how the Test,…
[Pharmacovigilance in Germany : It is about time].
Douros, A; Schaefer, C; Kreutz, R; Garbe, E
2016-06-01
Pharmacovigilance is defined as the activities relating to the detection, assessment, and prevention of adverse drug reactions (ADRs). Although its beginnings in Germany date back more than 50 years, a stagnation in this field has been observed lately. Different tools of pharmacovigilance will be illustrated and the reasons for its stagnation in Germany will be elucidated. Spontaneous reporting systems are an important tool in pharmacovigilance and are based on reports of ADRs from treating physicians, other healthcare professionals, or patients. Due to several weaknesses of spontaneous reporting systems such as underreporting, media bias, confounding by comorbidity or comedication, and due to the limited quality of the reports, the development of electronic healthcare databases was publicly funded in recent years so that they can be used for pharmacovigilance research. In the US different electronic healthcare databases were merged in a project sponsored by public means resulting in more than 193 million individuals. In Germany the establishment of large longitudinal databases was never conceived as a public duty and has not been implemented so far. Further attempts to use administrative healthcare data for pharmacovigilance purposes are severely restricted by the Code of Social Law (Section 75, Book 10). This situation has led to a stagnation in pharmacovigilance research in Germany. Without publicly funded large longitudinal healthcare databases and an amendment of Section 75, Book 10, of the Code of Social Law, the use of healthcare data in pharmacovigilance research in Germany will remain a rarity. This could have negative effects on the medical care of the general population.
Use of large healthcare databases for rheumatology clinical research.
Desai, Rishi J; Solomon, Daniel H
2017-03-01
Large healthcare databases, which contain data collected during routinely delivered healthcare to patients, can serve as a valuable resource for generating actionable evidence to assist medical and healthcare policy decision-making. In this review, we summarize use of large healthcare databases in rheumatology clinical research. Large healthcare data are critical to evaluate medication safety and effectiveness in patients with rheumatologic conditions. Three major sources of large healthcare data are: first, electronic medical records, second, health insurance claims, and third, patient registries. Each of these sources offers unique advantages, but also has some inherent limitations. To address some of these limitations and maximize the utility of these data sources for evidence generation, recent efforts have focused on linking different data sources. Innovations such as randomized registry trials, which aim to facilitate design of low-cost randomized controlled trials built on existing infrastructure provided by large healthcare databases, are likely to make clinical research more efficient in coming years. Harnessing the power of information contained in large healthcare databases, while paying close attention to their inherent limitations, is critical to generate a rigorous evidence-base for medical decision-making and ultimately enhancing patient care.
Population Education Accessions List, May-August 1999.
ERIC Educational Resources Information Center
United Nations Educational, Scientific and Cultural Organization, Bangkok (Thailand). Principal Regional Office for Asia and the Pacific.
This document is comprised of output from the Regional Clearinghouse on Population Education and Communication (RCPEC) computerized bibliographic database on reproductive and sexual health and geography. Entries are categorized into four parts: (1) "Population Education"; (2) "Knowledge-base Information"; (3) "Audio-Visual and IEC Materials; and…
High-sensitivity HLA typing by Saturated Tiling Capture Sequencing (STC-Seq).
Jiao, Yang; Li, Ran; Wu, Chao; Ding, Yibin; Liu, Yanning; Jia, Danmei; Wang, Lifeng; Xu, Xiang; Zhu, Jing; Zheng, Min; Jia, Junling
2018-01-15
Highly polymorphic human leukocyte antigen (HLA) genes are responsible for fine-tuning the adaptive immune system. High-resolution HLA typing is important for the treatment of autoimmune and infectious diseases. Additionally, it is routinely performed for identifying matched donors in transplantation medicine. Although many HLA typing approaches have been developed, the complexity, low-efficiency and high-cost of current HLA-typing assays limit their application in population-based high-throughput HLA typing for donors, which is required for creating large-scale databases for transplantation and precision medicine. Here, we present a cost-efficient Saturated Tiling Capture Sequencing (STC-Seq) approach to capturing 14 HLA class I and II genes. The highly efficient capture (an approximately 23,000-fold enrichment) of these genes allows for simplified allele calling. Tests on five genes (HLA-A/B/C/DRB1/DQB1) from 31 human samples and 351 datasets using STC-Seq showed results that were 98% consistent with the known two sets of digitals (field1 and field2) genotypes. Additionally, STC can capture genomic DNA fragments longer than 3 kb from HLA loci, making the library compatible with the third-generation sequencing. STC-Seq is a highly accurate and cost-efficient method for HLA typing which can be used to facilitate the establishment of population-based HLA databases for the precision and transplantation medicine.
Big Data and Total Hip Arthroplasty: How Do Large Databases Compare?
Bedard, Nicholas A; Pugely, Andrew J; McHugh, Michael A; Lux, Nathan R; Bozic, Kevin J; Callaghan, John J
2018-01-01
Use of large databases for orthopedic research has become extremely popular in recent years. Each database varies in the methods used to capture data and the population it represents. The purpose of this study was to evaluate how these databases differed in reported demographics, comorbidities, and postoperative complications for primary total hip arthroplasty (THA) patients. Primary THA patients were identified within National Surgical Quality Improvement Programs (NSQIP), Nationwide Inpatient Sample (NIS), Medicare Standard Analytic Files (MED), and Humana administrative claims database (HAC). NSQIP definitions for comorbidities and complications were matched to corresponding International Classification of Diseases, 9th Revision/Current Procedural Terminology codes to query the other databases. Demographics, comorbidities, and postoperative complications were compared. The number of patients from each database was 22,644 in HAC, 371,715 in MED, 188,779 in NIS, and 27,818 in NSQIP. Age and gender distribution were clinically similar. Overall, there was variation in prevalence of comorbidities and rates of postoperative complications between databases. As an example, NSQIP had more than twice the obesity than NIS. HAC and MED had more than 2 times the diabetics than NSQIP. Rates of deep infection and stroke 30 days after THA had more than 2-fold difference between all databases. Among databases commonly used in orthopedic research, there is considerable variation in complication rates following THA depending upon the database used for analysis. It is important to consider these differences when critically evaluating database research. Additionally, with the advent of bundled payments, these differences must be considered in risk adjustment models. Copyright © 2017 Elsevier Inc. All rights reserved.
Godown, Justin; Thurm, Cary; Dodd, Debra A; Soslow, Jonathan H; Feingold, Brian; Smith, Andrew H; Mettler, Bret A; Thompson, Bryn; Hall, Matt
2017-12-01
Large clinical, research, and administrative databases are increasingly utilized to facilitate pediatric heart transplant (HTx) research. Linking databases has proven to be a robust strategy across multiple disciplines to expand the possible analyses that can be performed while leveraging the strengths of each dataset. We describe a unique linkage of the Scientific Registry of Transplant Recipients (SRTR) database and the Pediatric Health Information System (PHIS) administrative database to provide a platform to assess resource utilization in pediatric HTx. All pediatric patients (1999-2016) who underwent HTx at a hospital enrolled in the PHIS database were identified. A linkage was performed between the SRTR and PHIS databases in a stepwise approach using indirect identifiers. To determine the feasibility of using these linked data to assess resource utilization, total and post-HTx hospital costs were assessed. A total of 3188 unique transplants were identified as being present in both databases and amenable to linkage. Linkage of SRTR and PHIS data was successful in 3057 (95.9%) patients, of whom 2896 (90.8%) had complete cost data. Median total and post-HTx hospital costs were $518,906 (IQR $324,199-$889,738), and $334,490 (IQR $235,506-$498,803) respectively with significant differences based on patient demographics and clinical characteristics at HTx. Linkage of the SRTR and PHIS databases is feasible and provides an invaluable tool to assess resource utilization. Our analysis provides contemporary cost data for pediatric HTx from the largest US sample reported to date. It also provides a platform for expanded analyses in the pediatric HTx population. Copyright © 2017 Elsevier Inc. All rights reserved.
Daniel J. Isaak; Jay M. Ver Hoef; Erin E. Peterson; Dona L. Horan; David E. Nagel
2017-01-01
Population size estimates for stream fishes are important for conservation and management, but sampling costs limit the extent of most estimates to small portions of river networks that encompass 100sâ10 000s of linear kilometres. However, the advent of large fish density data sets, spatial-stream-network (SSN) models that benefit from nonindependence among samples,...
Risk and Safety in Post-Soviet Russia
2008-09-01
radiation exposure databases from Chernobyl , radioactive contamination from long-term operation of large radiochemical atomic plants, and the impact of...64 9.4 Single Irradiation of the Population 65 9.5 Chronic Irradiation of the Population and Personnel 66 9.6. Conclusions 67 10.0 Chernobyl ...Related Radiation Risk for the Public 76 10.1 Introduction 76 10.2 Radioactive Contamination of Russian Territories as a Result of the Chernobyl
Choice of population database for forensic DNA profile analysis.
Steele, Christopher D; Balding, David J
2014-12-01
When evaluating the weight of evidence (WoE) for an individual to be a contributor to a DNA sample, an allele frequency database is required. The allele frequencies are needed to inform about genotype probabilities for unknown contributors of DNA to the sample. Typically databases are available from several populations, and a common practice is to evaluate the WoE using each available database for each unknown contributor. Often the most conservative WoE (most favourable to the defence) is the one reported to the court. However the number of human populations that could be considered is essentially unlimited and the number of contributors to a sample can be large, making it impractical to perform every possible WoE calculation, particularly for complex crime scene profiles. We propose instead the use of only the database that best matches the ancestry of the queried contributor, together with a substantial FST adjustment. To investigate the degree of conservativeness of this approach, we performed extensive simulations of one- and two-contributor crime scene profiles, in the latter case with, and without, the profile of the second contributor available for the analysis. The genotypes were simulated using five population databases, which were also available for the analysis, and evaluations of WoE using our heuristic rule were compared with several alternative calculations using different databases. Using FST=0.03, we found that our heuristic gave WoE more favourable to the defence than alternative calculations in well over 99% of the comparisons we considered; on average the difference in WoE was just under 0.2 bans (orders of magnitude) per locus. The degree of conservativeness of the heuristic rule can be adjusted through the FST value. We propose the use of this heuristic for DNA profile WoE calculations, due to its ease of implementation, and efficient use of the evidence while allowing a flexible degree of conservativeness. Copyright © 2014. Published by Elsevier Ireland Ltd.
Hegedűs, Tamás; Chaubey, Pururawa Mayank; Várady, György; Szabó, Edit; Sarankó, Hajnalka; Hofstetter, Lia; Roschitzki, Bernd; Sarkadi, Balázs
2015-01-01
Based on recent results, the determination of the easily accessible red blood cell (RBC) membrane proteins may provide new diagnostic possibilities for assessing mutations, polymorphisms or regulatory alterations in diseases. However, the analysis of the current mass spectrometry-based proteomics datasets and other major databases indicates inconsistencies—the results show large scattering and only a limited overlap for the identified RBC membrane proteins. Here, we applied membrane-specific proteomics studies in human RBC, compared these results with the data in the literature, and generated a comprehensive and expandable database using all available data sources. The integrated web database now refers to proteomic, genetic and medical databases as well, and contains an unexpected large number of validated membrane proteins previously thought to be specific for other tissues and/or related to major human diseases. Since the determination of protein expression in RBC provides a method to indicate pathological alterations, our database should facilitate the development of RBC membrane biomarker platforms and provide a unique resource to aid related further research and diagnostics. Database URL: http://rbcc.hegelab.org PMID:26078478
Kamitsuji, Shigeo; Matsuda, Takashi; Nishimura, Koichi; Endo, Seiko; Wada, Chisa; Watanabe, Kenji; Hasegawa, Koichi; Hishigaki, Haretsugu; Masuda, Masatoshi; Kuwahara, Yusuke; Tsuritani, Katsuki; Sugiura, Kenkichi; Kubota, Tomoko; Miyoshi, Shinji; Okada, Kinya; Nakazono, Kazuyuki; Sugaya, Yuki; Yang, Woosung; Sawamoto, Taiji; Uchida, Wataru; Shinagawa, Akira; Fujiwara, Tsutomu; Yamada, Hisaharu; Suematsu, Koji; Tsutsui, Naohisa; Kamatani, Naoyuki; Liou, Shyh-Yuh
2015-06-01
Japan Pharmacogenomics Data Science Consortium (JPDSC) has assembled a database for conducting pharmacogenomics (PGx) studies in Japanese subjects. The database contains the genotypes of 2.5 million single-nucleotide polymorphisms (SNPs) and 5 human leukocyte antigen loci from 2994 Japanese healthy volunteers, as well as 121 kinds of clinical information, including self-reports, physiological data, hematological data and biochemical data. In this article, the reliability of our data was evaluated by principal component analysis (PCA) and association analysis for hematological and biochemical traits by using genome-wide SNP data. PCA of the SNPs showed that all the samples were collected from the Japanese population and that the samples were separated into two major clusters by birthplace, Okinawa and other than Okinawa, as had been previously reported. Among 87 SNPs that have been reported to be associated with 18 hematological and biochemical traits in genome-wide association studies (GWAS), the associations of 56 SNPs were replicated using our data base. Statistical power simulations showed that the sample size of the JPDSC control database is large enough to detect genetic markers having a relatively strong association even when the case sample size is small. The JPDSC database will be useful as control data for conducting PGx studies to explore genetic markers to improve the safety and efficacy of drugs either during clinical development or in post-marketing.
Deng, Chen-Hui; Zhang, Guan-Min; Bi, Shan-Shan; Zhou, Tian-Yan; Lu, Wei
2011-07-01
This study is to develop a therapeutic drug monitoring (TDM) network server of tacrolimus for Chinese renal transplant patients, which can facilitate doctor to manage patients' information and provide three levels of predictions. Database management system MySQL was employed to build and manage the database of patients and doctors' information, and hypertext mark-up language (HTML) and Java server pages (JSP) technology were employed to construct network server for database management. Based on the population pharmacokinetic model of tacrolimus for Chinese renal transplant patients, above program languages were used to construct the population prediction and subpopulation prediction modules. Based on Bayesian principle and maximization of the posterior probability function, an objective function was established, and minimized by an optimization algorithm to estimate patient's individual pharmacokinetic parameters. It is proved that the network server has the basic functions for database management and three levels of prediction to aid doctor to optimize the regimen of tacrolimus for Chinese renal transplant patients.
Craters of the Pluto-Charon system
NASA Astrophysics Data System (ADS)
Robbins, Stuart J.; Singer, Kelsi N.; Bray, Veronica J.; Schenk, Paul; Lauer, Tod R.; Weaver, Harold A.; Runyon, Kirby; McKinnon, William B.; Beyer, Ross A.; Porter, Simon; White, Oliver L.; Hofgartner, Jason D.; Zangari, Amanda M.; Moore, Jeffrey M.; Young, Leslie A.; Spencer, John R.; Binzel, Richard P.; Buie, Marc W.; Buratti, Bonnie J.; Cheng, Andrew F.; Grundy, William M.; Linscott, Ivan R.; Reitsema, Harold J.; Reuter, Dennis C.; Showalter, Mark R.; Tyler, G. Len; Olkin, Catherine B.; Ennico, Kimberly S.; Stern, S. Alan; New Horizons Lorri, Mvic Instrument Teams
2017-05-01
NASA's New Horizons flyby mission of the Pluto-Charon binary system and its four moons provided humanity with its first spacecraft-based look at a large Kuiper Belt Object beyond Triton. Excluding this system, multiple Kuiper Belt Objects (KBOs) have been observed for only 20 years from Earth, and the KBO size distribution is unconstrained except among the largest objects. Because small KBOs will remain beyond the capabilities of ground-based observatories for the foreseeable future, one of the best ways to constrain the small KBO population is to examine the craters they have made on the Pluto-Charon system. The first step to understanding the crater population is to map it. In this work, we describe the steps undertaken to produce a robust crater database of impact features on Pluto, Charon, and their two largest moons, Nix and Hydra. These include an examination of different types of images and image processing, and we present an analysis of variability among the crater mapping team, where crater diameters were found to average ± 10% uncertainty across all sizes measured (∼0.5-300 km). We also present a few basic analyses of the crater databases, finding that Pluto's craters' differential size-frequency distribution across the encounter hemisphere has a power-law slope of approximately -3.1 ± 0.1 over diameters D ≈ 15-200 km, and Charon's has a slope of -3.0 ± 0.2 over diameters D ≈ 10-120 km; it is significantly shallower on both bodies at smaller diameters. We also better quantify evidence of resurfacing evidenced by Pluto's craters in contrast with Charon's. With this work, we are also releasing our database of potential and probable impact craters: 5287 on Pluto, 2287 on Charon, 35 on Nix, and 6 on Hydra.
Craters of the Pluto-Charon System
NASA Technical Reports Server (NTRS)
Robbins, Stuart J.; Singer, Kelsi N.; Bray, Veronica J.; Schenk, Paul; Lauer, Todd R.; Weaver, Harold A.; Runyon, Kirby; Mckinnon, William B.; Beyer, Ross A.; Porter, Simon;
2016-01-01
NASA's New Horizons flyby mission of the Pluto-Charon binary system and its four moons provided humanity with its first spacecraft-based look at a large Kuiper Belt Object beyond Triton. Excluding this system, multiple Kuiper Belt Objects (KBOs) have been observed for only 20 years from Earth, and the KBO size distribution is unconstrained except among the largest objects. Because small KBOs will remain beyond the capabilities of ground-based observatories for the foreseeable future, one of the best ways to constrain the small KBO population is to examine the craters they have made on the Pluto-Charon system. The first step to understanding the crater population is to map it. In this work, we describe the steps undertaken to produce a robust crater database of impact features on Pluto, Charon, and their two largest moons, Nix and Hydra. These include an examination of different types of images and image processing, and we present an analysis of variability among the crater mapping team, where crater diameters were found to average +/-10% uncertainty across all sizes measured (approx.0.5-300 km). We also present a few basic analyses of the crater databases, finding that Pluto's craters' differential size-frequency distribution across the encounter hemisphere has a power-law slope of approximately -3.1 +/- 0.1 over diameters D approx. = 15-200 km, and Charon's has a slope of -3.0 +/- 0.2 over diameters D approx. = 10-120 km; it is significantly shallower on both bodies at smaller diameters. We also better quantify evidence of resurfacing evidenced by Pluto's craters in contrast with Charon's. With this work, we are also releasing our database of potential and probable impact craters: 5287 on Pluto, 2287 on Charon, 35 on Nix, and 6 on Hydra.
Mineau, Mineau P; Gilda, Garibotti; Kerber, Richard
2014-01-01
We examine how key early family circumstances affect mortality risks decades later. Early life conditions are measured by parental mortality, parental fertility (e.g., offspring sibship size, parental age at offspring birth), religious upbringing, and parental socioeconomic status. Prior to these early life conditions are familial and genetic factors that affect life-span. Accordingly, we consider the role of parental and familial longevity on adult mortality risks. We analyze the large Utah Population Database which contains a vast amount of genealogical and other vital/health data that contain full life histories of individuals and hundreds of their relatives. To control for unobserved heterogeneity, we analyze sib-pair data for 12,000 sib-pairs using frailty models. We found modest effects of key childhood conditions (birth order, sibship size, parental religiosity, parental SES, and parental death in childhood). Our measures of familial aggregation of longevity were large and suggest an alternative view of early life conditions. PMID:19278766
Comparison of flavonoid intake assessment methods.
Ivey, Kerry L; Croft, Kevin; Prince, Richard L; Hodgson, Jonathan M
2016-09-14
Flavonoids are a diverse group of polyphenolic compounds found in high concentrations in many plant foods and beverages. High flavonoid intake has been associated with reduced risk of chronic disease. To date, population based studies have used the United States Department of Agriculture (USDA) food content database to determine habitual flavonoid intake. More recently, a new flavonoid food content database, Phenol-Explorer (PE), has been developed. However, the level of agreement between the two databases is yet to be explored. To compare the methods used to create each database, and to explore the level of agreement between the flavonoid intake estimates derived from USDA and PE data. The study population included 1063 randomly selected women aged over 75 years. Two separate intake estimates were determined using food composition data from the USDA and the PE databases. There were many similarities in methods used to create each database; however, there are several methodological differences that manifest themselves in differences in flavonoid intake estimates between the 2 databases. Despite differences in net estimates, there was a strong level of agreement between total-flavonoid, flavanol, flavanone and anthocyanidin intake estimates derived from each database. Intake estimates for flavanol monomers showed greater agreement than flavanol polymers. The level of agreement between the two databases was the weakest for the flavonol and flavone intake estimates. In this population, the application of USDA and PE source data yielded highly correlated intake estimates for total-flavonoids, flavanols, flavanones and anthocyanidins. For these sub-classes, the USDA and PE databases may be used interchangeably in epidemiological investigations. There was poorer correlation between intake estimates for flavonols and flavones due to differences in USDA and PE methodologies. Individual flavonoid compound groups that comprise flavonoid sub-classes had varying levels of agreement. As such, when determining the appropriate database to calculate flavonoid intake variables, it is important to consider methodologies underpinning database creation and which foods are important contributors to dietary intake in the population of interest.
The role of smartphones in encouraging physical activity in adults
Stuckey, Melanie I; Carter, Shawn W; Knight, Emily
2017-01-01
Lack of physical activity is a global public health issue. Behavioral change interventions utilizing smartphone applications (apps) are considered a potential solution. The purpose of this literature review was to: 1) determine whether smartphone-based interventions encourage the initiation of, and participation in, physical activity; 2) explore the success of interventions in different populations; and 3) examine the key factors of the interventions that successfully encouraged physical activity. Eight databases (Medline, Scopus, EBM Reviews–Cochrane Central Register of Controlled Trials, EBM Reviews–Cochrane Database of Systematic Reviews, PsycInfo, SportDISCUS, CINAHL, and EMBASE) were searched and studies reporting physical activity outcomes following interventions using smartphone apps in adults were included in the narrative review. Results were mixed with eight studies reporting increased physical activity and ten reporting no change. Interventions did not appear to be successful in specific populations defined by age, sex, country, or clinical diagnosis. There was no conclusive evidence that a specific behavioral theory or behavioral change technique was superior in eliciting behavioral change. The literature remains limited primarily to short-term studies, many of which are underpowered feasibility or pilot studies; therefore, many knowledge gaps regarding the effectiveness of smartphone apps in encouraging physical activity remain. Robust studies that can accommodate the fast pace of the technology industry are needed to examine outcomes in large populations. PMID:28979157
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bright, Edward A.; Rose, Amy N.; Urban, Marie L.
The LandScan data set is a worldwide population database compiled on a 30" x 30" latitude/longitube grid. Census counts (at sub-national level) were apportioned to each grid cell based on likelihood coefficients, which are based on land cover, slope, road proximity, high-resolution imagery, and other data sets. The LandScan data set was developed as part of Oak Ridge National Laboratory (ORNL) Global Population Project for estimating ambient populations at risk.
Lee, Howard; Chapiro, Julius; Schernthaner, Rüdiger; Duran, Rafael; Wang, Zhijun; Gorodetski, Boris; Geschwind, Jean-François; Lin, MingDe
2015-04-01
The objective of this study was to demonstrate that an intra-arterial liver therapy clinical research database system is a more workflow efficient and robust tool for clinical research than a spreadsheet storage system. The database system could be used to generate clinical research study populations easily with custom search and retrieval criteria. A questionnaire was designed and distributed to 21 board-certified radiologists to assess current data storage problems and clinician reception to a database management system. Based on the questionnaire findings, a customized database and user interface system were created to perform automatic calculations of clinical scores including staging systems such as the Child-Pugh and Barcelona Clinic Liver Cancer, and facilitates data input and output. Questionnaire participants were favorable to a database system. The interface retrieved study-relevant data accurately and effectively. The database effectively produced easy-to-read study-specific patient populations with custom-defined inclusion/exclusion criteria. The database management system is workflow efficient and robust in retrieving, storing, and analyzing data. Copyright © 2015 AUR. Published by Elsevier Inc. All rights reserved.
Enhancement and Validation of an Arab Surname Database
Schwartz, Kendra; Beebani, Ganj; Sedki, Mai; Tahhan, Mamon; Ruterbusch, Julie J.
2015-01-01
Objectives Arab Americans constitute a large, heterogeneous, and quickly growing subpopulation in the United States. Health statistics for this group are difficult to find because US governmental offices do not recognize Arab as separate from white. The development and validation of an Arab- and Chaldean-American name database will enhance research efforts in this population subgroup. Methods A previously validated name database was supplemented with newly identified names gathered primarily from vital statistic records and then evaluated using a multistep process. This process included 1) review by 4 Arabic- and Chaldean-speaking reviewers, 2) ethnicity assessment by social media searches, and 3) self-report of ancestry obtained from a telephone survey. Results Our Arab- and Chaldean-American name algorithm has a positive predictive value of 91% and a negative predictive value of 100%. Conclusions This enhanced name database and algorithm can be used to identify Arab Americans in health statistics data, such as cancer and hospital registries, where they are often coded as white, to determine the extent of health disparities in this population. PMID:24625771
Ogishima, Soichi; Takai, Takako; Shimokawa, Kazuro; Nagaie, Satoshi; Tanaka, Hiroshi; Nakaya, Jun
2015-01-01
The Tohoku Medical Megabank project is a national project to revitalization of the disaster area in the Tohoku region by the Great East Japan Earthquake, and have conducted large-scale prospective genome-cohort study. Along with prospective genome-cohort study, we have developed integrated database and knowledge base which will be key database for realizing personalized prevention and medicine.
Radiocarbon Dating the Anthropocene
NASA Astrophysics Data System (ADS)
Chaput, M. A.; Gajewski, K. J.
2015-12-01
The Anthropocene has no agreed start date since current suggestions for its beginning range from Pre-Industrial times to the Industrial Revolution, and from the mid-twentieth century to the future. To set the boundary of the Anthropocene in geological time, we must first understand when, how and to what extent humans began altering the Earth system. One aspect of this involves reconstructing the effects of prehistoric human activity on the physical landscape. However, for global reconstructions of land use and land cover change to be more accurately interpreted in the context of human interaction with the landscape, large-scale spatio-temporal demographic changes in prehistoric populations must be known. Estimates of the relative number of prehistoric humans in different regions of the world and at different moments in time are needed. To this end, we analyze a dataset of radiocarbon dates from the Canadian Archaeological Radiocarbon Database (CARD), the Palaeolithic Database of Europe and the AustArch Database of Australia, as well as published dates from South America. This is the first time such a large quantity of dates (approximately 60,000) has been mapped and studied at a global scale. Initial results from the analysis of temporal frequency distributions of calibrated radiocarbon dates, assumed to be proportional to population density, will be discussed. The utility of radiocarbon dates in studies of the Anthropocene will be evaluated and potential links between population density and changes in atmospheric greenhouse gas concentrations, climate, migration patterning and fire frequency coincidence will be considered.
NASA Astrophysics Data System (ADS)
Graettinger, A. H.
2018-05-01
A maar crater is the top of a much larger subsurface diatreme structure produced by phreatomagmatic explosions and the size and shape of the crater reflects the growth history of that structure during an eruption. Recent experimental and geophysical research has shown that crater complexity can reflect subsurface complexity. Morphometry provides a means of characterizing a global population of maar craters in order to establish the typical size and shape of features. A global database of Quaternary maar crater planform morphometry indicates that maar craters are typically not circular and frequently have compound shapes resembling overlapping circles. Maar craters occur in volcanic fields that contain both small volume and complex volcanoes. The global perspective provided by the database shows that maars are common in many volcanic and tectonic settings producing a similar diversity of size and shape within and between volcanic fields. A few exceptional populations of maars were revealed by the database, highlighting directions of future research to improve our understanding on the geometry and spacing of subsurface explosions that produce maars. These outlying populations, such as anomalously large craters (>3000 m), chains of maars, and volcanic fields composed of mostly maar craters each represent a small portion of the database, but provide opportunities to reinvestigate fundamental questions on maar formation. Maar crater morphometry can be integrated with structural, hydrological studies to investigate lateral migration of phreatomagmatic explosion location in the subsurface. A comprehensive database of intact maar morphometry is also beneficial for the hunt for maar-diatremes on other planets.
NCBI2RDF: enabling full RDF-based access to NCBI databases.
Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor
2013-01-01
RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.
Selig, L; Guedes, R; Kritski, A; Spector, N; Lapa E Silva, J R; Braga, J U; Trajman, A
2009-08-01
In 2006, 848 persons died from tuberculosis (TB) in Rio de Janeiro, Brazil, corresponding to a mortality rate of 5.4 per 100 000 population. No specific TB death surveillance actions are currently in place in Brazil. Two public general hospitals with large open emergency rooms in Rio de Janeiro City. To evaluate the contribution of TB death surveillance in detecting gaps in TB control. We conducted a survey of TB deaths from September 2005 to August 2006. Records of TB-related deaths and deaths due to undefined causes were investigated. Complementary data were gathered from the mortality and TB notification databases. Seventy-three TB-related deaths were investigated. Transmission hazards were identified among firefighters, health care workers and in-patients. Management errors included failure to isolate suspected cases, to confirm TB, to correct drug doses in underweight patients and to trace contacts. Following the survey, 36 cases that had not previously been notified were included in the national TB notification database and the outcome of 29 notified cases was corrected. TB mortality surveillance can contribute to TB monitoring and evaluation by detecting correctable and specific programme- and hospital-based care errors, and by improving the accuracy of TB database reporting. Specific local and programmatic interventions can be proposed as a result.
TabSQL: a MySQL tool to facilitate mapping user data to public databases.
Xia, Xiao-Qin; McClelland, Michael; Wang, Yipeng
2010-06-23
With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data.
TabSQL: a MySQL tool to facilitate mapping user data to public databases
2010-01-01
Background With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. Results We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. Conclusions TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data. PMID:20573251
VIEWCACHE: An incremental pointer-based access method for autonomous interoperable databases
NASA Technical Reports Server (NTRS)
Roussopoulos, N.; Sellis, Timos
1993-01-01
One of the biggest problems facing NASA today is to provide scientists efficient access to a large number of distributed databases. Our pointer-based incremental data base access method, VIEWCACHE, provides such an interface for accessing distributed datasets and directories. VIEWCACHE allows database browsing and search performing inter-database cross-referencing with no actual data movement between database sites. This organization and processing is especially suitable for managing Astrophysics databases which are physically distributed all over the world. Once the search is complete, the set of collected pointers pointing to the desired data are cached. VIEWCACHE includes spatial access methods for accessing image datasets, which provide much easier query formulation by referring directly to the image and very efficient search for objects contained within a two-dimensional window. We will develop and optimize a VIEWCACHE External Gateway Access to database management systems to facilitate database search.
Global rates of habitat loss and implications for amphibian conservation
Gallant, Alisa L.; Klaver, R.W.; Casper, G.S.; Lannoo, M.J.
2007-01-01
A large number of factors are known to affect amphibian population viability, but most authors agree that the principal causes of amphibian declines are habitat loss, alteration, and fragmentation. We provide a global assessment of land use dynamics in the context of amphibian distributions. We accomplished this by compiling global maps of amphibian species richness and recent rates of change in land cover, land use, and human population growth. The amphibian map was developed using a combination of published literature and digital databases. We used an ecoregion framework to help interpret species distributions across environmental, rather than political, boundaries. We mapped rates of land cover and use change with statistics from the World Resources Institute, refined with a global digital dataset on land cover derived from satellite data. Temporal maps of human population were developed from the World Resources Institute database and other published sources. Our resultant map of amphibian species richness illustrates that amphibians are distributed in an uneven pattern around the globe, preferring terrestrial and freshwater habitats in ecoregions that are warm and moist. Spatiotemporal patterns of human population show that, prior to the 20th century, population growth and spread was slower, most extensive in the temperate ecoregions, and largely exclusive of major regions of high amphibian richness. Since the beginning of the 20th century, human population growth has been exponential and has occurred largely in the subtropical and tropical ecoregions favored by amphibians. Population growth has been accompanied by broad-scale changes in land cover and land use, typically in support of agriculture. We merged information on land cover, land use, and human population growth to generate a composite map showing the rates at which humans have been changing the world. When compared with the map of amphibian species richness, we found that many of the regions of the earth supporting the richest assemblages of amphibians are currently undergoing the highest rates of landscape modification.
Informatics in radiology: use of CouchDB for document-based storage of DICOM objects.
Rascovsky, Simón J; Delgado, Jorge A; Sanz, Alexander; Calvo, Víctor D; Castrillón, Gabriel
2012-01-01
Picture archiving and communication systems traditionally have depended on schema-based Structured Query Language (SQL) databases for imaging data management. To optimize database size and performance, many such systems store a reduced set of Digital Imaging and Communications in Medicine (DICOM) metadata, discarding informational content that might be needed in the future. As an alternative to traditional database systems, document-based key-value stores recently have gained popularity. These systems store documents containing key-value pairs that facilitate data searches without predefined schemas. Document-based key-value stores are especially suited to archive DICOM objects because DICOM metadata are highly heterogeneous collections of tag-value pairs conveying specific information about imaging modalities, acquisition protocols, and vendor-supported postprocessing options. The authors used an open-source document-based database management system (Apache CouchDB) to create and test two such databases; CouchDB was selected for its overall ease of use, capability for managing attachments, and reliance on HTTP and Representational State Transfer standards for accessing and retrieving data. A large database was created first in which the DICOM metadata from 5880 anonymized magnetic resonance imaging studies (1,949,753 images) were loaded by using a Ruby script. To provide the usual DICOM query functionality, several predefined "views" (standard queries) were created by using JavaScript. For performance comparison, the same queries were executed in both the CouchDB database and a SQL-based DICOM archive. The capabilities of CouchDB for attachment management and database replication were separately assessed in tests of a similar, smaller database. Results showed that CouchDB allowed efficient storage and interrogation of all DICOM objects; with the use of information retrieval algorithms such as map-reduce, all the DICOM metadata stored in the large database were searchable with only a minimal increase in retrieval time over that with the traditional database management system. Results also indicated possible uses for document-based databases in data mining applications such as dose monitoring, quality assurance, and protocol optimization. RSNA, 2012
Huel, René L. M.; Bašić, Lara; Madacki-Todorović, Kamelija; Smajlović, Lejla; Eminović, Izet; Berbić, Irfan; Miloš, Ana; Parsons, Thomas J.
2007-01-01
Aim To present a compendium of off-ladder alleles and other genotyping irregularities relating to rare/unexpected population genetic variation, observed in a large short tandem repeat (STR) database from Bosnia and Serbia. Methods DNA was extracted from blood stain cards relating to reference samples from a population of 32 800 individuals from Bosnia and Serbia, and typed using Promega’s PowerPlex®16 STR kit. Results There were 31 distinct off-ladder alleles were observed in 10 of the 15 STR loci amplified from the PowerPlex®16 STR kit. Of these 31 alleles, 3 have not been previously reported. Furthermore, 16 instances of triallelic patterns were observed in 9 of the 15 loci. Primer binding site mismatches that affected amplification were observed in two loci, D5S818 and D8S1179. Conclusion Instances of deviations from manufacturer’s allelic ladders should be expected and caution taken to properly designate the correct alleles in large DNA databases. Particular care should be taken in kinship matching or paternity cases as incorrect designation of any of these deviations from allelic ladders could lead to false exclusions. PMID:17696304
Wang, Penghao; Wilson, Susan R
2013-01-01
Mass spectrometry-based protein identification is a very challenging task. The main identification approaches include de novo sequencing and database searching. Both approaches have shortcomings, so an integrative approach has been developed. The integrative approach firstly infers partial peptide sequences, known as tags, directly from tandem spectra through de novo sequencing, and then puts these sequences into a database search to see if a close peptide match can be found. However the current implementation of this integrative approach has several limitations. Firstly, simplistic de novo sequencing is applied and only very short sequence tags are used. Secondly, most integrative methods apply an algorithm similar to BLAST to search for exact sequence matches and do not accommodate sequence errors well. Thirdly, by applying these methods the integrated de novo sequencing makes a limited contribution to the scoring model which is still largely based on database searching. We have developed a new integrative protein identification method which can integrate de novo sequencing more efficiently into database searching. Evaluated on large real datasets, our method outperforms popular identification methods.
Teaching Database Design with Constraint-Based Tutors
ERIC Educational Resources Information Center
Mitrovic, Antonija; Suraweera, Pramuditha
2016-01-01
Design tasks are difficult to teach, due to large, unstructured solution spaces, underspecified problems, non-existent problem solving algorithms and stopping criteria. In this paper, we comment on our approach to develop KERMIT, a constraint-based tutor that taught database design. In later work, we re-implemented KERMIT as EER-Tutor, and…
LandScan 2016 High-Resolution Global Population Data Set
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bright, Edward A; Rose, Amy N; Urban, Marie L
The LandScan data set is a worldwide population database compiled on a 30" x 30" latitude/longitude grid. Census counts (at sub-national level) were apportioned to each grid cell based on likelihood coefficients, which are based on land cover, slope, road proximity, high-resolution imagery, and other data sets. The LandScan data set was developed as part of Oak Ridge National Laboratory (ORNL) Global Population Project for estimating ambient populations at risk.
Development and implementation of a psychotherapy tracking database in primary care.
Craner, Julia R; Sawchuk, Craig N; Mack, John D; LeRoy, Michelle A
2017-06-01
Although there is a rapid increase in the integration of behavioral health services in primary care, few studies have evaluated the effectiveness of these services in real-world clinical settings, in part due to the difficulty of translating traditional mental health research designs to this setting. Accordingly, innovative approaches are needed to fit the unique challenges of conducting research in primary care. The development and implementation of one such approach is described in this article. A continuously populating database for psychotherapy services was implemented across 5 primary care clinics in a large health system to assess several levels of patient care, including service utilization, symptomatic outcomes, and session-by-session use of psychotherapy principles by providers. Each phase of implementation revealed challenges, including clinician time, dissemination to clinics with different resources, and fidelity of data collection strategy across providers, as well as benefits, including the generation of useful data to inform clinical care, program development, and empirical research. The feasible and sustainable implementation of data collection for routine clinical practice in primary care has the potential to fuel the evidence base around integrated care. The current project describes the development of an innovative approach that, with further empirical study and refinement, could enable health care professionals and systems to understand their population and clinical process in a way that addresses essential gaps in the integrated care literature. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Comparison of the Frontier Distributed Database Caching System to NoSQL Databases
NASA Astrophysics Data System (ADS)
Dykstra, Dave
2012-12-01
One of the main attractions of non-relational “NoSQL” databases is their ability to scale to large numbers of readers, including readers spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects for Conditions data, is based on traditional SQL databases but also adds high scalability and the ability to be distributed over a wide-area for an important subset of applications. This paper compares the major characteristics of the two different approaches and identifies the criteria for choosing which approach to prefer over the other. It also compares in some detail the NoSQL databases used by CMS and ATLAS: MongoDB, CouchDB, HBase, and Cassandra.
Comparison of the Frontier Distributed Database Caching System to NoSQL Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dykstra, Dave
One of the main attractions of non-relational NoSQL databases is their ability to scale to large numbers of readers, including readers spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects for Conditions data, is based on traditional SQL databases but also adds high scalability and the ability to be distributed over a wide-area for an important subset of applications. This paper compares the major characteristics of the two different approaches and identifies the criteria for choosing which approach to prefer over the other. It alsomore » compares in some detail the NoSQL databases used by CMS and ATLAS: MongoDB, CouchDB, HBase, and Cassandra.« less
Population Education Accessions List. January-April, 1999.
ERIC Educational Resources Information Center
United Nations Educational, Scientific, and Cultural Organization, Bangkok (Thailand). Regional Office for Education in Asia and the Pacific.
This document features output from a computerized bibliographic database. The list categorizes entries into three parts. Part I, Population Education, consists of titles that address various aspects of population education arranged by country in the first section and general materials in the second. Part II, Knowledge Base Information, consists of…
A Database as a Service for the Healthcare System to Store Physiological Signal Data.
Chang, Hsien-Tsung; Lin, Tsai-Huei
2016-01-01
Wearable devices that measure physiological signals to help develop self-health management habits have become increasingly popular in recent years. These records are conducive for follow-up health and medical care. In this study, based on the characteristics of the observed physiological signal records- 1) a large number of users, 2) a large amount of data, 3) low information variability, 4) data privacy authorization, and 5) data access by designated users-we wish to resolve physiological signal record-relevant issues utilizing the advantages of the Database as a Service (DaaS) model. Storing a large amount of data using file patterns can reduce database load, allowing users to access data efficiently; the privacy control settings allow users to store data securely. The results of the experiment show that the proposed system has better database access performance than a traditional relational database, with a small difference in database volume, thus proving that the proposed system can improve data storage performance.
A Database as a Service for the Healthcare System to Store Physiological Signal Data
Lin, Tsai-Huei
2016-01-01
Wearable devices that measure physiological signals to help develop self-health management habits have become increasingly popular in recent years. These records are conducive for follow-up health and medical care. In this study, based on the characteristics of the observed physiological signal records– 1) a large number of users, 2) a large amount of data, 3) low information variability, 4) data privacy authorization, and 5) data access by designated users—we wish to resolve physiological signal record-relevant issues utilizing the advantages of the Database as a Service (DaaS) model. Storing a large amount of data using file patterns can reduce database load, allowing users to access data efficiently; the privacy control settings allow users to store data securely. The results of the experiment show that the proposed system has better database access performance than a traditional relational database, with a small difference in database volume, thus proving that the proposed system can improve data storage performance. PMID:28033415
Dietary choline and betaine intakes vary in an adult multiethnic population.
Yonemori, Kim M; Lim, Unhee; Koga, Karin R; Wilkens, Lynne R; Au, Donna; Boushey, Carol J; Le Marchand, Loïc; Kolonel, Laurence N; Murphy, Suzanne P
2013-06-01
Choline and betaine are important nutrients for human health, but reference food composition databases for these nutrients became available only recently. We tested the feasibility of using these databases to estimate dietary choline and betaine intakes among ethnically diverse adults who participated in the Multiethnic Cohort (MEC) Study. Of the food items (n = 965) used to quantify intakes for the MEC FFQ, 189 items were exactly matched with items in the USDA Database for the Choline Content of Common Foods for total choline, choline-containing compounds, and betaine, and 547 items were matched to the USDA National Nutrient Database for Standard Reference for total choline (n = 547) and 148 for betaine. When a match was not found, choline and betaine values were imputed based on the same food with a different form (124 food items for choline, 300 for choline compounds, 236 for betaine), a similar food (n = 98, 284, and 227, respectively) or the closest item in the same food category (n = 6, 191, and 157, respectively), or the values were assumed to be zero (n = 1, 1, and 8, respectively). The resulting mean intake estimates for choline and betaine among 188,147 MEC participants (aged 45-75) varied by sex (372 and 154 mg/d in men, 304 and 128 mg/d in women, respectively; P-heterogeneity < 0.0001) and by race/ethnicity among Caucasians, African Americans, Japanese Americans, Latinos, and Native Hawaiians (P-heterogeneity < 0.0001), largely due to the variation in energy intake. Our findings demonstrate the feasibility of assessing choline and betaine intake and characterize the variation in intake that exists in a multiethnic population.
Dietary Choline and Betaine Intakes Vary in an Adult Multiethnic Population123
Yonemori, Kim M.; Lim, Unhee; Koga, Karin R.; Wilkens, Lynne R.; Au, Donna; Boushey, Carol J.; Le Marchand, Loïc; Kolonel, Laurence N.; Murphy, Suzanne P.
2013-01-01
Choline and betaine are important nutrients for human health, but reference food composition databases for these nutrients became available only recently. We tested the feasibility of using these databases to estimate dietary choline and betaine intakes among ethnically diverse adults who participated in the Multiethnic Cohort (MEC) Study. Of the food items (n = 965) used to quantify intakes for the MEC FFQ, 189 items were exactly matched with items in the USDA Database for the Choline Content of Common Foods for total choline, choline-containing compounds, and betaine, and 547 items were matched to the USDA National Nutrient Database for Standard Reference for total choline (n = 547) and 148 for betaine. When a match was not found, choline and betaine values were imputed based on the same food with a different form (124 food items for choline, 300 for choline compounds, 236 for betaine), a similar food (n = 98, 284, and 227, respectively) or the closest item in the same food category (n = 6, 191, and 157, respectively), or the values were assumed to be zero (n = 1, 1, and 8, respectively). The resulting mean intake estimates for choline and betaine among 188,147 MEC participants (aged 45–75) varied by sex (372 and 154 mg/d in men, 304 and 128 mg/d in women, respectively; P-heterogeneity < 0.0001) and by race/ethnicity among Caucasians, African Americans, Japanese Americans, Latinos, and Native Hawaiians (P-heterogeneity < 0.0001), largely due to the variation in energy intake. Our findings demonstrate the feasibility of assessing choline and betaine intake and characterize the variation in intake that exists in a multiethnic population. PMID:23616508
Menditto, Enrica; Bolufer De Gea, Angela; Cahir, Caitriona; Marengoni, Alessandra; Riegler, Salvatore; Fico, Giuseppe; Costa, Elisio; Monaco, Alessandro; Pecorelli, Sergio; Pani, Luca; Prados-Torres, Alexandra
2016-01-01
Computerized health care databases have been widely described as an excellent opportunity for research. The availability of “big data” has brought about a wave of innovation in projects when conducting health services research. Most of the available secondary data sources are restricted to the geographical scope of a given country and present heterogeneous structure and content. Under the umbrella of the European Innovation Partnership on Active and Healthy Ageing, collaborative work conducted by the partners of the group on “adherence to prescription and medical plans” identified the use of observational and large-population databases to monitor medication-taking behavior in the elderly. This article describes the methodology used to gather the information from available databases among the Adherence Action Group partners with the aim of improving data sharing on a European level. A total of six databases belonging to three different European countries (Spain, Republic of Ireland, and Italy) were included in the analysis. Preliminary results suggest that there are some similarities. However, these results should be applied in different contexts and European countries, supporting the idea that large European studies should be designed in order to get the most of already available databases. PMID:27358570
Smeets, Hugo M; de Wit, Niek J; Hoes, Arno W
2011-04-01
Observational studies performed within routine health care databases have the advantage of their large size and, when the aim is to assess the effect of interventions, can offer a completion to randomized controlled trials with usually small samples from experimental situations. Institutional Health Insurance Databases (HIDs) are attractive for research because of their large size, their longitudinal perspective, and their practice-based information. As they are based on financial reimbursement, the information is generally reliable. The database of one of the major insurance companies in the Netherlands, the Agis Health Database (AHD), is described in detail. Whether the AHD data sets meet the specific requirements to conduct several types of clinical studies is discussed according to the classification of the four different types of clinical research; that is, diagnostic, etiologic, prognostic, and intervention research. The potential of the AHD for these various types of research is illustrated using examples of studies recently conducted in the AHD. HIDs such as the AHD offer large potential for several types of clinical research, in particular etiologic and intervention studies, but at present the lack of detailed clinical information is an important limitation. Copyright © 2011 Elsevier Inc. All rights reserved.
Special Section: The USMARC Community Information Format.
ERIC Educational Resources Information Center
Lutz, Marilyn; And Others
1992-01-01
Five papers discuss topics related to the USMARC Community Information Format (CIF), including using CIF to create a public service resource network; development of a CIF-based database of materials relating to multicultural and differently-abled populations; background on CIF; development of an information and referral database; and CIF and…
Salters, K A; Cescon, A; Zhang, W; Ogilvie, G; Murray, M C M; Coldman, A; Hamm, J; Chiu, C G; Montaner, J S G; Wiseman, S M; Money, D; Pick, N; Hogg, R S
2016-03-01
We used population-based data to identify incident cancer cases and correlates of cancer among women living with HIV/AIDS in British Columbia (BC), Canada between 1994 and 2008. Data were obtained from a retrospective population-based cohort created from linkage of two province-wide databases: (1) the database of the BC Cancer Agency, a province-wide population-based cancer registry, and (2) a database managed by the BC Centre for Excellence in HIV/AIDS, which contains data on all persons treated with antiretroviral therapy in BC. This analysis included women (≥ 19 years old) living with HIV in BC, Canada. Incident cancer diagnoses that occurred after highly active antiretroviral therapy (HAART) initiation were included. We obtained a general population comparison of cancer incidence among women from the BC Cancer Agency. Bivariate analysis (Pearson χ(2) , Fisher's exact or Wilcoxon rank-sum test) compared women with and without incident cancer across relevant clinical and sociodemographic variables. Standardized incidence ratios (SIRs) were calculated for selected cancers compared with the general population sample. We identified 2211 women with 12 529 person-years (PY) of follow-up who were at risk of developing cancer after HAART initiation. A total of 77 incident cancers (615/100 000 PY) were identified between 1994 and 2008. HIV-positive women with cancer, in comparison to the general population sample, were more likely to be diagnosed with invasive cervical cancer, Hodgkin's lymphoma, non-Hodgkin's lymphoma and Kaposi's sarcoma and less likely to be diagnosed with cancers of the digestive system. This study observed elevated rates of cancer among HIV-positive women compared to a general population sample. HIV-positive women may have an increased risk for cancers of viral-related pathogenesis. © 2015 British HIV Association.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mahmood, Usama, E-mail: usama.mahmood@gmail.com; Morris, Christopher; Neuner, Geoffrey
2012-08-01
Purpose: To evaluate survival outcomes of young women with early-stage breast cancer treated with breast conservation therapy (BCT) or mastectomy, using a large, population-based database. Methods and Materials: Using the Surveillance, Epidemiology, and End Results (SEER) database, information was obtained for all female patients, ages 20 to 39 years old, diagnosed with T1-2 N0-1 M0 breast cancer between 1990 and 2007, who underwent either BCT (lumpectomy and radiation treatment) or mastectomy. Multivariable and matched pair analyses were performed to compare overall survival (OS) and cause-specific survival (CSS) of patients undergoing BCT and mastectomy. Results: A total of 14,764 women weremore » identified, of whom 45% received BCT and 55% received mastectomy. Median follow-up was 5.7 years (range, 0.5-17.9 years). After we accounted for all patient and tumor characteristics, multivariable analysis found that BCT resulted in OS (hazard ratio [HR], 0.93; 95% confidence interval [CI], 0.83-1.04; p = 0.16) and CSS (HR, 0.93; CI, 0.83-1.05; p = 0.26) similar to that of mastectomy. Matched pair analysis, including 4,644 BCT and mastectomy patients, confirmed no difference in OS or CSS: the 5-, 10-, and15-year OS rates for BCT and mastectomy were 92.5%, 83.5%, and 77.0% and 91.9%, 83.6%, and 79.1%, respectively (p = 0.99), and the 5-, 10-, and 15-year CSS rates for BCT and mastectomy were 93.3%, 85.5%, and 79.9% and 92.5%, 85.5%, and 81.9%, respectively (p = 0.88). Conclusions: Our analysis of this population-based database suggests that young women with early-stage breast cancer have similar survival rates whether treated with BCT or mastectomy. These patients should be counseled appropriately regarding their treatment options and should not choose a mastectomy based on the assumption of improved survival.« less
Method for the reduction of image content redundancy in large image databases
Tobin, Kenneth William; Karnowski, Thomas P.
2010-03-02
A method of increasing information content for content-based image retrieval (CBIR) systems includes the steps of providing a CBIR database, the database having an index for a plurality of stored digital images using a plurality of feature vectors, the feature vectors corresponding to distinct descriptive characteristics of the images. A visual similarity parameter value is calculated based on a degree of visual similarity between features vectors of an incoming image being considered for entry into the database and feature vectors associated with a most similar of the stored images. Based on said visual similarity parameter value it is determined whether to store or how long to store the feature vectors associated with the incoming image in the database.
Thailand mutation and variation database (ThaiMUT).
Ruangrit, Uttapong; Srikummool, Metawee; Assawamakin, Anunchai; Ngamphiw, Chumpol; Chuechote, Suparat; Thaiprasarnsup, Vilasinee; Agavatpanitch, Gallissara; Pasomsab, Ekawat; Yenchitsomanus, Pa-Thai; Mahasirimongkol, Surakameth; Chantratita, Wasun; Palittapongarnpim, Prasit; Uyyanonvara, Bunyarit; Limwongse, Chanin; Tongsima, Sissades
2008-08-01
With the completion of the human genome project, novel sequencing and genotyping technologies had been utilized to detect mutations. Such mutations have continually been produced at exponential rate by researchers in various communities. Based on the population's mutation spectra, occurrences of Mendelian diseases are different across ethnic groups. A proportion of Mendelian diseases can be observed in some countries at higher rates than others. Recognizing the importance of mutation effects in Thailand, we established a National and Ethnic Mutation Database (NEMDB) for Thai people. This database, named Thailand Mutation and Variation database (ThaiMUT), offers a web-based access to genetic mutation and variation information in Thai population. This NEMDB initiative is an important informatics tool for both research and clinical purposes to retrieve and deposit human variation data. The mutation data cataloged in ThaiMUT database were derived from journal articles available in PubMed and local publications. In addition to collected mutation data, ThaiMUT also records genetic polymorphisms located in drug related genes. ThaiMUT could then provide useful information for clinical mutation screening services for Mendelian diseases and pharmacogenomic researches. ThaiMUT can be publicly accessed from http://gi.biotec.or.th/thaimut.
Extreme Gleason Upgrading From Biopsy to Radical Prostatectomy: A Population-based Analysis.
Winters, Brian R; Wright, Jonathan L; Holt, Sarah K; Lin, Daniel W; Ellis, William J; Dalkin, Bruce L; Schade, George R
2016-10-01
To examine the risk factors associated with the odds of extreme Gleason upgrading at radical prostatectomy (RP) (defined as a Gleason prognostic group score increase of ≥2), we utilized a large, population-based cancer registry. The Surveillance, Epidemiologic, and End Results database was queried (2010-2011) for all patients diagnosed with Gleason 3 + 3 or 3 + 4 on prostate needle biopsy. Available clinicopathologic factors and the odds of upgrading and extreme upgrading at RP were evaluated using multivariate logistic regression. A total of 12,459 patients were identified, with a median age of 61 (interquartile range: 56-65) and a diagnostic prostate-specific antigen (PSA) of 5.5 ng/mL (interquartile range: 4.3-7.5). Upgrading was observed in 34% of men, including 44% of 7402 patients with Gleason 3 + 3 and 19% of 5057 patients with Gleason 3 + 4 disease. Age, clinical stage, diagnostic PSA, and % prostate needle biopsy cores positive were independently associated with odds of any upgrading at RP. In baseline Gleason 3 + 3 disease, extreme upgrading was observed in 6%, with increasing age, diagnostic PSA, and >50% core positivity associated with increased odds. In baseline Gleason 3 + 4 disease, extreme upgrading was observed in 4%, with diagnostic PSA and palpable disease remaining predictive. Positive surgical margins were significantly higher in patients with extreme upgrading at RP (P < .001). Gleason upgrading at RP is common in this large population-based cohort, including extreme upgrading in a clinically significant portion. Copyright © 2016 Elsevier Inc. All rights reserved.
Polednak, Anthony P
2013-01-01
Inaccuracies in primary liver cancer (ie, excluding intrahepatic bile duct [IHBD]) or IHBD cancer as the underlying cause of death on the death certificate vs the cancer site in a cancer registry should be considered in surveillance of mortality rates in the population. Concordance between cancer site on the death record (1999-2010) and diagnosis (1973-2010) in the database for 9 cancer registries of the Surveillance, Epidemiology, and End Results (SEER) Program was examined for decedents with only 1 cancer recorded. Overreporting of deaths coded to liver cancer (ie, lack of confirmation in SEER) was largely balanced by underreporting (ie, a cancer site other than liver cancer in SEER). For IHBD cancer, overreporting was much more frequent than underreporting. Using modified rates, based on the most accurate numerators available, had little impact on trends for liver cancer in the SEER population, which were similar to trends for the entire US population based on routine statistics. An increase in the death rate for IHBD cancer, however, was no longer evident after modification. The findings support the use of routine data on underlying cause of death for surveillance of trends in death rates for liver cancer but not for IHBD cancer. Additional population-based cancer registries could potentially be used for surveillance of recent and future trends in mortality rates from these cancers.
Poon, Art F. Y.; Joy, Jeffrey B.; Woods, Conan K.; Shurgold, Susan; Colley, Guillaume; Brumme, Chanson J.; Hogg, Robert S.; Montaner, Julio S. G.; Harrigan, P. Richard
2015-01-01
Background. The diversification of human immunodeficiency virus (HIV) is shaped by its transmission history. We therefore used a population based province wide HIV drug resistance database in British Columbia (BC), Canada, to evaluate the impact of clinical, demographic, and behavioral factors on rates of HIV transmission. Methods. We reconstructed molecular phylogenies from 27 296 anonymized bulk HIV pol sequences representing 7747 individuals in BC—about half the estimated HIV prevalence in BC. Infections were grouped into clusters based on phylogenetic distances, as a proxy for variation in transmission rates. Rates of cluster expansion were reconstructed from estimated dates of HIV seroconversion. Results. Our criteria grouped 4431 individuals into 744 clusters largely separated with respect to risk factors, including large established clusters predominated by injection drug users and more-recently emerging clusters comprising men who have sex with men. The mean log10 viral load of an individual's phylogenetic neighborhood (composed of 5 other individuals with shortest phylogenetic distances) increased their odds of appearing in a cluster by >2-fold per log10 viruses per milliliter. Conclusions. Hotspots of ongoing HIV transmission can be characterized in near real time by the secondary analysis of HIV resistance genotypes, providing an important potential resource for targeting public health initiatives for HIV prevention. PMID:25312037
Park, Hae-Min; Park, Ju-Hyeong; Kim, Yoon-Woo; Kim, Kyoung-Jin; Jeong, Hee-Jin; Jang, Kyoung-Soon; Kim, Byung-Gee; Kim, Yun-Gon
2013-11-15
In recent years, the improvement of mass spectrometry-based glycomics techniques (i.e. highly sensitive, quantitative and high-throughput analytical tools) has enabled us to obtain a large dataset of glycans. Here we present a database named Xeno-glycomics database (XDB) that contains cell- or tissue-specific pig glycomes analyzed with mass spectrometry-based techniques, including a comprehensive pig glycan information on chemical structures, mass values, types and relative quantities. It was designed as a user-friendly web-based interface that allows users to query the database according to pig tissue/cell types or glycan masses. This database will contribute in providing qualitative and quantitative information on glycomes characterized from various pig cells/organs in xenotransplantation and might eventually provide new targets in the α1,3-galactosyltransferase gene-knock out pigs era. The database can be accessed on the web at http://bioinformatics.snu.ac.kr/xdb.
Phenotypic characterization and genealogical tracing in an Afrikaner schizophrenia database.
Karayiorgou, Maria; Torrington, Marie; Abecasis, Gonçalo R; Pretorius, Herman; Robertson, Brian; Kaliski, Sean; Lay, Stephen; Sobin, Christina; Möller, Natalie; Lundy, S Laura; Blundell, Maude L; Gogos, Joseph A; Roos, J Louw
2004-01-01
Founder populations hold tremendous promise for mapping genes for complex traits, as they offer less genetic and environmental heterogeneity and greater potential for genealogical research. Not all founder populations are equally valuable, however. The Afrikaner population meets several criteria that make it an ideal population for mapping complex traits, including founding by a small number of initial founders that likely allowed for a relatively restricted set of mutations and a large current population size that allows identification of a sufficient number of cases. Here, we examine the potential to conduct genealogical research in this population and present initial results indicating that accurate genealogical tracing for up to 17 generations is feasible. We also examine the clinical similarities of schizophrenia cases diagnosed in South Africa and those diagnosed in other, heterogeneous populations, specifically the US. We find that, with regard to basic sample descriptors and cardinal symptoms of disease, the two populations are equivalent. It is, therefore, likely that results from our genetic study of schizophrenia will be applicable to other populations. Based on the results presented here, the history and current size of the population, as well as our previous analysis addressing the extent of background linkage disequilibrium (LD) in the Afrikaners, we conclude that the Afrikaner population is likely an appropriate founder population to map genes for schizophrenia using both linkage and LD approaches. Copyright 2003 Wiley-Liss, Inc.
Understanding How Principals Use Data Dashboards to Inform Systemic School Improvement
ERIC Educational Resources Information Center
Marker, Kathryn Christner
2016-01-01
Because data access may be perceived by principals as overwhelming or irrelevant rather than helpful (Wayman, Spikes, & Volonnino, 2013), data access does not guarantee effective data use. The data-based decision making literature has largely focused on teacher use of data, considering less often data-based organizational improvements for the…
Building a structured monitoring and evaluating system of postmarketing drug use in Shanghai.
Du, Wenmin; Levine, Mitchell; Wang, Longxing; Zhang, Yaohua; Yi, Chengdong; Wang, Hongmin; Wang, Xiaoyu; Xie, Hongjuan; Xu, Jianglong; Jin, Huilin; Wang, Tongchun; Huang, Gan; Wu, Ye
2007-01-01
In order to understand a drug's full profile in the post-marketing environment, information is needed regarding utilization patterns, beneficial effects, ADRs and economic value. China, the most populated country in the world, has the largest number of people who are taking medications. To begin to appreciate the impact of these medications, a multifunctional evaluation and surveillance system was developed, the Shanghai Drug Monitoring and Evaluative System (SDMES). Set up by the Shanghai Center for Adverse Drug Reaction Monitoring in 2001, the SDMES contains three databases: a population health data base of middle aged and elderly persons; hospital patient medical records; and a spontaneous ADR reporting database. Each person has a unique identification and Medicare number, which permits record-linkage within and between these three databases. After more than three years in development, the population health database has comprehensive data for more than 320,000 residents. The hospital database has two years of inpatient medical records from five major hospitals, and will be increasing to 10 hospitals in 2007. The spontaneous reporting ADR database has collected 20,205 cases since 2001 from approximately 295 sources, including hospitals, pharmaceutical companies, drug wholesalers and pharmacies. The SDMES has the potential to become an important national and international pharmacoepidemiology resource for drug evaluation.
HormoneBase, a population-level database of steroid hormone levels across vertebrates
Vitousek, Maren N.; Johnson, Michele A.; Donald, Jeremy W.; Francis, Clinton D.; Fuxjager, Matthew J.; Goymann, Wolfgang; Hau, Michaela; Husak, Jerry F.; Kircher, Bonnie K.; Knapp, Rosemary; Martin, Lynn B.; Miller, Eliot T.; Schoenle, Laura A.; Uehling, Jennifer J.; Williams, Tony D.
2018-01-01
Hormones are central regulators of organismal function and flexibility that mediate a diversity of phenotypic traits from early development through senescence. Yet despite these important roles, basic questions about how and why hormone systems vary within and across species remain unanswered. Here we describe HormoneBase, a database of circulating steroid hormone levels and their variation across vertebrates. This database aims to provide all available data on the mean, variation, and range of plasma glucocorticoids (both baseline and stress-induced) and androgens in free-living and un-manipulated adult vertebrates. HormoneBase (www.HormoneBase.org) currently includes >6,580 entries from 476 species, reported in 648 publications from 1967 to 2015, and unpublished datasets. Entries are associated with data on the species and population, sex, year and month of study, geographic coordinates, life history stage, method and latency of hormone sampling, and analysis technique. This novel resource could be used for analyses of the function and evolution of hormone systems, and the relationships between hormonal variation and a variety of processes including phenotypic variation, fitness, and species distributions. PMID:29786693
A national database of incidence and treatment outcomes of status epilepticus in Thailand.
Tiamkao, Somsak; Pranbul, Sineenard; Sawanyawisuth, Kittisak; Thepsuthammarat, Kaewjai
2014-06-01
Status epilepticus (SE) is a serious neurological condition. The national database of SE in Thailand and other developing countries is limited in terms of incidence and treatment outcomes. This study was conducted on the prevalence of status epilepticus (SE). The study group comprised of adult inpatients (over 18 years old) with SE throughout Thailand. SE patients were diagnosed and searched based on ICD 10 (G41) from the national database. The database used was from reimbursement documents submitted by the hospitals under the three health insurance systems, namely, the universal health coverage insurance, social security, and government health welfare system during the fiscal year 2010. We found 2190 SE patients receiving treatment at hospitals (5.10/100 000 population). The average age was 50.5 years and 1413 patients were males (64.5%). Mortality rate was 0.6 death/100 000 population or 11.96% of total patients. Significant factors associated with death or a nonimproved status at discharge were type of insurance, hospital level, chronic kidney disease, having pneumonia, having shock, on mechanical ventilator, and having cardiopulmonary resuscitation. In conclusion, the incidence of SE in Thailand was 5.10/100 000 population with mortality rate of 0.6/100 000 population.
Predicting Language Outcome and Recovery After Stroke (PLORAS)
Price, CJ; Seghier, ML; Leff, AP
2013-01-01
The ability of comprehend and produce speech after stroke depends on whether the areas of the brain that support language have been damaged. Here we review two different ways to predict language outcome after stroke. The first depends on understanding the neural circuits that support language. This model-based approach is a challenging endeavor because language is a complex cognitive function that involves the interaction of many different brain areas. The second approach does not require an understanding of why a lesion impairs language, instead, predictions are made on the basis of how previous patients with the same lesion recovered. This requires a database storing the speech and language abilities of a large population of patients who have, between them, incurred a comprehensive range of focal brain damage. In addition it requires a system that converts an MRI scan from a new patient into a 3D description of the lesion and then compares this lesion to all others on the database. The outputs of this system are the longitudinal language outcomes of corresponding patients in the database. This will provide a new patient, their carers and the clinician team managing them the range of likely recovery patterns over a variety of language measures. PMID:20212513
Schneider, Jeffrey C; Tan, Wei-Han; Goldstein, Richard; Mix, Jacqueline M; Niewczyk, Paulette; Divita, Margaret A; Ryan, Colleen M; Gerrard, Paul B; Kowalske, Karen; Zafonte, Ross
2013-01-01
A preliminary investigation of the burn rehabilitation population found a large variability of zero onset day frequency between facilities. Onset days is defined as the time from injury to inpatient rehabilitation admission; this variable has not been investigated in burn patients previously. This study explored if this finding was a facility-based phenomena or characteristic of burn inpatient rehabilitation patients. This study was a secondary analysis of Uniform Data System for Medical Rehabilitation (UDSmr) data from 2002 to 2007 examining inpatient rehabilitation characteristics among patients with burn injuries. Exclusion criteria were age less than 18 years and discharge against medical advice. Comparisons of demographic, medical and functional data were made between facilities with a high frequency of zero onset days versus facilities with a low frequency of zero onset days. A total of 4738 patients from 455 inpatient rehabilitation facilities were included. Twenty-three percent of the population exhibited zero onset days (n = 1103). Sixteen facilities contained zero onset patients; two facilities accounted for 97% of the zero onset subgroup. Facilities with a high frequency of zero onset day patients demonstrated significant differences in demographic, medical, and functional variables compared to the remainder of the study population. There were significantly more zero onset day admissions among burn patients (23%) than other diagnostic groups (0.5- 3.6%) in the Uniform Data System for Medical Rehabilitation database, but the majority (97%) came from two inpatient rehabilitation facilities. It is unexpected for patients with significant burn injury to be admitted to a rehabilitation facility on the day of injury. Future studies investigating burn rehabilitation outcomes using the Uniform Data System for Medical Rehabilitation database should exclude facilities with a high percentage of zero onset days, which are not representative of the burn inpatient rehabilitation population.
A web-based relational database for monitoring and analyzing mosquito population dynamics.
Sucaet, Yves; Van Hemert, John; Tucker, Brad; Bartholomay, Lyric
2008-07-01
Mosquito population dynamics have been monitored on an annual basis in the state of Iowa since 1969. The primary goal of this project was to integrate light trap data from these efforts into a centralized back-end database and interactive website that is available through the internet at http://iowa-mosquito.ent.iastate.edu. For comparative purposes, all data were categorized according to the week of the year and normalized according to the number of traps running. Users can readily view current, weekly mosquito abundance compared with data from previous years. Additional interactive capabilities facilitate analyses of the data based on mosquito species, distribution, or a time frame of interest. All data can be viewed in graphical and tabular format and can be downloaded to a comma separated value (CSV) file for import into a spreadsheet or more specialized statistical software package. Having this long-term dataset in a centralized database/website is useful for informing mosquito and mosquito-borne disease control and for exploring the ecology of the species represented therein. In addition to mosquito population dynamics, this database is available as a standardized platform that could be modified and applied to a multitude of projects that involve repeated collection of observational data. The development and implementation of this tool provides capacity for the user to mine data from standard spreadsheets into a relational database and then view and query the data in an interactive website.
Huang, Shih-Wei; Lin, Jia-Wei; Wang, Wei-Te; Wu, Chin-Wen; Liou, Tsan-Hon; Lin, Hui-Wen
2014-01-01
The purpose of this study was to investigate the prevalence and risk of adhesive capsulitis among hyperthyroidism patients. The data were obtained from the Longitudinal Health Insurance Database 2005 (LHID 2005) in Taiwan, using 1 million participants and a prospective population-based 7-year cohort study of survival analysis. The ambulatory-care claim records of patients diagnosed according to the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes relating to hyperthyroidism between January 1, 2004 and December 31, 2007, were obtained. The prevalence and the adjusted hazard ratio (HR) of adhesive capsulitis among hyperthyroid patients and the control group were estimated. Of 4472 hyperthyroid patients, 162 (671/100 000 person-years) experienced adhesive capsulitis during the 24 122 person-year follow-up period. The crude HR of stroke was 1.26 (95% confidence interval [CI], 1.06 to 1.49), which was larger than that of the control group. The adjusted HR of developing adhesive capsulitis was 1.22 (95% CI, 1.03 to 1.45) for hyperthyroid patients during the 7-year follow-up period, which achieved statistical significance. The results of our large-scale longitudinal population-based study indicated that hyperthyroidism is an independent risk factor of developing adhesive capsulitis. PMID:24567049
Huang, Shih-Wei; Lin, Jia-Wei; Wang, Wei-Te; Wu, Chin-Wen; Liou, Tsan-Hon; Lin, Hui-Wen
2014-02-25
The purpose of this study was to investigate the prevalence and risk of adhesive capsulitis among hyperthyroidism patients. The data were obtained from the Longitudinal Health Insurance Database 2005 (LHID 2005) in Taiwan, using 1 million participants and a prospective population-based 7-year cohort study of survival analysis. The ambulatory-care claim records of patients diagnosed according to the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes relating to hyperthyroidism between January 1, 2004 and December 31, 2007, were obtained. The prevalence and the adjusted hazard ratio (HR) of adhesive capsulitis among hyperthyroid patients and the control group were estimated. Of 4472 hyperthyroid patients, 162 (671/100,000 person-years) experienced adhesive capsulitis during the 24,122 person-year follow-up period. The crude HR of stroke was 1.26 (95% confidence interval [CI], 1.06 to 1.49), which was larger than that of the control group. The adjusted HR of developing adhesive capsulitis was 1.22 (95% CI, 1.03 to 1.45) for hyperthyroid patients during the 7-year follow-up period, which achieved statistical significance. The results of our large-scale longitudinal population-based study indicated that hyperthyroidism is an independent risk factor of developing adhesive capsulitis.
Helewa, Ramzi M; Turner, Donna; Wirtzfeld, Debrah; Park, Jason; Hochman, David; Czaykowski, Piotr; Singh, Harminder; Shu, Emma; Xue, Lin; McKay, Andrew
2013-06-17
The Canadian province of Manitoba covers a large geographical area but only has one major urban center, Winnipeg. We sought to determine if regional differences existed in the quality of colorectal cancer care in a publicly funded health care system. This was a population-based historical cohort analysis of the treatment and outcomes of Manitobans diagnosed with colorectal cancer between 2004 and 2006. Administrative databases were utilized to assess quality of care using published quality indicators. A total of 2,086 patients were diagnosed with stage I to IV colorectal cancer and 42.2% lived outside of Winnipeg. Patients from North Manitoba had a lower odds of undergoing major surgery after controlling for other confounders (odds ratio (OR): 0.48, 95% confidence interval (CI): 0.26 to 0.90). No geographic differences existed in the quality measures of 30-day operative mortality, consultations with oncologists, surveillance colonoscopy, and 5-year survival. However, there was a trend towards lower survival in North Manitoba. We found minimal differences by geography. However, overall compliance with quality measures is low and there are concerning trends in North Manitoba. This study is one of the few to evaluate population-based benchmarks for colorectal cancer therapy in Canada.
Techniques for Efficiently Managing Large Geosciences Data Sets
NASA Astrophysics Data System (ADS)
Kruger, A.; Krajewski, W. F.; Bradley, A. A.; Smith, J. A.; Baeck, M. L.; Steiner, M.; Lawrence, R. E.; Ramamurthy, M. K.; Weber, J.; Delgreco, S. A.; Domaszczynski, P.; Seo, B.; Gunyon, C. A.
2007-12-01
We have developed techniques and software tools for efficiently managing large geosciences data sets. While the techniques were developed as part of an NSF-Funded ITR project that focuses on making NEXRAD weather data and rainfall products available to hydrologists and other scientists, they are relevant to other geosciences disciplines that deal with large data sets. Metadata, relational databases, data compression, and networking are central to our methodology. Data and derived products are stored on file servers in a compressed format. URLs to, and metadata about the data and derived products are managed in a PostgreSQL database. Virtually all access to the data and products is through this database. Geosciences data normally require a number of processing steps to transform the raw data into useful products: data quality assurance, coordinate transformations and georeferencing, applying calibration information, and many more. We have developed the concept of crawlers that manage this scientific workflow. Crawlers are unattended processes that run indefinitely, and at set intervals query the database for their next assignment. A database table functions as a roster for the crawlers. Crawlers perform well-defined tasks that are, except for perhaps sequencing, largely independent from other crawlers. Once a crawler is done with its current assignment, it updates the database roster table, and gets its next assignment by querying the database. We have developed a library that enables one to quickly add crawlers. The library provides hooks to external (i.e., C-language) compiled codes, so that developers can work and contribute independently. Processes called ingesters inject data into the system. The bulk of the data are from a real-time feed using UCAR/Unidata's IDD/LDM software. An exciting recent development is the establishment of a Unidata HYDRO feed that feeds value-added metadata over the IDD/LDM. Ingesters grab the metadata and populate the PostgreSQL tables. These and other concepts we have developed have enabled us to efficiently manage a 70 Tb (and growing) data weather radar data set.
Huang, Wei-Yi; Chen, Yu-Fen; Carter, Stacey; Chang, Hong-Chiang; Lan, Chung-Fu; Huang, Kuo-How
2013-06-01
We investigated the epidemiology of upper urinary tract stone disease in Taiwan using a nationwide, population based database. This study was based on the National Health Insurance Research Database of Taiwan, which contains data on all medical beneficiary claims from 22.72 million enrollees, accounting for almost 99% of the Taiwanese population. The Longitudinal Health Insurance Database 2005, a subset of the National Health Insurance Research Database, contains data on all medical benefit claims from 1997 through 2010 for a subset of 1 million beneficiaries randomly sampled from the 2005 enrollment file. For epidemiological analysis we selected subjects whose claims records included the diagnosis of upper urinary tract urolithiasis. The age adjusted rate of medical care visits for upper urinary tract urolithiasis decreased by 6.5% from 1,367/100,000 subjects in 1998 to 1,278/100,000 in 2010. There was a significantly decreasing trend during the 13-year period in visits from female and all subjects (r(2) = 0.86, p = 0.001 and r(2) = 0.52, p = 0.005, respectively). In contrast, an increasing trend was noted for male subjects (r(2) = 0.45, p = 0.012). The age adjusted prevalence in 2010 was 9.01%, 5.79% and 7.38% in male, female and all subjects, respectively. The overall recurrence rate at 1 and 5 years was 6.12% and 34.71%, respectively. Male subjects had a higher recurrence rate than female subjects. Our study provides important information on the epidemiology of upper urinary tract stone disease in Taiwan, helping to quantify the burden of urolithiasis and establish strategies to decrease the risk of urolithiasis. Copyright © 2013 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
Chess databases as a research vehicle in psychology: Modeling large data.
Vaci, Nemanja; Bilalić, Merim
2017-08-01
The game of chess has often been used for psychological investigations, particularly in cognitive science. The clear-cut rules and well-defined environment of chess provide a model for investigations of basic cognitive processes, such as perception, memory, and problem solving, while the precise rating system for the measurement of skill has enabled investigations of individual differences and expertise-related effects. In the present study, we focus on another appealing feature of chess-namely, the large archive databases associated with the game. The German national chess database presented in this study represents a fruitful ground for the investigation of multiple longitudinal research questions, since it collects the data of over 130,000 players and spans over 25 years. The German chess database collects the data of all players, including hobby players, and all tournaments played. This results in a rich and complete collection of the skill, age, and activity of the whole population of chess players in Germany. The database therefore complements the commonly used expertise approach in cognitive science by opening up new possibilities for the investigation of multiple factors that underlie expertise and skill acquisition. Since large datasets are not common in psychology, their introduction also raises the question of optimal and efficient statistical analysis. We offer the database for download and illustrate how it can be used by providing concrete examples and a step-by-step tutorial using different statistical analyses on a range of topics, including skill development over the lifetime, birth cohort effects, effects of activity and inactivity on skill, and gender differences.
Nelson, Matthew R.; Bryc, Katarzyna; King, Karen S.; Indap, Amit; Boyko, Adam R.; Novembre, John; Briley, Linda P.; Maruyama, Yuka; Waterworth, Dawn M.; Waeber, Gérard; Vollenweider, Peter; Oksenberg, Jorge R.; Hauser, Stephen L.; Stirnadel, Heide A.; Kooner, Jaspal S.; Chambers, John C.; Jones, Brendan; Mooser, Vincent; Bustamante, Carlos D.; Roses, Allen D.; Burns, Daniel K.; Ehm, Margaret G.; Lai, Eric H.
2008-01-01
Technological and scientific advances, stemming in large part from the Human Genome and HapMap projects, have made large-scale, genome-wide investigations feasible and cost effective. These advances have the potential to dramatically impact drug discovery and development by identifying genetic factors that contribute to variation in disease risk as well as drug pharmacokinetics, treatment efficacy, and adverse drug reactions. In spite of the technological advancements, successful application in biomedical research would be limited without access to suitable sample collections. To facilitate exploratory genetics research, we have assembled a DNA resource from a large number of subjects participating in multiple studies throughout the world. This growing resource was initially genotyped with a commercially available genome-wide 500,000 single-nucleotide polymorphism panel. This project includes nearly 6,000 subjects of African-American, East Asian, South Asian, Mexican, and European origin. Seven informative axes of variation identified via principal-component analysis (PCA) of these data confirm the overall integrity of the data and highlight important features of the genetic structure of diverse populations. The potential value of such extensively genotyped collections is illustrated by selection of genetically matched population controls in a genome-wide analysis of abacavir-associated hypersensitivity reaction. We find that matching based on country of origin, identity-by-state distance, and multidimensional PCA do similarly well to control the type I error rate. The genotype and demographic data from this reference sample are freely available through the NCBI database of Genotypes and Phenotypes (dbGaP). PMID:18760391
NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases
Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor
2013-01-01
RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments. PMID:23984425
MGIS: managing banana (Musa spp.) genetic resources information and high-throughput genotyping data
Guignon, V.; Sempere, G.; Sardos, J.; Hueber, Y.; Duvergey, H.; Andrieu, A.; Chase, R.; Jenny, C.; Hazekamp, T.; Irish, B.; Jelali, K.; Adeka, J.; Ayala-Silva, T.; Chao, C.P.; Daniells, J.; Dowiya, B.; Effa effa, B.; Gueco, L.; Herradura, L.; Ibobondji, L.; Kempenaers, E.; Kilangi, J.; Muhangi, S.; Ngo Xuan, P.; Paofa, J.; Pavis, C.; Thiemele, D.; Tossou, C.; Sandoval, J.; Sutanto, A.; Vangu Paka, G.; Yi, G.; Van den houwe, I.; Roux, N.
2017-01-01
Abstract Unraveling the genetic diversity held in genebanks on a large scale is underway, due to advances in Next-generation sequence (NGS) based technologies that produce high-density genetic markers for a large number of samples at low cost. Genebank users should be in a position to identify and select germplasm from the global genepool based on a combination of passport, genotypic and phenotypic data. To facilitate this, a new generation of information systems is being designed to efficiently handle data and link it with other external resources such as genome or breeding databases. The Musa Germplasm Information System (MGIS), the database for global ex situ-held banana genetic resources, has been developed to address those needs in a user-friendly way. In developing MGIS, we selected a generic database schema (Chado), the robust content management system Drupal for the user interface, and Tripal, a set of Drupal modules which links the Chado schema to Drupal. MGIS allows germplasm collection examination, accession browsing, advanced search functions, and germplasm orders. Additionally, we developed unique graphical interfaces to compare accessions and to explore them based on their taxonomic information. Accession-based data has been enriched with publications, genotyping studies and associated genotyping datasets reporting on germplasm use. Finally, an interoperability layer has been implemented to facilitate the link with complementary databases like the Banana Genome Hub and the MusaBase breeding database. Database URL: https://www.crop-diversity.org/mgis/ PMID:29220435
Patel, Mehul D; Wu, David; Chase, Monica Reed; Mavros, Panagiotis; Heithoff, Kim; Hanson, Mary E; Simpson, Ross J
2017-06-01
Estimates of residual cardiovascular risks among patients who have experienced a recent acute myocardial infarction (MI) are predominantly derived from secondary prevention trial populations, patient registries, and population-based cohorts. To generate real-world evidence of antiplatelet treatment and recurrent events following MI in patients on antiplatelet treatment among commercial, employer-based insured patients in a large administrative database. This was a retrospective cohort claims database study using the Truven Health MarketScan Commercial Claims and Encounters and Medicare Supplemental databases between 2007-2011. Patients with an acute MI hospitalization with a discharge date between 2008 and 2010 were included. Excluded were those patients with documentation of stroke, transient ischemic attack (TIA), or severe bleeding at or before index hospitalization and with concomitant use of anticoagulant therapy following index hospitalization. Patients treated with clopidogrel following the index MI hospitalization were followed up to 1 year for repeat MI, stroke, and coronary revascularization. Among 33,943 post-MI continuous clopidogrel users without history of stroke, TIA, or bleeding, 22% had diabetes, whereas angina and renal impairment were less prevalent (5% and 7%, respectively). Over the 1-year follow-up, 2.4% experienced a repeat MI or stroke, and 8.2% underwent coronary revascularization. Angina, diabetes, and renal impairment were associated with elevated 1-year risk of repeat MI or stroke. This study suggests that there is residual cardiovascular risk, although relatively low, in an insured, secondary prevention population on antiplatelet treatment following an MI. In patients with MI, identifying angina, diabetes, and renal impairment may aid risk stratification and guide the effective management of these higher-risk patients. Funding for this research was provided by Merck & Co. Although Merck & Co. formally reviewed a penultimate draft, the opinions expressed are those of the authorship and may not necessarily reflect those of the company. Reed Chase, Wu, Mavros, Heithoff, and Hanson are employees of Merck Sharp & Dohme, a subsidiary of Merck & Co., and may own stock and/or hold stock options in the company. Patel was an employee of Merck & Co. during the conduct of this study and preparation of the manuscript. Simpson is a paid consultant for Merck, Pfizer, and Amgen and has received speaker's fees from Merck and Pfizer. Study concept and design were contributed by all authors except Hanson. Heifhoff and Patel collected the data, and data interpretation was performed by Simpson, Mavros, Patel, Wu, and Hanson. The manuscript was written by Hanson, Mavros, and Patel and revised by Heithoff, Wu, Simpson, and Reed Chase.
InterRett, a Model for International Data Collection in a Rare Genetic Disorder
ERIC Educational Resources Information Center
Louise, Sandra; Fyfe, Sue; Bebbington, Ami; Bahi-Buisson, Nadia; Anderson, Alison; Pineda, Merce; Percy, Alan; Zeev, Bruria Ben; Wu, Xi Ru; Bao, Xinhua; MacLeod, Patrick; Armstrong, Judith; Leonard, Helen
2009-01-01
Rett syndrome (RTT) is a rare genetic disorder within the autistic spectrum. This study compared socio-demographic, clinical and genetic characteristics of the international database, InterRett, and the population-based Australian Rett syndrome database (ARSD). It also explored the strengths and limitations of InterRett in comparison with other…
Migration and health in Canada: health in the global village
Gushulak, Brian D.; Pottie, Kevin; Roberts, Janet Hatcher; Torres, Sara; DesMeules, Marie
2011-01-01
Background: Immigration has been and remains an important force shaping Canadian demography and identity. Health characteristics associated with the movement of large numbers of people have current and future implications for migrants, health practitioners and health systems. We aimed to identify demographics and health status data for migrant populations in Canada. Methods: We systematically searched Ovid MEDLINE (1996–2009) and other relevant web-based databases to examine immigrant selection processes, demographic statistics, health status from population studies and health service implications associated with migration to Canada. Studies and data were selected based on relevance, use of recent data and quality. Results: Currently, immigration represents two-thirds of Canada’s population growth, and immigrants make up more than 20% of the nation’s population. Both of these metrics are expected to increase. In general, newly arriving immigrants are healthier than the Canadian population, but over time there is a decline in this healthy immigrant effect. Immigrants and children born to new immigrants represent growing cohorts; in some metropolitan regions of Canada, they represent the majority of the patient population. Access to health services and health conditions of some migrant populations differ from patterns among Canadian-born patients, and these disparities have implications for preventive care and provision of health services. Interpretation: Because the health characteristics of some migrant populations vary according to their origin and experience, improved understanding of the scope and nature of the immigration process will help practitioners who will be increasingly involved in the care of immigrant populations, including prevention, early detection of disease and treatment. PMID:20584934
The population, environment, and health nexus: an Arab world perspective.
Kulczycki, A; Saxena, P C
1998-01-01
This report describes models of the links between population growth, environmental degradation, and health in Arab countries and in the world; management of the commons; urbanization and water as critical issues; and challenges in Lebanon. It is concluded that the complexity of interrelationships is difficult to untangle. Researchers frequently neglect health issues in modeling the relationships. The lack of attention to the health, development, and environment nexus has serious implications in the Middle East and North Africa. In Lebanon, national strategies do not include a national waste management strategy based on reduction, reuse, and recycling. Most Arab countries face the major issue of the lack of adequate planning in many economic sectors, which results in imbalances in supply and demand. Most Arab countries do not have adequate statistical databases upon which to base development, planning, and policy-making. The last census in Lebanon was in 1932. Information is missing on health. Health economics are ignored. It is not possible to estimate the health costs due to deficiencies in sanitation, hygiene, water, and air quality. Capacity building for environmental management and intersectoral collaboration is hampered. Arab countries with large oil reserves have ignored the population and environment links. Poorer countries will suffer the most from limited renewable water resources and their decline due to population growth. The political agenda in Arab countries should give priority to health, environment, development, and population issues.
VIEWCACHE: An incremental pointer-based access method for autonomous interoperable databases
NASA Technical Reports Server (NTRS)
Roussopoulos, N.; Sellis, Timos
1992-01-01
One of biggest problems facing NASA today is to provide scientists efficient access to a large number of distributed databases. Our pointer-based incremental database access method, VIEWCACHE, provides such an interface for accessing distributed data sets and directories. VIEWCACHE allows database browsing and search performing inter-database cross-referencing with no actual data movement between database sites. This organization and processing is especially suitable for managing Astrophysics databases which are physically distributed all over the world. Once the search is complete, the set of collected pointers pointing to the desired data are cached. VIEWCACHE includes spatial access methods for accessing image data sets, which provide much easier query formulation by referring directly to the image and very efficient search for objects contained within a two-dimensional window. We will develop and optimize a VIEWCACHE External Gateway Access to database management systems to facilitate distributed database search.
Database systems for knowledge-based discovery.
Jagarlapudi, Sarma A R P; Kishan, K V Radha
2009-01-01
Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery.
Automatic detection of anomalies in screening mammograms
2013-01-01
Background Diagnostic performance in breast screening programs may be influenced by the prior probability of disease. Since breast cancer incidence is roughly half a percent in the general population there is a large probability that the screening exam will be normal. That factor may contribute to false negatives. Screening programs typically exhibit about 83% sensitivity and 91% specificity. This investigation was undertaken to determine if a system could be developed to pre-sort screening-images into normal and suspicious bins based on their likelihood to contain disease. Wavelets were investigated as a method to parse the image data, potentially removing confounding information. The development of a classification system based on features extracted from wavelet transformed mammograms is reported. Methods In the multi-step procedure images were processed using 2D discrete wavelet transforms to create a set of maps at different size scales. Next, statistical features were computed from each map, and a subset of these features was the input for a concerted-effort set of naïve Bayesian classifiers. The classifier network was constructed to calculate the probability that the parent mammography image contained an abnormality. The abnormalities were not identified, nor were they regionalized. The algorithm was tested on two publicly available databases: the Digital Database for Screening Mammography (DDSM) and the Mammographic Images Analysis Society’s database (MIAS). These databases contain radiologist-verified images and feature common abnormalities including: spiculations, masses, geometric deformations and fibroid tissues. Results The classifier-network designs tested achieved sensitivities and specificities sufficient to be potentially useful in a clinical setting. This first series of tests identified networks with 100% sensitivity and up to 79% specificity for abnormalities. This performance significantly exceeds the mean sensitivity reported in literature for the unaided human expert. Conclusions Classifiers based on wavelet-derived features proved to be highly sensitive to a range of pathologies, as a result Type II errors were nearly eliminated. Pre-sorting the images changed the prior probability in the sorted database from 37% to 74%. PMID:24330643
Pollack, Keshia M; Agnew, Jacqueline; Slade, Martin D; Cantley, Linda; Taiwo, Oyebode; Vegso, Sally; Sircar, Kanta; Cullen, Mark R
2007-09-01
Employer administrative files are an underutilized source of data in epidemiologic studies of occupational injuries. Personnel files, occupational health surveillance data, industrial hygiene data, and a real-time incident and injury management system from a large multi-site aluminum manufacturer were linked deterministically. An ecological-level measure of physical job demand was also linked. This method successfully created a database containing over 100 variables for 9,101 hourly employees from eight geographically dispersed U.S. plants. Between 2002 and 2004, there were 3,563 traumatic injuries to 2,495 employees. The most common injuries were sprain/strains (32%), contusions (24%), and lacerations (14%). A multivariable logistic regression model revealed that physical job demand was the strongest predictor of injury risk, in a dose dependent fashion. Other strong predictors of injury included female gender, young age, short company tenure and short time on current job. Employer administrative files are a useful source of data, as they permit the exploration of risk factors and potential confounders that are not included in many population-based surveys. The ability to link employer administrative files with injury surveillance data is a valuable analysis strategy for comprehensively studying workplace injuries, identifying salient risk factors, and targeting workforce populations disproportionately affected. (c) 2007 Wiley-Liss, Inc.
Chen, Huan-Sheng; Cheng, Chun-Ting; Hou, Chun-Cheng; Liou, Hung-Hsiang; Chang, Cheng-Tsung; Lin, Chun-Ju; Wu, Tsai-Kun; Chen, Chang-Hsu; Lim, Paik-Seong
2017-07-01
Rapid screening and monitoring of nutritional status is mandatory in hemodialysis population because of the increasingly encountered nutritional problems. Considering the limitations of previous composite nutrition scores applied in this population, we tried to develop a standardized composite nutrition score (SCNS) using low lean tissue index as a marker of protein wasting to facilitate clinical screening and monitoring and to predict outcome. This retrospective cohort used 2 databases of dialysis populations from Taiwan between 2011 and 2014. First database consisting of data from 629 maintenance hemodialysis patients was used to develop the SCNS and the second database containing data from 297 maintenance hemodialysis patients was used to validate this developed score. SCNS containing albumin, creatinine, potassium, and body mass index was developed from the first database using low lean tissue index as a marker of protein wasting. When applying this score in the original database, significantly higher risk of developing protein wasting was found for patients with lower SCNS (odds ratio 1.38 [middle tertile vs highest tertile, P < .0001] and 2.40 [lowest tertile vs middle tertile, P < .0001]). The risk of death was also shown to be higher for patients with lower SCNS (hazard ratio 4.45 [below median level vs above median level, P < .0001]). These results were validated in the second database. We developed an SCNS consisting of 4 easily available biochemical parameters. This kind of scoring system can be easily applied in different dialysis facilities for screening and monitoring of protein wasting. The wide application of body composition monitor in dialysis population will also facilitate the development of specific nutrition scoring model for individual facility. Copyright © 2017 National Kidney Foundation, Inc. Published by Elsevier Inc. All rights reserved.
Amoo-Achampong, Kelms; Rosas, Samuel; Schmoke, Nicholas; Accilien, Yves-Dany; Nwachukwu, Benedict U; McCormick, Frank
2017-09-01
To describe recent epidemiological trends in concussion diagnosis within the United States (US) population. We conducted a retrospective review of PearlDiver, a private-payor insurance database. Our search included International Classification of Disease, Ninth Revision codes for sports-related concussions spanning 2010 through 2014. Overall study population included patients aged 5 to 39 with subgroup analysis performed on Cohort A (Youth), children and adolescents aged 5 to 19, and Cohort B (Adults), adults aged 20 to 39. Incidence was defined as the number of individuals diagnosed normalized to the number of patients in the database for each demographic. Our search returned 1,599 patients diagnosed during the study period. The average (±SD) annual rate was 4.14 ± 1.42 per 100,000 patients for the overall population. Youth patients were diagnosed at a mean annual rate of 3.78 ± 1.30 versus 0.36 ± 0.16 per 100,000 in Adults. Concussion normalized incidence significantly increased from 2.47 to 3.87 per 100,000 patients (57%) in the Youth cohort (p = 0.048). In Adults, rate grew from 0.34 to 0.44 per 100,000 patients (29%) but was not statistically significant (p = 0.077). Four-year compound annual growth rates for Youth and Adults were 26.3% and 20.4%, respectively. Youth patients comprised 1,422/1,599 (90.18%) of all concussion diagnoses and were predominantly male (75%). Adults also constituted 138/1,599 (8.63%) of the sample and were also largely male (80%). Midwestern states had highest diagnostic rates (Cohort A:19 per 100,000 and Cohort B:1.8 per 100,000). Both cohorts had the most total diagnoses made in the fourth quarter followed by the second quarter. Sports-related concussion diagnostic rates have grown significantly in the youth population. Quarterly, regional and gender distributions appear consistent with participation in concussion-prone sports. Utilization of individualized and multifaceted approaches are recommended to advance diagnosis, assessment and management of concussions in the U.S.
NASA Astrophysics Data System (ADS)
Prata, F.; Stebel, K.
2013-12-01
Over the last few years there has been a recognition of the utility of satellite measurements to identify and track volcanic emissions that present a natural hazard to human populations. Mitigation of the volcanic hazard to life and the environment requires understanding of the properties of volcanic emissions, identifying the hazard in near real-time and being able to provide timely and accurate forecasts to affected areas. Amongst the many ways to measure volcanic emissions, satellite remote sensing is capable of providing global quantitative retrievals of important microphysical parameters such as ash mass loading, ash particle effective radius, infrared optical depth, SO2 partial and total column abundance, plume altitude, aerosol optical depth and aerosol absorbing index. The eruption of Eyjafjallajokull in April-May, 2010 led to increased research and measurement programs to better characterize properties of volcanic ash and the need to establish a data-base in which to store and access these data was confirmed. The European Space Agency (ESA) has recognized the importance of having a quality controlled data-base of satellite retrievals and has funded an activity (VAST) to develop novel remote sensing retrieval schemes and a data-base, initially focused on several recent hazardous volcanic eruptions. As a first step, satellite retrievals for the eruptions of Eyjafjallajokull, Grimsvotn, Puyhue-Cordon Caulle, Nabro, Merapi, Okmok, Kasatochi and Sarychev Peak are being considered. Here we describe the data, retrievals and methods being developed for the data-base. Three important applications of the data-base are illustrated related to the ash/aviation problem, to the impact of the Merapi volcanic eruption on the local population, and to estimate SO2 fluxes from active volcanoes-as a means to diagnose future unrest. Dispersion model simulations are also being included in the data-base. In time, data from conventional in situ sampling instruments, airborne and ground-based remote sensing platforms and other meta-data (bulk ash and gas properties, volcanic setting, volcanic eruption chronologies, hazards and impacts etc.) will be added. The data-base has the potential to provide the natural hazards community with the first dynamic atmospheric volcanic hazards map and will be a valuable tool particularly for global transport.
DNA-based methods of geochemical prospecting
Ashby, Matthew [Mill Valley, CA
2011-12-06
The present invention relates to methods for performing surveys of the genetic diversity of a population. The invention also relates to methods for performing genetic analyses of a population. The invention further relates to methods for the creation of databases comprising the survey information and the databases created by these methods. The invention also relates to methods for analyzing the information to correlate the presence of nucleic acid markers with desired parameters in a sample. These methods have application in the fields of geochemical exploration, agriculture, bioremediation, environmental analysis, clinical microbiology, forensic science and medicine.
Benchmarking distributed data warehouse solutions for storing genomic variant information
Wiewiórka, Marek S.; Wysakowicz, Dawid P.; Okoniewski, Michał J.
2017-01-01
Abstract Genomic-based personalized medicine encompasses storing, analysing and interpreting genomic variants as its central issues. At a time when thousands of patientss sequenced exomes and genomes are becoming available, there is a growing need for efficient database storage and querying. The answer could be the application of modern distributed storage systems and query engines. However, the application of large genomic variant databases to this problem has not been sufficiently far explored so far in the literature. To investigate the effectiveness of modern columnar storage [column-oriented Database Management System (DBMS)] and query engines, we have developed a prototypic genomic variant data warehouse, populated with large generated content of genomic variants and phenotypic data. Next, we have benchmarked performance of a number of combinations of distributed storages and query engines on a set of SQL queries that address biological questions essential for both research and medical applications. In addition, a non-distributed, analytical database (MonetDB) has been used as a baseline. Comparison of query execution times confirms that distributed data warehousing solutions outperform classic relational DBMSs. Moreover, pre-aggregation and further denormalization of data, which reduce the number of distributed join operations, significantly improve query performance by several orders of magnitude. Most of distributed back-ends offer a good performance for complex analytical queries, while the Optimized Row Columnar (ORC) format paired with Presto and Parquet with Spark 2 query engines provide, on average, the lowest execution times. Apache Kudu on the other hand, is the only solution that guarantees a sub-second performance for simple genome range queries returning a small subset of data, where low-latency response is expected, while still offering decent performance for running analytical queries. In summary, research and clinical applications that require the storage and analysis of variants from thousands of samples can benefit from the scalability and performance of distributed data warehouse solutions. Database URL: https://github.com/ZSI-Bio/variantsdwh PMID:29220442
DBMap: a TreeMap-based framework for data navigation and visualization of brain research registry
NASA Astrophysics Data System (ADS)
Zhang, Ming; Zhang, Hong; Tjandra, Donny; Wong, Stephen T. C.
2003-05-01
The purpose of this study is to investigate and apply a new, intuitive and space-conscious visualization framework to facilitate efficient data presentation and exploration of large-scale data warehouses. We have implemented the DBMap framework for the UCSF Brain Research Registry. Such a novel utility would facilitate medical specialists and clinical researchers in better exploring and evaluating a number of attributes organized in the brain research registry. The current UCSF Brain Research Registry consists of a federation of disease-oriented database modules, including Epilepsy, Brain Tumor, Intracerebral Hemorrphage, and CJD (Creuzfeld-Jacob disease). These database modules organize large volumes of imaging and non-imaging data to support Web-based clinical research. While the data warehouse supports general information retrieval and analysis, there lacks an effective way to visualize and present the voluminous and complex data stored. This study investigates whether the TreeMap algorithm can be adapted to display and navigate categorical biomedical data warehouse or registry. TreeMap is a space constrained graphical representation of large hierarchical data sets, mapped to a matrix of rectangles, whose size and color represent interested database fields. It allows the display of a large amount of numerical and categorical information in limited real estate of computer screen with an intuitive user interface. The paper will describe, DBMap, the proposed new data visualization framework for large biomedical databases. Built upon XML, Java and JDBC technologies, the prototype system includes a set of software modules that reside in the application server tier and provide interface to backend database tier and front-end Web tier of the brain registry.
DHLAS: A web-based information system for statistical genetic analysis of HLA population data.
Thriskos, P; Zintzaras, E; Germenis, A
2007-03-01
DHLAS (database HLA system) is a user-friendly, web-based information system for the analysis of human leukocyte antigens (HLA) data from population studies. DHLAS has been developed using JAVA and the R system, it runs on a Java Virtual Machine and its user-interface is web-based powered by the servlet engine TOMCAT. It utilizes STRUTS, a Model-View-Controller framework and uses several GNU packages to perform several of its tasks. The database engine it relies upon for fast access is MySQL, but others can be used a well. The system estimates metrics, performs statistical testing and produces graphs required for HLA population studies: (i) Hardy-Weinberg equilibrium (calculated using both asymptotic and exact tests), (ii) genetics distances (Euclidian or Nei), (iii) phylogenetic trees using the unweighted pair group method with averages and neigbor-joining method, (iv) linkage disequilibrium (pairwise and overall, including variance estimations), (v) haplotype frequencies (estimate using the expectation-maximization algorithm) and (vi) discriminant analysis. The main merit of DHLAS is the incorporation of a database, thus, the data can be stored and manipulated along with integrated genetic data analysis procedures. In addition, it has an open architecture allowing the inclusion of other functions and procedures.
Whitehead, Elizabeth; Dodds, Linda; Joseph, K S; Gordon, Kevin E; Wood, Ellen; Allen, Alexander C; Camfield, Peter; Dooley, Joseph M
2006-04-01
We examined the effect of pregnancy and neonatal factors on the subsequent development of childhood epilepsy in a population-based cohort study. Children born between January 1986 and December 2000 in Nova Scotia, Canada were followed up to December 2001. Data on pregnancy and neonatal events and on diagnoses of childhood epilepsy were obtained through record linkage of 2 population-based databases: the Nova Scotia Atlee Perinatal Database and the Canadian Epilepsy Database and Registry. Factors analyzed included events during the prenatal, labor and delivery, and neonatal time periods. Cox proportional hazards regression models were used to estimate relative risks and 95% confidence intervals. There were 648 new cases of epilepsy diagnosed among 124,207 live births, for an overall rate of 63 per 100,000 person-years. Incidence rates were highest among children <1 year of age. In adjusted analyses, factors significantly associated with an increased risk of epilepsy included eclampsia, neonatal seizures, central nervous system (CNS) anomalies, placental abruption, major non-CNS anomalies, neonatal metabolic disorders, neonatal CNS diseases, previous low birth weight infant, infection in pregnancy, small for gestational age, unmarried, and not breastfeeding infant at the time of discharge from hospital. Our study supports the concept that prenatal factors contribute to the occurrence of subsequent childhood epilepsy.
NASA Astrophysics Data System (ADS)
Gong, L.
2013-12-01
Large-scale hydrological models and land surface models are by far the only tools for accessing future water resources in climate change impact studies. Those models estimate discharge with large uncertainties, due to the complex interaction between climate and hydrology, the limited quality and availability of data, as well as model uncertainties. A new purely data-based scale-extrapolation method is proposed, to estimate water resources for a large basin solely from selected small sub-basins, which are typically two-orders-of-magnitude smaller than the large basin. Those small sub-basins contain sufficient information, not only on climate and land surface, but also on hydrological characteristics for the large basin In the Baltic Sea drainage basin, best discharge estimation for the gauged area was achieved with sub-basins that cover 2-4% of the gauged area. There exist multiple sets of sub-basins that resemble the climate and hydrology of the basin equally well. Those multiple sets estimate annual discharge for gauged area consistently well with 5% average error. The scale-extrapolation method is completely data-based; therefore it does not force any modelling error into the prediction. The multiple predictions are expected to bracket the inherent variations and uncertainties of the climate and hydrology of the basin. The method can be applied in both un-gauged basins and un-gauged periods with uncertainty estimation.
Charoute, Hicham; Bakhchane, Amina; Benrahma, Houda; Romdhane, Lilia; Gabi, Khalid; Rouba, Hassan; Fakiri, Malika; Abdelhak, Sonia; Lenaers, Guy; Barakat, Abdelhamid
2015-11-01
The Mediterranean basin has been the theater of migration crossroads followed by settlement of several societies and cultures in prehistoric and historical times, with important consequences on genetic and genomic determinisms. Here, we present the Mediterranean Founder Mutation Database (MFMD), established to offer web-based access to founder mutation information in the Mediterranean population. Mutation data were collected from the literature and other online resources and systematically reviewed and assembled into this database. The information provided for each founder mutation includes DNA change, amino-acid change, mutation type and mutation effect, as well as mutation frequency and coalescence time when available. Currently, the database contains 383 founder mutations found in 210 genes related to 219 diseases. We believe that MFMD will help scientists and physicians to design more rapid and less expensive genetic diagnostic tests. Moreover, the coalescence time of founder mutations gives an overview about the migration history of the Mediterranean population. MFMD can be publicly accessed from http://mfmd.pasteur.ma. © 2015 WILEY PERIODICALS, INC.
Background: Electronic health records (EHRs) are now a ubiquitous component of the US healthcare system and are attractive for secondary data analysis as they contain detailed and longitudinal clinical records on potentially millions of individuals. However, due to their relative...
Educational System Efficiency Improvement Using Knowledge Discovery in Databases
ERIC Educational Resources Information Center
Lukaš, Mirko; Leškovic, Darko
2007-01-01
This study describes one of possible way of usage ICT in education system. We basically treated educational system like Business Company and develop appropriate model for clustering of student population. Modern educational systems are forced to extract the most necessary and purposeful information from a large amount of available data. Clustering…
The use of a computerized database to monitor vaccine safety in Viet Nam.
Ali, Mohammad; Canh, Gia Do; Clemens, John D.; Park, Jin-Kyung; von Seidlein, Lorenz; Minh, Tan Truong; Thiem, Dinh Vu; Tho, Huu Le; Trach, Duc Dang
2005-01-01
Health information systems to monitor vaccine safety are used in industrialized countries to detect adverse medical events related to vaccinations or to prove the safety of vaccines. There are no such information systems in the developing world, but they are urgently needed. A large linked database for the monitoring of vaccine-related adverse events has been established in Khanh Hoa province, Viet Nam. Data collected during the first 2 years of surveillance, a period which included a mass measles vaccination campaign, were used to evaluate the system. For this purpose the discharge diagnoses of individuals admitted to polyclinics and hospitals were coded according to the International Classification of Diseases (ICD)-10 guidelines and linked in a dynamic population database with vaccination histories. A case-series analysis was applied to the cohort of children vaccinated during the mass measles vaccination campaign. The study recorded 107,022 immunizations in a catchment area with a population of 357,458 and confirmed vaccine coverage of 87% or higher for completed routine childhood vaccinations. The measles vaccination campaign immunized at least 86% of the targeted children aged 9 months to 10 years. No medical event was detected significantly more frequently during the 14 days after measles vaccination than before it. The experience in Viet Nam confirmed the safety of a measles vaccination campaign and shows that it is feasible to establish health information systems such as a large linked database which can provide reliable data in a developing country for a modest increase in use of resources. PMID:16193545
A reservoir morphology database for the conterminous United States
Rodgers, Kirk D.
2017-09-13
The U.S. Geological Survey, in cooperation with the Reservoir Fisheries Habitat Partnership, combined multiple national databases to create one comprehensive national reservoir database and to calculate new morphological metrics for 3,828 reservoirs. These new metrics include, but are not limited to, shoreline development index, index of basin permanence, development of volume, and other descriptive metrics based on established morphometric formulas. The new database also contains modeled chemical and physical metrics. Because of the nature of the existing databases used to compile the Reservoir Morphology Database and the inherent missing data, some metrics were not populated. One comprehensive database will assist water-resource managers in their understanding of local reservoir morphology and water chemistry characteristics throughout the continental United States.
[Current status and trends in the health of the Moscow population].
Tishuk, E A; Plavunov, N F; Soboleva, N P
1997-01-01
Based on vast comprehensive medical statistical database, the authors analyze the health status of the population and the efficacy of public health service in Moscow. The pre-crisis tendencies and the modern status of public health under modern socioeconomic conditions are noted.
The HARPS-N archive through a Cassandra, NoSQL database suite?
NASA Astrophysics Data System (ADS)
Molinari, Emilio; Guerra, Jose; Harutyunyan, Avet; Lodi, Marcello; Martin, Adrian
2016-07-01
The TNG-INAF is developing the science archive for the WEAVE instrument. The underlying architecture of the archive is based on a non relational database, more precisely, on Apache Cassandra cluster, which uses a NoSQL technology. In order to test and validate the use of this architecture, we created a local archive which we populated with all the HARPSN spectra collected at the TNG since the instrument's start of operations in mid-2012, as well as developed tools for the analysis of this data set. The HARPS-N data set is two orders of magnitude smaller than WEAVE, but we want to demonstrate the ability to walk through a complete data set and produce scientific output, as valuable as that produced by an ordinary pipeline, though without accessing directly the FITS files. The analytics is done by Apache Solr and Spark and on a relational PostgreSQL database. As an example, we produce observables like metallicity indexes for the targets in the archive and compare the results with the ones coming from the HARPS-N regular data reduction software. The aim of this experiment is to explore the viability of a high availability cluster and distributed NoSQL database as a platform for complex scientific analytics on a large data set, which will then be ported to the WEAVE Archive System (WAS) which we are developing for the WEAVE multi object, fiber spectrograph.
Frost, Rachael; Levati, Sara; McClurg, Doreen; Brady, Marian; Williams, Brian
2017-06-01
To systematically review methods for measuring adherence used in home-based rehabilitation trials and to evaluate their validity, reliability, and acceptability. In phase 1 we searched the CENTRAL database, NHS Economic Evaluation Database, and Health Technology Assessment Database (January 2000 to April 2013) to identify adherence measures used in randomized controlled trials of allied health professional home-based rehabilitation interventions. In phase 2 we searched the databases of MEDLINE, Embase, CINAHL, Allied and Complementary Medicine Database, PsycINFO, CENTRAL, ProQuest Nursing and Allied Health, and Web of Science (inception to April 2015) for measurement property assessments for each measure. Studies assessing the validity, reliability, or acceptability of adherence measures. Two reviewers independently extracted data on participant and measure characteristics, measurement properties evaluated, evaluation methods, and outcome statistics and assessed study quality using the COnsensus-based Standards for the selection of health Measurement INstruments checklist. In phase 1 we included 8 adherence measures (56 trials). In phase 2, from the 222 measurement property assessments identified in 109 studies, 22 high-quality measurement property assessments were narratively synthesized. Low-quality studies were used as supporting data. StepWatch Activity Monitor validly and acceptably measured short-term step count adherence. The Problematic Experiences of Therapy Scale validly and reliably assessed adherence to vestibular rehabilitation exercises. Adherence diaries had moderately high validity and acceptability across limited populations. The Borg 6 to 20 scale, Bassett and Prapavessis scale, and Yamax CW series had insufficient validity. Low-quality evidence supported use of the Joint Protection Behaviour Assessment. Polar A1 series heart monitors were considered acceptable by 1 study. Current rehabilitation adherence measures are limited. Some possess promising validity and acceptability for certain parameters of adherence, situations, and populations and should be used in these situations. Rigorous evaluation of adherence measures in a broader range of populations is needed. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Courtney, Ryan J; Naicker, Sundresan; Shakeshaft, Anthony; Clare, Philip; Martire, Kristy A; Mattick, Richard P
2015-06-08
Smoking cessation research output should move beyond descriptive research of the health problem to testing interventions that can provide causal data and effective evidence-based solutions. This review examined the number and type of published smoking cessation studies conducted in low-socioeconomic status (low-SES) and disadvantaged population groups. A systematic database search was conducted for two time periods: 2000-2004 (TP1) and 2008-2012 (TP2). Publications that examined smoking cessation in a low-SES or disadvantaged population were coded by: population of interest; study type (reviews, non-data based publications, data-based publications (descriptive, measurement and intervention research)); and country. Intervention studies were coded in accordance with the Cochrane Effective Practice and Organisation of Care data collection checklist and use of biochemical verification of self-reported abstinence was assessed. 278 citations were included. Research output (i.e., all study types) had increased from TP1 27% to TP2 73% (χ²=73.13, p<0.001), however, the proportion of data-based research had not significantly increased from TP1 and TP2: descriptive (TP1=23% vs. TP2=33%) or intervention (TP1=77% vs. TP2=67%). The proportion of intervention studies adopting biochemical verification of self-reported abstinence had significantly decreased from TP1 to TP2 with an increased reliance on self-reported abstinence (TP1=12% vs. TP2=36%). The current research output is not ideal or optimal to decrease smoking rates. Research institutions, scholars and funding organisations should take heed to review findings when developing future research and policy.
NASA Astrophysics Data System (ADS)
Onodera, Natsuo; Mizukami, Masayuki
This paper estimates several quantitative indice on production and distribution of scientific and technical databases based on various recent publications and attempts to compare the indice internationally. Raw data used for the estimation are brought mainly from the Database Directory (published by MITI) for database production and from some domestic and foreign study reports for database revenues. The ratio of the indice among Japan, US and Europe for usage of database is similar to those for general scientific and technical activities such as population and R&D expenditures. But Japanese contributions to production, revenue and over-countory distribution of databases are still lower than US and European countries. International comparison of relative database activities between public and private sectors is also discussed.
Jesensek Papez, B; Palfy, M; Mertik, M; Turk, Z
2009-01-01
This study further evaluated a computer-based infrared thermography (IRT) system, which employs artificial neural networks for the diagnosis of carpal tunnel syndrome (CTS) using a large database of 502 thermal images of the dorsal and palmar side of 132 healthy and 119 pathological hands. It confirmed the hypothesis that the dorsal side of the hand is of greater importance than the palmar side when diagnosing CTS thermographically. Using this method it was possible correctly to classify 72.2% of all hands (healthy and pathological) based on dorsal images and > 80% of hands when only severely affected and healthy hands were considered. Compared with the gold standard electromyographic diagnosis of CTS, IRT cannot be recommended as an adequate diagnostic tool when exact severity level diagnosis is required, however we conclude that IRT could be used as a screening tool for severe cases in populations with high ergonomic risk factors of CTS.
Scenarios of large mammal loss in Europe for the 21st century.
Rondinini, Carlo; Visconti, Piero
2015-08-01
Distributions and populations of large mammals are declining globally, leading to an increase in their extinction risk. We forecasted the distribution of extant European large mammals (17 carnivores and 10 ungulates) based on 2 Rio+20 scenarios of socioeconomic development: business as usual and reduced impact through changes in human consumption of natural resources. These scenarios are linked to scenarios of land-use change and climate change through the spatial allocation of land conversion up to 2050. We used a hierarchical framework to forecast the extent and distribution of mammal habitat based on species' habitat preferences (as described in the International Union for Conservation of Nature Red List database) within a suitable climatic space fitted to the species' current geographic range. We analyzed the geographic and taxonomic variation of habitat loss for large mammals and the potential effect of the reduced impact policy on loss mitigation. Averaging across scenarios, European large mammals were predicted to lose 10% of their habitat by 2050 (25% in the worst-case scenario). Predicted loss was much higher for species in northwestern Europe, where habitat is expected to be lost due to climate and land-use change. Change in human consumption patterns was predicted to substantially improve the conservation of habitat for European large mammals, but not enough to reduce extinction risk if species cannot adapt locally to climate change or disperse. © 2015 Society for Conservation Biology.
Prevalence and cost of hospital medical errors in the general and elderly United States populations.
Mallow, Peter J; Pandya, Bhavik; Horblyuk, Ruslan; Kaplan, Harold S
2013-12-01
The primary objective of this study was to quantify the differences in the prevalence rate and costs of hospital medical errors between the general population and an elderly population aged ≥65 years. Methods from an actuarial study of medical errors were modified to identify medical errors in the Premier Hospital Database using data from 2009. Visits with more than four medical errors were removed from the population to avoid over-estimation of cost. Prevalence rates were calculated based on the total number of inpatient visits. There were 3,466,596 total inpatient visits in 2009. Of these, 1,230,836 (36%) occurred in people aged ≥ 65. The prevalence rate was 49 medical errors per 1000 inpatient visits in the general cohort and 79 medical errors per 1000 inpatient visits for the elderly cohort. The top 10 medical errors accounted for more than 80% of the total in the general cohort and the 65+ cohort. The most costly medical error for the general population was postoperative infection ($569,287,000). Pressure ulcers were most costly ($347,166,257) in the elderly population. This study was conducted with a hospital administrative database, and assumptions were necessary to identify medical errors in the database. Further, there was no method to identify errors of omission or misdiagnoses within the database. This study indicates that prevalence of hospital medical errors for the elderly is greater than the general population and the associated cost of medical errors in the elderly population is quite substantial. Hospitals which further focus their attention on medical errors in the elderly population may see a significant reduction in costs due to medical errors as a disproportionate percentage of medical errors occur in this age group.
[Privacy and public benefit in using large scale health databases].
Yamamoto, Ryuichi
2014-01-01
In Japan, large scale heath databases were constructed in a few years, such as National Claim insurance and health checkup database (NDB) and Japanese Sentinel project. But there are some legal issues for making adequate balance between privacy and public benefit by using such databases. NDB is carried based on the act for elderly person's health care but in this act, nothing is mentioned for using this database for general public benefit. Therefore researchers who use this database are forced to pay much concern about anonymization and information security that may disturb the research work itself. Japanese Sentinel project is a national project to detecting drug adverse reaction using large scale distributed clinical databases of large hospitals. Although patients give the future consent for general such purpose for public good, it is still under discussion using insufficiently anonymized data. Generally speaking, researchers of study for public benefit will not infringe patient's privacy, but vague and complex requirements of legislation about personal data protection may disturb the researches. Medical science does not progress without using clinical information, therefore the adequate legislation that is simple and clear for both researchers and patients is strongly required. In Japan, the specific act for balancing privacy and public benefit is now under discussion. The author recommended the researchers including the field of pharmacology should pay attention to, participate in the discussion of, and make suggestion to such act or regulations.
Data Mining Applied to Analysis of Contraceptive Methods Among College Students.
Simões, Priscyla Waleska; Cesconetto, Samuel; Dalló, Eduardo Daminelli; de Souza Pires, Maria Marlene; Comunello, Eros; Borges Tomaz, Felipe; Xavier, Eduardo Pícolo; da Rosa Brunel Alves, Pedro Antonio; Ceretta, Luciane Bisognin; Manenti, Sandra Aparecida
2017-01-01
The aim of this study was to use the Data Mining to analyze the profile of the use of contraceptive methods in a university population. We used a database about sexuality performed on a university population in southern Brazil. The results obtained by the generated rules are largely in line with the literature and epidemiology worldwide, showing significant points of vulnerability in the university population. Validation measures of the study, as such, accuracy, sensitivity, specificity, and area under the ROC curve were higher or at least similar as compared to recent studies using the same methodology.
A comparison of database systems for XML-type data.
Risse, Judith E; Leunissen, Jack A M
2010-01-01
In the field of bioinformatics interchangeable data formats based on XML are widely used. XML-type data is also at the core of most web services. With the increasing amount of data stored in XML comes the need for storing and accessing the data. In this paper we analyse the suitability of different database systems for storing and querying large datasets in general and Medline in particular. All reviewed database systems perform well when tested with small to medium sized datasets, however when the full Medline dataset is queried a large variation in query times is observed. There is not one system that is vastly superior to the others in this comparison and, depending on the database size and the query requirements, different systems are most suitable. The best all-round solution is the Oracle 11~g database system using the new binary storage option. Alias-i's Lingpipe is a more lightweight, customizable and sufficiently fast solution. It does however require more initial configuration steps. For data with a changing XML structure Sedna and BaseX as native XML database systems or MySQL with an XML-type column are suitable.
NVST Data Archiving System Based On FastBit NoSQL Database
NASA Astrophysics Data System (ADS)
Liu, Ying-bo; Wang, Feng; Ji, Kai-fan; Deng, Hui; Dai, Wei; Liang, Bo
2014-06-01
The New Vacuum Solar Telescope (NVST) is a 1-meter vacuum solar telescope that aims to observe the fine structures of active regions on the Sun. The main tasks of the NVST are high resolution imaging and spectral observations, including the measurements of the solar magnetic field. The NVST has been collecting more than 20 million FITS files since it began routine observations in 2012 and produces a maximum observational records of 120 thousand files in a day. Given the large amount of files, the effective archiving and retrieval of files becomes a critical and urgent problem. In this study, we implement a new data archiving system for the NVST based on the Fastbit Not Only Structured Query Language (NoSQL) database. Comparing to the relational database (i.e., MySQL; My Structured Query Language), the Fastbit database manifests distinctive advantages on indexing and querying performance. In a large scale database of 40 million records, the multi-field combined query response time of Fastbit database is about 15 times faster and fully meets the requirements of the NVST. Our study brings a new idea for massive astronomical data archiving and would contribute to the design of data management systems for other astronomical telescopes.
Dankar, Fida K; Ptitsyn, Andrey; Dankar, Samar K
2018-04-10
Contemporary biomedical databases include a wide range of information types from various observational and instrumental sources. Among the most important features that unite biomedical databases across the field are high volume of information and high potential to cause damage through data corruption, loss of performance, and loss of patient privacy. Thus, issues of data governance and privacy protection are essential for the construction of data depositories for biomedical research and healthcare. In this paper, we discuss various challenges of data governance in the context of population genome projects. The various challenges along with best practices and current research efforts are discussed through the steps of data collection, storage, sharing, analysis, and knowledge dissemination.
Parson, W; Gusmão, L; Hares, D R; Irwin, J A; Mayr, W R; Morling, N; Pokorak, E; Prinz, M; Salas, A; Schneider, P M; Parsons, T J
2014-11-01
The DNA Commission of the International Society of Forensic Genetics (ISFG) regularly publishes guidelines and recommendations concerning the application of DNA polymorphisms to the question of human identification. Previous recommendations published in 2000 addressed the analysis and interpretation of mitochondrial DNA (mtDNA) in forensic casework. While the foundations set forth in the earlier recommendations still apply, new approaches to the quality control, alignment and nomenclature of mitochondrial sequences, as well as the establishment of mtDNA reference population databases, have been developed. Here, we describe these developments and discuss their application to both mtDNA casework and mtDNA reference population databasing applications. While the generation of mtDNA for forensic casework has always been guided by specific standards, it is now well-established that data of the same quality are required for the mtDNA reference population data used to assess the statistical weight of the evidence. As a result, we introduce guidelines regarding sequence generation, as well as quality control measures based on the known worldwide mtDNA phylogeny, that can be applied to ensure the highest quality population data possible. For both casework and reference population databasing applications, the alignment and nomenclature of haplotypes is revised here and the phylogenetic alignment proffered as acceptable standard. In addition, the interpretation of heteroplasmy in the forensic context is updated, and the utility of alignment-free database searches for unbiased probability estimates is highlighted. Finally, we discuss statistical issues and define minimal standards for mtDNA database searches. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Amadoz, Alicia; González-Candelas, Fernando
2007-04-20
Most research scientists working in the fields of molecular epidemiology, population and evolutionary genetics are confronted with the management of large volumes of data. Moreover, the data used in studies of infectious diseases are complex and usually derive from different institutions such as hospitals or laboratories. Since no public database scheme incorporating clinical and epidemiological information about patients and molecular information about pathogens is currently available, we have developed an information system, composed by a main database and a web-based interface, which integrates both types of data and satisfies requirements of good organization, simple accessibility, data security and multi-user support. From the moment a patient arrives to a hospital or health centre until the processing and analysis of molecular sequences obtained from infectious pathogens in the laboratory, lots of information is collected from different sources. We have divided the most relevant data into 12 conceptual modules around which we have organized the database schema. Our schema is very complete and it covers many aspects of sample sources, samples, laboratory processes, molecular sequences, phylogenetics results, clinical tests and results, clinical information, treatments, pathogens, transmissions, outbreaks and bibliographic information. Communication between end-users and the selected Relational Database Management System (RDMS) is carried out by default through a command-line window or through a user-friendly, web-based interface which provides access and management tools for the data. epiPATH is an information system for managing clinical and molecular information from infectious diseases. It facilitates daily work related to infectious pathogens and sequences obtained from them. This software is intended for local installation in order to safeguard private data and provides advanced SQL-users the flexibility to adapt it to their needs. The database schema, tool scripts and web-based interface are free software but data stored in our database server are not publicly available. epiPATH is distributed under the terms of GNU General Public License. More details about epiPATH can be found at http://genevo.uv.es/epipath.
NLTE4 Plasma Population Kinetics Database
National Institute of Standards and Technology Data Gateway
SRD 159 NLTE4 Plasma Population Kinetics Database (Web database for purchase) This database contains benchmark results for simulation of plasma population kinetics and emission spectra. The data were contributed by the participants of the 4th Non-LTE Code Comparison Workshop who have unrestricted access to the database. The only limitation for other users is in hidden labeling of the output results. Guest users can proceed to the database entry page without entering userid and password.
Pedersen, Mona K; Nielsen, Gunnar L; Uhrenfeldt, Lisbeth; Rasmussen, Ole S; Lundbye-Christensen, Søren
2017-08-01
To describe the construction of the Older Person at Risk Assessment (OPRA) database, the ability to link this database with existing data sources obtained from Danish nationwide population-based registries and to discuss its research potential for the analyses of risk factors associated with 30-day hospital readmission. We reviewed Danish nationwide registries to obtain information on demographic and social determinants as well as information on health and health care use in a population of hospitalised older people. The sample included all people aged 65+ years discharged from Danish public hospitals in the period from 1 January 2007 to 30 September 2010. We used personal identifiers to link and integrate the data from all events of interest with the outcome measures in the OPRA database. The database contained records of the patients, admissions and variables of interest. The cohort included 1,267,752 admissions for 479,854 unique people. The rate of 30-day all-cause acute readmission was 18.9% ( n=239,077) and the overall 30-day mortality was 5.0% ( n=63,116). The OPRA database provides the possibility of linking data on health and life events in a population of people moving into retirement and ageing. Construction of the database makes it possible to outline individual life and health trajectories over time, transcending organisational boundaries within health care systems. The OPRA database is multi-component and multi-disciplinary in orientation and has been prepared to be used in a wide range of subgroup analyses, including different outcome measures and statistical methods.
Keane, Pearse A; Grossi, Carlota M; Foster, Paul J; Yang, Qi; Reisman, Charles A; Chan, Kinpui; Peto, Tunde; Thomas, Dhanes; Patel, Praveen J
2016-01-01
To describe an approach to the use of optical coherence tomography (OCT) imaging in large, population-based studies, including methods for OCT image acquisition, storage, and the remote, rapid, automated analysis of retinal thickness. In UK Biobank, OCT images were acquired between 2009 and 2010 using a commercially available "spectral domain" OCT device (3D OCT-1000, Topcon). Images were obtained using a raster scan protocol, 6 mm x 6 mm in area, and consisting of 128 B-scans. OCT image sets were stored on UK Biobank servers in a central repository, adjacent to high performance computers. Rapid, automated analysis of retinal thickness was performed using custom image segmentation software developed by the Topcon Advanced Biomedical Imaging Laboratory (TABIL). This software employs dual-scale gradient information to allow for automated segmentation of nine intraretinal boundaries in a rapid fashion. 67,321 participants (134,642 eyes) in UK Biobank underwent OCT imaging of both eyes as part of the ocular module. 134,611 images were successfully processed with 31 images failing segmentation analysis due to corrupted OCT files or withdrawal of subject consent for UKBB study participation. Average time taken to call up an image from the database and complete segmentation analysis was approximately 120 seconds per data set per login, and analysis of the entire dataset was completed in approximately 28 days. We report an approach to the rapid, automated measurement of retinal thickness from nearly 140,000 OCT image sets from the UK Biobank. In the near future, these measurements will be publically available for utilization by researchers around the world, and thus for correlation with the wealth of other data collected in UK Biobank. The automated analysis approaches we describe may be of utility for future large population-based epidemiological studies, clinical trials, and screening programs that employ OCT imaging.
Grossi, Carlota M.; Foster, Paul J.; Yang, Qi; Reisman, Charles A.; Chan, Kinpui; Peto, Tunde; Thomas, Dhanes; Patel, Praveen J.
2016-01-01
Purpose To describe an approach to the use of optical coherence tomography (OCT) imaging in large, population-based studies, including methods for OCT image acquisition, storage, and the remote, rapid, automated analysis of retinal thickness. Methods In UK Biobank, OCT images were acquired between 2009 and 2010 using a commercially available “spectral domain” OCT device (3D OCT-1000, Topcon). Images were obtained using a raster scan protocol, 6 mm x 6 mm in area, and consisting of 128 B-scans. OCT image sets were stored on UK Biobank servers in a central repository, adjacent to high performance computers. Rapid, automated analysis of retinal thickness was performed using custom image segmentation software developed by the Topcon Advanced Biomedical Imaging Laboratory (TABIL). This software employs dual-scale gradient information to allow for automated segmentation of nine intraretinal boundaries in a rapid fashion. Results 67,321 participants (134,642 eyes) in UK Biobank underwent OCT imaging of both eyes as part of the ocular module. 134,611 images were successfully processed with 31 images failing segmentation analysis due to corrupted OCT files or withdrawal of subject consent for UKBB study participation. Average time taken to call up an image from the database and complete segmentation analysis was approximately 120 seconds per data set per login, and analysis of the entire dataset was completed in approximately 28 days. Conclusions We report an approach to the rapid, automated measurement of retinal thickness from nearly 140,000 OCT image sets from the UK Biobank. In the near future, these measurements will be publically available for utilization by researchers around the world, and thus for correlation with the wealth of other data collected in UK Biobank. The automated analysis approaches we describe may be of utility for future large population-based epidemiological studies, clinical trials, and screening programs that employ OCT imaging. PMID:27716837
Iraq War mortality estimates: a systematic review.
Tapp, Christine; Burkle, Frederick M; Wilson, Kumanan; Takaro, Tim; Guyatt, Gordon H; Amad, Hani; Mills, Edward J
2008-03-07
In March 2003, the United States invaded Iraq. The subsequent number, rates, and causes of mortality in Iraq resulting from the war remain unclear, despite intense international attention. Understanding mortality estimates from modern warfare, where the majority of casualties are civilian, is of critical importance for public health and protection afforded under international humanitarian law. We aimed to review the studies, reports and counts on Iraqi deaths since the start of the war and assessed their methodological quality and results. We performed a systematic search of 15 electronic databases from inception to January 2008. In addition, we conducted a non-structured search of 3 other databases, reviewed study reference lists and contacted subject matter experts. We included studies that provided estimates of Iraqi deaths based on primary research over a reported period of time since the invasion. We excluded studies that summarized mortality estimates and combined non-fatal injuries and also studies of specific sub-populations, e.g. under-5 mortality. We calculated crude and cause-specific mortality rates attributable to violence and average deaths per day for each study, where not already provided. Thirteen studies met the eligibility criteria. The studies used a wide range of methodologies, varying from sentinel-data collection to population-based surveys. Studies assessed as the highest quality, those using population-based methods, yielded the highest estimates. Average deaths per day ranged from 48 to 759. The cause-specific mortality rates attributable to violence ranged from 0.64 to 10.25 per 1,000 per year. Our review indicates that, despite varying estimates, the mortality burden of the war and its sequelae on Iraq is large. The use of established epidemiological methods is rare. This review illustrates the pressing need to promote sound epidemiologic approaches to determining mortality estimates and to establish guidelines for policy-makers, the media and the public on how to interpret these estimates.
The first Malay database toward the ethnic-specific target molecular variation.
Halim-Fikri, Hashim; Etemad, Ali; Abdul Latif, Ahmad Zubaidi; Merican, Amir Feisal; Baig, Atif Amin; Annuar, Azlina Ahmad; Ismail, Endom; Salahshourifar, Iman; Liza-Sharmini, Ahmad Tajudin; Ramli, Marini; Shah, Mohamed Irwan; Johan, Muhammad Farid; Hassan, Nik Norliza Nik; Abdul-Aziz, Noraishah Mydin; Mohd Noor, Noor Haslina; Nur-Shafawati, Ab Rajab; Hassan, Rosline; Bahar, Rosnah; Zain, Rosnah Binti; Yusoff, Shafini Mohamed; Yusoff, Surini; Tan, Soon Guan; Thong, Meow-Keong; Wan-Isa, Hatin; Abdullah, Wan Zaidah; Mohamed, Zahurin; Abdul Latiff, Zarina; Zilfalil, Bin Alwi
2015-04-30
The Malaysian Node of the Human Variome Project (MyHVP) is one of the eighteen official Human Variome Project (HVP) country-specific nodes. Since its inception in 9(th) October 2010, MyHVP has attracted the significant number of Malaysian clinicians and researchers to participate and contribute their data to this project. MyHVP also act as the center of coordination for genotypic and phenotypic variation studies of the Malaysian population. A specialized database was developed to store and manage the data based on genetic variations which also associated with health and disease of Malaysian ethnic groups. This ethnic-specific database is called the Malaysian Node of the Human Variome Project database (MyHVPDb). Currently, MyHVPDb provides only information about the genetic variations and mutations found in the Malays. In the near future, it will expand for the other Malaysian ethnics as well. The data sets are specified based on diseases or genetic mutation types which have three main subcategories: Single Nucleotide Polymorphism (SNP), Copy Number Variation (CNV) followed by the mutations which code for the common diseases among Malaysians. MyHVPDb has been open to the local researchers, academicians and students through the registration at the portal of MyHVP ( http://hvpmalaysia.kk.usm.my/mhgvc/index.php?id=register ). This database would be useful for clinicians and researchers who are interested in doing a study on genomics population and genetic diseases in order to obtain up-to-date and accurate information regarding the population-specific variations and also useful for those in countries with similar ethnic background.
Automated knowledge base development from CAD/CAE databases
NASA Technical Reports Server (NTRS)
Wright, R. Glenn; Blanchard, Mary
1988-01-01
Knowledge base development requires a substantial investment in time, money, and resources in order to capture the knowledge and information necessary for anything other than trivial applications. This paper addresses a means to integrate the design and knowledge base development process through automated knowledge base development from CAD/CAE databases and files. Benefits of this approach include the development of a more efficient means of knowledge engineering, resulting in the timely creation of large knowledge based systems that are inherently free of error.
CTGA: the database for genetic disorders in Arab populations.
Tadmouri, Ghazi O; Al Ali, Mahmoud Taleb; Al-Haj Ali, Sarah; Al Khaja, Najib
2006-01-01
The Arabs comprise a genetically heterogeneous group that resulted from the admixture of different populations throughout history. They share many common characteristics responsible for a considerable proportion of perinatal and neonatal mortalities. To this end, the Centre for Arab Genomic Studies (CAGS) launched a pilot project to construct the 'Catalogue of Transmission Genetics in Arabs' (CTGA) database for genetic disorders in Arabs. Information in CTGA is drawn from published research and mined hospital records. The database offers web-based basic and advanced search approaches. In either case, the final search result is a detailed HTML record that includes text-, URL- and graphic-based fields. At present, CTGA hosts entries for 692 phenotypes and 235 related genes described in Arab individuals. Of these, 213 phenotypic descriptions and 22 related genes were observed in the Arab population of the United Arab Emirates (UAE). These results emphasize the role of CTGA as an essential tool to promote scientific research on genetic disorders in the region. The priority of CTGA is to provide timely information on the occurrence of genetic disorders in Arab individuals. It is anticipated that data from Arab countries other than the UAE will be exhaustively searched and incorporated in CTGA (http://www.cags.org.ae).
CTGA: the database for genetic disorders in Arab populations
Tadmouri, Ghazi O.; Ali, Mahmoud Taleb Al; Ali, Sarah Al-Haj; Khaja, Najib Al
2006-01-01
The Arabs comprise a genetically heterogeneous group that resulted from the admixture of different populations throughout history. They share many common characteristics responsible for a considerable proportion of perinatal and neonatal mortalities. To this end, the Centre for Arab Genomic Studies (CAGS) launched a pilot project to construct the ‘Catalogue of Transmission Genetics in Arabs’ (CTGA) database for genetic disorders in Arabs. Information in CTGA is drawn from published research and mined hospital records. The database offers web-based basic and advanced search approaches. In either case, the final search result is a detailed HTML record that includes text-, URL- and graphic-based fields. At present, CTGA hosts entries for 692 phenotypes and 235 related genes described in Arab individuals. Of these, 213 phenotypic descriptions and 22 related genes were observed in the Arab population of the United Arab Emirates (UAE). These results emphasize the role of CTGA as an essential tool to promote scientific research on genetic disorders in the region. The priority of CTGA is to provide timely information on the occurrence of genetic disorders in Arab individuals. It is anticipated that data from Arab countries other than the UAE will be exhaustively searched and incorporated in CTGA (). PMID:16381941
Building a Database for a Quantitative Model
NASA Technical Reports Server (NTRS)
Kahn, C. Joseph; Kleinhammer, Roger
2014-01-01
A database can greatly benefit a quantitative analysis. The defining characteristic of a quantitative risk, or reliability, model is the use of failure estimate data. Models can easily contain a thousand Basic Events, relying on hundreds of individual data sources. Obviously, entering so much data by hand will eventually lead to errors. Not so obviously entering data this way does not aid linking the Basic Events to the data sources. The best way to organize large amounts of data on a computer is with a database. But a model does not require a large, enterprise-level database with dedicated developers and administrators. A database built in Excel can be quite sufficient. A simple spreadsheet database can link every Basic Event to the individual data source selected for them. This database can also contain the manipulations appropriate for how the data is used in the model. These manipulations include stressing factors based on use and maintenance cycles, dormancy, unique failure modes, the modeling of multiple items as a single "Super component" Basic Event, and Bayesian Updating based on flight and testing experience. A simple, unique metadata field in both the model and database provides a link from any Basic Event in the model to its data source and all relevant calculations. The credibility for the entire model often rests on the credibility and traceability of the data.
Fleet, Jamie L; Dixon, Stephanie N; Shariff, Salimah Z; Quinn, Robert R; Nash, Danielle M; Harel, Ziv; Garg, Amit X
2013-04-05
Large, population-based administrative healthcare databases can be used to identify patients with chronic kidney disease (CKD) when serum creatinine laboratory results are unavailable. We examined the validity of algorithms that used combined hospital encounter and physician claims database codes for the detection of CKD in Ontario, Canada. We accrued 123,499 patients over the age of 65 from 2007 to 2010. All patients had a baseline serum creatinine value to estimate glomerular filtration rate (eGFR). We developed an algorithm of physician claims and hospital encounter codes to search administrative databases for the presence of CKD. We determined the sensitivity, specificity, positive and negative predictive values of this algorithm to detect our primary threshold of CKD, an eGFR <45 mL/min per 1.73 m² (15.4% of patients). We also assessed serum creatinine and eGFR values in patients with and without CKD codes (algorithm positive and negative, respectively). Our algorithm required evidence of at least one of eleven CKD codes and 7.7% of patients were algorithm positive. The sensitivity was 32.7% [95% confidence interval: (95% CI): 32.0 to 33.3%]. Sensitivity was lower in women compared to men (25.7 vs. 43.7%; p <0.001) and in the oldest age category (over 80 vs. 66 to 80; 28.4 vs. 37.6 %; p < 0.001). All specificities were over 94%. The positive and negative predictive values were 65.4% (95% CI: 64.4 to 66.3%) and 88.8% (95% CI: 88.6 to 89.0%), respectively. In algorithm positive patients, the median [interquartile range (IQR)] baseline serum creatinine value was 135 μmol/L (106 to 179 μmol/L) compared to 82 μmol/L (69 to 98 μmol/L) for algorithm negative patients. Corresponding eGFR values were 38 mL/min per 1.73 m² (26 to 51 mL/min per 1.73 m²) vs. 69 mL/min per 1.73 m² (56 to 82 mL/min per 1.73 m²), respectively. Patients with CKD as identified by our database algorithm had distinctly higher baseline serum creatinine values and lower eGFR values than those without such codes. However, because of limited sensitivity, the prevalence of CKD was underestimated.
2013-01-01
Background Large, population-based administrative healthcare databases can be used to identify patients with chronic kidney disease (CKD) when serum creatinine laboratory results are unavailable. We examined the validity of algorithms that used combined hospital encounter and physician claims database codes for the detection of CKD in Ontario, Canada. Methods We accrued 123,499 patients over the age of 65 from 2007 to 2010. All patients had a baseline serum creatinine value to estimate glomerular filtration rate (eGFR). We developed an algorithm of physician claims and hospital encounter codes to search administrative databases for the presence of CKD. We determined the sensitivity, specificity, positive and negative predictive values of this algorithm to detect our primary threshold of CKD, an eGFR <45 mL/min per 1.73 m2 (15.4% of patients). We also assessed serum creatinine and eGFR values in patients with and without CKD codes (algorithm positive and negative, respectively). Results Our algorithm required evidence of at least one of eleven CKD codes and 7.7% of patients were algorithm positive. The sensitivity was 32.7% [95% confidence interval: (95% CI): 32.0 to 33.3%]. Sensitivity was lower in women compared to men (25.7 vs. 43.7%; p <0.001) and in the oldest age category (over 80 vs. 66 to 80; 28.4 vs. 37.6 %; p < 0.001). All specificities were over 94%. The positive and negative predictive values were 65.4% (95% CI: 64.4 to 66.3%) and 88.8% (95% CI: 88.6 to 89.0%), respectively. In algorithm positive patients, the median [interquartile range (IQR)] baseline serum creatinine value was 135 μmol/L (106 to 179 μmol/L) compared to 82 μmol/L (69 to 98 μmol/L) for algorithm negative patients. Corresponding eGFR values were 38 mL/min per 1.73 m2 (26 to 51 mL/min per 1.73 m2) vs. 69 mL/min per 1.73 m2 (56 to 82 mL/min per 1.73 m2), respectively. Conclusions Patients with CKD as identified by our database algorithm had distinctly higher baseline serum creatinine values and lower eGFR values than those without such codes. However, because of limited sensitivity, the prevalence of CKD was underestimated. PMID:23560464
Sadygov, Rovshan G; Cociorva, Daniel; Yates, John R
2004-12-01
Database searching is an essential element of large-scale proteomics. Because these methods are widely used, it is important to understand the rationale of the algorithms. Most algorithms are based on concepts first developed in SEQUEST and PeptideSearch. Four basic approaches are used to determine a match between a spectrum and sequence: descriptive, interpretative, stochastic and probability-based matching. We review the basic concepts used by most search algorithms, the computational modeling of peptide identification and current challenges and limitations of this approach for protein identification.
Fast 3D shape screening of large chemical databases through alignment-recycling
Fontaine, Fabien; Bolton, Evan; Borodina, Yulia; Bryant, Stephen H
2007-01-01
Background Large chemical databases require fast, efficient, and simple ways of looking for similar structures. Although such tasks are now fairly well resolved for graph-based similarity queries, they remain an issue for 3D approaches, particularly for those based on 3D shape overlays. Inspired by a recent technique developed to compare molecular shapes, we designed a hybrid methodology, alignment-recycling, that enables efficient retrieval and alignment of structures with similar 3D shapes. Results Using a dataset of more than one million PubChem compounds of limited size (< 28 heavy atoms) and flexibility (< 6 rotatable bonds), we obtained a set of a few thousand diverse structures covering entirely the 3D shape space of the conformers of the dataset. Transformation matrices gathered from the overlays between these diverse structures and the 3D conformer dataset allowed us to drastically (100-fold) reduce the CPU time required for shape overlay. The alignment-recycling heuristic produces results consistent with de novo alignment calculation, with better than 80% hit list overlap on average. Conclusion Overlay-based 3D methods are computationally demanding when searching large databases. Alignment-recycling reduces the CPU time to perform shape similarity searches by breaking the alignment problem into three steps: selection of diverse shapes to describe the database shape-space; overlay of the database conformers to the diverse shapes; and non-optimized overlay of query and database conformers using common reference shapes. The precomputation, required by the first two steps, is a significant cost of the method; however, once performed, querying is two orders of magnitude faster. Extensions and variations of this methodology, for example, to handle more flexible and larger small-molecules are discussed. PMID:17880744
Trajectory Browser: An Online Tool for Interplanetary Trajectory Analysis and Visualization
NASA Technical Reports Server (NTRS)
Foster, Cyrus James
2013-01-01
The trajectory browser is a web-based tool developed at the NASA Ames Research Center for finding preliminary trajectories to planetary bodies and for providing relevant launch date, time-of-flight and (Delta)V requirements. The site hosts a database of transfer trajectories from Earth to planets and small-bodies for various types of missions such as rendezvous, sample return or flybys. A search engine allows the user to find trajectories meeting desired constraints on the launch window, mission duration and (Delta)V capability, while a trajectory viewer tool allows the visualization of the heliocentric trajectory and the detailed mission itinerary. The anticipated user base of this tool consists primarily of scientists and engineers designing interplanetary missions in the context of pre-phase A studies, particularly for performing accessibility surveys to large populations of small-bodies.
Intelligent Interfaces for Mining Large-Scale RNAi-HCS Image Databases
Lin, Chen; Mak, Wayne; Hong, Pengyu; Sepp, Katharine; Perrimon, Norbert
2010-01-01
Recently, High-content screening (HCS) has been combined with RNA interference (RNAi) to become an essential image-based high-throughput method for studying genes and biological networks through RNAi-induced cellular phenotype analyses. However, a genome-wide RNAi-HCS screen typically generates tens of thousands of images, most of which remain uncategorized due to the inadequacies of existing HCS image analysis tools. Until now, it still requires highly trained scientists to browse a prohibitively large RNAi-HCS image database and produce only a handful of qualitative results regarding cellular morphological phenotypes. For this reason we have developed intelligent interfaces to facilitate the application of the HCS technology in biomedical research. Our new interfaces empower biologists with computational power not only to effectively and efficiently explore large-scale RNAi-HCS image databases, but also to apply their knowledge and experience to interactive mining of cellular phenotypes using Content-Based Image Retrieval (CBIR) with Relevance Feedback (RF) techniques. PMID:21278820
Maetens, Arno; De Schreye, Robrecht; Faes, Kristof; Houttekier, Dirk; Deliens, Luc; Gielen, Birgit; De Gendt, Cindy; Lusyne, Patrick; Annemans, Lieven; Cohen, Joachim
2016-10-18
The use of full-population databases is under-explored to study the use, quality and costs of end-of-life care. Using the case of Belgium, we explored: (1) which full-population databases provide valid information about end-of-life care, (2) what procedures are there to use these databases, and (3) what is needed to integrate separate databases. Technical and privacy-related aspects of linking and accessing Belgian administrative databases and disease registries were assessed in cooperation with the database administrators and privacy commission bodies. For all relevant databases, we followed procedures in cooperation with database administrators to link the databases and to access the data. We identified several databases as fitting for end-of-life care research in Belgium: the InterMutualistic Agency's national registry of health care claims data, the Belgian Cancer Registry including data on incidence of cancer, and databases administrated by Statistics Belgium including data from the death certificate database, the socio-economic survey and fiscal data. To obtain access to the data, approval was required from all database administrators, supervisory bodies and two separate national privacy bodies. Two Trusted Third Parties linked the databases via a deterministic matching procedure using multiple encrypted social security numbers. In this article we describe how various routinely collected population-level databases and disease registries can be accessed and linked to study patterns in the use, quality and costs of end-of-life care in the full population and in specific diagnostic groups.
C-A1-03: Considerations in the Design and Use of an Oracle-based Virtual Data Warehouse
Bredfeldt, Christine; McFarland, Lela
2011-01-01
Background/Aims The amount of clinical data available for research is growing exponentially. As it grows, increasing the efficiency of both data storage and data access becomes critical. Relational database management systems (rDBMS) such as Oracle are ideal solutions for managing longitudinal clinical data because they support large-scale data storage and highly efficient data retrieval. In addition, they can greatly simplify the management of large data warehouses, including security management and regular data refreshes. However, the HMORN Virtual Data Warehouse (VDW) was originally designed based on SAS datasets, and this design choice has a number of implications for both the design and use of an Oracle-based VDW. From a design standpoint, VDW tables are designed as flat SAS datasets, which do not take full advantage of Oracle indexing capabilities. From a data retrieval standpoint, standard VDW SAS scripts do not take advantage of SAS pass-through SQL capabilities to enable Oracle to perform the processing required to narrow datasets to the population of interest. Methods Beginning in 2009, the research department at Kaiser Permanente in the Mid-Atlantic States (KPMA) has developed an Oracle-based VDW according to the HMORN v3 specifications. In order to take advantage of the strengths of relational databases, KPMA introduced an interface layer to the VDW data, using views to provide access to standardized VDW variables. In addition, KPMA has developed SAS programs that provide access to SQL pass-through processing for first-pass data extraction into SAS VDW datasets for processing by standard VDW scripts. Results We discuss both the design and performance considerations specific to the KPMA Oracle-based VDW. We benchmarked performance of the Oracle-based VDW using both standard VDW scripts and an initial pre-processing layer to evaluate speed and accuracy of data return. Conclusions Adapting the VDW for deployment in an Oracle environment required minor changes to the underlying structure of the data. Further modifications of the underlying data structure would lead to performance enhancements. Maximally efficient data access for standard VDW scripts requires an extra step that involves restricting the data to the population of interest at the data server level prior to standard processing.
Faster sequence homology searches by clustering subsequences.
Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka
2015-04-15
Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2-2.8 times faster than RAPSearch and is ∼185-261 times faster than BLASTX. The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Concepts and controversies in estimating vitamin K status in population based studies
USDA-ARS?s Scientific Manuscript database
A better understanding of vitamin K's role in health and disease requires the assessment of vitamin K nutritional status in population and clinical studies. This is primarily accomplished using dietary questionnaires and/or biomarkers. Because food composition databases in the U.S. are most complete...
Poon, Art F Y; Joy, Jeffrey B; Woods, Conan K; Shurgold, Susan; Colley, Guillaume; Brumme, Chanson J; Hogg, Robert S; Montaner, Julio S G; Harrigan, P Richard
2015-03-15
The diversification of human immunodeficiency virus (HIV) is shaped by its transmission history. We therefore used a population based province wide HIV drug resistance database in British Columbia (BC), Canada, to evaluate the impact of clinical, demographic, and behavioral factors on rates of HIV transmission. We reconstructed molecular phylogenies from 27,296 anonymized bulk HIV pol sequences representing 7747 individuals in BC-about half the estimated HIV prevalence in BC. Infections were grouped into clusters based on phylogenetic distances, as a proxy for variation in transmission rates. Rates of cluster expansion were reconstructed from estimated dates of HIV seroconversion. Our criteria grouped 4431 individuals into 744 clusters largely separated with respect to risk factors, including large established clusters predominated by injection drug users and more-recently emerging clusters comprising men who have sex with men. The mean log10 viral load of an individual's phylogenetic neighborhood (composed of 5 other individuals with shortest phylogenetic distances) increased their odds of appearing in a cluster by >2-fold per log10 viruses per milliliter. Hotspots of ongoing HIV transmission can be characterized in near real time by the secondary analysis of HIV resistance genotypes, providing an important potential resource for targeting public health initiatives for HIV prevention. © The Author 2014. Published by Oxford University Press on behalf of the Infectious Diseases Society of America. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Abraham, Manoj T; Rousso, Joseph J; Hu, Shirley; Brown, Ryan F; Moscatello, Augustine L; Finn, J Charles; Patel, Neha A; Kadakia, Sameep P; Wood-Smith, Donald
2017-07-01
The American Academy of Facial Plastic and Reconstructive Surgery FACE TO FACE database was created to gather and organize patient data primarily from international humanitarian surgical mission trips, as well as local humanitarian initiatives. Similar to cloud-based Electronic Medical Records, this web-based user-generated database allows for more accurate tracking of provider and patient information and outcomes, regardless of site, and is useful when coordinating follow-up care for patients. The database is particularly useful on international mission trips as there are often different surgeons who may provide care to patients on subsequent missions, and patients who may visit more than 1 mission site. Ultimately, by pooling data across multiples sites and over time, the database has the potential to be a useful resource for population-based studies and outcome data analysis. The objective of this paper is to delineate the process involved in creating the AAFPRS FACE TO FACE database, to assess its functional utility, to draw comparisons to electronic medical records systems that are now widely implemented, and to explain the specific benefits and disadvantages of the use of the database as it was implemented on recent international surgical mission trips.
Siskind, Eric; Maloney, Caroline; Akerman, Meredith; Alex, Asha; Ashburn, Sarah; Barlow, Meade; Siskind, Tamar; Bhaskaran, Madhu; Ali, Nicole; Basu, Amit; Molmenti, Ernesto; Ortiz, Jorge
2014-09-01
Previously, increasing age has been a part of the exclusion criteria used when determining eligibility for a pancreas transplant. However, the analysis of pancreas transplantation outcomes based on age groupings has largely been based on single-center reports. A UNOS database review of all adult pancreas and kidney-pancreas transplants between 1996 and 2012 was performed. Patients were divided into groups based on age categories: 18-29 (n = 1823), 30-39 (n = 7624), 40-49 (n = 7967), 50-59 (n = 3160), and ≥60 (n = 280). We compared survival outcomes and demographic variables between each age grouping. Of the 20 854 pancreas transplants, 3440 of the recipients were 50 yr of age or above. Graft survival was consistently the greatest in adults 40-49 yr of age. Graft survival was least in adults age 18-29 at one-, three-, and five-yr intervals. At 10- and 15-yr intervals, graft survival was the poorest in adults >60 yr old. Patient survival and age were found to be inversely proportional; as the patient population's age increased, survival decreased. Pancreas transplants performed in patients of increasing age demonstrate decreased patient and graft survival when compared to pancreas transplants in patients <50 yr of age. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Venkatramani, Rajkumar; Spector, Logan G.; Georgieff, Michael; Tomlinson, Gail; Krailo, Mark; Malogolowkin, Marcio; Kohlmann, Wendy; Curtin, Karen; Fonstad, Rachel K.; Schiffman, Joshua D.
2014-01-01
Beckwith-Wiedemann Syndrome (BWS) and Familial Adenomatous Polyposis (FAP) are known to predispose to hepatoblastoma (HB). A case control study was conducted through the Children’s Oncology Group (COG) to study the association of HB with isolated congenital abnormalities. Cases (N = 383) were diagnosed between 2000 and 2008. Controls (N = 387) were recruited from state birth registries, frequency matched for sex, region, year of birth, and birth weight. Data on congenital abnormalities among subjects and covariates were obtained by maternal telephone interview. Odds ratios (OR) and 95% confidence intervals (CI) describing the association between congenital abnormalities with HB, adjusted for sex, birth weight, maternal age and maternal education, were calculated using unconditional logistic regression. There was a significant association of HB with kidney, bladder, or sex organ abnormalities (OR = 4.75; 95% CI: 1.74–13) which appeared to be specific to kidney/bladder defects (OR = 4.3; 95% CI: 1.2–15.3) but not those of sex organs (OR = 1.24; 95% CI: 0.37–4.1). Elevated but non-significant ORs were found for spina bifida or other spinal defects (OR = 2.12; 95% CI: 0.39–11.7), large or multiple birthmarks (OR = 1.33; 95% CI: 0.81–2.21). The results were validated through the Utah Population Database (UPDB), a statewide population-based registry linking birth certificates, medical records, and cancer diagnoses. In the UPDB, there were 29 cases and 290 population controls matched 10:1 on sex and birth year. Consistent with the COG findings, kidney/bladder defects were associated with hepatoblastoma. These results confirm the association of HB with kidney/bladder abnormalities. PMID:24934283
Report on Wind Turbine Subsystem Reliability - A Survey of Various Databases (Presentation)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sheng, S.
2013-07-01
Wind industry has been challenged by premature subsystem/component failures. Various reliability data collection efforts have demonstrated their values in supporting wind turbine reliability and availability research & development and industrial activities. However, most information on these data collection efforts are scattered and not in a centralized place. With the objective of getting updated reliability statistics of wind turbines and/or subsystems so as to benefit future wind reliability and availability activities, this report is put together based on a survey of various reliability databases that are accessible directly or indirectly by NREL. For each database, whenever feasible, a brief description summarizingmore » database population, life span, and data collected is given along with its features & status. Then selective results deemed beneficial to the industry and generated based on the database are highlighted. This report concludes with several observations obtained throughout the survey and several reliability data collection opportunities in the future.« less
Franch Nadal, Josep; Mata Cases, Manel; Mauricio Puente, Dídac
2016-11-01
Type 2 diabetes mellitus is currently the most frequent chronic metabolic disease. In spain, according to the di@bet.es study, its prevalence is 13.8% in the adult population (although it is undiagnosed in 6%). The main risk factor for type 2 diabetes mellitus is obesity. The severity of type 2 diabetes mellitus is determined not only by the presence of hyperglycaemia, but also by the coexistence of other risk factors such as hypertension or dyslipidaemia, which are often associated with the disease. Its impact on the presence of chronic diabetic complications varies. While hyperglycaemia mainly influences the presence of microvascular complications, hypertension, dyslipidaemia and smoking play a greater role in macrovascular atherosclerotic disease. One of the most powerful ways to study the epidemiology of the disease is through the use of large databases that analyse the situation in the routine clinical management of huge numbers of patients. Recently, the data provided by the e-Management Project, based on the SIDIAP database, have allowed updating of many data on the health care of diabetic persons in Catalonia. This not only allows determination of the epidemiology of the disease but is also a magnificent starting point for the design of future studies that will provide answers to more questions. However, the use of large databases is not free of certain problems, especially those concerning the reliability of registries. This article analyses some of the data obtained by the e-Management study and other spanish epidemiological studies of equal importance. Copyright © 2016 Elsevier España, S.L.U. All rights reserved.
Austin, P C; Shah, B R; Newman, A; Anderson, G M
2012-09-01
There are limited validated methods to ascertain comorbidities for risk adjustment in ambulatory populations of patients with diabetes using administrative health-care databases. The objective was to examine the ability of the Johns Hopkins' Aggregated Diagnosis Groups to predict mortality in population-based ambulatory samples of both incident and prevalent subjects with diabetes. Retrospective cohorts constructed using population-based administrative data. The incident cohort consisted of all 346,297 subjects diagnosed with diabetes between 1 April 2004 and 31 March 2008. The prevalent cohort consisted of all 879,849 subjects with pre-existing diabetes on 1 January, 2007. The outcome was death within 1 year of the subject's index date. A logistic regression model consisting of age, sex and indicator variables for 22 of the 32 Johns Hopkins' Aggregated Diagnosis Group categories had excellent discrimination for predicting mortality in incident diabetes patients: the c-statistic was 0.87 in an independent validation sample. A similar model had excellent discrimination for predicting mortality in prevalent diabetes patients: the c-statistic was 0.84 in an independent validation sample. Both models demonstrated very good calibration, denoting good agreement between observed and predicted mortality across the range of predicted mortality in which the large majority of subjects lay. For comparative purposes, regression models incorporating the Charlson comorbidity index, age and sex, age and sex, and age alone had poorer discrimination than the model that incorporated the Johns Hopkins' Aggregated Diagnosis Groups. Logistical regression models using age, sex and the John Hopkins' Aggregated Diagnosis Groups were able to accurately predict 1-year mortality in population-based samples of patients with diabetes. © 2011 The Authors. Diabetic Medicine © 2011 Diabetes UK.
Mapel, D; Pearson, M
2002-08-01
Healthcare payers make decisions on funding for treatments for diseases, such as chronic obstructive pulmonary disease (COPD), on a population level, so require evidence of treatment success in appropriate populations, using usual routine care as the comparison for alternative management approaches. Such health outcomes evidence can be obtained from a number of sources. The 'gold standard' method for obtaining evidence of treatment success is usually taken as the randomized controlled prospective clinical trial. Yet the value of such studies in providing evidence for decision-makers can be questioned due to the restricted entry criteria limiting the ability to generalize to real life populations, narrow focus on individual parameters, use of placebo for comparison rather than usual therapy and unrealistic intense monitoring of patients. Evidence obtained from retrospective and observational studies can supplement that from randomized clinical trials, providing that care is taken to guard against bias and confounders. However, very large numbers of patients must be investigated if small differences between drugs and treatment approaches are to be detected. Administrative databases from healthcare systems provide an opportunity to obtain observational data on large numbers of patients. Such databases have shown that high healthcare costs in patients with COPD are associated with co-morbid conditions and current smoking status. Analysis of an administrative database has also shown that elderly patients with COPD who received inhaled corticosteroids within 90 days of discharge from hospital had 24% fewer repeat hospitalizations for COPD and were 29% less likely to die during the 1-year follow-up period. In conclusion, there are a number of sources of meaningful evidence of the health outcomes arising from different therapeutic approaches that should be of value to healthcare payers making decisions on resource allocation.
D'Cunha, Anitha; Pandit, Lekha; Malli, Chaithra
2017-06-01
Indian data have been largely missing from genome-wide databases that provide information on genetic variations in different populations. This hinders association studies for complex disorders in India. This study was aimed to determine whether the complex genetic structure and endogamy among Indians could potentially influence the design of case-control studies for autoimmune disorders in the south Indian population. A total of 12 single nucleotide variations (SNVs) related to genes associated with autoimmune disorders were genotyped in 370 healthy individuals belonging to six different caste groups in southern India. Allele frequencies were estimated; genetic divergence and phylogenetic relationship within the various caste groups and other HapMap populations were ascertained. Allele frequencies for all genotyped SNVs did not vary significantly among the different groups studied. Wright's FSTwas 0.001 per cent among study population and 0.38 per cent when compared with Gujarati in Houston (GIH) population on HapMap data. The analysis of molecular variance results showed a 97 per cent variation attributable to differences within the study population and <1 per cent variation due to differences between castes. Phylogenetic analysis showed a separation of Dravidian population from other HapMap populations and particularly from GIH population. Despite the complex genetic origins of the Indian population, our study indicated a low level of genetic differentiation among Dravidian language-speaking people of south India. Case-control studies of association among Dravidians of south India may not require stratification based on language and caste.
Pregnancy Outcomes from the Branded Glatiramer Acetate Pregnancy Database.
Sandberg-Wollheim, Magnhild; Neudorfer, Orit; Grinspan, Augusto; Weinstock-Guttman, Bianca; Haas, Judith; Izquierdo, Guillermo; Riley, Claire; Ross, Amy Perrin; Baruch, Peleg; Drillman, Talya; Coyle, Patricia K
2018-01-01
Appropriate counseling and treatment for women with multiple sclerosis (MS) who may become pregnant requires an understanding of the effects of exposure to disease-modifying therapies (DMTs) during pregnancy. Current reports and studies are limited in their usefulness, mostly by small sample size. Branded glatiramer acetate (GA) is a DMT approved for the treatment of relapsing forms of MS. For more than 2 decades, it has been shown to be efficacious and to have a favorable safety profile. The Teva Pharmaceutical Industries Ltd global pharmacovigilance database comprises data from more than 7000 pregnancies, during which women with MS were exposed to treatment with branded GA. We analyzed data from Teva's global pharmacovigilance database. Pregnancy outcomes for patients treated with branded GA were compared with reference rates of abnormal pregnancy outcomes reported in two large registries representing the general population. Pregnancies exposed to branded GA were not at higher risk for congenital anomalies than what is expected in the general population. These data provide evidence that branded GA exposure during pregnancy seems safe, without teratogenic effect.
Pregnancy Outcomes from the Branded Glatiramer Acetate Pregnancy Database
Sandberg-Wollheim, Magnhild; Grinspan, Augusto; Weinstock-Guttman, Bianca; Haas, Judith; Izquierdo, Guillermo; Riley, Claire; Ross, Amy Perrin; Baruch, Peleg; Drillman, Talya; Coyle, Patricia K.
2018-01-01
Abstract Background: Appropriate counseling and treatment for women with multiple sclerosis (MS) who may become pregnant requires an understanding of the effects of exposure to disease-modifying therapies (DMTs) during pregnancy. Current reports and studies are limited in their usefulness, mostly by small sample size. Branded glatiramer acetate (GA) is a DMT approved for the treatment of relapsing forms of MS. For more than 2 decades, it has been shown to be efficacious and to have a favorable safety profile. The Teva Pharmaceutical Industries Ltd global pharmacovigilance database comprises data from more than 7000 pregnancies, during which women with MS were exposed to treatment with branded GA. Methods: We analyzed data from Teva's global pharmacovigilance database. Pregnancy outcomes for patients treated with branded GA were compared with reference rates of abnormal pregnancy outcomes reported in two large registries representing the general population. Results: Pregnancies exposed to branded GA were not at higher risk for congenital anomalies than what is expected in the general population. Conclusions: These data provide evidence that branded GA exposure during pregnancy seems safe, without teratogenic effect. PMID:29507538
Processing and population genetic analysis of multigenic datasets with ProSeq3 software.
Filatov, Dmitry A
2009-12-01
The current tendency in molecular population genetics is to use increasing numbers of genes in the analysis. Here I describe a program for handling and population genetic analysis of DNA polymorphism data collected from multiple genes. The program includes a sequence/alignment editor and an internal relational database that simplify the preparation and manipulation of multigenic DNA polymorphism datasets. The most commonly used DNA polymorphism analyses are implemented in ProSeq3, facilitating population genetic analysis of large multigenic datasets. Extensive input/output options make ProSeq3 a convenient hub for sequence data processing and analysis. The program is available free of charge from http://dps.plants.ox.ac.uk/sequencing/proseq.htm.
Jiang, Li; Wei, Yi-Liang; Zhao, Lei; Li, Na; Liu, Tao; Liu, Hai-Bo; Ren, Li-Jie; Li, Jiu-Ling; Hao, Hui-Fang; Li, Qing; Li, Cai-Xia
2018-07-01
Over the last decade, several panels of ancestry-informative markers have been proposed for the analysis of population genetic structure. The differentiation efficiency depends on the discriminatory ability of the included markers and the reference population coverage. We previously developed a small set of 27 autosomal single nucleotide polymorphisms (SNPs) for analyzing African, European, and East Asian ancestries. In the current study, we gathered a high-coverage reference database of 110 populations (10,350 individuals) from across the globe. The discrimination power of the panel was re-evaluated using four continental ancestry groups (as well as Indigenous Americans). We observed that all the 27 SNPs demonstrated stratified population specificity leading to a striking ancestral discrimination. Five markers (rs728404, rs7170869, rs2470102, rs1448485, and rs4789193) showed differences (δ > 0.3) in the frequency profiles between East Asian and Indigenous American populations. Ancestry components of all involved populations were accurately accessed compared with those from previous genome-wide analyses, thereafter achieved broadly population separation. Thus, our ancestral inference panel of a small number of highly informative SNPs in combination with a large-scale reference database provides a high-resolution in estimating ancestry compositions and distinguishing individual origins. We propose extensive usage in biomedical studies and forensics. Copyright © 2018 Elsevier B.V. All rights reserved.
Huang, Huateng; Title, Pascal O.; Donnellan, Stephen C.; Holmes, Iris; Rabosky, Daniel L.
2017-01-01
Genetic diversity is a fundamental characteristic of species and is affected by many factors, including mutation rate, population size, life history and demography. To better understand the processes that influence levels of genetic diversity across taxa, we collected genome-wide restriction-associated DNA data from more than 500 individuals spanning 76 nominal species of Australian scincid lizards in the genus Ctenotus. To avoid potential biases associated with variation in taxonomic practice across the group, we used coalescent-based species delimitation to delineate 83 species-level lineages within the genus for downstream analyses. We then used these genetic data to infer levels of within-population genetic diversity. Using a phylogenetically informed approach, we tested whether variation in genetic diversity could be explained by population size, environmental heterogeneity or historical demography. We find that the strongest predictor of genetic diversity is a novel proxy for census population size: the number of vouchered occurrences in museum databases. However, museum occurrences only explain a limited proportion of the variance in genetic diversity, suggesting that genetic diversity might be difficult to predict at shallower phylogenetic scales. PMID:28469025
Singhal, Sonal; Huang, Huateng; Title, Pascal O; Donnellan, Stephen C; Holmes, Iris; Rabosky, Daniel L
2017-05-17
Genetic diversity is a fundamental characteristic of species and is affected by many factors, including mutation rate, population size, life history and demography. To better understand the processes that influence levels of genetic diversity across taxa, we collected genome-wide restriction-associated DNA data from more than 500 individuals spanning 76 nominal species of Australian scincid lizards in the genus Ctenotus To avoid potential biases associated with variation in taxonomic practice across the group, we used coalescent-based species delimitation to delineate 83 species-level lineages within the genus for downstream analyses. We then used these genetic data to infer levels of within-population genetic diversity. Using a phylogenetically informed approach, we tested whether variation in genetic diversity could be explained by population size, environmental heterogeneity or historical demography. We find that the strongest predictor of genetic diversity is a novel proxy for census population size: the number of vouchered occurrences in museum databases. However, museum occurrences only explain a limited proportion of the variance in genetic diversity, suggesting that genetic diversity might be difficult to predict at shallower phylogenetic scales. © 2017 The Author(s).
Pan, Lang; Zhang, Jian; Wang, Junzhi; Yu, Qin; Bai, Lianyang; Dong, Liyao
2017-05-08
American sloughgrass (Beckmannia syzigachne Steud.) is a weed widely distributed in wheat fields of China. In recent years, the evolution of herbicide (fenoxaprop-P-ethyl)-resistant populations has decreased the susceptibility of B. syzigachne. This study compared 4 B. syzigachne populations (3 resistant and 1 susceptible) using iTRAQ to characterize fenoxaprop-P-ethyl resistance in B. syzigachne at the proteomic level. Through searching the UniProt database, 3104 protein species were identified from 13,335 unique peptides. Approximately 2834 protein species were assigned to 23 functional classifications provided by the COG database. Among these, 2299 protein species were assigned to 125 predicted pathways. The resistant biotype contained 8 protein species that changed in abundance relative to the susceptible biotype; they were involved in photosynthesis, oxidative phosphorylation, and fatty acid biosynthesis pathways. In contrast to previous studies comparing only 1 resistant and 1 susceptible population, our use of 3 fenoxaprop-resistant B. syzigachne populations with different genetic backgrounds minimized irrelevant differential expression and eliminated false positives. Therefore, we could more confidently link the differentially expressed proteins to herbicide resistance. Proteomic analysis demonstrated that fenoxaprop-P-ethyl resistance is associated with photosynthetic capacity, a connection that might be related to the target-site mutations in resistant B. syzigachne. This is the first large-scale proteomics study examining herbicide stress responses in different B. syzigachne biotypes. This study has biological relevance because it is the first to employ proteomic analysis for understanding the mechanisms underlying Beckmannia syzigachne herbicide resistance. The plant is a major weed in China and negatively affects crop yield, but has developed considerable resistance to the most common herbicide, fenoxaprop-P-ethyl. Through comparisons of resistant and sensitive biotypes, our study identified multiple proteins (involved in photosynthesis, oxidative phosphorylation, and fatty acid biosynthesis) that are putatively linked to B. syzigachne herbicide response. This large-scale proteomics study, sorely lacking in weed science, contributes valuable data that can be applied to more fine-tuned analyses on the functions of specific proteins in herbicide resistance. Copyright © 2017 Elsevier B.V. All rights reserved.
van Staa, T-P; Klungel, O; Smeeth, L
2014-06-01
A solid foundation of evidence of the effects of an intervention is a prerequisite of evidence-based medicine. The best source of such evidence is considered to be randomized trials, which are able to avoid confounding. However, they may not always estimate effectiveness in clinical practice. Databases that collate anonymized electronic health records (EHRs) from different clinical centres have been widely used for many years in observational studies. Randomized point-of-care trials have been initiated recently to recruit and follow patients using the data from EHR databases. In this review, we describe how EHR databases can be used for conducting large-scale simple trials and discuss the advantages and disadvantages of their use. © 2014 The Association for the Publication of the Journal of Internal Medicine.
Cutaneous melanoma in situ: translational evidence from a large population-based study.
Mocellin, Simone; Nitti, Donato
2011-01-01
Cutaneous melanoma in situ (CMIS) is a nosologic entity surrounded by health concerns and unsolved debates. We aimed to shed some light on CMIS by means of a large population-based study. Patients with histologic diagnosis of CMIS were identified from the Surveillance Epidemiology End Results (SEER) database. The records of 93,863 cases of CMIS were available for analysis. CMIS incidence has been steadily increasing over the past 3 decades at a rate higher than any other in situ or invasive tumor, including invasive skin melanoma (annual percentage change [APC]: 9.5% versus 3.6%, respectively). Despite its noninvasive nature, CMIS is treated with excision margins wider than 1 cm in more than one third of cases. CMIS is associated with an increased risk of invasive melanoma (standardized incidence ratio [SIR]: 8.08; 95% confidence interval [CI]: 7.66-8.57), with an estimated 3:5 invasive/in situ ratio; surprisingly, it is also associated with a reduced risk of gastrointestinal (SIR: 0.78, CI: 0.72-0.84) and lung (SIR: 0.65, CI: 0.59-0.71) cancers. Relative survival analysis shows that persons with CMIS have a life expectancy equal to that of the general population. CMIS is increasingly diagnosed and is often overtreated, although it does not affect the life expectancy of its carriers. Patients with CMIS have an increased risk of developing invasive melanoma (which warrants their enrollment in screening programs) but also a reduced risk of some epithelial cancers, which raises the intriguing hypothesis that genetic/environmental risk factors for some tumors may oppose the pathogenesis of others.
Phenotip - a web-based instrument to help diagnosing fetal syndromes antenatally.
Porat, Shay; de Rham, Maud; Giamboni, Davide; Van Mieghem, Tim; Baud, David
2014-12-10
Prenatal ultrasound can often reliably distinguish fetal anatomic anomalies, particularly in the hands of an experienced ultrasonographer. Given the large number of existing syndromes and the significant overlap in prenatal findings, antenatal differentiation for syndrome diagnosis is difficult. We constructed a hierarchic tree of 1140 sonographic markers and submarkers, organized per organ system. Subsequently, a database of prenatally diagnosable syndromes was built. An internet-based search engine was then designed to search the syndrome database based on a single or multiple sonographic markers. Future developments will include a database with magnetic resonance imaging findings as well as further refinements in the search engine to allow prioritization based on incidence of syndromes and markers.
MASSCLEANage—Stellar Cluster Ages from Integrated Colors
NASA Astrophysics Data System (ADS)
Popescu, Bogdan; Hanson, M. M.
2010-11-01
We present the recently updated and expanded MASSCLEANcolors, a database of 70 million Monte Carlo models selected to match the properties (metallicity, ages, and masses) of stellar clusters found in the Large Magellanic Cloud (LMC). This database shows the rather extreme and non-Gaussian distribution of integrated colors and magnitudes expected with different cluster age and mass and the enormous age degeneracy of integrated colors when mass is unknown. This degeneracy could lead to catastrophic failures in estimating age with standard simple stellar population models, particularly if most of the clusters are of intermediate or low mass, like in the LMC. Utilizing the MASSCLEANcolors database, we have developed MASSCLEANage, a statistical inference package which assigns the most likely age and mass (solved simultaneously) to a cluster based only on its integrated broadband photometric properties. Finally, we use MASSCLEANage to derive the age and mass of LMC clusters based on integrated photometry alone. First, we compare our cluster ages against those obtained for the same seven clusters using more accurate integrated spectroscopy. We find improved agreement with the integrated spectroscopy ages over the original photometric ages. A close examination of our results demonstrates the necessity of solving simultaneously for mass and age to reduce degeneracies in the cluster ages derived via integrated colors. We then selected an additional subset of 30 photometric clusters with previously well-constrained ages and independently derive their age using the MASSCLEANage with the same photometry with very good agreement. The MASSCLEANage program is freely available under GNU General Public License.
Shibata, Natsumi; Kimura, Shinya; Hoshino, Takahiro; Takeuchi, Masato; Urushihara, Hisashi
2018-05-11
To date, few large-scale comparative effectiveness studies of influenza vaccination have been conducted in Japan, since marketing authorization for influenza vaccines in Japan has been granted based only on the results of seroconversion and safety in small-sized populations in clinical trial phases not on the vaccine effectiveness. We evaluated the clinical effectiveness of influenza vaccination for children aged 1-15 years in Japan throughout four influenza seasons from 2010 to 2014 in the real world setting. We conducted a cohort study using a large-scale claims database for employee health care insurance plans covering more than 3 million people, including enrollees and their dependents. Vaccination status was identified using plan records for the influenza vaccination subsidies. The effectiveness of influenza vaccination in preventing influenza and its complications was evaluated. To control confounding related to influenza vaccination, odds ratios (OR) were calculated by applying a doubly robust method using the propensity score for vaccination. Total study population throughout the four consecutive influenza seasons was over 116,000. Vaccination rate was higher in younger children and in the recent influenza seasons. Throughout the four seasons, the estimated ORs for influenza onset were statistically significant and ranged from 0.797 to 0.894 after doubly robust adjustment. On age stratification, significant ORs were observed in younger children. Additionally, ORs for influenza complication outcomes, such as pneumonia, hospitalization with influenza and respiratory tract diseases, were significantly reduced, except for hospitalization with influenza in the 2010/2011 and 2012/2013 seasons. We confirmed the clinical effectiveness of influenza vaccination in children aged 1-15 years from the 2010/2011 to 2013/2014 influenza seasons. Influenza vaccine significantly prevented the onset of influenza and was effective in reducing its secondary complications. Copyright © 2018 Elsevier Ltd. All rights reserved.
Lander, Rebecca L; Hambidge, K Michael; Krebs, Nancy F; Westcott, Jamie E; Garces, Ana; Figueroa, Lester; Tejeda, Gabriela; Lokangaka, Adrien; Diba, Tshilenge S; Somannavar, Manjunath S; Honnayya, Ranjitha; Ali, Sumera A; Khan, Umber S; McClure, Elizabeth M; Thorsten, Vanessa R; Stolka, Kristen B
2017-01-01
Background : Our aim was to utilize a feasible quantitative methodology to estimate the dietary adequacy of >900 first-trimester pregnant women in poor rural areas of the Democratic Republic of the Congo, Guatemala, India and Pakistan. This paper outlines the dietary methods used. Methods : Local nutritionists were trained at the sites by the lead study nutritionist and received ongoing mentoring throughout the study. Training topics focused on the standardized conduct of repeat multiple-pass 24-hr dietary recalls, including interview techniques, estimation of portion sizes, and construction of a unique site-specific food composition database (FCDB). Each FCDB was based on 13 food groups and included values for moisture, energy, 20 nutrients (i.e. macro- and micronutrients), and phytate (an anti-nutrient). Nutrient values for individual foods or beverages were taken from recently developed FAO-supported regional food composition tables or the USDA national nutrient database. Appropriate adjustments for differences in moisture and application of nutrient retention and yield factors after cooking were applied, as needed. Generic recipes for mixed dishes consumed by the study population were compiled at each site, followed by calculation of a median recipe per 100 g. Each recipe's nutrient values were included in the FCDB. Final site FCDB checks were planned according to FAO/INFOODS guidelines. Discussion : This dietary strategy provides the opportunity to assess estimated mean group usual energy and nutrient intakes and estimated prevalence of the population 'at risk' of inadequate intakes in first-trimester pregnant women living in four low- and middle-income countries. While challenges and limitations exist, this methodology demonstrates the practical application of a quantitative dietary strategy for a large international multi-site nutrition trial, providing within- and between-site comparisons. Moreover, it provides an excellent opportunity for local capacity building and each site FCDB can be easily modified for additional research activities conducted in other populations living in the same area.
Mean velocity and turbulence measurements in a 90 deg curved duct with thin inlet boundary layer
NASA Technical Reports Server (NTRS)
Crawford, R. A.; Peters, C. E.; Steinhoff, J.; Hornkohl, J. O.; Nourinejad, J.; Ramachandran, K.
1985-01-01
The experimental database established by this investigation of the flow in a large rectangular turning duct is of benchmark quality. The experimental Reynolds numbers, Deans numbers and boundary layer characteristics are significantly different from previous benchmark curved-duct experimental parameters. This investigation extends the experimental database to higher Reynolds number and thinner entrance boundary layers. The 5% to 10% thick boundary layers, based on duct half-width, results in a large region of near-potential flow in the duct core surrounded by developing boundary layers with large crossflows. The turbulent entrance boundary layer case at R sub ed = 328,000 provides an incompressible flowfield which approaches real turbine blade cascade characteristics. The results of this investigation provide a challenging benchmark database for computational fluid dynamics code development.
2011-01-01
Background Guidance documents for the development and validation of patient-reported outcomes (PROs) advise the use of conceptual frameworks, which outline the structure of the concept that a PRO aims to measure. It is unknown whether currently available PROs are based on conceptual frameworks. This study, which was limited to a specific case, had the following aims: (i) to identify conceptual frameworks of physical activity in chronic respiratory patients or similar populations (chronic heart disease patients or the elderly) and (ii) to assess whether the development and validation of PROs to measure physical activity in these populations were based on a conceptual framework of physical activity. Methods Two systematic reviews were conducted through searches of the Medline, Embase, PsycINFO, and Cinahl databases prior to January 2010. Results In the first review, only 2 out of 581 references pertaining to physical activity in the defined populations provided a conceptual framework of physical activity in COPD patients. In the second review, out of 103 studies developing PROs to measure physical activity or related constructs, none were based on a conceptual framework of physical activity. Conclusions These findings raise concerns about how the large body of evidence from studies that use physical activity PRO instruments should be evaluated by health care providers, guideline developers, and regulatory agencies. PMID:21967887
Courtney, Ryan J.; Naicker, Sundresan; Shakeshaft, Anthony; Clare, Philip; Martire, Kristy A.; Mattick, Richard P.
2015-01-01
Background: Smoking cessation research output should move beyond descriptive research of the health problem to testing interventions that can provide causal data and effective evidence-based solutions. This review examined the number and type of published smoking cessation studies conducted in low-socioeconomic status (low-SES) and disadvantaged population groups. Methods: A systematic database search was conducted for two time periods: 2000–2004 (TP1) and 2008–2012 (TP2). Publications that examined smoking cessation in a low-SES or disadvantaged population were coded by: population of interest; study type (reviews, non-data based publications, data-based publications (descriptive, measurement and intervention research)); and country. Intervention studies were coded in accordance with the Cochrane Effective Practice and Organisation of Care data collection checklist and use of biochemical verification of self-reported abstinence was assessed. Results: 278 citations were included. Research output (i.e., all study types) had increased from TP1 27% to TP2 73% (χ² = 73.13, p < 0.001), however, the proportion of data-based research had not significantly increased from TP1 and TP2: descriptive (TP1 = 23% vs. TP2 = 33%) or intervention (TP1 = 77% vs. TP2 = 67%). The proportion of intervention studies adopting biochemical verification of self-reported abstinence had significantly decreased from TP1 to TP2 with an increased reliance on self-reported abstinence (TP1 = 12% vs. TP2 = 36%). Conclusions: The current research output is not ideal or optimal to decrease smoking rates. Research institutions, scholars and funding organisations should take heed to review findings when developing future research and policy. PMID:26062037
Corder, Jennifer Price; Al Ahbabi, Fatima Jaber Sehmi; Al Dhaheri, Hind Saif; Chedid, Fares
2017-09-01
The majority of studies describing demographics and co-occurring conditions in cohorts with Down syndrome come from regions outside of the Middle East, mainly from Europe and North America. This paper describes demographics and co-occurring conditions in a hospital-based cohort of individuals with Down syndrome living in the Middle Eastern country of the United Arab Emirates (UAE). The first dedicated Down syndrome clinic in the UAE was established in 2012 at Tawam Hospital in Al Ain. This paper describes a clinic-based cohort of 221 participants over 4 years from the Gulf Down Syndrome Registry, a new Down syndrome database and contact registry created at Tawam Hospital. Key demographic findings include mean maternal age of 37 years, among the highest described in the literature. Sixty-two percent of mothers are >35 years. Over 90% of mothers received post-natal diagnosis of Down syndrome. High sex ratio, parental consanguinity, and large family size also characterize the group. The spectrum of many co-occurring conditions mirrors that of previously described populations, with some notable differences. Cardiovascular malformations are well represented, however, atrioventricular canal is not the most common. Genitourinary conditions are common, as evidenced by 12% of males with hypospadias and 15% with undescended testes. Glucose-6-phosphate dehydrogenase deficiency, alpha thalassemia trait, hypovitaminosis D, and dental caries are common in our cohort. This study describes a large hospital-based group with Down syndrome presenting to a new dedicated Down syndrome clinic in the UAE, highlighting unique demographic and co-occurring conditions found in that population. © 2017 Wiley Periodicals, Inc.
Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures
Pride, David T; Schoenfeld, Thomas
2008-01-01
Background Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. Results From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes. Conclusion That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis. PMID:18798991
Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures.
Pride, David T; Schoenfeld, Thomas
2008-09-17
Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes. That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis.
Area-level poverty and preterm birth risk: A population-based multilevel analysis
DeFranco, Emily A; Lian, Min; Muglia, Louis A; Schootman, Mario
2008-01-01
Background Preterm birth is a complex disease with etiologic influences from a variety of social, environmental, hormonal, genetic, and other factors. The purpose of this study was to utilize a large population-based birth registry to estimate the independent effect of county-level poverty on preterm birth risk. To accomplish this, we used a multilevel logistic regression approach to account for multiple co-existent individual-level variables and county-level poverty rate. Methods Population-based study utilizing Missouri's birth certificate database (1989–1997). We conducted a multilevel logistic regression analysis to estimate the effect of county-level poverty on PTB risk. Of 634,994 births nested within 115 counties in Missouri, two levels were considered. Individual-level variables included demographics factors, prenatal care, health-related behavioral risk factors, and medical risk factors. The area-level variable included the percentage of the population within each county living below the poverty line (US census data, 1990). Counties were divided into quartiles of poverty; the first quartile (lowest rate of poverty) was the reference group. Results PTB < 35 weeks occurred in 24,490 pregnancies (3.9%). The rate of PTB < 35 weeks was 2.8% in counties within the lowest quartile of poverty and increased through the 4th quartile (4.9%), p < 0.0001. High county-level poverty was significantly associated with PTB risk. PTB risk (< 35 weeks) was increased for women who resided in counties within the highest quartile of poverty, adjusted odds ratio (adjOR) 1.18 (95% CI 1.03, 1.35), with a similar effect at earlier gestational ages (< 32 weeks), adjOR 1.27 (95% CI 1.06, 1.52). Conclusion Women residing in socioeconomically deprived areas are at increased risk of preterm birth, above other underlying risk factors. Although the risk increase is modest, it affects a large number of pregnancies. PMID:18793437
Area-level poverty and preterm birth risk: a population-based multilevel analysis.
DeFranco, Emily A; Lian, Min; Muglia, Louis A; Schootman, Mario
2008-09-15
Preterm birth is a complex disease with etiologic influences from a variety of social, environmental, hormonal, genetic, and other factors. The purpose of this study was to utilize a large population-based birth registry to estimate the independent effect of county-level poverty on preterm birth risk. To accomplish this, we used a multilevel logistic regression approach to account for multiple co-existent individual-level variables and county-level poverty rate. Population-based study utilizing Missouri's birth certificate database (1989-1997). We conducted a multilevel logistic regression analysis to estimate the effect of county-level poverty on PTB risk. Of 634,994 births nested within 115 counties in Missouri, two levels were considered. Individual-level variables included demographics factors, prenatal care, health-related behavioral risk factors, and medical risk factors. The area-level variable included the percentage of the population within each county living below the poverty line (US census data, 1990). Counties were divided into quartiles of poverty; the first quartile (lowest rate of poverty) was the reference group. PTB < 35 weeks occurred in 24,490 pregnancies (3.9%). The rate of PTB < 35 weeks was 2.8% in counties within the lowest quartile of poverty and increased through the 4th quartile (4.9%), p < 0.0001. High county-level poverty was significantly associated with PTB risk. PTB risk (< 35 weeks) was increased for women who resided in counties within the highest quartile of poverty, adjusted odds ratio (adj OR) 1.18 (95% CI 1.03, 1.35), with a similar effect at earlier gestational ages (< 32 weeks), adj OR 1.27 (95% CI 1.06, 1.52). Women residing in socioeconomically deprived areas are at increased risk of preterm birth, above other underlying risk factors. Although the risk increase is modest, it affects a large number of pregnancies.
Salazar, Jose H; Yang, Jingyan; Shen, Liang; Abdullah, Fizan; Kim, Tae W
2014-12-01
Malignant Hyperthermia (MH) is a potentially fatal metabolic disorder. Due to its rarity, limited evidence exists about risk factors, morbidity, and mortality especially in children. Using the Nationwide Inpatient Sample and the Kid's Inpatient Database (KID), admissions with the ICD-9 code for MH (995.86) were extracted for patients 0-17 years of age. Demographic characteristics were analyzed. Logistic regression was performed to identify patient and hospital characteristics associated with mortality. A subset of patients with a surgical ICD-9 code in the KID was studied to calculate the prevalence of MH in the dataset. A total of 310 pediatric admissions were seen in 13 nonoverlapping years of data. Patients had a mortality of 2.9%. Male sex was predominant (64.8%), and 40.5% of the admissions were treated at centers not identified as children's hospitals. The most common associated diagnosis was rhabdomyolysis, which was present in 26 cases. Regression with the outcome of mortality did not yield significant differences between demographic factors, age, sex race, or hospital type, pediatric vs nonpediatric. Within a surgical subset of 530,449 admissions, MH was coded in 55, giving a rate of 1.04 cases per 10,000 cases. This study is the first to combine two large databases to study MH in the pediatric population. The analysis provides an insight into the risk factors, comorbidities, mortality, and prevalence of MH in the United States population. Until more methodologically rigorous, large-scale studies are done, the use of databases will continue to be the optimal method to study rare diseases. © 2014 John Wiley & Sons Ltd.
ClassLess: A Comprehensive Database of Young Stellar Objects
NASA Astrophysics Data System (ADS)
Hillenbrand, Lynne A.; baliber, nairn
2015-08-01
We have designed and constructed a database intended to house catalog and literature-published measurements of Young Stellar Objects (YSOs) within ~1 kpc of the Sun. ClassLess, so called because it includes YSOs in all stages of evolution, is a relational database in which user interaction is conducted via HTML web browsers, queries are performed in scientific language, and all data are linked to the sources of publication. Each star is associated with a cluster (or clusters), and both spatially resolved and unresolved measurements are stored, allowing proper use of data from multiple star systems. With this fully searchable tool, myriad ground- and space-based instruments and surveys across wavelength regimes can be exploited. In addition to primary measurements, the database self consistently calculates and serves higher level data products such as extinction, luminosity, and mass. As a result, searches for young stars with specific physical characteristics can be completed with just a few mouse clicks. We are in the database population phase now, and are eager to engage with interested experts worldwide on local galactic star formation and young stellar populations.
Fortier, Isabel; Doiron, Dany; Little, Julian; Ferretti, Vincent; L’Heureux, François; Stolk, Ronald P; Knoppers, Bartha M; Hudson, Thomas J; Burton, Paul R
2011-01-01
Background Proper understanding of the roles of, and interactions between genetic, lifestyle, environmental and psycho-social factors in determining the risk of development and/or progression of chronic diseases requires access to very large high-quality databases. Because of the financial, technical and time burdens related to developing and maintaining very large studies, the scientific community is increasingly synthesizing data from multiple studies to construct large databases. However, the data items collected by individual studies must be inferentially equivalent to be meaningfully synthesized. The DataSchema and Harmonization Platform for Epidemiological Research (DataSHaPER; http://www.datashaper.org) was developed to enable the rigorous assessment of the inferential equivalence, i.e. the potential for harmonization, of selected information from individual studies. Methods This article examines the value of using the DataSHaPER for retrospective harmonization of established studies. Using the DataSHaPER approach, the potential to generate 148 harmonized variables from the questionnaires and physical measures collected in 53 large population-based studies (6.9 million participants) was assessed. Variable and study characteristics that might influence the potential for data synthesis were also explored. Results Out of all assessment items evaluated (148 variables for each of the 53 studies), 38% could be harmonized. Certain characteristics of variables (i.e. relative importance, individual targeted, reference period) and of studies (i.e. observational units, data collection start date and mode of questionnaire administration) were associated with the potential for harmonization. For example, for variables deemed to be essential, 62% of assessment items paired could be harmonized. Conclusion The current article shows that the DataSHaPER provides an effective and flexible approach for the retrospective harmonization of information across studies. To implement data synthesis, some additional scientific, ethico-legal and technical considerations must be addressed. The success of the DataSHaPER as a harmonization approach will depend on its continuing development and on the rigour and extent of its use. The DataSHaPER has the potential to take us closer to a truly collaborative epidemiology and offers the promise of enhanced research potential generated through synthesized databases. PMID:21804097
Donor cycle and donor segmentation: new tools for improving blood donor management.
Veldhuizen, I; Folléa, G; de Kort, W
2013-07-01
An adequate donor population is of key importance for the entire blood transfusion chain. For good donor management, a detailed overview of the donor database is therefore imperative. This study offers a new description of the donor cycle related to the donor management process. It also presents the outcomes of a European Project, Donor Management IN Europe (DOMAINE), regarding the segmentation of the donor population into donor types. Blood establishments (BEs) from 18 European countries, the Thalassaemia International Federation and a representative from the South-Eastern Europe Health Network joined forces in DOMAINE. A questionnaire assessed blood donor management practices and the composition of the donor population using the newly proposed DOMAINE donor segmentation. 48 BEs in 34 European countries were invited to participate. The response rate was high (88%). However, only 14 BEs could deliver data on the composition of their donor population. The data showed large variations and major imbalances in the donor population. In 79% of the countries, inactive donors formed the dominant donor type. Only in 21%, regular donors were the largest subgroup, and in 29%, the proportion of first-time donors was higher than the proportion of regular donors. Good donor management depends on a thorough insight into the flow of donors through their donor career. Segmentation of the donor database is an essential tool to understand the influx and efflux of donors. The DOMAINE donor segmentation helps BEs in understanding their donor database and to adapt their donor recruitment and retention practices accordingly. Ways to use this new tool are proposed. © 2013 International Society of Blood Transfusion.
Li, Zhenghui; Zhang, Jian; Zhang, Hantao; Lin, Ziqing; Ye, Jian
2018-05-01
Short tandem repeats (STRs) play a vitally important role in forensics. Population data is needed to improve the field. There is currently no large population data-based data set in Chamdo Tibetan. In our study, the allele frequencies and forensic statistical parameters of 18 autosomal STR loci (D5S818, D21S11, D7S820, CSF1PO, D2S1338, D3S1358, VWA, D8S1179, D16S539, PentaE, TPOX, TH01, D19S433, D18S51, FGA, D6S1043, D13S317, and D12S391) included in the DNATyper™19 kit were investigated in 2249 healthy, unrelated Tibetan subjects living in Tibet Chamdo, Southwest China. The combined power of discrimination and the combined probability of exclusion of all 18 loci were 0.9999999999999999999998174 and 0.99999994704, respectively. Furthermore, the genetic relationship between our Tibetan group and 33 previously published populations was also investigated. Phylogenetic analyses revealed that the Chamdo Tibetan population is more closely related genetically with the Lhasa Tibetan group. Our results suggest that these autosomal STR loci are highly polymorphic in the Tibetan population living in Tibet Chamdo and can be used as a powerful tool in forensics, linguistics, and population genetic analyses.
Algorithm to calculate proportional area transformation factors for digital geographic databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Edwards, R.
1983-01-01
A computer technique is described for determining proportionate-area factors used to transform thematic data between large geographic areal databases. The number of calculations in the algorithm increases linearly with the number of segments in the polygonal definitions of the databases, and increases with the square root of the total number of chains. Experience is presented in calculating transformation factors for two national databases, the USGS Water Cataloging Unit outlines and DOT county boundaries which consist of 2100 and 3100 polygons respectively. The technique facilitates using thematic data defined on various natural bases (watersheds, landcover units, etc.) in analyses involving economicmore » and other administrative bases (states, counties, etc.), and vice versa.« less
Feasibility of Linking Population-Based Cancer Registries and Cancer Center Biorepositories
McCusker, Margaret E.; Allen, Mark; Fernandez-Ami, Allyn; Gandour-Edwards, Regina
2012-01-01
Purpose: Biospecimen-based research offers tremendous promise as a way to increase understanding of the molecular epidemiology of cancers. Population-based cancer registries can augment this research by providing more clinical detail and long-term follow-up information than is typically available from biospecimen annotations. In order to demonstrate the feasibility of this concept, we performed a pilot linkage between the California Cancer Registry (CCR) and the University of California, Davis Cancer Center Biorepository (UCD CCB) databases to determine if we could identify patients with records in both databases. Methods: We performed a probabilistic data linkage between 2180 UCD CCB biospecimen records collected during the years 2005–2009 and all CCR records for cancers diagnosed from 1988–2009 based on standard data linkage procedures. Results: The 1040 UCD records with a unique medical record number, tissue site, and pathology date were linked to 3.3 million CCR records. Of these, 844 (81.2%) were identified in both databases. Overall, record matches were highest (100%) for cancers of the cervix and testis/other male genital system organs. For the most common cancers, matches were highest for cancers of the lung and respiratory system (93%), breast (91.7%), and colon and rectum (89.5%), and lower for prostate (72.9%). Conclusions: This pilot linkage demonstrated that information on existing biospecimens from a cancer center biorepository can be linked successfully to cancer registry data. Linkages between existing biorepositories and cancer registries can foster productive collaborations and provide a foundation for virtual biorepository networks to support population-based biospecimen research. PMID:24845042
Makarova, Nataliya; Brand, Tilman; Brünings-Kuppe, Claudia; Pohlabeln, Hermann; Luttmann, Sabine
2016-03-21
The main objective of this study was to explore differences in mortality patterns among two large immigrant groups in Germany: one from Turkey and the other from the former Soviet Union (FSU). To this end, we investigated indicators of premature mortality. This study was conducted as a retrospective population-based study based on mortality register linkage. Using mortality data for the period 2004-2010, we calculated age-standardised death rates (SDR) and standardised mortality ratios (SMR) for premature deaths (
Makarova, Nataliya; Brand, Tilman; Brünings-Kuppe, Claudia; Pohlabeln, Hermann; Luttmann, Sabine
2016-01-01
Objectives The main objective of this study was to explore differences in mortality patterns among two large immigrant groups in Germany: one from Turkey and the other from the former Soviet Union (FSU). To this end, we investigated indicators of premature mortality. Design This study was conducted as a retrospective population-based study based on mortality register linkage. Using mortality data for the period 2004–2010, we calculated age-standardised death rates (SDR) and standardised mortality ratios (SMR) for premature deaths (
Durack, Jeremy C.; Chao, Chih-Chien; Stevenson, Derek; Andriole, Katherine P.; Dev, Parvati
2002-01-01
Medical media collections are growing at a pace that exceeds the value they currently provide as research and educational resources. To address this issue, the Stanford MediaServer was designed to promote innovative multimedia-based application development. The nucleus of the MediaServer platform is a digital media database strategically designed to meet the information needs of many biomedical disciplines. Key features include an intuitive web-based interface for collaboratively populating the media database, flexible creation of media collections for diverse and specialized purposes, and the ability to construct a variety of end-user applications from the same database to support biomedical education and research. PMID:12463820
Durack, Jeremy C; Chao, Chih-Chien; Stevenson, Derek; Andriole, Katherine P; Dev, Parvati
2002-01-01
Medical media collections are growing at a pace that exceeds the value they currently provide as research and educational resources. To address this issue, the Stanford MediaServer was designed to promote innovative multimedia-based application development. The nucleus of the MediaServer platform is a digital media database strategically designed to meet the information needs of many biomedical disciplines. Key features include an intuitive web-based interface for collaboratively populating the media database, flexible creation of media collections for diverse and specialized purposes, and the ability to construct a variety of end-user applications from the same database to support biomedical education and research.
Outline for Research in Large Data Base Resources.
ERIC Educational Resources Information Center
Kahn, Paul
This paper uses a hypothetical application entitled "VAPORTRAILS" to examine how an integrated application can be used to solve the problems of search and retrieval from a range of qualitatively different databases, and the organization of the resulting information into a personal database resource. In addition, four general classes of databases…
Validity of a computerized population registry of dementia based on clinical databases.
Mar, J; Arrospide, A; Soto-Gordoa, M; Machón, M; Iruin, Á; Martinez-Lage, P; Gabilondo, A; Moreno-Izco, F; Gabilondo, A; Arriola, L
2018-05-08
The handling of information through digital media allows innovative approaches for identifying cases of dementia through computerized searches within the clinical databases that include systems for coding diagnoses. The aim of this study was to analyze the validity of a dementia registry in Gipuzkoa based on the administrative and clinical databases existing in the Basque Health Service. This is a descriptive study based on the evaluation of available data sources. First, through review of medical records, the diagnostic validity was evaluated in 2 samples of cases identified and not identified as dementia. The sensitivity, specificity and positive and negative predictive value of the diagnosis of dementia were measured. Subsequently, the cases of living dementia in December 31, 2016 were searched in the entire Gipuzkoa population to collect sociodemographic and clinical variables. The validation samples included 986 cases and 327 no cases. The calculated sensitivity was 80.2% and the specificity was 99.9%. The negative predictive value was 99.4% and positive value was 95.1%. The cases in Gipuzkoa were 10,551, representing 65% of the cases predicted according to the literature. Antipsychotic medication were taken by a 40% and a 25% of the cases were institutionalized. A registry of dementias based on clinical and administrative databases is valid and feasible. Its main contribution is to show the dimension of dementia in the health system. Copyright © 2018 Sociedad Española de Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.
Jackson, Rod
2017-01-01
Background Many national cardiovascular disease (CVD) risk factor management guidelines now recommend that drug treatment decisions should be informed primarily by patients’ multi-variable predicted risk of CVD, rather than on the basis of single risk factor thresholds. To investigate the potential impact of treatment guidelines based on CVD risk thresholds at a national level requires individual level data representing the multi-variable CVD risk factor profiles for a country’s total adult population. As these data are seldom, if ever, available, we aimed to create a synthetic population, representing the joint CVD risk factor distributions of the adult New Zealand population. Methods and results A synthetic population of 2,451,278 individuals, representing the actual age, gender, ethnicity and social deprivation composition of people aged 30–84 years who completed the 2013 New Zealand census was generated using Monte Carlo sampling. Each ‘synthetic’ person was then probabilistically assigned values of the remaining cardiovascular disease (CVD) risk factors required for predicting their CVD risk, based on data from the national census national hospitalisation and drug dispensing databases and a large regional cohort study, using Monte Carlo sampling and multiple imputation. Where possible, the synthetic population CVD risk distributions for each non-demographic risk factor were validated against independent New Zealand data sources. Conclusions We were able to develop a synthetic national population with realistic multi-variable CVD risk characteristics. The construction of this population is the first step in the development of a micro-simulation model intended to investigate the likely impact of a range of national CVD risk management strategies that will inform CVD risk management guideline updates in New Zealand and elsewhere. PMID:28384217
Knight, Josh; Wells, Susan; Marshall, Roger; Exeter, Daniel; Jackson, Rod
2017-01-01
Many national cardiovascular disease (CVD) risk factor management guidelines now recommend that drug treatment decisions should be informed primarily by patients' multi-variable predicted risk of CVD, rather than on the basis of single risk factor thresholds. To investigate the potential impact of treatment guidelines based on CVD risk thresholds at a national level requires individual level data representing the multi-variable CVD risk factor profiles for a country's total adult population. As these data are seldom, if ever, available, we aimed to create a synthetic population, representing the joint CVD risk factor distributions of the adult New Zealand population. A synthetic population of 2,451,278 individuals, representing the actual age, gender, ethnicity and social deprivation composition of people aged 30-84 years who completed the 2013 New Zealand census was generated using Monte Carlo sampling. Each 'synthetic' person was then probabilistically assigned values of the remaining cardiovascular disease (CVD) risk factors required for predicting their CVD risk, based on data from the national census national hospitalisation and drug dispensing databases and a large regional cohort study, using Monte Carlo sampling and multiple imputation. Where possible, the synthetic population CVD risk distributions for each non-demographic risk factor were validated against independent New Zealand data sources. We were able to develop a synthetic national population with realistic multi-variable CVD risk characteristics. The construction of this population is the first step in the development of a micro-simulation model intended to investigate the likely impact of a range of national CVD risk management strategies that will inform CVD risk management guideline updates in New Zealand and elsewhere.
Traditional and Current Food Use of Wild Plants Listed in the Russian Pharmacopoeia.
Shikov, Alexander N; Tsitsilin, Andrey N; Pozharitskaya, Olga N; Makarov, Valery G; Heinrich, Michael
2017-01-01
Historically Russia can be regarded as a "herbophilious" society. For centuries the multinational population of Russia has used plants in daily diet and for self-medication. The specificity of dietary uptake of medicinal plants (especially those in the unique and highly developed Russian herbal medical tradition) has remained mostly unknown in other regions. Based on 11th edition of the State Pharmacopoeia of the USSR, we selected 70 wild plant species which have been used in food by local Russian populations. Empirical searches were conducted via the Russian-wide applied online database E-library.ru, library catalogs of public libraries in St-Petersburg, the databases Scopus, Web of Science, PubMed, and search engine Google Scholar. The large majority of species included in Russian Pharmacopoeia are used as food by local population, however, aerial parts are more widely used for food. In this review, we summarize data on medicinal species published in Russia and other countries that are included in the Russian Pharmacopoeia and have being used in food for a long time. Consequently, the Russian Pharmacopoeia is an important source of information on plant species used traditionally at the interface of food and medicine. At the same time, there are the so-called "functional foods", which denotes foods that not only serves to provide nutrition but also can be a source for prevention and cure of various diseases. This review highlights the potential of wild species of Russia monographed in its pharmacopeia for further developing new functional foods and-through the lens of their incorporation into the pharmacopeia-showcases the species' importance in Russia.
Population Education Accessions List, January-April 2000.
ERIC Educational Resources Information Center
United Nations Educational, Scientific and Cultural Organization, Bangkok (Thailand). Principal Regional Office for Asia and the Pacific.
This document contains output from a computerized bibliographic database. This issue is divided into four parts. Part I consists of titles that address various aspects of population education and is arranged by country in the first section, and general materials in the second section. Part II presents knowledge base information and consists of…
Use of administrative medical databases in population-based research.
Gavrielov-Yusim, Natalie; Friger, Michael
2014-03-01
Administrative medical databases are massive repositories of data collected in healthcare for various purposes. Such databases are maintained in hospitals, health maintenance organisations and health insurance organisations. Administrative databases may contain medical claims for reimbursement, records of health services, medical procedures, prescriptions, and diagnoses information. It is clear that such systems may provide a valuable variety of clinical and demographic information as well as an on-going process of data collection. In general, information gathering in these databases does not initially presume and is not planned for research purposes. Nonetheless, administrative databases may be used as a robust research tool. In this article, we address the subject of public health research that employs administrative data. We discuss the biases and the limitations of such research, as well as other important epidemiological and biostatistical key points specific to administrative database studies.
A blue carbon soil database: Tidal wetland stocks for the US National Greenhouse Gas Inventory
NASA Astrophysics Data System (ADS)
Feagin, R. A.; Eriksson, M.; Hinson, A.; Najjar, R. G.; Kroeger, K. D.; Herrmann, M.; Holmquist, J. R.; Windham-Myers, L.; MacDonald, G. M.; Brown, L. N.; Bianchi, T. S.
2015-12-01
Coastal wetlands contain large reservoirs of carbon, and in 2015 the US National Greenhouse Gas Inventory began the work of placing blue carbon within the national regulatory context. The potential value of a wetland carbon stock, in relation to its location, soon could be influential in determining governmental policy and management activities, or in stimulating market-based CO2 sequestration projects. To meet the national need for high-resolution maps, a blue carbon stock database was developed linking National Wetlands Inventory datasets with the USDA Soil Survey Geographic Database. Users of the database can identify the economic potential for carbon conservation or restoration projects within specific estuarine basins, states, wetland types, physical parameters, and land management activities. The database is geared towards both national-level assessments and local-level inquiries. Spatial analysis of the stocks show high variance within individual estuarine basins, largely dependent on geomorphic position on the landscape, though there are continental scale trends to the carbon distribution as well. Future plans including linking this database with a sedimentary accretion database to predict carbon flux in US tidal wetlands.
A comprehensive SNP and indel imputability database.
Duan, Qing; Liu, Eric Yi; Croteau-Chonka, Damien C; Mohlke, Karen L; Li, Yun
2013-02-15
Genotype imputation has become an indispensible step in genome-wide association studies (GWAS). Imputation accuracy, directly influencing downstream analysis, has shown to be improved using re-sequencing-based reference panels; however, this comes at the cost of high computational burden due to the huge number of potentially imputable markers (tens of millions) discovered through sequencing a large number of individuals. Therefore, there is an increasing need for access to imputation quality information without actually conducting imputation. To facilitate this process, we have established a publicly available SNP and indel imputability database, aiming to provide direct access to imputation accuracy information for markers identified by the 1000 Genomes Project across four major populations and covering multiple GWAS genotyping platforms. SNP and indel imputability information can be retrieved through a user-friendly interface by providing the ID(s) of the desired variant(s) or by specifying the desired genomic region. The query results can be refined by selecting relevant GWAS genotyping platform(s). This is the first database providing variant imputability information specific to each continental group and to each genotyping platform. In Filipino individuals from the Cebu Longitudinal Health and Nutrition Survey, our database can achieve an area under the receiver-operating characteristic curve of 0.97, 0.91, 0.88 and 0.79 for markers with minor allele frequency >5%, 3-5%, 1-3% and 0.5-1%, respectively. Specifically, by filtering out 48.6% of markers (corresponding to a reduction of up to 48.6% in computational costs for actual imputation) based on the imputability information in our database, we can remove 77%, 58%, 51% and 42% of the poorly imputed markers at the cost of only 0.3%, 0.8%, 1.5% and 4.6% of the well-imputed markers with minor allele frequency >5%, 3-5%, 1-3% and 0.5-1%, respectively. http://www.unc.edu/∼yunmli/imputability.html
Degli Esposti, Luca; Saragoni, Stefania; Buda, Stefano; Sturani, Alessandra; Degli Esposti, Ezio
2013-01-01
Diabetes is one of the most prevalent chronic diseases, and its prevalence is predicted to increase in the next two decades. Diabetes imposes a staggering financial burden on the health care system, so information about the costs and experiences of collecting and reporting quality measures of data is vital for practices deciding whether to adopt quality improvements or monitor existing initiatives. The aim of this study was to quantify the association between health care costs and level of glycemic control in patients with type 2 diabetes using clinical and administrative databases. A retrospective analysis using a large administrative database and a clinical registry containing laboratory results was performed. Patients were subdivided according to their glycated hemoglobin level. Multivariate analyses were used to control for differences in potential confounding factors, including age, gender, Charlson comorbidity index, presence of dyslipidemia, hypertension, or cardiovascular disease, and degree of adherence with antidiabetic drugs among the study groups. Of the total population of 700,000 subjects, 31,022 were identified as being diabetic (4.4% of the entire population). Of these, 21,586 met the study inclusion criteria. In total, 31.5% of patients had very poor glycemic control and 25.7% had excellent control. Over 2 years, the mean diabetes-related cost per person was: €1291.56 in patients with excellent control; €1545.99 in those with good control; €1584.07 in those with fair control; €1839.42 in those with poor control; and €1894.80 in those with very poor control. After adjustment, compared with the group having excellent control, the estimated excess cost per person associated with the groups with good control, fair control, poor control, and very poor control was €219.28, €264.65, €513.18, and €564.79, respectively. Many patients showed suboptimal glycemic control. Lower levels of glycated hemoglobin were associated with lower diabetes-related health care costs. Integration of administrative databases and a laboratory database appears to be suitable for showing that appropriate management of diabetes can help to achieve better resource allocation.
Developments in Post-marketing Comparative Effectiveness Research
S, Schneeweiss
2010-01-01
Physicians and insurers need to weigh the effectiveness of new drugs against existing therapeutics in routine care to make decisions about treatment and formularies. Because Food and Drug Administration (FDA) approval of most new drugs requires demonstrating efficacy and safety against placebo, there is limited interest by manufacturers in conducting such head-to-head trials. Comparative effectiveness research seeks to provide head-to-head comparisons of treatment outcomes in routine care. Health-care utilization databases record drug use and selected health outcomes for large populations in a timely way and reflect routine care, and therefore may be the preferred data source for comparative effectiveness research. Confounding caused by selective prescribing based on indication, severity, and prognosis threatens the validity of non-randomized database studies that often have limited details on clinical information. Several recent developments may bring the field closer to acceptable validity, including approaches that exploit the concepts of proxy variables using high-dimensional propensity scores, within-patient variation of drug exposure using crossover designs, and between-provider variation in prescribing preference using instrumental variable (IV) analyses. PMID:17554243
Developments in post-marketing comparative effectiveness research.
Schneeweiss, S
2007-08-01
Physicians and insurers need to weigh the effectiveness of new drugs against existing therapeutics in routine care to make decisions about treatment and formularies. Because Food and Drug Administration (FDA) approval of most new drugs requires demonstrating efficacy and safety against placebo, there is limited interest by manufacturers in conducting such head-to-head trials. Comparative effectiveness research seeks to provide head-to-head comparisons of treatment outcomes in routine care. Health-care utilization databases record drug use and selected health outcomes for large populations in a timely way and reflect routine care, and therefore may be the preferred data source for comparative effectiveness research. Confounding caused by selective prescribing based on indication, severity, and prognosis threatens the validity of non-randomized database studies that often have limited details on clinical information. Several recent developments may bring the field closer to acceptable validity, including approaches that exploit the concepts of proxy variables using high-dimensional propensity scores, within-patient variation of drug exposure using crossover designs, and between-provider variation in prescribing preference using instrumental variable (IV) analyses.
Alibhai, Sky; Jewell, Zoe; Evans, Jonah
2017-01-01
Acquiring reliable data on large felid populations is crucial for effective conservation and management. However, large felids, typically solitary, elusive and nocturnal, are difficult to survey. Tagging and following individuals with VHF or GPS technology is the standard approach, but costs are high and these methodologies can compromise animal welfare. Such limitations can restrict the use of these techniques at population or landscape levels. In this paper we describe a robust technique to identify and sex individual pumas from footprints. We used a standardized image collection protocol to collect a reference database of 535 footprints from 35 captive pumas over 10 facilities; 19 females (300 footprints) and 16 males (235 footprints), ranging in age from 1-20 yrs. Images were processed in JMP data visualization software, generating one hundred and twenty three measurements from each footprint. Data were analyzed using a customized model based on a pairwise trail comparison using robust cross-validated discriminant analysis with a Ward's clustering method. Classification accuracy was consistently > 90% for individuals, and for the correct classification of footprints within trails, and > 99% for sex classification. The technique has the potential to greatly augment the methods available for studying puma and other elusive felids, and is amenable to both citizen-science and opportunistic/local community data collection efforts, particularly as the data collection protocol is inexpensive and intuitive.
Performance monitoring in hip fracture surgery--how big a database do we really need?
Edwards, G A D; Metcalfe, A J; Johansen, A; O'Doherty, D
2010-04-01
Systems for collecting information about patient care are increasingly common in orthopaedic practice. Databases can allow various comparisons to be made over time. Significant decisions regarding service delivery and clinical practice may be made based on their results. We set out to determine the number of cases needed for comparison of 30-day mortality, inpatient wound infection rates and mean hospital length of stay, with a power of 80% for the demonstration of an effect at a significance level of p<0.05. We analysed 2 years of prospectively collected data on 1050 hip fracture patients admitted to a city teaching hospital. Detection of a 10% difference in 30-day mortality would require 14,065 patients in each arm of any comparison, demonstration of a 50% difference would require 643 patients in each arm; for wound infections, demonstration of a 10% difference in incidence would require 23,921 patients in each arm and 1127 patients for demonstration of a 50% difference; for length of stay, a difference of 10% would require 1479 patients and 6660 patients for a 50% difference. This study demonstrates the importance of considering the population sizes before comparisons are made on the basis of basic hip fracture outcome data. Our data also help illustrate the impact of sample size considerations when interpreting the results of performance monitoring. Many researchers will be used to the fact that rare outcomes such as inpatient mortality or wound infection require large sample sizes before differences can be reliably demonstrated between populations. This study gives actual figures that researchers could use when planning studies. Statistically meaningful analyses will only be possible with major multi-centre collaborations, as will be possible if hospital Trusts participate in the National Hip Fracture Database. Copyright (c) 2009 Elsevier Ltd. All rights reserved.
Pacaci, Anil; Gonul, Suat; Sinaci, A Anil; Yuksel, Mustafa; Laleci Erturkmen, Gokce B
2018-01-01
Background: Utilization of the available observational healthcare datasets is key to complement and strengthen the postmarketing safety studies. Use of common data models (CDM) is the predominant approach in order to enable large scale systematic analyses on disparate data models and vocabularies. Current CDM transformation practices depend on proprietarily developed Extract-Transform-Load (ETL) procedures, which require knowledge both on the semantics and technical characteristics of the source datasets and target CDM. Purpose: In this study, our aim is to develop a modular but coordinated transformation approach in order to separate semantic and technical steps of transformation processes, which do not have a strict separation in traditional ETL approaches. Such an approach would discretize the operations to extract data from source electronic health record systems, alignment of the source, and target models on the semantic level and the operations to populate target common data repositories. Approach: In order to separate the activities that are required to transform heterogeneous data sources to a target CDM, we introduce a semantic transformation approach composed of three steps: (1) transformation of source datasets to Resource Description Framework (RDF) format, (2) application of semantic conversion rules to get the data as instances of ontological model of the target CDM, and (3) population of repositories, which comply with the specifications of the CDM, by processing the RDF instances from step 2. The proposed approach has been implemented on real healthcare settings where Observational Medical Outcomes Partnership (OMOP) CDM has been chosen as the common data model and a comprehensive comparative analysis between the native and transformed data has been conducted. Results: Health records of ~1 million patients have been successfully transformed to an OMOP CDM based database from the source database. Descriptive statistics obtained from the source and target databases present analogous and consistent results. Discussion and Conclusion: Our method goes beyond the traditional ETL approaches by being more declarative and rigorous. Declarative because the use of RDF based mapping rules makes each mapping more transparent and understandable to humans while retaining logic-based computability. Rigorous because the mappings would be based on computer readable semantics which are amenable to validation through logic-based inference methods.
The Camden & Islington Research Database: Using electronic mental health records for research.
Werbeloff, Nomi; Osborn, David P J; Patel, Rashmi; Taylor, Matthew; Stewart, Robert; Broadbent, Matthew; Hayes, Joseph F
2018-01-01
Electronic health records (EHRs) are widely used in mental health services. Case registers using EHRs from secondary mental healthcare have the potential to deliver large-scale projects evaluating mental health outcomes in real-world clinical populations. We describe the Camden and Islington NHS Foundation Trust (C&I) Research Database which uses the Clinical Record Interactive Search (CRIS) tool to extract and de-identify routinely collected clinical information from a large UK provider of secondary mental healthcare, and demonstrate its capabilities to answer a clinical research question regarding time to diagnosis and treatment of bipolar disorder. The C&I Research Database contains records from 108,168 mental health patients, of which 23,538 were receiving active care. The characteristics of the patient population are compared to those of the catchment area, of London, and of England as a whole. The median time to diagnosis of bipolar disorder was 76 days (interquartile range: 17-391) and median time to treatment was 37 days (interquartile range: 5-194). Compulsory admission under the UK Mental Health Act was associated with shorter intervals to diagnosis and treatment. Prior diagnoses of other psychiatric disorders were associated with longer intervals to diagnosis, though prior diagnoses of schizophrenia and related disorders were associated with decreased time to treatment. The CRIS tool, developed by the South London and Maudsley NHS Foundation Trust (SLaM) Biomedical Research Centre (BRC), functioned very well at C&I. It is reassuring that data from different organizations deliver similar results, and that applications developed in one Trust can then be successfully deployed in another. The information can be retrieved in a quicker and more efficient fashion than more traditional methods of health research. The findings support the secondary use of EHRs for large-scale mental health research in naturalistic samples and settings investigated across large, diverse geographical areas.
NASA Astrophysics Data System (ADS)
Armigliato, Alberto; Pagnoni, Gianluca; Zaniboni, Filippo; Tinti, Stefano
2013-04-01
TRIDEC is a EU-FP7 Project whose main goal is, in general terms, to develop suitable strategies for the management of crises possibly arising in the Earth management field. The general paradigms adopted by TRIDEC to develop those strategies include intelligent information management, the capability of managing dynamically increasing volumes and dimensionality of information in complex events, and collaborative decision making in systems that are typically very loosely coupled. The two areas where TRIDEC applies and tests its strategies are tsunami early warning and industrial subsurface development. In the field of tsunami early warning, TRIDEC aims at developing a Decision Support System (DSS) that integrates 1) a set of seismic, geodetic and marine sensors devoted to the detection and characterisation of possible tsunamigenic sources and to monitoring the time and space evolution of the generated tsunami, 2) large-volume databases of pre-computed numerical tsunami scenarios, 3) a proper overall system architecture. Two test areas are dealt with in TRIDEC: the western Iberian margin and the eastern Mediterranean. In this study, we focus on the western Iberian margin with special emphasis on the Portuguese coasts. The strategy adopted in TRIDEC plans to populate two different databases, called "Virtual Scenario Database" (VSDB) and "Matching Scenario Database" (MSDB), both of which deal only with earthquake-generated tsunamis. In the VSDB we simulate numerically few large-magnitude events generated by the major known tectonic structures in the study area. Heterogeneous slip distributions on the earthquake faults are introduced to simulate events as "realistically" as possible. The members of the VSDB represent the unknowns that the TRIDEC platform must be able to recognise and match during the early crisis management phase. On the other hand, the MSDB contains a very large number (order of thousands) of tsunami simulations performed starting from many different simple earthquake sources of different magnitudes and located in the "vicinity" of the virtual scenario earthquake. In the DSS perspective, the members of the MSDB have to be suitably combined based on the information coming from the sensor networks, and the results are used during the crisis evolution phase to forecast the degree of exposition of different coastal areas. We provide examples from both databases whose members are computed by means of the in-house software called UBO-TSUFD, implementing the non-linear shallow-water equations and solving them over a set of nested grids that guarantee a suitable spatial resolution (few tens of meters) in specific, suitably chosen, coastal areas.
Harnessing Data to Assess Equity of Care by Race, Ethnicity and Language
Gracia, Amber; Cheirif, Jorge; Veliz, Juana; Reyna, Melissa; Vecchio, Mara; Aryal, Subhash
2015-01-01
Objective: Determine any disparities in care based on race, ethnicity and language (REaL) by utilizing inpatient (IP) core measures at Texas Health Resources, a large, faith-based, non-profit health care delivery system located in a large, ethnically diverse metropolitan area in Texas. These measures, which were established by the U.S. Centers for Medicare and Medicaid Services (CMS) and The Joint Commission (TJC), help to ensure better accountability for patient outcomes throughout the U.S. health care system. Methods: Sample analysis to understand the architecture of race, ethnicity and language (REaL) variables within the Texas Health clinical database, followed by development of the logic, method and framework for isolating populations and evaluating disparities by race (non-Hispanic White, non-Hispanic Black, Native American/Native Hawaiian/Pacific Islander, Asian and Other); ethnicity (Hispanic and non-Hispanic); and preferred language (English and Spanish). The study is based on use of existing clinical data for four inpatient (IP) core measures: Acute Myocardial Infarction (AMI), Congestive Heart Failure (CHF), Pneumonia (PN) and Surgical Care (SCIP), representing 100% of the sample population. These comprise a high number of cases presenting in our acute care facilities. Findings are based on a sample of clinical data (N = 19,873 cases) for the four inpatient (IP) core measures derived from 13 of Texas Health’s wholly-owned facilities, formulating a set of baseline data. Results: Based on applied method, Texas Health facilities consistently scored high with no discernable race, ethnicity and language (REaL) disparities as evidenced by a low percentage difference to the reference point (non-Hispanic White) on IP core measures, including: AMI (0.3%–1.2%), CHF (0.7%–3.0%), PN (0.5%–3.7%), and SCIP (0–0.7%). PMID:26703665
Filipino DNA variation at 12 X-chromosome short tandem repeat markers.
Salvador, Jazelyn M; Apaga, Dame Loveliness T; Delfin, Frederick C; Calacal, Gayvelline C; Dennis, Sheila Estacio; De Ungria, Maria Corazon A
2018-06-08
Demands for solving complex kinship scenarios where only distant relatives are available for testing have risen in the past years. In these instances, other genetic markers such as X-chromosome short tandem repeat (X-STR) markers are employed to supplement autosomal and Y-chromosomal STR DNA typing. However, prior to use, the degree of STR polymorphism in the population requires evaluation through generation of an allele or haplotype frequency population database. This population database is also used for statistical evaluation of DNA typing results. Here, we report X-STR data from 143 unrelated Filipino male individuals who were genotyped via conventional polymerase chain reaction-capillary electrophoresis (PCR-CE) using the 12 X-STR loci included in the Investigator ® Argus X-12 kit (Qiagen) and via massively parallel sequencing (MPS) of seven X-STR loci included in the ForenSeq ™ DNA Signature Prep kit of the MiSeq ® FGx ™ Forensic Genomics System (Illumina). Allele calls between PCR-CE and MPS systems were consistent (100% concordance) across seven overlapping X-STRs. Allele and haplotype frequencies and other parameters of forensic interest were calculated based on length (PCR-CE, 12 X-STRs) and sequence (MPS, seven X-STRs) variations observed in the population. Results of our study indicate that the 12 X-STRs in the PCR-CE system are highly informative for the Filipino population. MPS of seven X-STR loci identified 73 X-STR alleles compared with 55 X-STR alleles that were identified solely by length via PCR-CE. Of the 73 sequence-based alleles observed, six alleles have not been reported in the literature. The population data presented here may serve as a reference Philippine frequency database of X-STRs for forensic casework applications. Copyright © 2018 Elsevier B.V. All rights reserved.
Method for a dummy CD mirror server based on NAS
NASA Astrophysics Data System (ADS)
Tang, Muna; Pei, Jing
2002-09-01
With the development of computer network, information sharing is becoming the necessity in human life. The rapid development of CD-ROM and CD-ROM driver techniques makes it possible to issue large database online. After comparing many designs of dummy CD mirror database, which are the embodiment of a main product in CD-ROM database now and in near future, we proposed and realized a new PC based scheme. Our system has the following merits, such as, supporting all kinds of CD format; supporting many network protocol; the independence of mirror network server and the main server; low price, super large capacity, without the need of any special hardware. Preliminarily experiments have verified the validity of the proposed scheme. Encouraged by the promising application future, we are now preparing to put it into market. This paper discusses the design and implement of the CD-ROM server detailedly.
Medication safety research by observational study design.
Lao, Kim S J; Chui, Celine S L; Man, Kenneth K C; Lau, Wallis C Y; Chan, Esther W; Wong, Ian C K
2016-06-01
Observational studies have been recognised to be essential for investigating the safety profile of medications. Numerous observational studies have been conducted on the platform of large population databases, which provide adequate sample size and follow-up length to detect infrequent and/or delayed clinical outcomes. Cohort and case-control are well-accepted traditional methodologies for hypothesis testing, while within-individual study designs are developing and evolving, addressing previous known methodological limitations to reduce confounding and bias. Respective examples of observational studies of different study designs using medical databases are shown. Methodology characteristics, study assumptions, strengths and weaknesses of each method are discussed in this review.
Tatem, Andrew J; Guerra, Carlos A; Kabaria, Caroline W; Noor, Abdisalan M; Hay, Simon I
2008-10-27
The efficient allocation of financial resources for malaria control and the optimal distribution of appropriate interventions require accurate information on the geographic distribution of malaria risk and of the human populations it affects. Low population densities in rural areas and high population densities in urban areas can influence malaria transmission substantially. Here, the Malaria Atlas Project (MAP) global database of Plasmodium falciparum parasite rate (PfPR) surveys, medical intelligence and contemporary population surfaces are utilized to explore these relationships and other issues involved in combining malaria risk maps with those of human population distribution in order to define populations at risk more accurately. First, an existing population surface was examined to determine if it was sufficiently detailed to be used reliably as a mask to identify areas of very low and very high population density as malaria free regions. Second, the potential of international travel and health guidelines (ITHGs) for identifying malaria free cities was examined. Third, the differences in PfPR values between surveys conducted in author-defined rural and urban areas were examined. Fourth, the ability of various global urban extent maps to reliably discriminate these author-based classifications of urban and rural in the PfPR database was investigated. Finally, the urban map that most accurately replicated the author-based classifications was analysed to examine the effects of urban classifications on PfPR values across the entire MAP database. Masks of zero population density excluded many non-zero PfPR surveys, indicating that the population surface was not detailed enough to define areas of zero transmission resulting from low population densities. In contrast, the ITHGs enabled the identification and mapping of 53 malaria free urban areas within endemic countries. Comparison of PfPR survey results showed significant differences between author-defined 'urban' and 'rural' designations in Africa, but not for the remainder of the malaria endemic world. The Global Rural Urban Mapping Project (GRUMP) urban extent mask proved most accurate for mapping these author-defined rural and urban locations, and further sub-divisions of urban extents into urban and peri-urban classes enabled the effects of high population densities on malaria transmission to be mapped and quantified. The availability of detailed, contemporary census and urban extent data for the construction of coherent and accurate global spatial population databases is often poor. These known sources of uncertainty in population surfaces and urban maps have the potential to be incorporated into future malaria burden estimates. Currently, insufficient spatial information exists globally to identify areas accurately where population density is low enough to impact upon transmission. Medical intelligence does however exist to reliably identify malaria free cities. Moreover, in Africa, urban areas that have a significant effect on malaria transmission can be mapped.
Kingfisher: a system for remote sensing image database management
NASA Astrophysics Data System (ADS)
Bruzzo, Michele; Giordano, Ferdinando; Dellepiane, Silvana G.
2003-04-01
At present retrieval methods in remote sensing image database are mainly based on spatial-temporal information. The increasing amount of images to be collected by the ground station of earth observing systems emphasizes the need for database management with intelligent data retrieval capabilities. The purpose of the proposed method is to realize a new content based retrieval system for remote sensing images database with an innovative search tool based on image similarity. This methodology is quite innovative for this application, at present many systems exist for photographic images, as for example QBIC and IKONA, but they are not able to extract and describe properly remote image content. The target database is set by an archive of images originated from an X-SAR sensor (spaceborne mission, 1994). The best content descriptors, mainly texture parameters, guarantees high retrieval performances and can be extracted without losses independently of image resolution. The latter property allows DBMS (Database Management System) to process low amount of information, as in the case of quick-look images, improving time performance and memory access without reducing retrieval accuracy. The matching technique has been designed to enable image management (database population and retrieval) independently of dimensions (width and height). Local and global content descriptors are compared, during retrieval phase, with the query image and results seem to be very encouraging.
Hirano, Yoko; Asami, Yuko; Kuribayashi, Kazuhiko; Kitazaki, Shigeru; Yamamoto, Yuji; Fujimoto, Yoko
2018-05-01
Many pharmacoepidemiologic studies using large-scale databases have recently been utilized to evaluate the safety and effectiveness of drugs in Western countries. In Japan, however, conventional methodology has been applied to postmarketing surveillance (PMS) to collect safety and effectiveness information on new drugs to meet regulatory requirements. Conventional PMS entails enormous costs and resources despite being an uncontrolled observational study method. This study is aimed at examining the possibility of database research as a more efficient pharmacovigilance approach by comparing a health care claims database and PMS with regard to the characteristics and safety profiles of sertraline-prescribed patients. The characteristics of sertraline-prescribed patients recorded in a large-scale Japanese health insurance claims database developed by MinaCare Co. Ltd. were scanned and compared with the PMS results. We also explored the possibility of detecting signals indicative of adverse reactions based on the claims database by using sequence symmetry analysis. Diabetes mellitus, hyperlipidemia, and hyperthyroidism served as exploratory events, and their detection criteria for the claims database were reported by the Pharmaceuticals and Medical Devices Agency in Japan. Most of the characteristics of sertraline-prescribed patients in the claims database did not differ markedly from those in the PMS. There was no tendency for higher risks of the exploratory events after exposure to sertraline, and this was consistent with sertraline's known safety profile. Our results support the concept of using database research as a cost-effective pharmacovigilance tool that is free of selection bias . Further investigation using database research is required to confirm our preliminary observations. Copyright © 2018. Published by Elsevier Inc.
Fernández, José M; Valencia, Alfonso
2004-10-12
Downloading the information stored in relational databases into XML and other flat formats is a common task in bioinformatics. This periodical dumping of information requires considerable CPU time, disk and memory resources. YAdumper has been developed as a purpose-specific tool to deal with the integral structured information download of relational databases. YAdumper is a Java application that organizes database extraction following an XML template based on an external Document Type Declaration. Compared with other non-native alternatives, YAdumper substantially reduces memory requirements and considerably improves writing performance.
Performance analysis of different database in new internet mapping system
NASA Astrophysics Data System (ADS)
Yao, Xing; Su, Wei; Gao, Shuai
2017-03-01
In the Mapping System of New Internet, Massive mapping entries between AID and RID need to be stored, added, updated, and deleted. In order to better deal with the problem when facing a large number of mapping entries update and query request, the Mapping System of New Internet must use high-performance database. In this paper, we focus on the performance of Redis, SQLite, and MySQL these three typical databases, and the results show that the Mapping System based on different databases can adapt to different needs according to the actual situation.
Data-based Non-Markovian Model Inference
NASA Astrophysics Data System (ADS)
Ghil, Michael
2015-04-01
This talk concentrates on obtaining stable and efficient data-based models for simulation and prediction in the geosciences and life sciences. The proposed model derivation relies on using a multivariate time series of partial observations from a large-dimensional system, and the resulting low-order models are compared with the optimal closures predicted by the non-Markovian Mori-Zwanzig formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a very broad generalization and a time-continuous limit of existing multilevel, regression-based approaches to data-based closure, in particular of empirical model reduction (EMR). We show that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the Mori-Zwanzig formalism. A simple correlation-based stopping criterion for an EMR-MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are given for the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a very broad class of MSM applications. The EMR-MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. The resulting reduced model with energy-conserving nonlinearities captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lokta-Volterra model of population dynamics in its chaotic regime. The positivity constraint on the solutions' components replaces here the quadratic-energy-preserving constraint of fluid-flow problems and it successfully prevents blow-up. This work is based on a close collaboration with M.D. Chekroun, D. Kondrashov, S. Kravtsov and A.W. Robertson.
Pereira, R; Alves, C; Aler, M; Amorim, A; Arévalo, C; Betancor, E; Braganholi, D; Bravo, M L; Brito, P; Builes, J J; Burgos, G; Carvalho, E F; Castillo, A; Catanesi, C I; Cicarelli, R M B; Coufalova, P; Dario, P; D'Amato, M E; Davison, S; Ferragut, J; Fondevila, M; Furfuro, S; García, O; Gaviria, A; Gomes, I; González, E; Gonzalez-Liñan, A; Gross, T E; Hernández, A; Huang, Q; Jiménez, S; Jobim, L F; López-Parra, A M; Marino, M; Marques, S; Martínez-Cortés, G; Masciovecchio, V; Parra, D; Penacino, G; Pinheiro, M F; Porto, M J; Posada, Y; Restrepo, C; Ribeiro, T; Rubio, L; Sala, A; Santurtún, A; Solís, L S; Souto, L; Streitemberger, E; Torres, A; Vilela-Lamego, C; Yunis, J J; Yurrebaso, I; Gusmão, L
2018-01-01
A collaborative effort was carried out by the Spanish and Portuguese Speaking Working Group of the International Society for Forensic Genetics (GHEP-ISFG) to promote knowledge exchange between associate laboratories interested in the implementation of indel-based methodologies and build allele frequency databases of 38 indels for forensic applications. These databases include populations from different countries that are relevant for identification and kinship investigations undertaken by the participating laboratories. Before compiling population data, participants were asked to type the 38 indels in blind samples from annual GHEP-ISFG proficiency tests, using an amplification protocol previously described. Only laboratories that reported correct results contributed with population data to this study. A total of 5839 samples were genotyped from 45 different populations from Africa, America, East Asia, Europe and Middle East. Population differentiation analysis showed significant differences between most populations studied from Africa and America, as well as between two Asian populations from China and East Timor. Low F ST values were detected among most European populations. Overall diversities and parameters of forensic efficiency were high in populations from all continents. Copyright © 2017 Elsevier B.V. All rights reserved.
A mathematical model of neuro-fuzzy approximation in image classification
NASA Astrophysics Data System (ADS)
Gopalan, Sasi; Pinto, Linu; Sheela, C.; Arun Kumar M., N.
2016-06-01
Image digitization and explosion of World Wide Web has made traditional search for image, an inefficient method for retrieval of required grassland image data from large database. For a given input query image Content-Based Image Retrieval (CBIR) system retrieves the similar images from a large database. Advances in technology has increased the use of grassland image data in diverse areas such has agriculture, art galleries, education, industry etc. In all the above mentioned diverse areas it is necessary to retrieve grassland image data efficiently from a large database to perform an assigned task and to make a suitable decision. A CBIR system based on grassland image properties and it uses the aid of a feed-forward back propagation neural network for an effective image retrieval is proposed in this paper. Fuzzy Memberships plays an important role in the input space of the proposed system which leads to a combined neural fuzzy approximation in image classification. The CBIR system with mathematical model in the proposed work gives more clarity about fuzzy-neuro approximation and the convergence of the image features in a grassland image.
ComVisMD - compact visualization of multidimensional data: experimenting with cricket players data
NASA Astrophysics Data System (ADS)
Dandin, Shridhar B.; Ducassé, Mireille
2018-03-01
Database information is multidimensional and often displayed in tabular format (row/column display). Presented in aggregated form, multidimensional data can be used to analyze the records or objects. Online Analytical database Processing (OLAP) proposes mechanisms to display multidimensional data in aggregated forms. A choropleth map is a thematic map in which areas are colored in proportion to the measurement of a statistical variable being displayed, such as population density. They are used mostly for compact graphical representation of geographical information. We propose a system, ComVisMD inspired by choropleth map and the OLAP cube to visualize multidimensional data in a compact way. ComVisMD displays multidimensional data like OLAP Cube, where we are mapping an attribute a (first dimension, e.g. year started playing cricket) in vertical direction, object coloring based on b (second dimension, e.g. batting average), mapping varying-size circles based on attribute c (third dimension, e.g. highest score), mapping numbers based on attribute d (fourth dimension, e.g. matches played). We illustrate our approach on cricket players data, namely on two tables Country and Player. They have a large number of rows and columns: 246 rows and 17 columns for players of one country. ComVisMD’s visualization reduces the size of the tabular display by a factor of about 4, allowing users to grasp more information at a time than the bare table display.
An ab initio electronic transport database for inorganic materials.
Ricci, Francesco; Chen, Wei; Aydemir, Umut; Snyder, G Jeffrey; Rignanese, Gian-Marco; Jain, Anubhav; Hautier, Geoffroy
2017-07-04
Electronic transport in materials is governed by a series of tensorial properties such as conductivity, Seebeck coefficient, and effective mass. These quantities are paramount to the understanding of materials in many fields from thermoelectrics to electronics and photovoltaics. Transport properties can be calculated from a material's band structure using the Boltzmann transport theory framework. We present here the largest computational database of electronic transport properties based on a large set of 48,000 materials originating from the Materials Project database. Our results were obtained through the interpolation approach developed in the BoltzTraP software, assuming a constant relaxation time. We present the workflow to generate the data, the data validation procedure, and the database structure. Our aim is to target the large community of scientists developing materials selection strategies and performing studies involving transport properties.
Architectural Implications for Spatial Object Association Algorithms*
Kumar, Vijay S.; Kurc, Tahsin; Saltz, Joel; Abdulla, Ghaleb; Kohn, Scott R.; Matarazzo, Celeste
2013-01-01
Spatial object association, also referred to as crossmatch of spatial datasets, is the problem of identifying and comparing objects in two or more datasets based on their positions in a common spatial coordinate system. In this work, we evaluate two crossmatch algorithms that are used for astronomical sky surveys, on the following database system architecture configurations: (1) Netezza Performance Server®, a parallel database system with active disk style processing capabilities, (2) MySQL Cluster, a high-throughput network database system, and (3) a hybrid configuration consisting of a collection of independent database system instances with data replication support. Our evaluation provides insights about how architectural characteristics of these systems affect the performance of the spatial crossmatch algorithms. We conducted our study using real use-case scenarios borrowed from a large-scale astronomy application known as the Large Synoptic Survey Telescope (LSST). PMID:25692244
Cortesi, Paolo A; Assietti, Roberto; Cuzzocrea, Fabrizio; Prestamburgo, Domenico; Pluderi, Mauro; Cozzolino, Paolo; Tito, Patrizia; Vanelli, Roberto; Cecconi, Davide; Borsa, Stefano; Cesana, Giancarlo; Mantovani, Lorenzo G
2017-09-15
Retrospective large population based-study. Assessment of the epidemiologic trends and economic burden of first spinal fusions. No adequate data are available regarding the epidemiology of spinal fusion surgery and its economic impact in Europe. The study population was identified through a data warehouse (DENALI), which matches clinical and economic data of different Healthcare Administrative databases of the Italian Lombardy Region. The study population consisted of all subjects, resident in Lombardy, who, during the period January 2001 to December 2010, underwent spinal fusion surgery (ICD-9-CM codes: 81.04, 81.05, 81.06, 81.07, and 81.08). The first procedure was used as the index event. We estimated the incidence of first spinal fusion surgery, the population and surgery characteristics and the healthcare costs from the National Health Service's perspective. The analysis was performed for the entire population and divided into the main groups of diagnosis. The analysis identified 17,772 [mean age (SD): 54.6 (14.5) years, 55.3% females] spinal fusion surgeries. Almost 67% of the patients suffered from a lumbar degenerative disease. The incidence rate of interventions increased from 11.5 to 18.5 per 100,000 person-year between 2001 and 2006, and was above 20.0 per 100,000 person-year in the last 4 years. The patients' mean age increased during the observational time period from 48.1 to 55.9 years; whereas the median hospital length of stay reported for the index event decreased. The average cost of the spinal fusion surgery increased during the observational period, from &OV0556; 4726 up to &OV0556; 9388. The study showed an increasing incidence of spinal fusion surgery and costs from 2001 to 2010. These results can be used to better understand the epidemiological and economic burden of these interventions, and help to optimize the resources available considering the different clinical approaches accessible today. 4.
Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples
Macpherson, J. Michael; Eriksson, Nick; Saxonov, Serge; Pe'er, Itsik; Mountain, Joanna L.
2012-01-01
Although a few hundred single nucleotide polymorphisms (SNPs) suffice to infer close familial relationships, high density genome-wide SNP data make possible the inference of more distant relationships such as 2nd to 9th cousinships. In order to characterize the relationship between genetic similarity and degree of kinship given a timeframe of 100–300 years, we analyzed the sharing of DNA inferred to be identical by descent (IBD) in a subset of individuals from the 23andMe customer database (n = 22,757) and from the Human Genome Diversity Panel (HGDP-CEPH, n = 952). With data from 121 populations, we show that the average amount of DNA shared IBD in most ethnolinguistically-defined populations, for example Native American groups, Finns and Ashkenazi Jews, differs from continentally-defined populations by several orders of magnitude. Via extensive pedigree-based simulations, we determined bounds for predicted degrees of relationship given the amount of genomic IBD sharing in both endogamous and ‘unrelated’ population samples. Using these bounds as a guide, we detected tens of thousands of 2nd to 9th degree cousin pairs within a heterogenous set of 5,000 Europeans. The ubiquity of distant relatives, detected via IBD segments, in both ethnolinguistic populations and in large ‘unrelated’ populations samples has important implications for genetic genealogy, forensics and genotype/phenotype mapping studies. PMID:22509285
2013-01-01
Background Large-scale pharmaco-epidemiological studies of Chinese herbal medicine (CHM) for treatment of urticaria are few, even though clinical trials showed some CHM are effective. The purpose of this study was to explore the frequencies and patterns of CHM prescriptions for urticaria by analysing the population-based CHM database in Taiwan. Methods This study was linked to and processed through the complete traditional CHM database of the National Health Insurance Research Database in Taiwan during 2009. We calculated the frequencies and patterns of CHM prescriptions used for treatment of urticaria, of which the diagnosis was defined as the single ICD-9 Code of 708. Frequent itemset mining, as applied to data mining, was used to analyse co-prescription of CHM for patients with urticaria. Results There were 37,386 subjects who visited traditional Chinese Medicine clinics for urticaria in Taiwan during 2009 and received a total of 95,765 CHM prescriptions. Subjects between 18 and 35 years of age comprised the largest number of those treated (32.76%). In addition, women used CHM for urticaria more frequently than men (female:male = 1.94:1). There was an average of 5.54 items prescribed in the form of either individual Chinese herbs or a formula in a single CHM prescription for urticaria. Bai-Xian-Pi (Dictamnus dasycarpus Turcz) was the most commonly prescribed single Chinese herb while Xiao-Feng San was the most commonly prescribed Chinese herbal formula. The most commonly prescribed CHM drug combination was Xiao-Feng San plus Bai-Xian-Pi while the most commonly prescribed triple drug combination was Xiao-Feng San, Bai-Xian-Pi, and Di-Fu Zi (Kochia scoparia). Conclusions In view of the popularity of CHM such as Xiao-Feng San prescribed for the wind-heat pattern of urticaria in this study, a large-scale, randomized clinical trial is warranted to research their efficacy and safety. PMID:23947955
Chien, Pei-Shan; Tseng, Yu-Fang; Hsu, Yao-Chin; Lai, Yu-Kai; Weng, Shih-Feng
2013-08-15
Large-scale pharmaco-epidemiological studies of Chinese herbal medicine (CHM) for treatment of urticaria are few, even though clinical trials showed some CHM are effective. The purpose of this study was to explore the frequencies and patterns of CHM prescriptions for urticaria by analysing the population-based CHM database in Taiwan. This study was linked to and processed through the complete traditional CHM database of the National Health Insurance Research Database in Taiwan during 2009. We calculated the frequencies and patterns of CHM prescriptions used for treatment of urticaria, of which the diagnosis was defined as the single ICD-9 Code of 708. Frequent itemset mining, as applied to data mining, was used to analyse co-prescription of CHM for patients with urticaria. There were 37,386 subjects who visited traditional Chinese Medicine clinics for urticaria in Taiwan during 2009 and received a total of 95,765 CHM prescriptions. Subjects between 18 and 35 years of age comprised the largest number of those treated (32.76%). In addition, women used CHM for urticaria more frequently than men (female:male = 1.94:1). There was an average of 5.54 items prescribed in the form of either individual Chinese herbs or a formula in a single CHM prescription for urticaria. Bai-Xian-Pi (Dictamnus dasycarpus Turcz) was the most commonly prescribed single Chinese herb while Xiao-Feng San was the most commonly prescribed Chinese herbal formula. The most commonly prescribed CHM drug combination was Xiao-Feng San plus Bai-Xian-Pi while the most commonly prescribed triple drug combination was Xiao-Feng San, Bai-Xian-Pi, and Di-Fu Zi (Kochia scoparia). In view of the popularity of CHM such as Xiao-Feng San prescribed for the wind-heat pattern of urticaria in this study, a large-scale, randomized clinical trial is warranted to research their efficacy and safety.
Yoshikawa, Munemitsu; Yamashiro, Kenji; Miyake, Masahiro; Oishi, Maho; Akagi-Kurashige, Yumiko; Kumagai, Kyoko; Nakata, Isao; Nakanishi, Hideo; Oishi, Akio; Gotoh, Norimoto; Yamada, Ryo; Matsuda, Fumihiko; Yoshimura, Nagahisa
2014-10-21
We investigated the association between refractive error in a Japanese population and myopia-related genes identified in two recent large-scale genome-wide association studies. Single-nucleotide polymorphisms (SNPs) in 51 genes that were reported by the Consortium for Refractive Error and Myopia and/or the 23andMe database were genotyped in 3712 healthy Japanese volunteers from the Nagahama Study using HumanHap610K Quad, HumanOmni2.5M, and/or HumanExome Arrays. To evaluate the association between refractive error and recently identified myopia-related genes, we used three approaches to perform quantitative trait locus analyses of mean refractive error in both eyes of the participants: per-SNP, gene-based top-SNP, and gene-based all-SNP analyses. Association plots of successfully replicated genes also were investigated. In our per-SNP analysis, eight myopia gene associations were replicated successfully: GJD2, RASGRF1, BICC1, KCNQ5, CD55, CYP26A1, LRRC4C, and B4GALNT2.Seven additional gene associations were replicated in our gene-based analyses: GRIA4, BMP2, QKI, BMP4, SFRP1, SH3GL2, and EHBP1L1. The signal strength of the reported SNPs and their tagging SNPs increased after considering different linkage disequilibrium patterns across ethnicities. Although two previous studies suggested strong associations between PRSS56, LAMA2, TOX, and RDH5 and myopia, we could not replicate these results. Our results confirmed the significance of the myopia-related genes reported previously and suggested that gene-based replication analyses are more effective than per-SNP analyses. Our comparison with two previous studies suggested that BMP3 SNPs cause myopia primarily in Caucasian populations, while they may exhibit protective effects in Asian populations. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.
Designing an Integrated System of Databases: A Workstation for Information Seekers.
ERIC Educational Resources Information Center
Micco, Mary; Smith, Irma
1987-01-01
Proposes a framework for the design of a full function workstation for information retrieval based on study of information seeking behavior. A large amount of local storage of the CD-ROM jukebox variety and full networking capability to both local and external databases are identified as requirements of the prototype. (MES)
Multiple Object Retrieval in Image Databases Using Hierarchical Segmentation Tree
ERIC Educational Resources Information Center
Chen, Wei-Bang
2012-01-01
The purpose of this research is to develop a new visual information analysis, representation, and retrieval framework for automatic discovery of salient objects of user's interest in large-scale image databases. In particular, this dissertation describes a content-based image retrieval framework which supports multiple-object retrieval. The…
Georgitsi, Marianthi; Viennas, Emmanouil; Gkantouna, Vassiliki; Christodoulopoulou, Elena; Zagoriti, Zoi; Tafrali, Christina; Ntellos, Fotios; Giannakopoulou, Olga; Boulakou, Athanassia; Vlahopoulou, Panagiota; Kyriacou, Eva; Tsaknakis, John; Tsakalidis, Athanassios; Poulas, Konstantinos; Tzimas, Giannis; Patrinos, George P
2011-01-01
Population and ethnic group-specific allele frequencies of pharmacogenomic markers are poorly documented and not systematically collected in structured data repositories. We developed the Frequency of Inherited Disorders Pharmacogenomics database (FINDbase-PGx), a separate module of the FINDbase, aiming to systematically document pharmacogenomic allele frequencies in various populations and ethnic groups worldwide. We critically collected and curated 214 scientific articles reporting pharmacogenomic markers allele frequencies in various populations and ethnic groups worldwide. Subsequently, in order to host the curated data, support data visualization and data mining, we developed a website application, utilizing Microsoft™ PivotViewer software. Curated allelic frequency data pertaining to 144 pharmacogenomic markers across 14 genes, representing approximately 87,000 individuals from 150 populations worldwide, are currently included in FINDbase-PGx. A user-friendly query interface allows for easy data querying, based on numerous content criteria, such as population, ethnic group, geographical region, gene, drug and rare allele frequency. FINDbase-PGx is a comprehensive database, which, unlike other pharmacogenomic knowledgebases, fulfills the much needed requirement to systematically document pharmacogenomic allelic frequencies in various populations and ethnic groups worldwide.
Clement, Fiona; Zimmer, Scott; Dixon, Elijah; Ball, Chad G.; Heitman, Steven J.; Swain, Mark; Ghosh, Subrata
2016-01-01
Importance At the turn of the 21st century, studies evaluating the change in incidence of appendicitis over time have reported inconsistent findings. Objectives We compared the differences in the incidence of appendicitis derived from a pathology registry versus an administrative database in order to validate coding in administrative databases and establish temporal trends in the incidence of appendicitis. Design We conducted a population-based comparative cohort study to identify all individuals with appendicitis from 2000 to2008. Setting & Participants Two population-based data sources were used to identify cases of appendicitis: 1) a pathology registry (n = 8,822); and 2) a hospital discharge abstract database (n = 10,453). Intervention & Main Outcome The administrative database was compared to the pathology registry for the following a priori analyses: 1) to calculate the positive predictive value (PPV) of administrative codes; 2) to compare the annual incidence of appendicitis; and 3) to assess differences in temporal trends. Temporal trends were assessed using a generalized linear model that assumed a Poisson distribution and reported as an annual percent change (APC) with 95% confidence intervals (CI). Analyses were stratified by perforated and non-perforated appendicitis. Results The administrative database (PPV = 83.0%) overestimated the incidence of appendicitis (100.3 per 100,000) when compared to the pathology registry (84.2 per 100,000). Codes for perforated appendicitis were not reliable (PPV = 52.4%) leading to overestimation in the incidence of perforated appendicitis in the administrative database (34.8 per 100,000) as compared to the pathology registry (19.4 per 100,000). The incidence of appendicitis significantly increased over time in both the administrative database (APC = 2.1%; 95% CI: 1.3, 2.8) and pathology registry (APC = 4.1; 95% CI: 3.1, 5.0). Conclusion & Relevance The administrative database overestimated the incidence of appendicitis, particularly among perforated appendicitis. Therefore, studies utilizing administrative data to analyze perforated appendicitis should be interpreted cautiously. PMID:27820826
Abou El Hassan, Mohamed; Stoianov, Alexandra; Araújo, Petra A T; Sadeghieh, Tara; Chan, Man Khun; Chen, Yunqi; Randell, Edward; Nieuwesteeg, Michelle; Adeli, Khosrow
2015-11-01
The CALIPER program has established a comprehensive database of pediatric reference intervals using largely the Abbott ARCHITECT biochemical assays. To expand clinical application of CALIPER reference standards, the present study is aimed at transferring CALIPER reference intervals from the Abbott ARCHITECT to Beckman Coulter AU assays. Transference of CALIPER reference intervals was performed based on the CLSI guidelines C28-A3 and EP9-A2. The new reference intervals were directly verified using up to 100 reference samples from the healthy CALIPER cohort. We found a strong correlation between Abbott ARCHITECT and Beckman Coulter AU biochemical assays, allowing the transference of the vast majority (94%; 30 out of 32 assays) of CALIPER reference intervals previously established using Abbott assays. Transferred reference intervals were, in general, similar to previously published CALIPER reference intervals, with some exceptions. Most of the transferred reference intervals were sex-specific and were verified using healthy reference samples from the CALIPER biobank based on CLSI criteria. It is important to note that the comparisons performed between the Abbott and Beckman Coulter assays make no assumptions as to assay accuracy or which system is more correct/accurate. The majority of CALIPER reference intervals were transferrable to Beckman Coulter AU assays, allowing the establishment of a new database of pediatric reference intervals. This further expands the utility of the CALIPER database to clinical laboratories using the AU assays; however, each laboratory should validate these intervals for their analytical platform and local population as recommended by the CLSI. Copyright © 2015 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
Wawrzyniak, Zbigniew M; Paczesny, Daniel; Mańczuk, Marta; Zatoński, Witold A
2011-01-01
Large-scale epidemiologic studies can assess health indicators differentiating social groups and important health outcomes of the incidence and mortality of cancer, cardiovascular disease, and others, to establish a solid knowledgebase for the prevention management of premature morbidity and mortality causes. This study presents new advanced methods of data collection and data management systems with current data quality control and security to ensure high quality data assessment of health indicators in the large epidemiologic PONS study (The Polish-Norwegian Study). The material for experiment is the data management design of the large-scale population study in Poland (PONS) and the managed processes are applied into establishing a high quality and solid knowledge. The functional requirements of the PONS study data collection, supported by the advanced IT web-based methods, resulted in medical data of a high quality, data security, with quality data assessment, control process and evolution monitoring are fulfilled and shared by the IT system. Data from disparate and deployed sources of information are integrated into databases via software interfaces, and archived by a multi task secure server. The practical and implemented solution of modern advanced database technologies and remote software/hardware structure successfully supports the research of the big PONS study project. Development and implementation of follow-up control of the consistency and quality of data analysis and the processes of the PONS sub-databases have excellent measurement properties of data consistency of more than 99%. The project itself, by tailored hardware/software application, shows the positive impact of Quality Assurance (QA) on the quality of outcomes analysis results, effective data management within a shorter time. This efficiency ensures the quality of the epidemiological data and indicators of health by the elimination of common errors of research questionnaires and medical measurements.
Muscatello, David J.; Amin, Janaki; MacIntyre, C. Raina; Newall, Anthony T.; Rawlinson, William D.; Sintchenko, Vitali; Gilmour, Robin; Thackway, Sarah
2014-01-01
Background Historically, counting influenza recorded in administrative health outcome databases has been considered insufficient to estimate influenza attributable morbidity and mortality in populations. We used database record linkage to evaluate whether modern databases have similar limitations. Methods Person-level records were linked across databases of laboratory notified influenza, emergency department (ED) presentations, hospital admissions and death registrations, from the population (∼6.9 million) of New South Wales (NSW), Australia, 2005 to 2008. Results There were 2568 virologically diagnosed influenza infections notified. Among those, 25% of 40 who died, 49% of 1451 with a hospital admission and 7% of 1742 with an ED presentation had influenza recorded on the respective database record. Compared with persons aged ≥65 years and residents of regional and remote areas, respectively, children and residents of major cities were more likely to have influenza coded on their admission record. Compared with older persons and admitted patients, respectively, working age persons and non-admitted persons were more likely to have influenza coded on their ED record. On both ED and admission records, persons with influenza type A infection were more likely than those with type B infection to have influenza coded. Among death registrations, hospital admissions and ED presentations with influenza recorded as a cause of illness, 15%, 28% and 1.4%, respectively, also had laboratory notified influenza. Time trends in counts of influenza recorded on the ED, admission and death databases reflected the trend in counts of virologically diagnosed influenza. Conclusions A minority of the death, hospital admission and ED records for persons with a virologically diagnosed influenza infection identified influenza as a cause of illness. Few database records with influenza recorded as a cause had laboratory confirmation. The databases have limited value for estimating incidence of influenza outcomes, but can be used for monitoring variation in incidence over time. PMID:24875306
Wennberg, David E; Sharp, Sandra M; Bevan, Gwyn; Skinner, Jonathan S; Gottlieb, Daniel J; Wennberg, John E
2014-04-10
To compare the performance of two new approaches to risk adjustment that are free of the influence of observational intensity with methods that depend on diagnoses listed in administrative databases. Administrative data from the US Medicare program for services provided in 2007 among 306 US hospital referral regions. Cross sectional analysis. 20% sample of fee for service Medicare beneficiaries residing in one of 306 hospital referral regions in the United States in 2007 (n = 5,153,877). The effect of health risk adjustment on age, sex, and race adjusted mortality and spending rates among hospital referral regions using four indices: the standard Centers for Medicare and Medicaid Services--Hierarchical Condition Categories (HCC) index used by the US Medicare program (calculated from diagnoses listed in Medicare's administrative database); a visit corrected HCC index (to reduce the effects of observational intensity on frequency of diagnoses); a poverty index (based on US census); and a population health index (calculated using data on incidence of hip fractures and strokes, and responses from a population based annual survey of health from the Centers for Disease Control and Prevention). Estimated variation in age, sex, and race adjusted mortality rates across hospital referral regions was reduced using the indices based on population health, poverty, and visit corrected HCC, but increased using the standard HCC index. Most of the residual variation in age, sex, and race adjusted mortality was explained (in terms of weighted R2) by the population health index: R2=0.65. The other indices explained less: R2=0.20 for the visit corrected HCC index; 0.19 for the poverty index, and 0.02 for the standard HCC index. The residual variation in age, sex, race, and price adjusted spending per capita across the 306 hospital referral regions explained by the indices (in terms of weighted R2) were 0.50 for the standard HCC index, 0.21 for the population health index, 0.12 for the poverty index, and 0.07 for the visit corrected HCC index, implying that only a modest amount of the variation in spending can be explained by factors most closely related to mortality. Further, once the HCC index is visit corrected it accounts for almost none of the residual variation in age, sex, and race adjusted spending. Health risk adjustment using either the poverty index or the population health index performed substantially better in terms of explaining actual mortality than the indices that relied on diagnoses from administrative databases; the population health index explained the majority of residual variation in age, sex, and race adjusted mortality. Owing to the influence of observational intensity on diagnoses from administrative databases, the standard HCC index over-adjusts for regional differences in spending. Research to improve health risk adjustment methods should focus on developing measures of risk that do not depend on observation influenced diagnoses recorded in administrative databases.
Cross-Matching Source Observations from the Palomar Transient Factory (PTF)
NASA Astrophysics Data System (ADS)
Laher, Russ; Grillmair, C.; Surace, J.; Monkewitz, S.; Jackson, E.
2009-01-01
Over the four-year lifetime of the PTF project, approximately 40 billion instances of astronomical-source observations will be extracted from the image data. The instances will correspond to the same astronomical objects being observed at roughly 25-50 different times, and so a very large catalog containing important object-variability information will be the chief PTF product. Organizing astronomical-source catalogs is conventionally done by dividing the catalog into declination zones and sorting by right ascension within each zone (e.g., the USNOA star catalog), in order to facilitate catalog searches. This method was reincarnated as the "zones" algorithm in a SQL-Server database implementation (Szalay et al., MSR-TR-2004-32), with corrections given by Gray et al. (MSR-TR-2006-52). The primary advantage of this implementation is that all of the work is done entirely on the database server and client/server communication is eliminated. We implemented the methods outlined in Gray et al. for a PostgreSQL database. We programmed the methods as database functions in PL/pgSQL procedural language. The cross-matching is currently based on source positions, but we intend to extend it to use both positions and positional uncertainties to form a chi-square statistic for optimal thresholding. The database design includes three main tables, plus a handful of internal tables. The Sources table stores the SExtractor source extractions taken at various times; the MergedSources table stores statistics about the astronomical objects, which are the result of cross-matching records in the Sources table; and the Merges table, which associates cross-matched primary keys in the Sources table with primary keys in the MergedSoures table. Besides judicious database indexing, we have also internally partitioned the Sources table by declination zone, in order to speed up the population of Sources records and make the database more manageable. The catalog will be accessible to the public after the proprietary period through IRSA (irsa.ipac.caltech.edu).
Compressing DNA sequence databases with coil.
White, W Timothy J; Hendy, Michael D
2008-05-20
Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression - an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression - the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.
Compressing DNA sequence databases with coil
White, W Timothy J; Hendy, Michael D
2008-01-01
Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work. PMID:18489794
Image-based query-by-example for big databases of galaxy images
NASA Astrophysics Data System (ADS)
Shamir, Lior; Kuminski, Evan
2017-01-01
Very large astronomical databases containing millions or even billions of galaxy images have been becoming increasingly important tools in astronomy research. However, in many cases the very large size makes it more difficult to analyze these data manually, reinforcing the need for computer algorithms that can automate the data analysis process. An example of such task is the identification of galaxies of a certain morphology of interest. For instance, if a rare galaxy is identified it is reasonable to expect that more galaxies of similar morphology exist in the database, but it is virtually impossible to manually search these databases to identify such galaxies. Here we describe computer vision and pattern recognition methodology that receives a galaxy image as an input, and searches automatically a large dataset of galaxies to return a list of galaxies that are visually similar to the query galaxy. The returned list is not necessarily complete or clean, but it provides a substantial reduction of the original database into a smaller dataset, in which the frequency of objects visually similar to the query galaxy is much higher. Experimental results show that the algorithm can identify rare galaxies such as ring galaxies among datasets of 10,000 astronomical objects.
CLAST: CUDA implemented large-scale alignment search tool.
Yano, Masahiro; Mori, Hiroshi; Akiyama, Yutaka; Yamada, Takuji; Kurokawa, Ken
2014-12-11
Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. Therefore, there is a clear need for a sequence similarity search tool that can rapidly detect weak similarity in large datasets. We developed a tool, which we named CLAST (CUDA implemented large-scale alignment search tool), that enables analyses of millions of reads and thousands of reference genome sequences, and runs on NVIDIA Fermi architecture graphics processing units. CLAST has four main advantages over existing alignment tools. First, CLAST was capable of identifying sequence similarities ~80.8 times faster than BLAST and 9.6 times faster than BLAT. Second, CLAST executes global alignment as the default (local alignment is also an option), enabling CLAST to assign reads to taxonomic and functional groups based on evolutionarily distant nucleotide sequences with high accuracy. Third, CLAST does not need a preprocessed sequence database like Burrows-Wheeler Transform-based tools, and this enables CLAST to incorporate large, frequently updated sequence databases. Fourth, CLAST requires <2 GB of main memory, making it possible to run CLAST on a standard desktop computer or server node. CLAST achieved very high speed (similar to the Burrows-Wheeler Transform-based Bowtie 2 for long reads) and sensitivity (equal to BLAST, BLAT, and FR-HIT) without the need for extensive database preprocessing or a specialized computing platform. Our results demonstrate that CLAST has the potential to be one of the most powerful and realistic approaches to analyze the massive amount of sequence data from next-generation sequencing technologies.
BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation.
Dudek, Christian-Alexander; Dannheim, Henning; Schomburg, Dietmar
2017-01-01
The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier that the automatic sequence pattern-based BrEPS protocol, based on manually curated sequences, can be used for the prediction of enzymatic functions of genes. The growing sequence databases provide the opportunity for more reliable patterns, but are also a challenge for the implementation of automatic protocols. We reimplemented and optimized the BrEPS pattern generation to be applicable for larger datasets in an acceptable timescale. Primary improvement of the new BrEPS protocol is the enhanced data selection step. Manually curated annotations from Swiss-Prot are used as reliable source for function prediction of enzymes observed on protein level. The pool of sequences is extended by highly similar sequences from TrEMBL and SwissProt. This allows us to restrict the selection of Swiss-Prot entries, without losing the diversity of sequences needed to generate significant patterns. Additionally, a supporting pattern type was introduced by extending the patterns at semi-conserved positions with highly similar amino acids. Extended patterns have an increased complexity, increasing the chance to match more sequences, without losing the essential structural information of the pattern. To enhance the usability of the database, we introduced enzyme function prediction based on consensus EC numbers and IUBMB enzyme nomenclature. BrEPS is part of the Braunschweig Enzyme Database (BRENDA) and is available on a completely redesigned website and as download. The database can be downloaded and used with the BrEPScmd command line tool for large scale sequence analysis. The BrEPS website and downloads for the database creation tool, command line tool and database are freely accessible at http://breps.tu-bs.de.
BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation
Schomburg, Dietmar
2017-01-01
The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier that the automatic sequence pattern-based BrEPS protocol, based on manually curated sequences, can be used for the prediction of enzymatic functions of genes. The growing sequence databases provide the opportunity for more reliable patterns, but are also a challenge for the implementation of automatic protocols. We reimplemented and optimized the BrEPS pattern generation to be applicable for larger datasets in an acceptable timescale. Primary improvement of the new BrEPS protocol is the enhanced data selection step. Manually curated annotations from Swiss-Prot are used as reliable source for function prediction of enzymes observed on protein level. The pool of sequences is extended by highly similar sequences from TrEMBL and SwissProt. This allows us to restrict the selection of Swiss-Prot entries, without losing the diversity of sequences needed to generate significant patterns. Additionally, a supporting pattern type was introduced by extending the patterns at semi-conserved positions with highly similar amino acids. Extended patterns have an increased complexity, increasing the chance to match more sequences, without losing the essential structural information of the pattern. To enhance the usability of the database, we introduced enzyme function prediction based on consensus EC numbers and IUBMB enzyme nomenclature. BrEPS is part of the Braunschweig Enzyme Database (BRENDA) and is available on a completely redesigned website and as download. The database can be downloaded and used with the BrEPScmd command line tool for large scale sequence analysis. The BrEPS website and downloads for the database creation tool, command line tool and database are freely accessible at http://breps.tu-bs.de. PMID:28750104
Data-Mining Techniques in Detecting Factors Linked to Academic Achievement
ERIC Educational Resources Information Center
Martínez Abad, Fernando; Chaparro Caso López, Alicia A.
2017-01-01
In light of the emergence of statistical analysis techniques based on data mining in education sciences, and the potential they offer to detect non-trivial information in large databases, this paper presents a procedure used to detect factors linked to academic achievement in large-scale assessments. The study is based on a non-experimental,…
DEXTER: Disease-Expression Relation Extraction from Text.
Gupta, Samir; Dingerdissen, Hayley; Ross, Karen E; Hu, Yu; Wu, Cathy H; Mazumder, Raja; Vijay-Shanker, K
2018-01-01
Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease. The scientific literature is another rich source of information on gene expression-disease relationships that not only have been captured from large-scale studies but have also been observed in thousands of small-scale studies. Expression information obtained from literature through manual curation can extend expression databases. While many of the existing databases include information from literature, they are limited by the time-consuming nature of manual curation and have difficulty keeping up with the explosion of publications in the biomedical field. In this work, we describe an automated text-mining tool, Disease-Expression Relation Extraction from Text (DEXTER) to extract information from literature on gene and microRNA expression in the context of disease. One of the motivations in developing DEXTER was to extend the BioXpress database, a cancer-focused gene expression database that includes data derived from large-scale experiments and manual curation of publications. The literature-based portion of BioXpress lags behind significantly compared to expression information obtained from large-scale studies and can benefit from our text-mined results. We have conducted two different evaluations to measure the accuracy of our text-mining tool and achieved average F-scores of 88.51 and 81.81% for the two evaluations, respectively. Also, to demonstrate the ability to extract rich expression information in different disease-related scenarios, we used DEXTER to extract information on differential expression information for 2024 genes in lung cancer, 115 glycosyltransferases in 62 cancers and 826 microRNA in 171 cancers. All extractions using DEXTER are integrated in the literature-based portion of BioXpress.Database URL: http://biotm.cis.udel.edu/DEXTER.
Infant feeding practices within a large electronic medical record database.
Bartsch, Emily; Park, Alison L; Young, Jacqueline; Ray, Joel G; Tu, Karen
2018-01-02
The emerging adoption of the electronic medical record (EMR) in primary care enables clinicians and researchers to efficiently examine epidemiological trends in child health, including infant feeding practices. We completed a population-based retrospective cohort study of 8815 singleton infants born at term in Ontario, Canada, April 2002 to March 2013. Newborn records were linked to the Electronic Medical Record Administrative data Linked Database (EMRALD™), which uses patient-level information from participating family practice EMRs across Ontario. We assessed exclusive breastfeeding patterns using an automated electronic search algorithm, with manual review of EMRs when the latter was not possible. We examined the rate of breastfeeding at visits corresponding to 2, 4 and 6 months of age, as well as sociodemographic factors associated with exclusive breastfeeding. Of the 8815 newborns, 1044 (11.8%) lacked breastfeeding information in their EMR. Rates of exclusive breastfeeding were 39.5% at 2 months, 32.4% at 4 months and 25.1% at 6 months. At age 6 months, exclusive breastfeeding rates were highest among mothers aged ≥40 vs. < 20 years (rate ratio [RR] 2.45, 95% confidence interval [CI] 1.62-3.68), urban vs. rural residence (RR 1.35, 95% CI 1.22-1.50), and highest vs. lowest income quintile (RR 1.18, 95% CI 1.02-1.36). Overall, immigrants had similar rates of exclusive breastfeeding as non-immigrants; yet, by age 6 months, among those residing in the lowest income quintile, immigrants were more likely to exclusively breastfeed than their non-immigrant counterparts (RR 1.43, 95% CI 1.12-1.83). We efficiently determined rates and factors associated with exclusive breastfeeding using data from a large EMR database.
Treatment Trends and Outcomes of Small-Cell Carcinoma of the Bladder
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koay, Eugene J.; MD Anderson Cancer Center, Houston, Texas; Teh, Bin S., E-mail: bteh@tmh.org
2012-05-01
Purpose: Treatment for small-cell carcinoma of the bladder is largely guided by case reports, retrospective reviews, and small prospective trials. This study aimed to study outcomes using a large population-based database. Methods: The Surveillance, Epidemiology, and End Results-Medicare database (1991-2005) was used to analyze how different treatment combinations of specific bladder surgeries, chemotherapy, and radiation affected patient outcomes. Trends in the use of these combinations over time were also analyzed. Results: A total of 533 patients were retrieved from the database. A bladder-sparing approach involving transurethral resection of the bladder tumor (TURBT) combined with chemotherapy and radiation yielded no significantmore » difference in overall survival compared with patients undergoing at least a cystectomy (of whom over 90% received radical cystectomy) with chemotherapy (p > 0.05). The analysis of treatment trends indicated that these two general strategies for cure combined to account for fewer than 20% of patients. A majority of patients (54%) received TURBT as their only surgical treatment, and a subset analysis of these patients indicated that chemotherapy played a role in all stages of disease (p < 0.05) whereas radiation improved overall survival in regional-stage disease (p < 0.05). Conclusion: Relatively few patients with small-cell carcinoma of the bladder receive potentially curative therapies. Chemotherapy should be a major component of treatment. Cystectomy and bladder-sparing approaches represent two viable strategies and deserve further investigation to identify the patients who may benefit from organ preservation or not. In addition, the role of radiation in regional-stage disease should be investigated further, because it positively affects survival after TURBT.« less
Xu, X G; He, J; He, Y M; Tao, S D; Ying, Y L; Zhu, F M; Lv, H J; Yan, L X
2011-04-01
The Diego blood group system plays an important role in transfusion medicine. Genotyping of DI1 and DI2 alleles is helpful for the investigation into haemolytic disease of the newborn (HDN) and for the development of rare blood group databases. Here, we set up a polymerase chain reaction sequence-based typing (PCR-SBT) method for genotyping of Diego blood group alleles. Specific primers for exon 19 of the solute carrier family 4, anion exchanger, member1 (SLC4A1) gene were designed, and our PCR-SBT method was established and optimized for Diego genotyping. A total of 1053 samples from the Chinese Han population and the family members of a rare proband with DI1/DI1 genotype were investigated by the PCR-SBT method. An allele-specific primer PCR (PCR-ASP) was used to verify the reliability of the PCR-SBT method. The frequencies of DI1 and DI2 alleles in the Chinese Han population were 0.0247 and 0.9753, respectively. Six new single nucleotide polymorphisms (SNPs) were found in the sequenced regions of the SLC4A1 gene, and four novel SNPs located in the exon 19, in which one SNP could cause an amino acid alteration of Ala858Ser on erythrocyte anion exchanger protein 1. The genotypes for Diego blood group were identical among 41 selected samples with PCR-ASP and PCR-SBT. The PCR-SBT method can be used in Diego genotyping as a substitute of serological technique when the antisera is lacking and was suitable for screening large numbers of donors in rare blood group databases. © 2010 The Author(s). Vox Sanguinis © 2010 International Society of Blood Transfusion.
Active Exploration of Large 3D Model Repositories.
Gao, Lin; Cao, Yan-Pei; Lai, Yu-Kun; Huang, Hao-Zhi; Kobbelt, Leif; Hu, Shi-Min
2015-12-01
With broader availability of large-scale 3D model repositories, the need for efficient and effective exploration becomes more and more urgent. Existing model retrieval techniques do not scale well with the size of the database since often a large number of very similar objects are returned for a query, and the possibilities to refine the search are quite limited. We propose an interactive approach where the user feeds an active learning procedure by labeling either entire models or parts of them as "like" or "dislike" such that the system can automatically update an active set of recommended models. To provide an intuitive user interface, candidate models are presented based on their estimated relevance for the current query. From the methodological point of view, our main contribution is to exploit not only the similarity between a query and the database models but also the similarities among the database models themselves. We achieve this by an offline pre-processing stage, where global and local shape descriptors are computed for each model and a sparse distance metric is derived that can be evaluated efficiently even for very large databases. We demonstrate the effectiveness of our method by interactively exploring a repository containing over 100 K models.
NASA Astrophysics Data System (ADS)
Bulan, Orhan; Bernal, Edgar A.; Loce, Robert P.; Wu, Wencheng
2013-03-01
Video cameras are widely deployed along city streets, interstate highways, traffic lights, stop signs and toll booths by entities that perform traffic monitoring and law enforcement. The videos captured by these cameras are typically compressed and stored in large databases. Performing a rapid search for a specific vehicle within a large database of compressed videos is often required and can be a time-critical life or death situation. In this paper, we propose video compression and decompression algorithms that enable fast and efficient vehicle or, more generally, event searches in large video databases. The proposed algorithm selects reference frames (i.e., I-frames) based on a vehicle having been detected at a specified position within the scene being monitored while compressing a video sequence. A search for a specific vehicle in the compressed video stream is performed across the reference frames only, which does not require decompression of the full video sequence as in traditional search algorithms. Our experimental results on videos captured in a local road show that the proposed algorithm significantly reduces the search space (thus reducing time and computational resources) in vehicle search tasks within compressed video streams, particularly those captured in light traffic volume conditions.
The Russian effort in establishing large atomic and molecular databases
NASA Astrophysics Data System (ADS)
Presnyakov, Leonid P.
1998-07-01
The database activities in Russia have been developed in connection with UV and soft X-ray spectroscopic studies of extraterrestrial and laboratory (magnetically confined and laser-produced) plasmas. Two forms of database production are used: i) a set of computer programs to calculate radiative and collisional data for the general atom or ion, and ii) development of numeric database systems with the data stored in the computer. The first form is preferable for collisional data. At the Lebedev Physical Institute, an appropriate set of the codes has been developed. It includes all electronic processes at collision energies from the threshold up to the relativistic limit. The ion -atom (and -ion) collisional data are calculated with the methods developed recently. The program for the calculations of the level populations and line intensities is used for spectrical diagnostics of transparent plasmas. The second form of database production is widely used at the Institute of Physico-Technical Measurements (VNIIFTRI), and the Troitsk Center: the Institute of Spectroscopy and TRINITI. The main results obtained at the centers above are reviewed. Plans for future developments jointly with international collaborations are discussed.
Jerlström, Tomas; Gårdmark, Truls; Carringer, Malcolm; Holmäng, Sten; Liedberg, Fredrik; Hosseini, Abolfazl; Malmström, Per-Uno; Ljungberg, Börje; Hagberg, Oskar; Jahnson, Staffan
2014-08-01
Cystectomy combined with pelvic lymph-node dissection and urinary diversion entails high morbidity and mortality. Improvements are needed, and a first step is to collect information on the current situation. In 2011, this group took the initiative to start a population-based database in Sweden (population 9.5 million in 2011) with prospective registration of patients and complications until 90 days after cystectomy. This article reports findings from the first year of registration. Participation was voluntary, and data were reported by local urologists or research nurses. Perioperative parameters and early complications classified according to the modified Clavien system were registered, and selected variables of possible importance for complications were analysed by univariate and multivariate logistic regression. During 2011, 285 (65%) of 435 cystectomies performed in Sweden were registered in the database, the majority reported by the seven academic centres. Median blood loss was 1000 ml, operating time 318 min, and length of hospital stay 15 days. Any complications were registered for 103 patients (36%). Clavien grades 1-2 and 3-5 were noted in 19% and 15%, respectively. Thirty-seven patients (13%) were reoperated on at least once. In logistic regression analysis elevated risk of complications was significantly associated with operating time exceeding 318 min in both univariate and multivariate analysis, and with age 76-89 years only in multivariate analysis. It was feasible to start a national population-based registry of radical cystectomies for bladder cancer. The evaluation of the first year shows an increased risk of complications in patients with longer operating time and higher age. The results agree with some previously published series but should be interpreted with caution considering the relatively low coverage, which is expected to be higher in the future.
A web-based platform for virtual screening.
Watson, Paul; Verdonk, Marcel; Hartshorn, Michael J
2003-09-01
A fully integrated, web-based, virtual screening platform has been developed to allow rapid virtual screening of large numbers of compounds. ORACLE is used to store information at all stages of the process. The system includes a large database of historical compounds from high throughput screenings (HTS) chemical suppliers, ATLAS, containing over 3.1 million unique compounds with their associated physiochemical properties (ClogP, MW, etc.). The database can be screened using a web-based interface to produce compound subsets for virtual screening or virtual library (VL) enumeration. In order to carry out the latter task within ORACLE a reaction data cartridge has been developed. Virtual libraries can be enumerated rapidly using the web-based interface to the cartridge. The compound subsets can be seamlessly submitted for virtual screening experiments, and the results can be viewed via another web-based interface allowing ad hoc querying of the virtual screening data stored in ORACLE.
Remote visual analysis of large turbulence databases at multiple scales
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pulido, Jesus; Livescu, Daniel; Kanov, Kalin
The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less
Remote visual analysis of large turbulence databases at multiple scales
Pulido, Jesus; Livescu, Daniel; Kanov, Kalin; ...
2018-06-15
The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less
GLAD: a system for developing and deploying large-scale bioinformatics grid.
Teo, Yong-Meng; Wang, Xianbing; Ng, Yew-Kwong
2005-03-01
Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.
NASA Astrophysics Data System (ADS)
Lee, Sangho; Suh, Jangwon; Park, Hyeong-Dong
2015-03-01
Boring logs are widely used in geological field studies since the data describes various attributes of underground and surface environments. However, it is difficult to manage multiple boring logs in the field as the conventional management and visualization methods are not suitable for integrating and combining large data sets. We developed an iPad application to enable its user to search the boring log rapidly and visualize them using the augmented reality (AR) technique. For the development of the application, a standard borehole database appropriate for a mobile-based borehole database management system was designed. The application consists of three modules: an AR module, a map module, and a database module. The AR module superimposes borehole data on camera imagery as viewed by the user and provides intuitive visualization of borehole locations. The map module shows the locations of corresponding borehole data on a 2D map with additional map layers. The database module provides data management functions for large borehole databases for other modules. Field survey was also carried out using more than 100,000 borehole data.
Angermeier, Paul L.; Frimpong, Emmanuel A.
2009-01-01
The need for integrated and widely accessible sources of species traits data to facilitate studies of ecology, conservation, and management has motivated development of traits databases for various taxa. In spite of the increasing number of traits-based analyses of freshwater fishes in the United States, no consolidated database of traits of this group exists publicly, and much useful information on these species is documented only in obscure sources. The largely inaccessible and unconsolidated traits information makes large-scale analysis involving many fishes and/or traits particularly challenging. FishTraits is a database of >100 traits for 809 (731 native and 78 exotic) fish species found in freshwaters of the conterminous United States, including 37 native families and 145 native genera. The database contains information on four major categories of traits: (1) trophic ecology, (2) body size and reproductive ecology (life history), (3) habitat associations, and (4) salinity and temperature tolerances. Information on geographic distribution and conservation status is also included. Together, we refer to the traits, distribution, and conservation status information as attributes. Descriptions of attributes are available here. Many sources were consulted to compile attributes, including state and regional species accounts and other databases.
Active browsing using similarity pyramids
NASA Astrophysics Data System (ADS)
Chen, Jau-Yuen; Bouman, Charles A.; Dalton, John C.
1998-12-01
In this paper, we describe a new approach to managing large image databases, which we call active browsing. Active browsing integrates relevance feedback into the browsing environment, so that users can modify the database's organization to suit the desired task. Our method is based on a similarity pyramid data structure, which hierarchically organizes the database, so that it can be efficiently browsed. At coarse levels, the similarity pyramid allows users to view the database as large clusters of similar images. Alternatively, users can 'zoom into' finer levels to view individual images. We discuss relevance feedback for the browsing process, and argue that it is fundamentally different from relevance feedback for more traditional search-by-query tasks. We propose two fundamental operations for active browsing: pruning and reorganization. Both of these operations depend on a user-defined relevance set, which represents the image or set of images desired by the user. We present statistical methods for accurately pruning the database, and we propose a new 'worm hole' distance metric for reorganizing the database, so that members of the relevance set are grouped together.
The CDC Hemophilia A Mutation Project (CHAMP) Mutation List: a New Online Resource
Payne, Amanda B.; Miller, Connie H.; Kelly, Fiona M.; Soucie, J. Michael; Hooper, W. Craig
2015-01-01
Genotyping efforts in hemophilia A (HA) populations in many countries have identified large numbers of unique mutations in the Factor VIII gene (F8). To assist HA researchers conducting genotyping analyses, we have developed a listing of F8 mutations including those listed in existing locus-specific databases as well as those identified in patient populations and reported in the literature. Each mutation was reviewed and uniquely identified using Human Genome Variation Society (HGVS) nomenclature standards for coding DNA and predicted protein changes as well as traditional nomenclature based on the mature, processed protein. Listings also include the associated hemophilia severity classified by International Society of Thrombosis and Haemostasis (ISTH) criteria, associations of the mutations with inhibitors, and reference information. The mutation list currently contains 2,537 unique mutations known to cause HA. HA severity caused by the mutation is available for 2,022 mutations (80%) and information on inhibitors is available for 1,816 mutations (72%). The CDC Hemophilia A Mutation Project (CHAMP) Mutation List is available at http://www.cdc.gov/hemophiliamutations for download and search and will be updated quarterly based on periodic literature reviews and submitted reports. PMID:23280990
Cutaneous Melanoma In Situ: Translational Evidence from a Large Population-Based Study
Nitti, Donato
2011-01-01
Background. Cutaneous melanoma in situ (CMIS) is a nosologic entity surrounded by health concerns and unsolved debates. We aimed to shed some light on CMIS by means of a large population-based study. Methods. Patients with histologic diagnosis of CMIS were identified from the Surveillance Epidemiology End Results (SEER) database. Results. The records of 93,863 cases of CMIS were available for analysis. CMIS incidence has been steadily increasing over the past 3 decades at a rate higher than any other in situ or invasive tumor, including invasive skin melanoma (annual percentage change [APC]: 9.5% versus 3.6%, respectively). Despite its noninvasive nature, CMIS is treated with excision margins wider than 1 cm in more than one third of cases. CMIS is associated with an increased risk of invasive melanoma (standardized incidence ratio [SIR]: 8.08; 95% confidence interval [CI]: 7.66–8.57), with an estimated 3:5 invasive/in situ ratio; surprisingly, it is also associated with a reduced risk of gastrointestinal (SIR: 0.78, CI: 0.72–0.84) and lung (SIR: 0.65, CI: 0.59–0.71) cancers. Relative survival analysis shows that persons with CMIS have a life expectancy equal to that of the general population. Conclusions. CMIS is increasingly diagnosed and is often overtreated, although it does not affect the life expectancy of its carriers. Patients with CMIS have an increased risk of developing invasive melanoma (which warrants their enrollment in screening programs) but also a reduced risk of some epithelial cancers, which raises the intriguing hypothesis that genetic/environmental risk factors for some tumors may oppose the pathogenesis of others. PMID:21632457
BioCarian: search engine for exploratory searches in heterogeneous biological databases.
Zaki, Nazar; Tennakoon, Chandana
2017-10-02
There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search on previously published viral integration data and were able to deduce the main conclusions of the original publication. BioCarian is accessible via http://www.biocarian.com . We have developed a search engine to explore RDF databases that can be used by both novice and advanced users.
A Systematic Review of Rural, Theory-based Physical Activity Interventions.
Walsh, Shana M; Meyer, M Renée Umstattd; Gamble, Abigail; Patterson, Megan S; Moore, Justin B
2017-05-01
This systematic review synthesized the scientific literature on theory-based physical activity (PA) interventions in rural populations. PubMed, PsycINFO, and Web of Science databases were searched to identify studies with a rural study sample, PA as a primary outcome, use of a behavioral theory or model, randomized or quasi-experimental research design, and application at the primary and/or secondary level of prevention. Thirty-one studies met our inclusion criteria. The Social Cognitive Theory (N = 14) and Transtheoretical Model (N = 10) were the most frequently identified theories; however, most intervention studies were informed by theory but lacked higher-level theoretical application and testing. Interventions largely took place in schools (N = 10) and with female-only samples (N = 8). Findings demonstrated that theory-based PA interventions are mostly successful at increasing PA in rural populations but require improvement. Future studies should incorporate higher levels of theoretical application, and should explore adapting or developing rural-specific theories. Study designs should employ more rigorous research methods to decrease bias and increase validity of findings. Follow-up assessments to determine behavioral maintenance and/or intervention sustainability are warranted. Finally, funding agencies and journals are encouraged to adopt rural-urban commuting area codes as the standard for defining rural.
MASSCLEANage-STELLAR CLUSTER AGES FROM INTEGRATED COLORS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Popescu, Bogdan; Hanson, M. M., E-mail: popescb@mail.uc.ed, E-mail: margaret.hanson@uc.ed
2010-11-20
We present the recently updated and expanded MASSCLEANcolors, a database of 70 million Monte Carlo models selected to match the properties (metallicity, ages, and masses) of stellar clusters found in the Large Magellanic Cloud (LMC). This database shows the rather extreme and non-Gaussian distribution of integrated colors and magnitudes expected with different cluster age and mass and the enormous age degeneracy of integrated colors when mass is unknown. This degeneracy could lead to catastrophic failures in estimating age with standard simple stellar population models, particularly if most of the clusters are of intermediate or low mass, like in the LMC.more » Utilizing the MASSCLEANcolors database, we have developed MASSCLEANage, a statistical inference package which assigns the most likely age and mass (solved simultaneously) to a cluster based only on its integrated broadband photometric properties. Finally, we use MASSCLEANage to derive the age and mass of LMC clusters based on integrated photometry alone. First, we compare our cluster ages against those obtained for the same seven clusters using more accurate integrated spectroscopy. We find improved agreement with the integrated spectroscopy ages over the original photometric ages. A close examination of our results demonstrates the necessity of solving simultaneously for mass and age to reduce degeneracies in the cluster ages derived via integrated colors. We then selected an additional subset of 30 photometric clusters with previously well-constrained ages and independently derive their age using the MASSCLEANage with the same photometry with very good agreement. The MASSCLEANage program is freely available under GNU General Public License.« less
Going public: accessing urban data and producing population estimates using the urban FIA database
Chris Edgar; Mark Hatfield
2015-01-01
In this presentation we describe the urban forest inventory database (U-FIADB) and demonstrate how to use the database to produce population estimates. Examples from the recently completed City of Austin inventory will be used to demonstrate the capabilities of the database. We will identify several features of U-FIADB that are different from the FIA database (FIADB)...
Nosql for Storage and Retrieval of Large LIDAR Data Collections
NASA Astrophysics Data System (ADS)
Boehm, J.; Liu, K.
2015-08-01
Developments in LiDAR technology over the past decades have made LiDAR to become a mature and widely accepted source of geospatial information. This in turn has led to an enormous growth in data volume. The central idea for a file-centric storage of LiDAR point clouds is the observation that large collections of LiDAR data are typically delivered as large collections of files, rather than single files of terabyte size. This split of the dataset, commonly referred to as tiling, was usually done to accommodate a specific processing pipeline. It makes therefore sense to preserve this split. A document oriented NoSQL database can easily emulate this data partitioning, by representing each tile (file) in a separate document. The document stores the metadata of the tile. The actual files are stored in a distributed file system emulated by the NoSQL database. We demonstrate the use of MongoDB a highly scalable document oriented NoSQL database for storing large LiDAR files. MongoDB like any NoSQL database allows for queries on the attributes of the document. As a specialty MongoDB also allows spatial queries. Hence we can perform spatial queries on the bounding boxes of the LiDAR tiles. Inserting and retrieving files on a cloud-based database is compared to native file system and cloud storage transfer speed.
Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints.
Awale, Mahendra; Jin, Xian; Reymond, Jean-Louis
2015-01-01
Tools to explore large compound databases in search for analogs of query molecules provide a strategically important support in drug discovery to help identify available analogs of any given reference or hit compound by ligand based virtual screening (LBVS). We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry. Here we investigated related 3D-atom pair fingerprints to enable rapid stereoselective searches in the ZINC database (23.2 million 3D structures). Molecular fingerprints counting atom pairs at increasing through-space distance intervals were designed using either all atoms (16-bit 3DAPfp) or different atom categories (80-bit 3DXfp). These 3D-fingerprints retrieved molecular shape and pharmacophore analogs (defined by OpenEye ROCS scoring functions) of 110,000 compounds from the Cambridge Structural Database with equal or better accuracy than the 2D-fingerprints APfp and Xfp, and showed comparable performance in recovering actives from decoys in the DUD database. LBVS by 3DXfp or 3DAPfp similarity was stereoselective and gave very different analogs when starting from different diastereomers of the same chiral drug. Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances. 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases. Web-browsers for searching ZINC by 3DAPfp and 3DXfp similarity are accessible at www.gdb.unibe.ch and should provide useful assistance to drug discovery projects. Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.
Fault-tolerant symmetrically-private information retrieval
NASA Astrophysics Data System (ADS)
Wang, Tian-Yin; Cai, Xiao-Qiu; Zhang, Rui-Ling
2016-08-01
We propose two symmetrically-private information retrieval protocols based on quantum key distribution, which provide a good degree of database and user privacy while being flexible, loss-resistant and easily generalized to a large database similar to the precedent works. Furthermore, one protocol is robust to a collective-dephasing noise, and the other is robust to a collective-rotation noise.
Mining the Galaxy Zoo Database: Machine Learning Applications
NASA Astrophysics Data System (ADS)
Borne, Kirk D.; Wallin, J.; Vedachalam, A.; Baehr, S.; Lintott, C.; Darg, D.; Smith, A.; Fortson, L.
2010-01-01
The new Zooniverse initiative is addressing the data flood in the sciences through a transformative partnership between professional scientists, volunteer citizen scientists, and machines. As part of this project, we are exploring the application of machine learning techniques to data mining problems associated with the large and growing database of volunteer science results gathered by the Galaxy Zoo citizen science project. We will describe the basic challenge, some machine learning approaches, and early results. One of the motivators for this study is the acquisition (through the Galaxy Zoo results database) of approximately 100 million classification labels for roughly one million galaxies, yielding a tremendously large and rich set of training examples for improving automated galaxy morphological classification algorithms. In our first case study, the goal is to learn which morphological and photometric features in the Sloan Digital Sky Survey (SDSS) database correlate most strongly with user-selected galaxy morphological class. As a corollary to this study, we are also aiming to identify which galaxy parameters in the SDSS database correspond to galaxies that have been the most difficult to classify (based upon large dispersion in their volunter-provided classifications). Our second case study will focus on similar data mining analyses and machine leaning algorithms applied to the Galaxy Zoo catalog of merging and interacting galaxies. The outcomes of this project will have applications in future large sky surveys, such as the LSST (Large Synoptic Survey Telescope) project, which will generate a catalog of 20 billion galaxies and will produce an additional astronomical alert database of approximately 100 thousand events each night for 10 years -- the capabilities and algorithms that we are exploring will assist in the rapid characterization and classification of such massive data streams. This research has been supported in part through NSF award #0941610.
Chen, Yu-Long; Hsu, Chin-Wang; Cheng, Cheng-Chung; Yiang, Giou-Teng; Lin, Chin-Sheng; Lin, Cheng-Li; Sung, Fung-Chang; Liang, Ji-An
2017-06-01
To investigate the relationship between chronic pancreatitis (CP) and inflammatory bowel disease (IBD) in a large population-based cohort study. Data was obtained from the Taiwan National Health Insurance Research Database. The cohort study comprised 17,796 patients newly diagnosed with CP between 2000 and 2010 and 71,164 matched controls. A Cox proportional hazards model was used for evaluating the risk of IBD in the CP and comparison cohorts. When examined with a mean follow-up period of 4.87 and 6.04 years for the CP and comparison cohorts, respectively, the overall incidence of IBD was 10.3 times higher in the CP cohort than in the comparison cohort (5.75 vs. 0.56 per 10,000 person-years). Compared with the comparison cohort, the CP cohort exhibited a higher risk of IBD, irrespective of age, sex, and presence or absence of comorbidities. Moreover, the CP cohort was associated with a significantly higher risk of Crohn's disease (adjusted hazard ratio [aHR] = 12.9, 95% confidence interval [CI] = 5.15-32.5) and ulcerative colitis (aHR = 2.80, 95% CI = 1.00-7.86). This nationwide population-based cohort study revealed a significantly higher risk of IBD in patients with CP compared with control group. Clinicians should notice this association to avoid delayed diagnosis of IBD in patients with CP.
Lam, Raymond; Kruger, Estie; Tennant, Marc
2014-12-01
One disadvantage of the remarkable achievements in dentistry is that treatment options have never been more varied or confusing. This has made the concept of Evidenced Based Dentistry more applicable to modern dental practice. Despite merit in the concept whereby clinical decisions are guided by scientific evidence, there are problems with establishing a scientific base. This is no more challenging than in modern dentistry where the gap between rapidly developing products/procedures and its evidence base are widening. Furthermore, the burden of oral disease continues to remain high at the population level. These problems have prompted new approaches to enhancing research. The aim of this paper is to outline how a modified approach to dental coding may benefit clinical and population level research. Using publically assessable data obtained from the Australian Chronic Disease Dental Scheme and item codes contained within the Australian Schedule of Dental Services and Glossary, a suggested approach to dental informatics is illustrated. A selection of item codes have been selected and expanded with the addition of suffixes. These suffixes provided circumstantial information that will assist in assessing clinical outcomes such as success rates and prognosis. The use of item codes in administering the CDDS yielded a large database of item codes. These codes are amenable to dental informatics which has been shown to enhance research at both the clinical and population level. This is a cost effective method to supplement existing research methods. Copyright © 2014 Elsevier Inc. All rights reserved.
An ab initio electronic transport database for inorganic materials
Ricci, Francesco; Chen, Wei; Aydemir, Umut; ...
2017-07-04
Electronic transport in materials is governed by a series of tensorial properties such as conductivity, Seebeck coefficient, and effective mass. These quantities are paramount to the understanding of materials in many fields from thermoelectrics to electronics and photovoltaics. Transport properties can be calculated from a material’s band structure using the Boltzmann transport theory framework. We present here the largest computational database of electronic transport properties based on a large set of 48,000 materials originating from the Materials Project database. Our results were obtained through the interpolation approach developed in the BoltzTraP software, assuming a constant relaxation time. We present themore » workflow to generate the data, the data validation procedure, and the database structure. In conclusion, our aim is to target the large community of scientists developing materials selection strategies and performing studies involving transport properties.« less
Architectural Implications for Spatial Object Association Algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, V S; Kurc, T; Saltz, J
2009-01-29
Spatial object association, also referred to as cross-match of spatial datasets, is the problem of identifying and comparing objects in two or more datasets based on their positions in a common spatial coordinate system. In this work, we evaluate two crossmatch algorithms that are used for astronomical sky surveys, on the following database system architecture configurations: (1) Netezza Performance Server R, a parallel database system with active disk style processing capabilities, (2) MySQL Cluster, a high-throughput network database system, and (3) a hybrid configuration consisting of a collection of independent database system instances with data replication support. Our evaluation providesmore » insights about how architectural characteristics of these systems affect the performance of the spatial crossmatch algorithms. We conducted our study using real use-case scenarios borrowed from a large-scale astronomy application known as the Large Synoptic Survey Telescope (LSST).« less
An ab initio electronic transport database for inorganic materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ricci, Francesco; Chen, Wei; Aydemir, Umut
Electronic transport in materials is governed by a series of tensorial properties such as conductivity, Seebeck coefficient, and effective mass. These quantities are paramount to the understanding of materials in many fields from thermoelectrics to electronics and photovoltaics. Transport properties can be calculated from a material’s band structure using the Boltzmann transport theory framework. We present here the largest computational database of electronic transport properties based on a large set of 48,000 materials originating from the Materials Project database. Our results were obtained through the interpolation approach developed in the BoltzTraP software, assuming a constant relaxation time. We present themore » workflow to generate the data, the data validation procedure, and the database structure. In conclusion, our aim is to target the large community of scientists developing materials selection strategies and performing studies involving transport properties.« less
Töpf, A L; Gilbert, M T P; Dumbacher, J P; Hoelzel, A R
2006-01-01
Some of the transitional periods of Britain during the first millennium A.D. are traditionally associated with the movement of people from continental Europe, composed largely of invading armies (e.g., the Roman, Saxon, and Viking invasions). However, the extent to which these were migrations (as opposed to cultural exchange) remains controversial. We investigated the history of migration by women by amplifying mitochondrial DNA (mtDNA) from ancient Britons who lived between approximately A.D. 300-1,000 and compared these with 3,549 modern mtDNA database genotypes from England, Europe, and the Middle East. The objective was to assess the dynamics of the historical population composition by comparing genotypes in a temporal context. Towards this objective we test and calibrate the use of rho statistics to identify relationships between founder and source populations. We find evidence for shared ancestry between the earliest sites (predating Viking invasions) with modern populations across the north of Europe from Norway to Estonia, possibly reflecting common ancestors dating back to the last glacial epoch. This is in contrast with a late Saxon site in Norwich, where the genetic signature is consistent with more recent immigrations from the south, possibly as part of the Saxon invasions.
Perchlorate is a widespread environmental pollutant, and is a thyroid hormone disruptor. A previous population study based on the National Health and Nutrition Examination Survey (NHANES) 2001-2002 database showed that urinary perchlorate concentrations were associated with signi...
Jenkins, Rachel; Kydd, Robert; Mullen, Paul; Thomson, Kenneth; Sculley, James; Kuper, Susan; Carroll, Joanna; Gureje, Oye; Hatcher, Simon; Brownie, Sharon; Carroll, Christopher; Hollins, Sheila; Wong, Mai Luen
2010-01-01
Background Migration of health professionals from low and middle income countries to rich countries is a large scale and long-standing phenomenon, which is detrimental to the health systems in the donor countries. We sought to explore the extent of psychiatric migration. Methods In our study, we use the respective professional databases in each country to establish the numbers of psychiatrists currently registered in the UK, US, New Zealand, and Australia who originate from other countries. We also estimate the impact of this migration on the psychiatrist population ratios in the donor countries. Findings We document large numbers of psychiatrists currently registered in the UK, US, New Zealand and Australia originating from India (4687 psychiatrists), Pakistan (1158), Bangladesh (149) , Nigeria (384) , Egypt (484), Sri Lanka (142), Philippines (1593). For some countries of origin, the numbers of psychiatrists currently registered within high-income countries' professional databases are very small (e.g., 5 psychiatrists of Tanzanian origin registered in the 4 high-income countries we studied), but this number is very significant compared to the 15 psychiatrists currently registered in Tanzania). Without such emigration, many countries would have more than double the number of psychiatrists per 100, 000 population (e.g. Bangladesh, Myanmar, Afghanistan, Egypt, Syria, Lebanon); and some countries would have had five to eight times more psychiatrists per 100,000 (e.g. Philippines, Pakistan, Sri Lanka, Liberia, Nigeria and Zambia). Conclusions Large numbers of psychiatrists originating from key low and middle income countries are currently registered in the UK, US, New Zealand and Australia, with concomitant impact on the psychiatrist/population ratio n the originating countries. We suggest that creative international policy approaches are needed to ensure the individual migration rights of health professionals do not compromise societal population rights to health, and that there are public and fair agreements between countries within an internationally agreed framework. PMID:20140216
Berent, Jarosław
2007-01-01
This paper presents the new DNAStat version 1.2 for processing genetic profile databases and biostatistical calculations. This new version contains, besides all the options of its predecessor 1.0, a calculation-results file export option in .xls format for Microsoft Office Excel, as well as the option of importing/exporting the population base of systems as .txt files for processing in Microsoft Notepad or EditPad
Hein, Misty J.; Waters, Martha A.; Ruder, Avima M.; Stenzel, Mark R.; Blair, Aaron; Stewart, Patricia A.
2010-01-01
Objectives: Occupational exposure assessment for population-based case–control studies is challenging due to the wide variety of industries and occupations encountered by study participants. We developed and evaluated statistical models to estimate the intensity of exposure to three chlorinated solvents—methylene chloride, 1,1,1-trichloroethane, and trichloroethylene—using a database of air measurement data and associated exposure determinants. Methods: A measurement database was developed after an extensive review of the published industrial hygiene literature. The database of nearly 3000 measurements or summary measurements included sample size, measurement characteristics (year, duration, and type), and several potential exposure determinants associated with the measurements: mechanism of release (e.g. evaporation), process condition, temperature, usage rate, type of ventilation, location, presence of a confined space, and proximity to the source. The natural log-transformed measurement levels in the exposure database were modeled as a function of the measurement characteristics and exposure determinants using maximum likelihood methods. Assuming a single lognormal distribution of the measurements, an arithmetic mean exposure intensity level was estimated for each unique combination of exposure determinants and decade. Results: The proportions of variability in the measurement data explained by the modeled measurement characteristics and exposure determinants were 36, 38, and 54% for methylene chloride, 1,1,1-trichloroethane, and trichloroethylene, respectively. Model parameter estimates for the exposure determinants were in the anticipated direction. Exposure intensity estimates were plausible and exhibited internal consistency, but the ability to evaluate validity was limited. Conclusions: These prediction models can be used to estimate chlorinated solvent exposure intensity for jobs reported by population-based case–control study participants that have sufficiently detailed information regarding the exposure determinants. PMID:20418277
NASA Astrophysics Data System (ADS)
Nyitrai, Daniel; Martinho, Filipe; Dolbeth, Marina; Rito, João; Pardal, Miguel A.
2013-12-01
Large-scale and local climate patterns are known to influence several aspects of the life cycle of marine fish. In this paper, we used a 9-year database (2003-2011) to analyse the populations of two estuarine resident fishes, Pomatoschistus microps and Pomatoschistus minutus, in order to determine their relationships with varying environmental stressors operating over local and large scales. This study was performed in the Mondego estuary, Portugal. Firstly, the variations in abundance, growth, population structure and secondary production were evaluated. These species appeared in high densities in the beginning of the study period, with subsequent occasional high annual density peaks, while their secondary production was lower in dry years. The relationships between yearly fish abundance and the environmental variables were evaluated separately for both species using Spearman correlation analysis, considering the yearly abundance peaks for the whole population, juveniles and adults. Among the local climate patterns, precipitation, river runoff, salinity and temperature were used in the analyses, and North Atlantic Oscillation (NAO) index and sea surface temperature (SST) were tested as large-scale factors. For P. microps, precipitation and NAO were the significant factors explaining abundance of the whole population, the adults and the juveniles as well. Regarding P. minutus, for the whole population, juveniles and adults river runoff was the significant predictor. The results for both species suggest a differential influence of climate patterns on the various life cycle stages, confirming also the importance of estuarine resident fishes as indicators of changes in local and large-scale climate patterns, related to global climate change.
Durham, Erin-Elizabeth A; Yu, Xiaxia; Harrison, Robert W
2014-12-01
Effective machine-learning handles large datasets efficiently. One key feature of handling large data is the use of databases such as MySQL. The freeware fuzzy decision tree induction tool, FDT, is a scalable supervised-classification software tool implementing fuzzy decision trees. It is based on an optimized fuzzy ID3 (FID3) algorithm. FDT 2.0 improves upon FDT 1.0 by bridging the gap between data science and data engineering: it combines a robust decisioning tool with data retention for future decisions, so that the tool does not need to be recalibrated from scratch every time a new decision is required. In this paper we briefly review the analytical capabilities of the freeware FDT tool and its major features and functionalities; examples of large biological datasets from HIV, microRNAs and sRNAs are included. This work shows how to integrate fuzzy decision algorithms with modern database technology. In addition, we show that integrating the fuzzy decision tree induction tool with database storage allows for optimal user satisfaction in today's Data Analytics world.
Digital geomorphological landslide hazard mapping of the Alpago area, Italy
NASA Astrophysics Data System (ADS)
van Westen, Cees J.; Soeters, Rob; Sijmons, Koert
Large-scale geomorphological maps of mountainous areas are traditionally made using complex symbol-based legends. They can serve as excellent "geomorphological databases", from which an experienced geomorphologist can extract a large amount of information for hazard mapping. However, these maps are not designed to be used in combination with a GIS, due to their complex cartographic structure. In this paper, two methods are presented for digital geomorphological mapping at large scales using GIS and digital cartographic software. The methods are applied to an area with a complex geomorphological setting on the Borsoia catchment, located in the Alpago region, near Belluno in the Italian Alps. The GIS database set-up is presented with an overview of the data layers that have been generated and how they are interrelated. The GIS database was also converted into a paper map, using a digital cartographic package. The resulting largescale geomorphological hazard map is attached. The resulting GIS database and cartographic product can be used to analyse the hazard type and hazard degree for each polygon, and to find the reasons for the hazard classification.
Sinonasal extramedullary plasmacytoma: a population-based incidence and survival analysis.
Patel, Tapan D; Vázquez, Alejandro; Choudhary, Moaz M; Kam, David; Baredes, Soly; Eloy, Jean Anderson
2015-09-01
Sinonasal extramedullary plasmacytoma (SN-EMP) is a rare plasma cell neoplasm. Published literature on this tumor largely consists of case reports and case-series with small sample sizes. This study analyzed population-based data on SN-EMP patients to understand demographic and clinical features as well as incidence and survival trends. The Surveillance, Epidemiology, and End Results (SEER) database was queried for SN-EMP and other head and neck EMP (HN-EMP) cases from 1973 to 2011. Cases were analyzed to determine patient demographics, initial treatment modality, and survival outcomes. Of 778 patients identified with EMP in the head and neck region, 367 patients had SN-EMP and 411 had other HN-EMP. There was a strong male predilection found, with a male-to-female ratio of 3.65:1 in the SN-EMP group and 1.87:1 in the other HN-EMP group. The majority of the patients presented with localized disease in both SN-EMP (84.4%) and other HN-EMP (81.0%) groups. The most common treatment modality reported in this database was surgery with adjuvant radiotherapy in both SN-EMP (46.3%) and other HN-EMP (38.9%) groups, followed by radiotherapy alone (SN-EMP: 40.7%; other HN-EMP: 34.2%). Five-year and 10-year disease-specific survival rates were comparable between SN-EMP (88.2% and 83.3%, respectively) and other HN-EMP (90.0% and 87.4%, respectively) (p = 0.6016 and p = 0.4015, respectively). This study analyzed the largest cohort of SN-EMP patients to date. There was no statistically significant survival advantage found for any 1 particular treatment modality over other treatment modalities in both SN-EMP and other HN-EMP. © 2015 ARS-AAOA, LLC.
Laryngeal adenoid cystic carcinoma: A population-based perspective.
Dubal, Pariket M; Svider, Peter F; Folbe, Adam J; Lin, Ho-Sheng; Park, Richard C; Baredes, Soly; Eloy, Jean Anderson
2015-11-01
Adenoid cystic carcinoma (ACC) occurs infrequently in the larynx. Consequently, no large samples describing its clinical behavior are available in the literature. Our objective was to use a nationally representative population-based resource to evaluate clinical behavior, patient demographics, and outcomes among patients diagnosed with laryngeal ACC (LACC). Retrospective database analysis. The National Cancer Institute's Surveillance, Epidemiology, and End Results database was analyzed for patients diagnosed with LACC between 1973 and 2011. Patient demographics, incidence, treatment, and survival between LACC and other laryngeal malignancies were compared. Of 69 LACC patients, 63.8% were female, 78.2% Caucasian, and the median age was 54 years. LACC patients were much more likely to have subglottic lesions (44.9%) than individuals with other malignancies (1.6%). The incidence of LACC was 0.005/100,000 individuals. The majority of patients with LACC harbored T4 lesions at initial diagnosis, although 87.9% had N0 disease, and only 6.1% had distant metastasis at diagnosis. Disease-specific survival (DSS) was greater at 1 year for LACC compared to other laryngeal malignancies, but not at 5 or 10 years. Five-year DSS was greater for LACC patients who underwent surgery versus those who did not undergo surgery. This analysis notes that LACC has a low incidence with no significant change in incidence over the study period. Compared to other laryngeal malignancies, LACC has a female preponderance, is much more common in the subglottis, presents at a younger age, and more often presents with T4 disease. Surgery was noted to confer a survival advantage in LACC. 4. © 2015 The American Laryngological, Rhinological and Otological Society, Inc.
Verifying the geographic origin of mahogany (Swietenia macrophylla King) with DNA-fingerprints.
Degen, B; Ward, S E; Lemes, M R; Navarro, C; Cavers, S; Sebbenn, A M
2013-01-01
Illegal logging is one of the main causes of ongoing worldwide deforestation and needs to be eradicated. The trade in illegal timber and wood products creates market disadvantages for products from sustainable forestry. Although various measures have been established to counter illegal logging and the subsequent trade, there is a lack of practical mechanisms for identifying the origin of timber and wood products. In this study, six nuclear microsatellites were used to generate DNA fingerprints for a genetic reference database characterising the populations of origin of a large set of mahogany (Swietenia macrophylla King, Meliaceae) samples. For the database, leaves and/or cambium from 1971 mahogany trees sampled in 31 stands from Mexico to Bolivia were genotyped. A total of 145 different alleles were found, showing strong genetic differentiation (δ(Gregorious)=0.52, F(ST)=0.18, G(ST(Hedrick))=0.65) and clear correlation between genetic and spatial distances among stands (r=0.82, P<0.05). We used the genetic reference database and Bayesian assignment testing to determine the geographic origins of two sets of mahogany wood samples, based on their multilocus genotypes. In both cases the wood samples were assigned to the correct country of origin. We discuss the overall applicability of this methodology to tropical timber trading. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
An object model and database for functional genomics.
Jones, Andrew; Hunt, Ela; Wastling, Jonathan M; Pizarro, Angel; Stoeckert, Christian J
2004-07-10
Large-scale functional genomics analysis is now feasible and presents significant challenges in data analysis, storage and querying. Data standards are required to enable the development of public data repositories and to improve data sharing. There is an established data format for microarrays (microarray gene expression markup language, MAGE-ML) and a draft standard for proteomics (PEDRo). We believe that all types of functional genomics experiments should be annotated in a consistent manner, and we hope to open up new ways of comparing multiple datasets used in functional genomics. We have created a functional genomics experiment object model (FGE-OM), developed from the microarray model, MAGE-OM and two models for proteomics, PEDRo and our own model (Gla-PSI-Glasgow Proposal for the Proteomics Standards Initiative). FGE-OM comprises three namespaces representing (i) the parts of the model common to all functional genomics experiments; (ii) microarray-specific components; and (iii) proteomics-specific components. We believe that FGE-OM should initiate discussion about the contents and structure of the next version of MAGE and the future of proteomics standards. A prototype database called RNA And Protein Abundance Database (RAPAD), based on FGE-OM, has been implemented and populated with data from microbial pathogenesis. FGE-OM and the RAPAD schema are available from http://www.gusdb.org/fge.html, along with a set of more detailed diagrams. RAPAD can be accessed by registration at the site.
ADA perceived disability claims: a decision-tree analysis.
Draper, William R; Hawley, Carolyn E; McMahon, Brian T; Reid, Christine A; Barbir, Lara A
2014-06-01
The purpose of this study is to examine the possible interactions of predictor variables pertaining to perceived disability claims contained in a large governmental database. Specifically, it is a retrospective analysis of US Equal Employment Opportunity Commission (EEOC) data for the entire population of workplace discrimination claims based on the "regarded as disabled" prong of the Americans with Disabilities Act (ADA) definition of disability. The study utilized records extracted from a "master database" of over two million charges of workplace discrimination in the Integrated Mission System of the EEOC. This database includes all ADA-related discrimination allegations filed from July 26, 1992 through December 31, 2008. Chi squared automatic interaction detection (CHAID) was employed to analyze interaction effects of relevant variables, such as issue (grievance) and industry type. The research question addressed by CHAID is: What combination of factors are associated with merit outcomes for people making ADA EEOC allegations who are "regarded as" having disabilities? The CHAID analysis shows how merit outcome is predicted by the interaction of relevant variables. Issue was found to be the most prominent variable in determining merit outcome, followed by industry type, but the picture is made more complex by qualifications regarding age and race data. Although discharge was the most frequent grievance among charging parties in the perceived disability group, its merit outcome was significantly less than that for the leading factor of hiring.
Comparison of photo-matching algorithms commonly used for photographic capture-recapture studies.
Matthé, Maximilian; Sannolo, Marco; Winiarski, Kristopher; Spitzen-van der Sluijs, Annemarieke; Goedbloed, Daniel; Steinfartz, Sebastian; Stachow, Ulrich
2017-08-01
Photographic capture-recapture is a valuable tool for obtaining demographic information on wildlife populations due to its noninvasive nature and cost-effectiveness. Recently, several computer-aided photo-matching algorithms have been developed to more efficiently match images of unique individuals in databases with thousands of images. However, the identification accuracy of these algorithms can severely bias estimates of vital rates and population size. Therefore, it is important to understand the performance and limitations of state-of-the-art photo-matching algorithms prior to implementation in capture-recapture studies involving possibly thousands of images. Here, we compared the performance of four photo-matching algorithms; Wild-ID, I3S Pattern+, APHIS, and AmphIdent using multiple amphibian databases of varying image quality. We measured the performance of each algorithm and evaluated the performance in relation to database size and the number of matching images in the database. We found that algorithm performance differed greatly by algorithm and image database, with recognition rates ranging from 100% to 22.6% when limiting the review to the 10 highest ranking images. We found that recognition rate degraded marginally with increased database size and could be improved considerably with a higher number of matching images in the database. In our study, the pixel-based algorithm of AmphIdent exhibited superior recognition rates compared to the other approaches. We recommend carefully evaluating algorithm performance prior to using it to match a complete database. By choosing a suitable matching algorithm, databases of sizes that are unfeasible to match "by eye" can be easily translated to accurate individual capture histories necessary for robust demographic estimates.
COMADRE: a global data base of animal demography.
Salguero-Gómez, Roberto; Jones, Owen R; Archer, C Ruth; Bein, Christoph; de Buhr, Hendrik; Farack, Claudia; Gottschalk, Fränce; Hartmann, Alexander; Henning, Anne; Hoppe, Gabriel; Römer, Gesa; Ruoff, Tara; Sommer, Veronika; Wille, Julia; Voigt, Jakob; Zeh, Stefan; Vieregg, Dirk; Buckley, Yvonne M; Che-Castaldo, Judy; Hodgson, David; Scheuerlein, Alexander; Caswell, Hal; Vaupel, James W
2016-03-01
The open-data scientific philosophy is being widely adopted and proving to promote considerable progress in ecology and evolution. Open-data global data bases now exist on animal migration, species distribution, conservation status, etc. However, a gap exists for data on population dynamics spanning the rich diversity of the animal kingdom world-wide. This information is fundamental to our understanding of the conditions that have shaped variation in animal life histories and their relationships with the environment, as well as the determinants of invasion and extinction. Matrix population models (MPMs) are among the most widely used demographic tools by animal ecologists. MPMs project population dynamics based on the reproduction, survival and development of individuals in a population over their life cycle. The outputs from MPMs have direct biological interpretations, facilitating comparisons among animal species as different as Caenorhabditis elegans, Loxodonta africana and Homo sapiens. Thousands of animal demographic records exist in the form of MPMs, but they are dispersed throughout the literature, rendering comparative analyses difficult. Here, we introduce the COMADRE Animal Matrix Database, an open-data online repository, which in its version 1.0.0 contains data on 345 species world-wide, from 402 studies with a total of 1625 population projection matrices. COMADRE also contains ancillary information (e.g. ecoregion, taxonomy, biogeography, etc.) that facilitates interpretation of the numerous demographic metrics that can be derived from its MPMs. We provide R code to some of these examples. We introduce the COMADRE Animal Matrix Database, a resource for animal demography. Its open-data nature, together with its ancillary information, will facilitate comparative analysis, as will the growing availability of databases focusing on other aspects of the rich animal diversity, and tools to query and combine them. Through future frequent updates of COMADRE, and its integration with other online resources, we encourage animal ecologists to tackle global ecological and evolutionary questions with unprecedented sample size. © 2016 The Authors. Journal of Animal Ecology published by John Wiley & Sons Ltd on behalf of British Ecological Society.
Berthold, Michael R.; Hedrick, Michael P.; Gilson, Michael K.
2015-01-01
Today’s large, public databases of protein–small molecule interaction data are creating important new opportunities for data mining and integration. At the same time, new graphical user interface-based workflow tools offer facile alternatives to custom scripting for informatics and data analysis. Here, we illustrate how the large protein-ligand database BindingDB may be incorporated into KNIME workflows as a step toward the integration of pharmacological data with broader biomolecular analyses. Thus, we describe a collection of KNIME workflows that access BindingDB data via RESTful webservices and, for more intensive queries, via a local distillation of the full BindingDB dataset. We focus in particular on the KNIME implementation of knowledge-based tools to generate informed hypotheses regarding protein targets of bioactive compounds, based on notions of chemical similarity. A number of variants of this basic approach are tested for seven existing drugs with relatively ill-defined therapeutic targets, leading to replication of some previously confirmed results and discovery of new, high-quality hits. Implications for future development are discussed. Database URL: www.bindingdb.org PMID:26384374
2013-01-01
Background A large-scale, highly accurate, machine-understandable drug-disease treatment relationship knowledge base is important for computational approaches to drug repurposing. The large body of published biomedical research articles and clinical case reports available on MEDLINE is a rich source of FDA-approved drug-disease indication as well as drug-repurposing knowledge that is crucial for applying FDA-approved drugs for new diseases. However, much of this information is buried in free text and not captured in any existing databases. The goal of this study is to extract a large number of accurate drug-disease treatment pairs from published literature. Results In this study, we developed a simple but highly accurate pattern-learning approach to extract treatment-specific drug-disease pairs from 20 million biomedical abstracts available on MEDLINE. We extracted a total of 34,305 unique drug-disease treatment pairs, the majority of which are not included in existing structured databases. Our algorithm achieved a precision of 0.904 and a recall of 0.131 in extracting all pairs, and a precision of 0.904 and a recall of 0.842 in extracting frequent pairs. In addition, we have shown that the extracted pairs strongly correlate with both drug target genes and therapeutic classes, therefore may have high potential in drug discovery. Conclusions We demonstrated that our simple pattern-learning relationship extraction algorithm is able to accurately extract many drug-disease pairs from the free text of biomedical literature that are not captured in structured databases. The large-scale, accurate, machine-understandable drug-disease treatment knowledge base that is resultant of our study, in combination with pairs from structured databases, will have high potential in computational drug repurposing tasks. PMID:23742147
Databases for multilevel biophysiology research available at Physiome.jp.
Asai, Yoshiyuki; Abe, Takeshi; Li, Li; Oka, Hideki; Nomura, Taishin; Kitano, Hiroaki
2015-01-01
Physiome.jp (http://physiome.jp) is a portal site inaugurated in 2007 to support model-based research in physiome and systems biology. At Physiome.jp, several tools and databases are available to support construction of physiological, multi-hierarchical, large-scale models. There are three databases in Physiome.jp, housing mathematical models, morphological data, and time-series data. In late 2013, the site was fully renovated, and in May 2015, new functions were implemented to provide information infrastructure to support collaborative activities for developing models and performing simulations within the database framework. This article describes updates to the databases implemented since 2013, including cooperation among the three databases, interactive model browsing, user management, version management of models, management of parameter sets, and interoperability with applications.
Java Web Simulation (JWS); a web based database of kinetic models.
Snoep, J L; Olivier, B G
2002-01-01
Software to make a database of kinetic models accessible via the internet has been developed and a core database has been set up at http://jjj.biochem.sun.ac.za/. This repository of models, available to everyone with internet access, opens a whole new way in which we can make our models public. Via the database, a user can change enzyme parameters and run time simulations or steady state analyses. The interface is user friendly and no additional software is necessary. The database currently contains 10 models, but since the generation of the program code to include new models has largely been automated the addition of new models is straightforward and people are invited to submit their models to be included in the database.
Organ donation after death in Ontario: a population-based cohort study
Redelmeier, Donald A.; Markel, Frank; Scales, Damon C.
2013-01-01
Background: Shortfalls in deceased organ donation lead to shortages of solid organs available for transplantation. We assessed rates of deceased organ donation and compared hospitals that had clinical services for transplant recipients (transplant hospitals) to those that did not (general hospitals). Methods: We conducted a population-based cohort analysis involving patients who died from traumatic brain injury, subarachnoid hemorrhage, intracerebral hemorrhage or other catastrophic neurologic conditions in Ontario, Canada, between Apr. 1, 1994, and Mar. 31, 2011. We distinguished between acute care hospitals with and without transplant services. The primary outcome was actual organ donation determined through the physician database for organ procurement procedures. Results: Overall, 87 129 patients died from catastrophic neurologic conditions during the study period, of whom 1930 became actual donors. Our primary analysis excluded patients from small hospitals, reducing the total to 79 746 patients, of whom 1898 became actual donors. Patients who died in transplant hospitals had a distribution of demographic characteristics similar to that of patients who died in other large general hospitals. Transplant hospitals had an actual donor rate per 100 deaths that was about 4 times the donor rate at large general hospitals (5.0 v. 1.4, p < 0.001). The relative reduction in donations at general hospitals was accentuated among older patients, persisted among patients who were the most eligible candidates and amounted to about 121 fewer actual donors per year (adjusted odds ratio 0.58, 95% confidence interval 0.36–0.92). Hospital volumes were only weakly correlated with actual organ donation rates. Interpretation: Optimizing organ donation requires greater attention to large general hospitals. These hospitals account for most of the potential donors and missed opportunities for deceased organ donation. PMID:23549970
Batista Rodríguez, Gabriela; Balla, Andrea; Corradetti, Santiago; Martinez, Carmen; Hernández, Pilar; Bollo, Jesús; Targarona, Eduard M
2018-06-01
"Big data" refers to large amount of dataset. Those large databases are useful in many areas, including healthcare. The American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) and the National Inpatient Sample (NIS) are big databases that were developed in the USA in order to record surgical outcomes. The aim of the present systematic review is to evaluate the type and clinical impact of the information retrieved through NISQP and NIS big database articles focused on laparoscopic colorectal surgery. A systematic review was conducted using The Meta-Analysis Of Observational Studies in Epidemiology (MOOSE) guidelines. The research was carried out on PubMed database and revealed 350 published papers. Outcomes of articles in which laparoscopic colorectal surgery was the primary aim were analyzed. Fifty-five studies, published between 2007 and February 2017, were included. Articles included were categorized in groups according to the main topic as: outcomes related to surgical technique comparisons, morbidity and perioperatory results, specific disease-related outcomes, sociodemographic disparities, and academic training impact. NSQIP and NIS databases are just the tip of the iceberg for the potential application of Big Data technology and analysis in MIS. Information obtained through big data is useful and could be considered as external validation in those situations where a significant evidence-based medicine exists; also, those databases establish benchmarks to measure the quality of patient care. Data retrieved helps to inform decision-making and improve healthcare delivery.
ERIC Educational Resources Information Center
Shabajee, Paul; Miller, Libby; Dingley, Andy
A group of research projects based at HP-Labs Bristol, the University of Bristol (England) and ARKive (a new large multimedia database project focused on the worlds biodiversity based in the United Kingdom) are working to develop a flexible model for the indexing of multimedia collections that allows users to annotate content utilizing extensible…
Chen, San-Ni; Lian, Iebin; Chen, Yi-Chiao; Ho, Jau-Der
2015-02-01
To investigate peptic ulcer disease and other possible risk factors in patients with central serous chorioretinopathy (CSR) using a population-based database. In this population-based retrospective cohort study, longitudinal data from the Taiwan National Health Insurance Research Database were analyzed. The study cohort comprised 835 patients with CSR and the control cohort comprised 4175 patients without CSR from January 2000 to December 2009. Conditional logistic regression was applied to examine the association of peptic ulcer disease and other possible risk factors for CSR, and stratified Cox regression models were applied to examine whether patients with CSR have an increased chance of peptic ulcer disease and hypertension development. The identifiable risk factors for CSR included peptic ulcer disease (adjusted odd ratio: 1.39, P = 0.001) and higher monthly income (adjusted odd ratio: 1.30, P = 0.006). Patients with CSR also had a significantly higher chance of developing peptic ulcer disease after the diagnosis of CSR (adjusted odd ratio: 1.43, P = 0.009). Peptic ulcer disease and higher monthly income are independent risk factors for CSR. Whereas, patients with CSR also had increased risk for peptic ulcer development.
Slutsky, Jeremiah; Singh, Nilkamal; Khalsa, Sat Bir S.
2015-01-01
Abstract Objective: A comprehensive bibliometric analysis was conducted on publications for yoga therapy research in clinical populations. Methods: Major electronic databases were searched for articles in all languages published between 1967 and 2013. Databases included PubMed, PsychInfo, MEDLINE, IndMed, Indian Citation Index, Index Medicus for South-East Asia Region, Web of Knowledge, Embase, EBSCO, and Google Scholar. Nonindexed journals were searched manually. Key search words included yoga, yoga therapy, pranayama, asana. All studies met the definition of a clinical trial. All styles of yoga were included. The authors extracted the data. Results: A total of 486 articles met the inclusion criteria and were published in 217 different peer-reviewed journals from 29 different countries on 28,080 study participants. The primary result observed is the three-fold increase in number of publications seen in the last 10 years, inclusive of all study designs. Overall, 45% of the studies published were randomized controlled trials, 18% were controlled studies, and 37% were uncontrolled studies. Most publications originated from India (n=258), followed by the United States (n=122) and Canada (n=13). The top three disorders addressed by yoga interventions were mental health, cardiovascular disease, and respiratory disease. Conclusion: A surge in publications on yoga to mitigate disease-related symptoms in clinical populations has occurred despite challenges facing the field of yoga research, which include standardization and limitations in funding, time, and resources. The population at large has observed a parallel surge in the use of yoga outside of clinical practice. The use of yoga as a complementary therapy in clinical practice may lead to health benefits beyond traditional treatment alone; however, to effect changes in health care policy, more high-quality, evidence-based research is needed. PMID:26196166
Fitzpatrick, Tiffany; Rosella, Laura C; Calzavara, Andrew; Petch, Jeremy; Pinto, Andrew D; Manson, Heather; Goel, Vivek; Wodchis, Walter P
2015-08-01
Healthcare spending occurs disproportionately among a very small portion of the population. Research on these high-cost users (HCUs) of health care has been overwhelmingly cross-sectional in nature and limited to the few sociodemographic and clinical characteristics available in health administrative databases. This study is the first to bridge this knowledge gap by applying a population health lens to HCUs. We investigate associations between a broad range of SES characteristics and future HCUs. A cohort of adults from two cycles of large, nationally representative health surveys conducted in 2003 and 2005 was linked to population-based health administrative databases from a universal healthcare plan for Ontario, Canada. Comprehensive person-centered estimates of annual healthcare spending were calculated for the subsequent 5 years following interview. Baseline HCUs (top 5%) were excluded and healthcare spending for non-HCUs was analyzed. Adjusted for predisposition and need factors, the odds of future HCU status (over 5 years) were estimated according to various individual, household, and neighborhood SES factors. Analyses were conducted in 2014. Low income (personal and household); less than post-secondary education; and living in high-dependency neighborhoods greatly increased the odds of future HCUs. After adjustment, future HCU status was most strongly associated with food insecurity, personal income, and non-homeownership. Living in highly deprived or low ethnic concentration neighborhoods also increased the odds of becoming an HCU. Findings suggest that addressing social determinants of health, such as food and housing security, may be important components of interventions aiming to improve health outcomes and reduce costs. Copyright © 2015 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.
Traditional and Current Food Use of Wild Plants Listed in the Russian Pharmacopoeia
Shikov, Alexander N.; Tsitsilin, Andrey N.; Pozharitskaya, Olga N.; Makarov, Valery G.; Heinrich, Michael
2017-01-01
Historically Russia can be regarded as a “herbophilious” society. For centuries the multinational population of Russia has used plants in daily diet and for self-medication. The specificity of dietary uptake of medicinal plants (especially those in the unique and highly developed Russian herbal medical tradition) has remained mostly unknown in other regions. Based on 11th edition of the State Pharmacopoeia of the USSR, we selected 70 wild plant species which have been used in food by local Russian populations. Empirical searches were conducted via the Russian-wide applied online database E-library.ru, library catalogs of public libraries in St-Petersburg, the databases Scopus, Web of Science, PubMed, and search engine Google Scholar. The large majority of species included in Russian Pharmacopoeia are used as food by local population, however, aerial parts are more widely used for food. In this review, we summarize data on medicinal species published in Russia and other countries that are included in the Russian Pharmacopoeia and have being used in food for a long time. Consequently, the Russian Pharmacopoeia is an important source of information on plant species used traditionally at the interface of food and medicine. At the same time, there are the so-called “functional foods”, which denotes foods that not only serves to provide nutrition but also can be a source for prevention and cure of various diseases. This review highlights the potential of wild species of Russia monographed in its pharmacopeia for further developing new functional foods and—through the lens of their incorporation into the pharmacopeia—showcases the species' importance in Russia. PMID:29209213
Mattioli, Stefano; Baldasseroni, Alberto; Curti, Stefania; Cooke, Robin M T; Bena, Antonella; de Giacomi, Giovanna; dell'Omo, Marco; Fateh-Moghadam, Pirous; Melani, Carla; Biocca, Marco; Buiatti, Eva; Campo, Giuseppe; Zanardi, Francesca; Violante, Francesco S
2008-10-28
Carpal tunnel syndrome (CTS) is a socially relevant condition associated with biomechanical risk factors. We evaluated age-sex-specific incidence rates of in-hospital cases of CTS in central/northern Italy and explored relations with marital status. Seven regions were considered (overall population, 14.9 million) over 3-6-year periods between 1997 and 2002 (when out-of-hospital CTS surgery was extremely rare). Incidence rates of in-hospital cases of CTS were estimated based on 1) codified demographic, diagnostic and intervention data in obligatory discharge records from all Italian public/private hospitals, archived (according to residence) on regional databases; 2) demographic general population data for each region. We compared (using the chiscore test) age-sex-specific rates between married, unmarried, divorced and widowed subsets of the general population. We calculated standardized incidence ratios (SIRs) for married/unmarried men and women. Age-standardized incidence rates (per 100,000 person-years) of in-hospital cases of CTS were 166 in women and 44 in men (106 overall). Married subjects of both sexes showed higher age-specific rates with respect to unmarried men/women. SIRs were calculated comparing married vs unmarried rates of both sexes: 1.59 (95% confidence interval [95% CI], 1.57-1.60) in women, and 1.42 (95% CI, 1.40-1.45) in men. As compared with married women/men, widows/widowers both showed 2-3-fold higher incidence peaks during the fourth decade of life (beyond 50 years of age, widowed subjects showed similar trends to unmarried counterparts). This large population-based study illustrates distinct age-related trends in men and women, and also raises the question whether marital status could be associated with CTS in the general population.
Data base management system for lymphatic filariasis--a neglected tropical disease.
Upadhyayula, Suryanaryana Murty; Mutheneni, Srinivasa Rao; Kadiri, Madhusudhan Rao; Kumaraswamy, Sriram; Nelaturu, Sarat Chandra Babu
2012-01-01
Researchers working in the area of Public Health are being confronted with large volumes of data on various aspects of entomology and epidemiology. To obtain the relevant information out of these data requires particular database management system. In this paper, we have described about the usages of our developed database on lymphatic filariasis. This database application is developed using Model View Controller (MVC) architecture, with MySQL as database and a web based interface. We have collected and incorporated the data on filariasis in the database from Karimnagar, Chittoor, East and West Godavari districts of Andhra Pradesh, India. The importance of this database is to store the collected data, retrieve the information and produce various combinational reports on filarial aspects which in turn will help the public health officials to understand the burden of disease in a particular locality. This information is likely to have an imperative role on decision making for effective control of filarial disease and integrated vector management operations.
Izmerov, N F; Tikhonova, G I; Gorchakova, T Iu
2014-01-01
The purpose of the study was to carry out comparative analysis of the status and trends in mortality of male and female population of working age (15-59 (54) years) in Russia and the EU-27. Based on official Russian (Rosstat) data, on the global database of the World Health Organization's cause of death (The WHO Mortality Database, WHOMD) and databases The Human Mortality Database (HMD) of the sex-age composition of the population and the number of deaths from certain causes of death by age and sex standardized (direct method) mortality rates of working age population from selected causes of death for 1990 and 2011 in Russia and the average for the EU-27 were calculated. Analysis of trends in mortality of male and female population of working age in Russia over the past two decades shows that, despite the positive changes in during last six years, in 2011, age-standardized mortality rates remained above the 1990 level for most causes of death. During the same period in the EU-27 mortality in men (15-59 years) and women (15-54 years) increased from almost all causes ofdeath, which led to an even greatergap between Russia and developed countries on this indicator: standardized mortality rate of the male population of Russia in 1990 was higher than in the EU-27 by 2.1 times, and by 2011 the gap had increased to 3.5 times. The women in the 1990 had 1.5 times higher standardized mortality rates, and by 2011 the gap had increased to 2.7 times. Despite a steady decline in the mortality rates of working age population after 2005, its level in 2012 was still higher than the one of 1990 for both men and women, which led to a further increase in the gap between the age-standardized coefficients of mortality rate of working age population in Russia and the countries of European Community-27 (15-59 (54)). Faster reduction of mortality rate in the working age population will preserve Russian population and its labor potential.
Updated estimate of trans fat intake by the US population.
Doell, D; Folmer, D; Lee, H; Honigfort, M; Carberry, S
2012-01-01
The dietary intake of industrially-produced trans fatty acids (IP-TFA) was estimated for the US population (aged 2 years or more), children (aged 2-5 years) and teenage boys (aged 13-18 years) using the 2003-2006 National Health and Nutrition Examination Survey (NHANES) food consumption database, market share information and trans fat levels based on label survey data and analytical data for packaged and in-store purchased foods. For fast foods, a Monte Carlo model was used to estimate IP-TFA intake. Further, the intake of trans fat was also estimated using trans fat levels reported in the US Department of Agriculture (USDA) National Nutrient Database for Standard Reference, Release 22 (SR 22, 2009) and the 2003-2006 NHANES food consumption database. The cumulative intake of IP-TFA was estimated to be 1.3 g per person per day (g/p/d) at the mean for the US population. Based on this estimate, the mean dietary intake of IP-TFA has decreased significantly from that cited in the 2003 US Food and Drug Administration (FDA) final rule that established labelling requirements for trans fat (4.6 g/p/d for adults). Although the overall intake of IP-TFA has decreased as a result of the implementation of labelling requirements, individuals with certain dietary habits may still consume high levels of IP-TFA if certain brands or types of food products are frequently chosen.
Aging assessment of large electric motors in nuclear power plants
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villaran, M.; Subudhi, M.
1996-03-01
Large electric motors serve as the prime movers to drive high capacity pumps, fans, compressors, and generators in a variety of nuclear plant systems. This study examined the stressors that cause degradation and aging in large electric motors operating in various plant locations and environments. The operating history of these machines in nuclear plant service was studied by review and analysis of failure reports in the NPRDS and LER databases. This was supplemented by a review of motor designs, and their nuclear and balance of plant applications, in order to characterize the failure mechanisms that cause degradation, aging, and failuremore » in large electric motors. A generic failure modes and effects analysis for large squirrel cage induction motors was performed to identify the degradation and aging mechanisms affecting various components of these large motors, the failure modes that result, and their effects upon the function of the motor. The effects of large motor failures upon the systems in which they are operating, and on the plant as a whole, were analyzed from failure reports in the databases. The effectiveness of the industry`s large motor maintenance programs was assessed based upon the failure reports in the databases and reviews of plant maintenance procedures and programs.« less
Addition of a breeding database in the Genome Database for Rosaceae
Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie
2013-01-01
Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will further accelerate the cross-utilization of diverse data types by researchers from various disciplines. Database URL: http://www.rosaceae.org/breeders_toolbox PMID:24247530
Addition of a breeding database in the Genome Database for Rosaceae.
Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie
2013-01-01
Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will further accelerate the cross-utilization of diverse data types by researchers from various disciplines. Database URL: http://www.rosaceae.org/breeders_toolbox.
Liu, Rong; Li, Xi; Zhang, Wei; Zhou, Hong-Hao
2015-01-01
Objective Multiple linear regression (MLR) and machine learning techniques in pharmacogenetic algorithm-based warfarin dosing have been reported. However, performances of these algorithms in racially diverse group have never been objectively evaluated and compared. In this literature-based study, we compared the performances of eight machine learning techniques with those of MLR in a large, racially-diverse cohort. Methods MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied in warfarin dose algorithms in a cohort from the International Warfarin Pharmacogenetics Consortium database. Covariates obtained by stepwise regression from 80% of randomly selected patients were used to develop algorithms. To compare the performances of these algorithms, the mean percentage of patients whose predicted dose fell within 20% of the actual dose (mean percentage within 20%) and the mean absolute error (MAE) were calculated in the remaining 20% of patients. The performances of these techniques in different races, as well as the dose ranges of therapeutic warfarin were compared. Robust results were obtained after 100 rounds of resampling. Results BART, MARS and SVR were statistically indistinguishable and significantly out performed all the other approaches in the whole cohort (MAE: 8.84–8.96 mg/week, mean percentage within 20%: 45.88%–46.35%). In the White population, MARS and BART showed higher mean percentage within 20% and lower mean MAE than those of MLR (all p values < 0.05). In the Asian population, SVR, BART, MARS and LAR performed the same as MLR. MLR and LAR optimally performed among the Black population. When patients were grouped in terms of warfarin dose range, all machine learning techniques except ANN and LAR showed significantly higher mean percentage within 20%, and lower MAE (all p values < 0.05) than MLR in the low- and high- dose ranges. Conclusion Overall, machine learning-based techniques, BART, MARS and SVR performed superior than MLR in warfarin pharmacogenetic dosing. Differences of algorithms’ performances exist among the races. Moreover, machine learning-based algorithms tended to perform better in the low- and high- dose ranges than MLR. PMID:26305568
Chiò, A; Logroscino, G; Traynor, BJ; Collins, J; Simeone, JC; Goldstein, LA; White, LA
2014-01-01
Background Amyotrophic lateral sclerosis (ALS) is relatively rare, yet the economic and social burden is substantial. Having accurate incidence and prevalence estimates would facilitate efficient allocation of healthcare resources. Objective To provide a comprehensive and critical review of the epidemiologic literature on ALS. Methods MEDLINE and EMBASE (1995–2011) databases of population-based studies on ALS incidence and prevalence reporting quantitative data were analyzed. Data extracted included study location and time, design and data sources, case ascertainment methods, and incidence and/or prevalence rates. Medians and inter-quartile ranges (IQRs) were calculated, and ALS case estimates derived using 2010 population estimates. Results In all, 37 articles met inclusion criteria. In Europe, the median (IQR) incidence rate (/100,000 population) was 2.08 (1.47–2.43), corresponding to an estimated 15,355 (10,852–17,938) cases. Median (IQR) prevalence (/100,000 population) was 5.40 (4.06–7.89), or 39,863 (29,971–58,244) prevalent cases. Conclusions Disparity in rates among ALS incidence and prevalence studies may be due to differences in study design or true variations in population demographics, such as age, and geography, including environmental factors and genetic predisposition. Additional large-scale studies that use standardized case ascertainment methods are needed to more accurately assess the true global burden of ALS. PMID:23860588
Shared patients: multiple health and social care contact.
Keene, J; Swift, L; Bailey, S; Janacek, G
2001-07-01
The paper describes results from the 'Tracking Project', a new method for examining agency overlap, repeat service use and shared clients/patients amongst social and health care agencies in the community. This is the first project in this country to combine total population databases from a range of social, health care and criminal justice agencies to give a multidisciplinary database for one county (n = 97,162 cases), through standardised anonymisation of agency databases, using SOUNDEX, a software programme. A range of 20 community social and health care agencies were shown to have a large overlap with each other in a two-year period, indicating high proportions of shared patients/clients. Accident and Emergency is used as an example of major overlap: 16.2% (n = 39,992) of persons who attended a community agency had attended Accident and Emergency as compared to 8.2% (n = 775,000) of the total population of the county. Of these, 96% who had attended seven or more different community agencies had also attended Accident and Emergency. Further statistical analysis of Accident and Emergency attendance as a characteristic of community agency populations (n = 39,992) revealed that increasing frequency of attendance at Accident and Emergency was very strongly associated with increasing use of other services. That is, the patients that repeatedly attend Accident and Emergency are much more likely to attend more other agencies, indicating the possibility that they share more problematic or difficult patients. Research questions arising from these data are discussed and future research methods suggested in order to derive predictors from the database and develop screening instruments to identify multiple agency attenders for targeting or multidisciplinary working. It is suggested that Accident and Emergency attendance might serve as an important predictor of multiple agency attendance.
PropBase Query Layer: a single portal to UK subsurface physical property databases
NASA Astrophysics Data System (ADS)
Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham
2013-04-01
Until recently, the delivery of geological information for industry and public was achieved by geological mapping. Now pervasively available computers mean that 3D geological models can deliver realistic representations of the geometric location of geological units, represented as shells or volumes. The next phase of this process is to populate these with physical properties data that describe subsurface heterogeneity and its associated uncertainty. Achieving this requires capture and serving of physical, hydrological and other property information from diverse sources to populate these models. The British Geological Survey (BGS) holds large volumes of subsurface property data, derived both from their own research data collection and also other, often commercially derived data sources. This can be voxelated to incorporate this data into the models to demonstrate property variation within the subsurface geometry. All property data held by BGS has for many years been stored in relational databases to ensure their long-term continuity. However these have, by necessity, complex structures; each database contains positional reference data and model information, and also metadata such as sample identification information and attributes that define the source and processing. Whilst this is critical to assessing these analyses, it also hugely complicates the understanding of variability of the property under assessment and requires multiple queries to study related datasets making extracting physical properties from these databases difficult. Therefore the PropBase Query Layer has been created to allow simplified aggregation and extraction of all related data and its presentation of complex data in simple, mostly denormalized, tables which combine information from multiple databases into a single system. The structure from each relational database is denormalized in a generalised structure, so that each dataset can be viewed together in a common format using a simple interface. Data are re-engineered to facilitate easy loading. The query layer structure comprises tables, procedures, functions, triggers, views and materialised views. The structure contains a main table PRB_DATA which contains all of the data with the following attribution: • a unique identifier • the data source • the unique identifier from the parent database for traceability • the 3D location • the property type • the property value • the units • necessary qualifiers • precision information and an audit trail Data sources, property type and units are constrained by dictionaries, a key component of the structure which defines what properties and inheritance hierarchies are to be coded and also guides the process as to what and how these are extracted from the structure. Data types served by the Query Layer include site investigation derived geotechnical data, hydrogeology datasets, regional geochemistry, geophysical logs as well as lithological and borehole metadata. The size and complexity of the data sets with multiple parent structures requires a technically robust approach to keep the layer synchronised. This is achieved through Oracle procedures written in PL/SQL containing the logic required to carry out the data manipulation (inserts, updates, deletes) to keep the layer synchronised with the underlying databases either as regular scheduled jobs (weekly, monthly etc) or invoked on demand. The PropBase Query Layer's implementation has enabled rapid data discovery, visualisation and interpretation of geological data with greater ease, simplifying the parametrisation of 3D model volumes and facilitating the study of intra-unit heterogeneity.
Verberkmoes, Nathan C; Hervey, W Judson; Shah, Manesh; Land, Miriam; Hauser, Loren; Larimer, Frank W; Van Berkel, Gary J; Goeringer, Douglas E
2005-02-01
There is currently a great need for rapid detection and positive identification of biological threat agents, as well as microbial species in general, directly from complex environmental samples. This need is most urgent in the area of homeland security, but also extends into medical, environmental, and agricultural sciences. Mass-spectrometry-based analysis is one of the leading technologies in the field with a diversity of different methodologies for biothreat detection. Over the past few years, "shotgun"proteomics has become one method of choice for the rapid analysis of complex protein mixtures by mass spectrometry. Recently, it was demonstrated that this methodology is capable of distinguishing a target species against a large database of background species from a single-component sample or dual-component mixtures with relatively the same concentration. Here, we examine the potential of shotgun proteomics to analyze a target species in a background of four contaminant species. We tested the capability of a common commercial mass-spectrometry-based shotgun proteomics platform for the detection of the target species (Escherichia coli) at four different concentrations and four different time points of analysis. We also tested the effect of database size on positive identification of the four microbes used in this study by testing a small (13-species) database and a large (261-species) database. The results clearly indicated that this technology could easily identify the target species at 20% in the background mixture at a 60, 120, 180, or 240 min analysis time with the small database. The results also indicated that the target species could easily be identified at 20% or 6% but could not be identified at 0.6% or 0.06% in either a 240 min analysis or a 30 h analysis with the small database. The effects of the large database were severe on the target species where detection above the background at any concentration used in this study was impossible, though the three other microbes used in this study were clearly identified above the background when analyzed with the large database. This study points to the potential application of this technology for biological threat agent detection but highlights many areas of needed research before the technology will be useful in real world samples.
Van Le, Hoa; Beach, Kathleen J; Powell, Gregory; Pattishall, Ed; Ryan, Patrick; Mera, Robertino M
2013-02-01
Different structures and coding schemes may limit rapid evaluation of a large pool of potential drug safety signals using multiple longitudinal healthcare databases. To overcome this restriction, a semi-automated approach utilising common data model (CDM) and robust pharmacoepidemiologic methods was developed; however, its performance needed to be evaluated. Twenty-three established drug-safety associations from publications were reproduced in a healthcare claims database and four of these were also repeated in electronic health records. Concordance and discrepancy of pairwise estimates were assessed between the results derived from the publication and results from this approach. For all 27 pairs, an observed agreement between the published results and the results from the semi-automated approach was greater than 85% and Kappa coefficient was 0.61, 95% CI: 0.19-1.00. Ln(IRR) differed by less than 50% for 13/27 pairs, and the IRR varied less than 2-fold for 19/27 pairs. Reproducibility based on the intra-class correlation coefficient was 0.54. Most covariates (>90%) in the publications were available for inclusion in the models. Once the study populations and inclusion/exclusion criteria were obtained from the literature, the analysis was able to be completed in 2-8 h. The semi-automated methodology using a CDM produced consistent risk estimates compared to the published findings for most selected drug-outcome associations, regardless of original study designs, databases, medications and outcomes. Further assessment of this approach is useful to understand its roles, strengths and limitations in rapidly evaluating safety signals.
Shaw, Souradet Y; Blanchard, James F; Bernstein, Charles N
2015-04-01
Early childhood vaccinations have been hypothesized to contribute to the emergence of paediatric inflammatory bowel disease [IBD] in developed countries. Using linked population-based administrative databases, we aimed to explore the association between vaccination with measles-containing vaccines and the risk for IBD. This was a case-control study using the University of Manitoba IBD Epidemiology Database [UMIBDED]. The UMIBDED was linked to the Manitoba Immunization Monitoring System [MIMS], a population-based database of immunizations administered in Manitoba. All paediatric IBD cases in Manitoba, born after 1989 and diagnosed before March 31, 2008, were included. Controls were matched to cases on the basis of age, sex, and region of residence at time of diagnosis. Measles-containing vaccinations received in the first 2 years of life were documented, with vaccinations categorized as 'None' or 'Complete', with completeness defined according to Manitoba's vaccination schedule. Conditional logistic regression models were fitted to the data, with models adjusted for physician visits in the first 2 years of life and area-level socioeconomic status at case date. A total of 951 individuals [117 cases and 834 controls] met eligibility criteria, with average age of diagnosis among cases at 11 years. The proportion of IBD cases with completed vaccinations was 97%, compared with 94% of controls. In models adjusted for physician visits and area-level socioeconomic status, no statistically significant association was detected between completed measles vaccinations and the risk of IBD (adjusted odds ratio [AOR]: 1.5; 95% confidence interval [CI]: 0.5-4.4; p = 0.419]. No significant association between completed measles-containing vaccination in the first 2 years of life and paediatric IBD could be demonstrated in this population-based study. Copyright © 2015 European Crohn’s and Colitis Organisation (ECCO). Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Development of an online database of typical food portion sizes in Irish population groups.
Lyons, Jacqueline; Walton, Janette; Flynn, Albert
2013-01-01
The Irish Food Portion Sizes Database (available at www.iuna.net) describes typical portion weights for an extensive range of foods and beverages for Irish children, adolescents and adults. The present paper describes the methodologies used to develop the database and some key characteristics of the portion weight data contained therein. The data are derived from three large, cross-sectional food consumption surveys carried out in Ireland over the last decade: the National Children's Food Survey (2003-2004), National Teens' Food Survey (2005-2006) and National Adult Nutrition Survey (2008-2010). Median, 25th and 75th percentile portion weights are described for a total of 545 items across the three survey groups, split by age group or sex as appropriate. The typical (median) portion weights reported for adolescents and adults are similar for many foods, while those reported for children are notably smaller. Adolescent and adult males generally consume larger portions than their female counterparts, though similar portion weights may be consumed where foods are packaged in unit amounts (for example, pots of yoghurt). The inclusion of energy under-reporters makes little difference to the estimation of typical portion weights in adults. The data have wide-ranging applications in dietary assessment and food labelling, and will serve as a useful reference against which to compare future portion size data from the Irish population. The present paper provides a useful context for researchers and others wishing to use the Irish Food Portion Sizes Database, and may guide researchers in other countries in establishing similar databases of their own.
Alving, Berit Elisabeth; Christensen, Janne Buck; Thrysøe, Lars
2018-03-01
The purpose of this literature review is to provide an overview of the information retrieval behaviour of clinical nurses, in terms of the use of databases and other information resources and their frequency of use. Systematic searches carried out in five databases and handsearching were used to identify the studies from 2010 to 2016, with a populations, exposures and outcomes (PEO) search strategy, focusing on the question: In which databases or other information resources do hospital nurses search for evidence based information, and how often? Of 5272 titles retrieved based on the search strategy, only nine studies fulfilled the criteria for inclusion. The studies are from the United States, Canada, Taiwan and Nigeria. The results show that hospital nurses' primary choice of source for evidence based information is Google and peers, while bibliographic databases such as PubMed are secondary choices. Data on frequency are only included in four of the studies, and data are heterogenous. The reasons for choosing Google and peers are primarily lack of time; lack of information; lack of retrieval skills; or lack of training in database searching. Only a few studies are published on clinical nurses' retrieval behaviours, and more studies are needed from Europe and Australia. © 2018 Health Libraries Group.
Analysis of Outcomes After TKA: Do All Databases Produce Similar Findings?
Bedard, Nicholas A; Pugely, Andrew J; McHugh, Michael; Lux, Nathan; Otero, Jesse E; Bozic, Kevin J; Gao, Yubo; Callaghan, John J
2018-01-01
Use of large clinical and administrative databases for orthopaedic research has increased exponentially. Each database represents unique patient populations and varies in their methodology of data acquisition, which makes it possible that similar research questions posed to different databases might result in answers that differ in important ways. (1) What are the differences in reported demographics, comorbidities, and complications for patients undergoing primary TKA among four databases commonly used in orthopaedic research? (2) How does the difference in reported complication rates vary depending on whether only inpatient data or 30-day postoperative data are analyzed? Patients who underwent primary TKA during 2010 to 2012 were identified within the National Surgical Quality Improvement Programs (NSQIP), the Nationwide Inpatient Sample (NIS), the Medicare Standard Analytic Files (MED), and the Humana Administrative Claims database (HAC). NSQIP is a clinical registry that captures both inpatient and outpatient events up to 30 days after surgery using clinical reviewers and strict definitions for each variable. The other databases are administrative claims databases with their comorbidity and adverse event data defined by diagnosis and procedure codes used for reimbursement. NIS is limited to inpatient data only, whereas HAC and MED also have outpatient data. The number of patients undergoing primary TKA from each database was 48,248 in HAC, 783,546 in MED, 393,050 in NIS, and 43,220 in NSQIP. NSQIP definitions for comorbidities and surgical complications were matched to corresponding International Classification of Diseases, 9 Revision/Current Procedural Terminology codes and these coding algorithms were used to query NIS, MED, and HAC. Age, sex, comorbidities, and inpatient versus 30-day postoperative complications were compared across the four databases. Given the large sample sizes, statistical significance was often detected for small, clinically unimportant differences; thus, the focus of comparisons was whether the difference reached an absolute difference of twofold to signify an important clinical difference. Although there was a higher proportion of males in NIS and NSQIP and patients in NIS were younger, the difference was slight and well below our predefined threshold for a clinically important difference. There was variation in the prevalence of comorbidities and rates of postoperative complications among databases. The prevalence of chronic obstructive pulmonary disease (COPD) and coagulopathy in HAC and MED was more than twice that in NIS and NSQIP (relative risk [RR] for COPD: MED versus NIS 3.1, MED versus NSQIP 4.5, HAC versus NIS 3.6, HAC versus NSQIP 5.3; RR for coagulopathy: MED versus NIS 3.9, MED versus NSQIP 3.1, HAC versus NIS 3.3, HAC versus NSQIP 2.7; p < 0.001 for all comparisons). NSQIP had more than twice the obesity as NIS (RR 0.35). Rates of stroke within 30 days of TKA had more than a twofold difference among all databases (p < 0.001). HAC had more than twice the rates of 30-day complications at all endpoints compared with NSQIP and more than twice the 30-day infections as MED. A comparison of inpatient and 30-day complications rates demonstrated more than twice the amount of wound infections and deep vein thromboses is captured when data are analyzed out to 30 days after TKA (p < 0.001 for all comparisons). When evaluating research utilizing large databases, one must pay particular attention to the type of database used (administrative claims, clinical registry, or other kinds of databases), time period included, definitions utilized for specific variables, and the population captured to ensure it is best suited for the specific research question. Furthermore, with the advent of bundled payments, policymakers must meticulously consider the data sources used to ensure the data analytics match historical sources. Level III, therapeutic study.
NASA Astrophysics Data System (ADS)
Nakagawa, Y.; Kawahara, S.; Araki, F.; Matsuoka, D.; Ishikawa, Y.; Fujita, M.; Sugimoto, S.; Okada, Y.; Kawazoe, S.; Watanabe, S.; Ishii, M.; Mizuta, R.; Murata, A.; Kawase, H.
2017-12-01
Analyses of large ensemble data are quite useful in order to produce probabilistic effect projection of climate change. Ensemble data of "+2K future climate simulations" are currently produced by Japanese national project "Social Implementation Program on Climate Change Adaptation Technology (SI-CAT)" as a part of a database for Policy Decision making for Future climate change (d4PDF; Mizuta et al. 2016) produced by Program for Risk Information on Climate Change. Those data consist of global warming simulations and regional downscaling simulations. Considering that those data volumes are too large (a few petabyte) to download to a local computer of users, a user-friendly system is required to search and download data which satisfy requests of the users. We develop "a database system for near-future climate change projections" for providing functions to find necessary data for the users under SI-CAT. The database system for near-future climate change projections mainly consists of a relational database, a data download function and user interface. The relational database using PostgreSQL is a key function among them. Temporally and spatially compressed data are registered on the relational database. As a first step, we develop the relational database for precipitation, temperature and track data of typhoon according to requests by SI-CAT members. The data download function using Open-source Project for a Network Data Access Protocol (OPeNDAP) provides a function to download temporally and spatially extracted data based on search results obtained by the relational database. We also develop the web-based user interface for using the relational database and the data download function. A prototype of the database system for near-future climate change projections are currently in operational test on our local server. The database system for near-future climate change projections will be released on Data Integration and Analysis System Program (DIAS) in fiscal year 2017. Techniques of the database system for near-future climate change projections might be quite useful for simulation and observational data in other research fields. We report current status of development and some case studies of the database system for near-future climate change projections.
Cloud-based interactive analytics for terabytes of genomic variants data.
Pan, Cuiping; McInnes, Gregory; Deflaux, Nicole; Snyder, Michael; Bingham, Jonathan; Datta, Somalee; Tsao, Philip S
2017-12-01
Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired. We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information. Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs. cuiping@stanford.edu or ptsao@stanford.edu. Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.
Cloud-based interactive analytics for terabytes of genomic variants data
Pan, Cuiping; McInnes, Gregory; Deflaux, Nicole; Snyder, Michael; Bingham, Jonathan; Datta, Somalee; Tsao, Philip S
2017-01-01
Abstract Motivation Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired. Results We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information. Availability and implementation Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs. Contact cuiping@stanford.edu or ptsao@stanford.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28961771
Development and validation of an administrative case definition for inflammatory bowel diseases
Rezaie, Ali; Quan, Hude; Fedorak, Richard N; Panaccione, Remo; Hilsden, Robert J
2012-01-01
BACKGROUND: A population-based database of inflammatory bowel disease (IBD) patients is invaluable to explore and monitor the epidemiology and outcome of the disease. In this context, an accurate and validated population-based case definition for IBD becomes critical for researchers and health care providers. METHODS: IBD and non-IBD individuals were identified through an endoscopy database in a western Canadian health region (Calgary Health Region, Calgary, Alberta). Subsequently, using a novel algorithm, a series of case definitions were developed to capture IBD cases in the administrative databases. In the second stage of the study, the criteria were validated in the Capital Health Region (Edmonton, Alberta). RESULTS: A total of 150 IBD case definitions were developed using 1399 IBD patients and 15,439 controls in the development phase. In the validation phase, 318,382 endoscopic procedures were searched and 5201 IBD patients were identified. After consideration of sensitivity, specificity and temporal stability of each validated case definition, a diagnosis of IBD was assigned to individuals who experienced at least two hospitalizations or had four physician claims, or two medical contacts in the Ambulatory Care Classification System database with an IBD diagnostic code within a two-year period (specificity 99.8%; sensitivity 83.4%; positive predictive value 97.4%; negative predictive value 98.5%). An alternative case definition was developed for regions without access to the Ambulatory Care Classification System database. A novel scoring system was developed that detected Crohn disease and ulcerative colitis patients with a specificity of >99% and a sensitivity of 99.1% and 86.3%, respectively. CONCLUSION: Through a robust methodology, a reproducible set of criteria to capture IBD patients through administrative databases was developed. The methodology may be used to develop similar administrative definitions for chronic diseases. PMID:23061064
Stallings, Christopher D
2009-01-01
Understanding the current status of predatory fish communities, and the effects fishing has on them, is vitally important information for management. However, data are often insufficient at region-wide scales to assess the effects of extraction in coral reef ecosystems of developing nations. Here, I overcome this difficulty by using a publicly accessible, fisheries-independent database to provide a broad scale, comprehensive analysis of human impacts on predatory reef fish communities across the greater Caribbean region. Specifically, this study analyzed presence and diversity of predatory reef fishes over a gradient of human population density. Across the region, as human population density increases, presence of large-bodied fishes declines, and fish communities become dominated by a few smaller-bodied species. Complete disappearance of several large-bodied fishes indicates ecological and local extinctions have occurred in some densely populated areas. These findings fill a fundamentally important gap in our knowledge of the ecosystem effects of artisanal fisheries in developing nations, and provide support for multiple approaches to data collection where they are commonly unavailable.
Stallings, Christopher D.
2009-01-01
Background Understanding the current status of predatory fish communities, and the effects fishing has on them, is vitally important information for management. However, data are often insufficient at region-wide scales to assess the effects of extraction in coral reef ecosystems of developing nations. Methodology/Principal Findings Here, I overcome this difficulty by using a publicly accessible, fisheries-independent database to provide a broad scale, comprehensive analysis of human impacts on predatory reef fish communities across the greater Caribbean region. Specifically, this study analyzed presence and diversity of predatory reef fishes over a gradient of human population density. Across the region, as human population density increases, presence of large-bodied fishes declines, and fish communities become dominated by a few smaller-bodied species. Conclusions/Significance Complete disappearance of several large-bodied fishes indicates ecological and local extinctions have occurred in some densely populated areas. These findings fill a fundamentally important gap in our knowledge of the ecosystem effects of artisanal fisheries in developing nations, and provide support for multiple approaches to data collection where they are commonly unavailable. PMID:19421312
Spatial trends in leaf size of Amazonian rainforest trees
NASA Astrophysics Data System (ADS)
Malhado, A. C. M.; Malhi, Y.; Whittaker, R. J.; Ladle, R. J.; Ter Steege, H.; Aragão, L. E. O. C.; Quesada, C. A.; Araujo-Murakami, A.; Phillips, O. L.; Peacock, J.; Lopez-Gonzalez, G.; Baker, T. R.; Butt, N.; Anderson, L. O.; Arroyo, L.; Almeida, S.; Higuchi, N.; Killeen, T. J.; Monteagudo, A.; Neill, D.; Pitman, N.; Prieto, A.; Salomão, R. P.; Silva, N.; Vásquez-Martínez, R.; Laurance, W. F.
2009-02-01
Leaf size influences many aspects of tree function such as rates of transpiration and photosynthesis and, consequently, often varies in a predictable way in response to environmental gradients. The recent development of pan-Amazonian databases based on permanent botanical plots (e.g. RAINFOR, ATDN) has now made it possible to assess trends in leaf size across environmental gradients in Amazonia. Previous plot-based studies have shown that the community structure of Amazonian trees breaks down into at least two major ecological gradients corresponding with variations in soil fertility (decreasing south to northeast) and length of the dry season (increasing from northwest to south and east). Here we describe the results of the geographic distribution of leaf size categories based on 121 plots distributed across eight South American countries. We find that, as predicted, the Amazon forest is predominantly populated by tree species and individuals in the mesophyll size class (20.25-182.25 cm2). The geographic distribution of species and individuals with large leaves (>20.25 cm2) is complex but is generally characterized by a higher proportion of such trees in the north-west of the region. Spatially corrected regressions reveal weak correlations between the proportion of large-leaved species and metrics of water availability. We also find a significant negative relationship between leaf size and wood density.
Spatial trends in leaf size of Amazonian rainforest trees
NASA Astrophysics Data System (ADS)
Malhado, A. C. M.; Malhi, Y.; Whittaker, R. J.; Ladle, R. J.; Ter Steege, H.; Phillips, O. L.; Butt, N.; Aragão, L. E. O. C.; Quesada, C. A.; Araujo-Murakami, A.; Arroyo, L.; Peacock, J.; Lopez-Gonzalez, G.; Baker, T. R.; Anderson, L. O.; Almeida, S.; Higuchi, N.; Killeen, T. J.; Monteagudo, A.; Neill, D.; Pitman, N.; Prieto, A.; Salomão, R. P.; Vásquez-Martínez, R.; Laurance, W. F.
2009-08-01
Leaf size influences many aspects of tree function such as rates of transpiration and photosynthesis and, consequently, often varies in a predictable way in response to environmental gradients. The recent development of pan-Amazonian databases based on permanent botanical plots has now made it possible to assess trends in leaf size across environmental gradients in Amazonia. Previous plot-based studies have shown that the community structure of Amazonian trees breaks down into at least two major ecological gradients corresponding with variations in soil fertility (decreasing from southwest to northeast) and length of the dry season (increasing from northwest to south and east). Here we describe the geographic distribution of leaf size categories based on 121 plots distributed across eight South American countries. We find that the Amazon forest is predominantly populated by tree species and individuals in the mesophyll size class (20.25-182.25 cm2). The geographic distribution of species and individuals with large leaves (>20.25 cm2) is complex but is generally characterized by a higher proportion of such trees in the northwest of the region. Spatially corrected regressions reveal weak correlations between the proportion of large-leaved species and metrics of water availability. We also find a significant negative relationship between leaf size and wood density.
Compression technique for large statistical data bases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eggers, S.J.; Olken, F.; Shoshani, A.
1981-03-01
The compression of large statistical databases is explored and are proposed for organizing the compressed data, such that the time required to access the data is logarithmic. The techniques exploit special characteristics of statistical databases, namely, variation in the space required for the natural encoding of integer attributes, a prevalence of a few repeating values or constants, and the clustering of both data of the same length and constants in long, separate series. The techniques are variations of run-length encoding, in which modified run-lengths for the series are extracted from the data stream and stored in a header, which ismore » used to form the base level of a B-tree index into the database. The run-lengths are cumulative, and therefore the access time of the data is logarithmic in the size of the header. The details of the compression scheme and its implementation are discussed, several special cases are presented, and an analysis is given of the relative performance of the various versions.« less
NASA Astrophysics Data System (ADS)
Jung, Chinte; Sun, Chih-Hong
2006-10-01
Motivated by the increasing accessibility of technology, more and more spatial data are being made digitally available. How to extract the valuable knowledge from these large (spatial) databases is becoming increasingly important to businesses, as well. It is essential to be able to analyze and utilize these large datasets, convert them into useful knowledge, and transmit them through GIS-enabled instruments and the Internet, conveying the key information to business decision-makers effectively and benefiting business entities. In this research, we combine the techniques of GIS, spatial decision support system (SDSS), spatial data mining (SDM), and ArcGIS Server to achieve the following goals: (1) integrate databases from spatial and non-spatial datasets about the locations of businesses in Taipei, Taiwan; (2) use the association rules, one of the SDM methods, to extract the knowledge from the integrated databases; and (3) develop a Web-based SDSS GIService as a location-selection tool for business by the product of ArcGIS Server.
NASA Technical Reports Server (NTRS)
Charles, John B.; Richard, Elizabeth E.
2010-01-01
There is currently too little reproducible data for a scientifically valid understanding of the initial responses of a diverse human population to weightlessness and other space flight factors. Astronauts on orbital space flights to date have been extremely healthy and fit, unlike the general human population. Data collection opportunities during the earliest phases of space flights to date, when the most dynamic responses may occur in response to abrupt transitions in acceleration loads, have been limited by operational restrictions on our ability to encumber the astronauts with even minimal monitoring instrumentation. The era of commercial personal suborbital space flights promises the availability of a large (perhaps hundreds per year), diverse population of potential participants with a vested interest in their own responses to space flight factors, and a number of flight providers interested in documenting and demonstrating the attractiveness and safety of the experience they are offering. Voluntary participation by even a fraction of the flying population in a uniform set of unobtrusive biomedical data collections would provide a database enabling statistical analyses of a variety of acute responses to a standardized space flight environment. This will benefit both the space life sciences discipline and the general state of human knowledge.
Computations on Wings With Full-Span Oscillating Control Surfaces Using Navier-Stokes Equations
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.
2013-01-01
A dual-level parallel procedure is presented for computing large databases to support aerospace vehicle design. This procedure has been developed as a single Unix script within the Parallel Batch Submission environment utilizing MPIexec and runs MPI based analysis software. It has been developed to provide a process for aerospace designers to generate data for large numbers of cases with the highest possible fidelity and reasonable wall clock time. A single job submission environment has been created to avoid keeping track of multiple jobs and the associated system administration overhead. The process has been demonstrated for computing large databases for the design of typical aerospace configurations, a launch vehicle and a rotorcraft.
Lightweight genome viewer: portable software for browsing genomics data in its chromosomal context
Faith, Jeremiah J; Olson, Andrew J; Gardner, Timothy S; Sachidanandam, Ravi
2007-01-01
Background Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal. Results lwgv simplifies the visualization of user-generated results on a local computer. The dynamic results of these analyses are written to transient files, which can import static content from a more permanent file. lwgv is currently used in many different applications, from whole genome browsers to single-gene RNAi design visualization, demonstrating its applicability in a large variety of contexts and scales. Conclusion lwgv provides a lightweight alternative to large genome browsers for visualizing biological annotations and dynamic analyses in their chromosomal context. It is particularly suited for applications ranging from short sequences to medium-sized genomes when the creation and maintenance of a large software and database infrastructure is not necessary or desired. PMID:17877794
Lightweight genome viewer: portable software for browsing genomics data in its chromosomal context.
Faith, Jeremiah J; Olson, Andrew J; Gardner, Timothy S; Sachidanandam, Ravi
2007-09-18
Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal. lwgv simplifies the visualization of user-generated results on a local computer. The dynamic results of these analyses are written to transient files, which can import static content from a more permanent file. lwgv is currently used in many different applications, from whole genome browsers to single-gene RNAi design visualization, demonstrating its applicability in a large variety of contexts and scales. lwgv provides a lightweight alternative to large genome browsers for visualizing biological annotations and dynamic analyses in their chromosomal context. It is particularly suited for applications ranging from short sequences to medium-sized genomes when the creation and maintenance of a large software and database infrastructure is not necessary or desired.
ERIC Educational Resources Information Center
Blau, Ina; Hameiri, Mira
2017-01-01
Digital educational data management has become an integral part of school practices. Accessing school database by teachers, students, and parents from mobile devices promotes data-driven educational interactions based on real-time information. This paper analyses mobile access of educational database in a large sample of 429 schools during an…
JEnsembl: a version-aware Java API to Ensembl data systems.
Paterson, Trevor; Law, Andy
2012-11-01
The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing 'through time' comparative analyses to be performed. Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net).
Open source database of images DEIMOS: extension for large-scale subjective image quality assessment
NASA Astrophysics Data System (ADS)
Vítek, Stanislav
2014-09-01
DEIMOS (Database of Images: Open Source) is an open-source database of images and video sequences for testing, verification and comparison of various image and/or video processing techniques such as compression, reconstruction and enhancement. This paper deals with extension of the database allowing performing large-scale web-based subjective image quality assessment. Extension implements both administrative and client interface. The proposed system is aimed mainly at mobile communication devices, taking into account advantages of HTML5 technology; it means that participants don't need to install any application and assessment could be performed using web browser. The assessment campaign administrator can select images from the large database and then apply rules defined by various test procedure recommendations. The standard test procedures may be fully customized and saved as a template. Alternatively the administrator can define a custom test, using images from the pool and other components, such as evaluating forms and ongoing questionnaires. Image sequence is delivered to the online client, e.g. smartphone or tablet, as a fully automated assessment sequence or viewer can decide on timing of the assessment if required. Environmental data and viewing conditions (e.g. illumination, vibrations, GPS coordinates, etc.), may be collected and subsequently analyzed.
Toward Phase IV, Populating the WOVOdat Database
NASA Astrophysics Data System (ADS)
Ratdomopurbo, A.; Newhall, C. G.; Schwandner, F. M.; Selva, J.; Ueda, H.
2009-12-01
One of challenges for volcanologists is the fact that more and more people are likely to live on volcanic slopes. Information about volcanic activity during unrest should be accurate and rapidly distributed. As unrest may lead to eruption, evacuation may be necessary to minimize damage and casualties. The decision to evacuate people is usually based on the interpretation of monitoring data. Over the past several decades, monitoring volcanoes has used more and more sophisticated instruments. A huge volume of data is collected in order to understand the state of activity and behaviour of a volcano. WOVOdat, The World Organization of Volcano Observatories (WOVO) Database of Volcanic Unrest, will provide context within which scientists can interpret the state of their own volcano, during and between crises. After a decision during the 2000 IAVCEI General Assembly to create WOVOdat, development has passed through several phases, from Concept Development (Phase-I in 2000-2002), Database Design (Phase-II, 2003-2006) and Pilot Testing (Phase-III in 2007-2008). For WOVOdat to be operational, there are still two (2) steps to complete, which are: Database Population (Phase-IV) and Enhancement and Maintenance (Phase-V). Since January 2009, the WOVOdat project is hosted by Earth Observatory of Singapore for at least a 5-year period. According to the original planning in 2002, this 5-year period will be used for completing the Phase-IV. As the WOVOdat design is not yet tested for all types of data, 2009 is still reserved for building the back-end relational database management system (RDBMS) of WOVOdat and testing it with more complex data. Fine-tuning of the WOVOdat’s RDBMS design is being done with each new upload of observatory data. The next and main phase of WOVOdat development will be data population, managing data transfer from multiple observatory formats to WOVOdat format. Data population will depend on two important things, the availability of SQL database in volcano observatories and their data sharing policy. Hence, a strong collaboration with every WOVO observatory is important. For some volcanoes where the data are not in an SQL system, the WOVOdat project will help scientists working on the volcano to start building an SQL database.
Use of the Physiotherapy Evidence Database (PEDro) in Japan
TAKASAKI, Hiroshi; ELKINS, Mark R.; MOSELEY, Anne M.
2016-01-01
Background: The Physiotherapy Evidence Database (PEDro) may help users to overcome some obstacles to evidence-based physiotherapy. Understanding the extent to which Japanese physiotherapists access research evidence via the PEDro website may suggest strategies to enhance evidence-based physiotherapy in Japan. Objectives: To quantify usage of PEDro in Japan, to compare this to usage in other countries, and to examine variations in PEDro usage within Japan. Design: An observational study of PEDro usage with geographic analysis. Methods: Data about visits to the home-page and searches of the database were recorded for 4 years. These data were analysed by each region of the World Confederation for Physical Therapy, each country in the Asia Western Pacific region, and each prefecture in Japan. Results: From 2010 to 2013, users of PEDro made 2.27 million visits to the home-page and ran 6.28 million searches. Usage (ie, number of searches normalised by population) was highest in Europe, followed by North America Carribean, South America, Asia Western Pacific, and Africa. Within the Asia Western Pacific region, population-normalised usage was highest in Australia, then New Zealand and Singapore. Japan ranked 10 among the 26 countries in the region. Within Japan, the highest population-normalised usage was in the Nagano, Kumamoto and Aomori prefectures, which was ten-fold higher usage than in some other prefectures. Conclusions: Although Japan has higher PEDro usage than many other countries in the Asia Western Pacific region, some prefectures had very low usage, suggesting that evidence-based practice may not be being adopted uniformly across Japan. PMID:28289582
Sharp, Sandra M; Bevan, Gwyn; Skinner, Jonathan S; Gottlieb, Daniel J
2014-01-01
Objective To compare the performance of two new approaches to risk adjustment that are free of the influence of observational intensity with methods that depend on diagnoses listed in administrative databases. Setting Administrative data from the US Medicare program for services provided in 2007 among 306 US hospital referral regions. Design Cross sectional analysis. Participants 20% sample of fee for service Medicare beneficiaries residing in one of 306 hospital referral regions in the United States in 2007 (n=5 153 877). Main outcome measures The effect of health risk adjustment on age, sex, and race adjusted mortality and spending rates among hospital referral regions using four indices: the standard Centers for Medicare and Medicaid Services—Hierarchical Condition Categories (HCC) index used by the US Medicare program (calculated from diagnoses listed in Medicare’s administrative database); a visit corrected HCC index (to reduce the effects of observational intensity on frequency of diagnoses); a poverty index (based on US census); and a population health index (calculated using data on incidence of hip fractures and strokes, and responses from a population based annual survey of health from the Centers for Disease Control and Prevention). Results Estimated variation in age, sex, and race adjusted mortality rates across hospital referral regions was reduced using the indices based on population health, poverty, and visit corrected HCC, but increased using the standard HCC index. Most of the residual variation in age, sex, and race adjusted mortality was explained (in terms of weighted R2) by the population health index: R2=0.65. The other indices explained less: R2=0.20 for the visit corrected HCC index; 0.19 for the poverty index, and 0.02 for the standard HCC index. The residual variation in age, sex, race, and price adjusted spending per capita across the 306 hospital referral regions explained by the indices (in terms of weighted R2) were 0.50 for the standard HCC index, 0.21 for the population health index, 0.12 for the poverty index, and 0.07 for the visit corrected HCC index, implying that only a modest amount of the variation in spending can be explained by factors most closely related to mortality. Further, once the HCC index is visit corrected it accounts for almost none of the residual variation in age, sex, and race adjusted spending. Conclusion Health risk adjustment using either the poverty index or the population health index performed substantially better in terms of explaining actual mortality than the indices that relied on diagnoses from administrative databases; the population health index explained the majority of residual variation in age, sex, and race adjusted mortality. Owing to the influence of observational intensity on diagnoses from administrative databases, the standard HCC index over-adjusts for regional differences in spending. Research to improve health risk adjustment methods should focus on developing measures of risk that do not depend on observation influenced diagnoses recorded in administrative databases. PMID:24721838
Interactive Exploration for Continuously Expanding Neuron Databases.
Li, Zhongyu; Metaxas, Dimitris N; Lu, Aidong; Zhang, Shaoting
2017-02-15
This paper proposes a novel framework to help biologists explore and analyze neurons based on retrieval of data from neuron morphological databases. In recent years, the continuously expanding neuron databases provide a rich source of information to associate neuronal morphologies with their functional properties. We design a coarse-to-fine framework for efficient and effective data retrieval from large-scale neuron databases. In the coarse-level, for efficiency in large-scale, we employ a binary coding method to compress morphological features into binary codes of tens of bits. Short binary codes allow for real-time similarity searching in Hamming space. Because the neuron databases are continuously expanding, it is inefficient to re-train the binary coding model from scratch when adding new neurons. To solve this problem, we extend binary coding with online updating schemes, which only considers the newly added neurons and update the model on-the-fly, without accessing the whole neuron databases. In the fine-grained level, we introduce domain experts/users in the framework, which can give relevance feedback for the binary coding based retrieval results. This interactive strategy can improve the retrieval performance through re-ranking the above coarse results, where we design a new similarity measure and take the feedback into account. Our framework is validated on more than 17,000 neuron cells, showing promising retrieval accuracy and efficiency. Moreover, we demonstrate its use case in assisting biologists to identify and explore unknown neurons. Copyright © 2017 Elsevier Inc. All rights reserved.
Filling the gap in functional trait databases: use of ecological hypotheses to replace missing data.
Taugourdeau, Simon; Villerd, Jean; Plantureux, Sylvain; Huguenin-Elie, Olivier; Amiaud, Bernard
2014-04-01
Functional trait databases are powerful tools in ecology, though most of them contain large amounts of missing values. The goal of this study was to test the effect of imputation methods on the evaluation of trait values at species level and on the subsequent calculation of functional diversity indices at community level using functional trait databases. Two simple imputation methods (average and median), two methods based on ecological hypotheses, and one multiple imputation method were tested using a large plant trait database, together with the influence of the percentage of missing data and differences between functional traits. At community level, the complete-case approach and three functional diversity indices calculated from grassland plant communities were included. At the species level, one of the methods based on ecological hypothesis was for all traits more accurate than imputation with average or median values, but the multiple imputation method was superior for most of the traits. The method based on functional proximity between species was the best method for traits with an unbalanced distribution, while the method based on the existence of relationships between traits was the best for traits with a balanced distribution. The ranking of the grassland communities for their functional diversity indices was not robust with the complete-case approach, even for low percentages of missing data. With the imputation methods based on ecological hypotheses, functional diversity indices could be computed with a maximum of 30% of missing data, without affecting the ranking between grassland communities. The multiple imputation method performed well, but not better than single imputation based on ecological hypothesis and adapted to the distribution of the trait values for the functional identity and range of the communities. Ecological studies using functional trait databases have to deal with missing data using imputation methods corresponding to their specific needs and making the most out of the information available in the databases. Within this framework, this study indicates the possibilities and limits of single imputation methods based on ecological hypothesis and concludes that they could be useful when studying the ranking of communities for their functional diversity indices.
Filling the gap in functional trait databases: use of ecological hypotheses to replace missing data
Taugourdeau, Simon; Villerd, Jean; Plantureux, Sylvain; Huguenin-Elie, Olivier; Amiaud, Bernard
2014-01-01
Functional trait databases are powerful tools in ecology, though most of them contain large amounts of missing values. The goal of this study was to test the effect of imputation methods on the evaluation of trait values at species level and on the subsequent calculation of functional diversity indices at community level using functional trait databases. Two simple imputation methods (average and median), two methods based on ecological hypotheses, and one multiple imputation method were tested using a large plant trait database, together with the influence of the percentage of missing data and differences between functional traits. At community level, the complete-case approach and three functional diversity indices calculated from grassland plant communities were included. At the species level, one of the methods based on ecological hypothesis was for all traits more accurate than imputation with average or median values, but the multiple imputation method was superior for most of the traits. The method based on functional proximity between species was the best method for traits with an unbalanced distribution, while the method based on the existence of relationships between traits was the best for traits with a balanced distribution. The ranking of the grassland communities for their functional diversity indices was not robust with the complete-case approach, even for low percentages of missing data. With the imputation methods based on ecological hypotheses, functional diversity indices could be computed with a maximum of 30% of missing data, without affecting the ranking between grassland communities. The multiple imputation method performed well, but not better than single imputation based on ecological hypothesis and adapted to the distribution of the trait values for the functional identity and range of the communities. Ecological studies using functional trait databases have to deal with missing data using imputation methods corresponding to their specific needs and making the most out of the information available in the databases. Within this framework, this study indicates the possibilities and limits of single imputation methods based on ecological hypothesis and concludes that they could be useful when studying the ranking of communities for their functional diversity indices. PMID:24772273
Mobile object retrieval in server-based image databases
NASA Astrophysics Data System (ADS)
Manger, D.; Pagel, F.; Widak, H.
2013-05-01
The increasing number of mobile phones equipped with powerful cameras leads to huge collections of user-generated images. To utilize the information of the images on site, image retrieval systems are becoming more and more popular to search for similar objects in an own image database. As the computational performance and the memory capacity of mobile devices are constantly increasing, this search can often be performed on the device itself. This is feasible, for example, if the images are represented with global image features or if the search is done using EXIF or textual metadata. However, for larger image databases, if multiple users are meant to contribute to a growing image database or if powerful content-based image retrieval methods with local features are required, a server-based image retrieval backend is needed. In this work, we present a content-based image retrieval system with a client server architecture working with local features. On the server side, the scalability to large image databases is addressed with the popular bag-of-word model with state-of-the-art extensions. The client end of the system focuses on a lightweight user interface presenting the most similar images of the database highlighting the visual information which is common with the query image. Additionally, new images can be added to the database making it a powerful and interactive tool for mobile contentbased image retrieval.
Draper, John; Enot, David P; Parker, David; Beckmann, Manfred; Snowdon, Stuart; Lin, Wanchang; Zubair, Hassan
2009-01-01
Background Metabolomics experiments using Mass Spectrometry (MS) technology measure the mass to charge ratio (m/z) and intensity of ionised molecules in crude extracts of complex biological samples to generate high dimensional metabolite 'fingerprint' or metabolite 'profile' data. High resolution MS instruments perform routinely with a mass accuracy of < 5 ppm (parts per million) thus providing potentially a direct method for signal putative annotation using databases containing metabolite mass information. Most database interfaces support only simple queries with the default assumption that molecules either gain or lose a single proton when ionised. In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites. This report describes an annotation strategy that will allow searching based on all potential ionisation products predicted to form during electrospray ionisation (ESI). Results Metabolite 'structures' harvested from publicly accessible databases were converted into a common format to generate a comprehensive archive in MZedDB. 'Rules' were derived from chemical information that allowed MZedDB to generate a list of adducts and neutral loss fragments putatively able to form for each structure and calculate, on the fly, the exact molecular weight of every potential ionisation product to provide targets for annotation searches based on accurate mass. We demonstrate that data matrices representing populations of ionisation products generated from different biological matrices contain a large proportion (sometimes > 50%) of molecular isotopes, salt adducts and neutral loss fragments. Correlation analysis of ESI-MS data features confirmed the predicted relationships of m/z signals. An integrated isotope enumerator in MZedDB allowed verification of exact isotopic pattern distributions to corroborate experimental data. Conclusion We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae. Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data. PMID:19622150
Evaluation of Tsunami Run-Up on Coastal Areas at Regional Scale
NASA Astrophysics Data System (ADS)
González, M.; Aniel-Quiroga, Í.; Gutiérrez, O.
2017-12-01
Tsunami hazard assessment is tackled by means of numerical simulations, giving as a result, the areas flooded by tsunami wave inland. To get this, some input data is required, i.e., the high resolution topobathymetry of the study area, the earthquake focal mechanism parameters, etc. The computational cost of these kinds of simulations are still excessive. An important restriction for the elaboration of large scale maps at National or regional scale is the reconstruction of high resolution topobathymetry on the coastal zone. An alternative and traditional method consists of the application of empirical-analytical formulations to calculate run-up at several coastal profiles (i.e. Synolakis, 1987), combined with numerical simulations offshore without including coastal inundation. In this case, the numerical simulations are faster but some limitations are added as the coastal bathymetric profiles are very simply idealized. In this work, we present a complementary methodology based on a hybrid numerical model, formed by 2 models that were coupled ad hoc for this work: a non-linear shallow water equations model (NLSWE) for the offshore part of the propagation and a Volume of Fluid model (VOF) for the areas near the coast and inland, applying each numerical scheme where they better reproduce the tsunami wave. The run-up of a tsunami scenario is obtained by applying the coupled model to an ad-hoc numerical flume. To design this methodology, hundreds of worldwide topobathymetric profiles have been parameterized, using 5 parameters (2 depths and 3 slopes). In addition, tsunami waves have been also parameterized by their height and period. As an application of the numerical flume methodology, the coastal parameterized profiles and tsunami waves have been combined to build a populated database of run-up calculations. The combination was tackled by means of numerical simulations in the numerical flume The result is a tsunami run-up database that considers real profiles shape, realistic tsunami waves, and optimized numerical simulations. This database allows the calculation of the run-up of any new tsunami wave by interpolation on the database, in a short period of time, based on the tsunami wave characteristics provided as an output of the NLSWE model along the coast at a large scale domain (regional or National scale).
Epidemiological Characteristics of Male Sexual Assault in a Criminological Database
ERIC Educational Resources Information Center
Choudhary, Ekta; Gunzler, Douglas; Tu, Xin; Bossarte, Robert M.
2012-01-01
Sexual assault among males, compared with females, is understudied, and may also be significantly underreported. Past studies have relied primarily on population-based survey data to estimate the prevalence of sexual assault and associated health outcomes. However, survey-based studies rely primarily on self-reports of victimization and may not…
The utilization of neural nets in populating an object-oriented database
NASA Technical Reports Server (NTRS)
Campbell, William J.; Hill, Scott E.; Cromp, Robert F.
1989-01-01
Existing NASA supported scientific data bases are usually developed, managed and populated in a tedious, error prone and self-limiting way in terms of what can be described in a relational Data Base Management System (DBMS). The next generation Earth remote sensing platforms (i.e., Earth Observation System, (EOS), will be capable of generating data at a rate of over 300 Mbs per second from a suite of instruments designed for different applications. What is needed is an innovative approach that creates object-oriented databases that segment, characterize, catalog and are manageable in a domain-specific context and whose contents are available interactively and in near-real-time to the user community. Described here is work in progress that utilizes an artificial neural net approach to characterize satellite imagery of undefined objects into high-level data objects. The characterized data is then dynamically allocated to an object-oriented data base where it can be reviewed and assessed by a user. The definition, development, and evolution of the overall data system model are steps in the creation of an application-driven knowledge-based scientific information system.
Arntzen, Magnus Ø; Thiede, Bernd
2012-02-01
Apoptosis is the most commonly described form of programmed cell death, and dysfunction is implicated in a large number of human diseases. Many quantitative proteome analyses of apoptosis have been performed to gain insight in proteins involved in the process. This resulted in large and complex data sets that are difficult to evaluate. Therefore, we developed the ApoptoProteomics database for storage, browsing, and analysis of the outcome of large scale proteome analyses of apoptosis derived from human, mouse, and rat. The proteomics data of 52 publications were integrated and unified with protein annotations from UniProt-KB, the caspase substrate database homepage (CASBAH), and gene ontology. Currently, more than 2300 records of more than 1500 unique proteins were included, covering a large proportion of the core signaling pathways of apoptosis. Analysis of the data set revealed a high level of agreement between the reported changes in directionality reported in proteomics studies and expected apoptosis-related function and may disclose proteins without a current recognized involvement in apoptosis based on gene ontology. Comparison between induction of apoptosis by the intrinsic and the extrinsic apoptotic signaling pathway revealed slight differences. Furthermore, proteomics has significantly contributed to the field of apoptosis in identifying hundreds of caspase substrates. The database is available at http://apoptoproteomics.uio.no.
Arntzen, Magnus Ø.; Thiede, Bernd
2012-01-01
Apoptosis is the most commonly described form of programmed cell death, and dysfunction is implicated in a large number of human diseases. Many quantitative proteome analyses of apoptosis have been performed to gain insight in proteins involved in the process. This resulted in large and complex data sets that are difficult to evaluate. Therefore, we developed the ApoptoProteomics database for storage, browsing, and analysis of the outcome of large scale proteome analyses of apoptosis derived from human, mouse, and rat. The proteomics data of 52 publications were integrated and unified with protein annotations from UniProt-KB, the caspase substrate database homepage (CASBAH), and gene ontology. Currently, more than 2300 records of more than 1500 unique proteins were included, covering a large proportion of the core signaling pathways of apoptosis. Analysis of the data set revealed a high level of agreement between the reported changes in directionality reported in proteomics studies and expected apoptosis-related function and may disclose proteins without a current recognized involvement in apoptosis based on gene ontology. Comparison between induction of apoptosis by the intrinsic and the extrinsic apoptotic signaling pathway revealed slight differences. Furthermore, proteomics has significantly contributed to the field of apoptosis in identifying hundreds of caspase substrates. The database is available at http://apoptoproteomics.uio.no. PMID:22067098
Bourke, Jenny; Wong, Kingsley; Leonard, Helen
2018-01-23
To investigate how well intellectual disability (ID) can be ascertained using hospital morbidity data compared with a population-based data source. All children born in 1983-2010 with a hospital admission in the Western Australian Hospital Morbidity Data System (HMDS) were linked with the Western Australian Intellectual Disability Exploring Answers (IDEA) database. The International Classification of Diseases hospital codes consistent with ID were also identified. The characteristics of those children identified with ID through either or both sources were investigated. Of the 488 905 individuals in the study, 10 218 (2.1%) were identified with ID in either IDEA or HMDS with 1435 (14.0%) individuals identified in both databases, 8305 (81.3%) unique to the IDEA database and 478 (4.7%) unique to the HMDS dataset only. Of those unique to the HMDS dataset, about a quarter (n=124) had died before 1 year of age and most of these (75%) before 1 month. Children with ID who were also coded as such in the HMDS data were more likely to be aged under 1 year, female, non-Aboriginal and have a severe level of ID, compared with those not coded in the HMDS data. The sensitivity of using HMDS to identify ID was 14.7%, whereas the specificity was much higher at 99.9%. Hospital morbidity data are not a reliable source for identifying ID within a population, and epidemiological researchers need to take these findings into account in their study design. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Bourke, Jenny; Wong, Kingsley
2018-01-01
Objectives To investigate how well intellectual disability (ID) can be ascertained using hospital morbidity data compared with a population-based data source. Design, setting and participants All children born in 1983–2010 with a hospital admission in the Western Australian Hospital Morbidity Data System (HMDS) were linked with the Western Australian Intellectual Disability Exploring Answers (IDEA) database. The International Classification of Diseases hospital codes consistent with ID were also identified. Main outcome measures The characteristics of those children identified with ID through either or both sources were investigated. Results Of the 488 905 individuals in the study, 10 218 (2.1%) were identified with ID in either IDEA or HMDS with 1435 (14.0%) individuals identified in both databases, 8305 (81.3%) unique to the IDEA database and 478 (4.7%) unique to the HMDS dataset only. Of those unique to the HMDS dataset, about a quarter (n=124) had died before 1 year of age and most of these (75%) before 1 month. Children with ID who were also coded as such in the HMDS data were more likely to be aged under 1 year, female, non-Aboriginal and have a severe level of ID, compared with those not coded in the HMDS data. The sensitivity of using HMDS to identify ID was 14.7%, whereas the specificity was much higher at 99.9%. Conclusion Hospital morbidity data are not a reliable source for identifying ID within a population, and epidemiological researchers need to take these findings into account in their study design. PMID:29362262
Mining moving object trajectories in location-based services for spatio-temporal database update
NASA Astrophysics Data System (ADS)
Guo, Danhuai; Cui, Weihong
2008-10-01
Advances in wireless transmission and mobile technology applied to LBS (Location-based Services) flood us with amounts of moving objects data. Vast amounts of gathered data from position sensors of mobile phones, PDAs, or vehicles hide interesting and valuable knowledge and describe the behavior of moving objects. The correlation between temporal moving patterns of moving objects and geo-feature spatio-temporal attribute was ignored, and the value of spatio-temporal trajectory data was not fully exploited too. Urban expanding or frequent town plan change bring about a large amount of outdated or imprecise data in spatial database of LBS, and they cannot be updated timely and efficiently by manual processing. In this paper we introduce a data mining approach to movement pattern extraction of moving objects, build a model to describe the relationship between movement patterns of LBS mobile objects and their environment, and put up with a spatio-temporal database update strategy in LBS database based on trajectories spatiotemporal mining. Experimental evaluation reveals excellent performance of the proposed model and strategy. Our original contribution include formulation of model of interaction between trajectory and its environment, design of spatio-temporal database update strategy based on moving objects data mining, and the experimental application of spatio-temporal database update by mining moving objects trajectories.
Viral Genome DataBase: storing and analyzing genes and proteins from complete viral genomes.
Hiscock, D; Upton, C
2000-05-01
The Viral Genome DataBase (VGDB) contains detailed information of the genes and predicted protein sequences from 15 completely sequenced genomes of large (&100 kb) viruses (2847 genes). The data that is stored includes DNA sequence, protein sequence, GenBank and user-entered notes, molecular weight (MW), isoelectric point (pI), amino acid content, A + T%, nucleotide frequency, dinucleotide frequency and codon use. The VGDB is a mySQL database with a user-friendly JAVA GUI. Results of queries can be easily sorted by any of the individual parameters. The software and additional figures and information are available at http://athena.bioc.uvic.ca/genomes/index.html .
NASA Astrophysics Data System (ADS)
Huang, Pei; Wu, Sangyun; Feng, Aiping; Guo, Yacheng
2008-10-01
As littoral areas in possession of concentrated population, abundant resources, developed industry and active economy, the coastal areas are bound to become the forward positions and supported regions for marine exploitation. In the 21st century, the pressure that coastal zones are faced with is as follows: growth of population and urbanization, rise of sea level and coastal erosion, shortage of freshwater resource and deterioration of water resource, and degradation of fishery resource and so on. So the resources of coastal zones should be programmed and used reasonably for the sustainable development of economy and environment. This paper proposes a design research on the construction of coastal zone planning and management information system based on GIS and database technologies. According to this system, the planning results of coastal zones could be queried and displayed expediently through the system interface. It is concluded that the integrated application of GIS and database technologies provides a new modern method for the management of coastal zone resources, and makes it possible to ensure the rational development and utilization of the coastal zone resources, along with the sustainable development of economy and environment.
1992-12-01
Tutorial on Their Data Sharing," The International Journal on Very Large Data Bases (VLDB Journal ), Vol. 1, No. 1, July 1992. Hsiao, D. K., "Federated...Databases and Systems: A Tutorial on Their Resource Consolidation," The International Journal on Very Large Data Bases (VLDB Journal ), Vol. 1, No. 2...Game: Normal Approximation," accepted extensions of games and considers for publication by International possible applications. Journal of Game Theory
Interaction Domains and Suicide: A Population-Based Panel Study of Suicides in Stockholm, 1991-1999
ERIC Educational Resources Information Center
Hedstrom, Peter; Liu, Ka-Yuet; Nordvik, Monica K.
2008-01-01
This article examines how suicides influence suicide risks of others within two interaction domains: the family and the workplace. A distinction is made between dyad-based social-interaction effects and degree-based exposure effects. A unique database including all individuals who ever lived in Stockholm during the 1990s is analyzed. For about 5.6…
Lander, Rebecca L.; Hambidge, K. Michael; Krebs, Nancy F.; Westcott, Jamie E.; Garces, Ana; Figueroa, Lester; Tejeda, Gabriela; Lokangaka, Adrien; Diba, Tshilenge S.; Somannavar, Manjunath S.; Honnayya, Ranjitha; Ali, Sumera A.; Khan, Umber S.; McClure, Elizabeth M.; Thorsten, Vanessa R.; Stolka, Kristen B.
2017-01-01
ABSTRACT Background: Our aim was to utilize a feasible quantitative methodology to estimate the dietary adequacy of >900 first-trimester pregnant women in poor rural areas of the Democratic Republic of the Congo, Guatemala, India and Pakistan. This paper outlines the dietary methods used. Methods: Local nutritionists were trained at the sites by the lead study nutritionist and received ongoing mentoring throughout the study. Training topics focused on the standardized conduct of repeat multiple-pass 24-hr dietary recalls, including interview techniques, estimation of portion sizes, and construction of a unique site-specific food composition database (FCDB). Each FCDB was based on 13 food groups and included values for moisture, energy, 20 nutrients (i.e. macro- and micronutrients), and phytate (an anti-nutrient). Nutrient values for individual foods or beverages were taken from recently developed FAO-supported regional food composition tables or the USDA national nutrient database. Appropriate adjustments for differences in moisture and application of nutrient retention and yield factors after cooking were applied, as needed. Generic recipes for mixed dishes consumed by the study population were compiled at each site, followed by calculation of a median recipe per 100 g. Each recipe’s nutrient values were included in the FCDB. Final site FCDB checks were planned according to FAO/INFOODS guidelines. Discussion: This dietary strategy provides the opportunity to assess estimated mean group usual energy and nutrient intakes and estimated prevalence of the population ‘at risk’ of inadequate intakes in first-trimester pregnant women living in four low- and middle-income countries. While challenges and limitations exist, this methodology demonstrates the practical application of a quantitative dietary strategy for a large international multi-site nutrition trial, providing within- and between-site comparisons. Moreover, it provides an excellent opportunity for local capacity building and each site FCDB can be easily modified for additional research activities conducted in other populations living in the same area. PMID:28469549
NASA Astrophysics Data System (ADS)
Griffin, W. L.; Fisher, N. I.; Friedman, J. H.; O'Reilly, Suzanne Y.; Ryan, C. G.
2002-12-01
Three novel statistical approaches (Cluster Analysis by Regressive Partitioning [CARP], Patient Rule Induction Method [PRIM], and ModeMap) have been used to define compositional populations within a large database (n > 13,000) of Cr-pyrope garnets from the subcontinental lithospheric mantle (SCLM). The variables used are the major oxides and proton-microprobe data for Zn, Ga, Sr, Y, and Zr. Because the rules defining these populations (classes) are expressed in simple compositional variables, they are easily applied to new samples and other databases. The classes defined by the three methods show strong similarities and correlations, suggesting that they are statistically meaningful. The geological significance of the classes has been tested by classifying garnets from 184 mantle-derived peridotite xenoliths and from a smaller database (n > 5400) of garnets analyzed for >20 trace elements by laser ablation microprobe-inductively coupled plasma-mass spectrometry (LAM-ICPMS). The relative abundances of these classes in the lithospheric mantle vary widely across different tectonic settings, and some classes are absent or very rare in either Archean or Phanerozoic SCLM. Their distribution with depth also varies widely within individual lithospheric sections and between different sections of similar tectonothermal age. These garnet classes therefore are a useful tool for mapping the geology of the SCLM. Archean SCLM sections show high degrees of depletion and varying degrees of metasomatism, and they are commonly strongly layered. Several Proterozoic SCLM sections show a concentration of more depleted material near their base, grading upward into more fertile lherzolites. The distribution of garnet classes reflecting low-T phlogopite-related metasomatism and high-T melt-related metasomatism suggests that many of these Proterozoic SCLM sections consist of strongly metasomatized Archean SCLM. The garnet-facies SCLM beneath Phanerozoic terrains is only mildly depleted relative to Primitive Upper Mantle (PUM) compositions. These data emphasize the secular evolution of SCLM composition defined earlier [Griffin et al., 1998, 1999a] and suggest that at least part of this evolutionary trend reflects reworking and refertilization of SCLM formed in the Archean time.
Familial link of otitis media requiring tympanostomy tubes.
Padia, Reema; Alt, Jeremiah A; Curtin, Karen; Muntz, Harlan R; Orlandi, Richard R; Berger, Justin; Meier, Jeremy D
2017-04-01
Placement of tympanostomy tubes for recurrent or chronic otitis media is the most commonly performed ambulatory procedure in the United States. Etiologies have been speculated to be environmentally based, and studies have suggested a genetic component to the disease. However, no large-scale studies have attempted to define a familial component. The objective of this study was to determine the familial risk of otitis media requiring tympanostomy tubes (OMwTT) in a statewide population. Retrospective observational cohort study with population-based matched controls. Using an extensive genealogical database linked to medical records, the familial risk of OMwTT was calculated for relatives of probands (46,249 patients diagnosed with OMwTT from 1996-2013) compared to random population controls matched 5:1 on sex and birth year from logistic regression models. The median age at time of tympanostomy tube placement was 1 year (interquartile range, 0-2 years). First-degree relatives of patients with OMwTT, primarily siblings, had a 5-fold increased risk of OMwTT (P < 10 -16 ). Second-degree relatives were at a 1.5-fold increased risk (P < 10 -15 ). More extended relatives (third, fourth and fifth degree) showed a 1.4-fold increased risk (P < 10 -15 ). In the largest population-based study to date, a significant familial risk is confirmed in OMwTT, suggesting otitis media may have a significant genetic component given the increased risk found in close as well as distant relatives. This could be influenced by shared environments given a five-times risk observed in siblings. Further understanding the genetic basis of OMwTT and its interplay with environmental factors may clarify the etiology and lead to better detection of disease and treatments. 3b. Laryngoscope, 127:962-966, 2017. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.
ALS and the Military: A Population-Based Study in the Danish Registries
Seals, Ryan M.; Kioumourtzoglou, Marianthi-Anna; Gredal, Ole; Hansen, Johnni; Weisskopf, Marc G.
2016-01-01
Background Prior studies have suggested that military service may be associated with the development of amyotrophic lateral sclerosis. We conducted a population-based case-control study in Denmark to assess whether occupation in the Danish military is associated with an increased risk of developing amyotrophic lateral sclerosis. Methods There were 3,650 incident cases of amyotrophic lateral sclerosis recorded in the Danish National Patient Registry between 1982 and 2009. Each case was matched to 100 age- and sex-matched population controls alive and free of amyotrophic lateral sclerosis on the date of the case diagnosis. Comprehensive occupational history was obtained from the Danish Pension Fund database, which began in 1964. Results 2.4% (n=8,922) of controls had a history of employment in the military prior to the index date. Military employees overall had an elevated rate of ALS (OR=1.3; 95% CI: 1.1-1.6). A ten-year increase in years employed by the military was associated with an odds ratio of 1.2 (95% CI: 1.0-1.4), and all quartiles of time employed were elevated. There was little suggestion of a pattern across calendar year of first employment, but there was some evidence that increasing age at first employment was associated with increased ALS rates. Rates were highest in the decade immediately following the end of employment (OR=1.6; 95% CI: 1.2-2.2). Conclusions In this large population-based case-control study, employment by the military is associated with increased rates of ALS. These findings are consistent with earlier findings that military service or employment may entail exposure to risk factors for ALS. PMID:26583610
Efthimiadis, E N; Afifi, M
1996-01-01
OBJECTIVES: This study examined methods of accessing (for indexing and retrieval purposes) medical research on population groups in the major abstracting and indexing services of the health sciences literature. DESIGN: The study of diseases in specific population groups is facilitated by the indexing of both diseases and populations in a database. The MEDLINE, PsycINFO, and Embase databases were selected for the study. The published thesauri for these databases were examined to establish the vocabulary in use. Indexing terms were identified and examined as to their representation in the current literature. Terms were clustered further into groups thought to reflect an end user's perspective and to facilitate subsequent analysis. The medical literature contained in the three online databases was searched with both controlled vocabulary and natural language terms. RESULTS: The three thesauri revealed shallow pre-coordinated hierarchical structures, rather difficult-to-use terms for post-coordination, and a blurring of cultural, genetic, and racial facets of populations. Post-coordination is difficult because of the system-oriented terminology, which is intended mostly for information professionals. The terminology unintentionally restricts access by the end users who lack the knowledge needed to use the thesauri effectively for information retrieval. CONCLUSIONS: Population groups are not represented adequately in the index languages of health sciences databases. Users of these databases need to be alerted to the difficulties that may be encountered in searching for information on population groups. Information and health professionals may not be able to access the literature if they are not familiar with the indexing policies on population groups. Consequently, the study points to a problem that needs to be addressed, through either the redesign of existing systems or the design of new ones to meet the goals of Healthy People 2000 and beyond. PMID:8883987
G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases.
Wang, Xiaohong; Smalter, Aaron; Huan, Jun; Lushington, Gerald H
2009-01-01
Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML documents, among others.Most of the current graph indexing methods focus on subgraph query processing, i.e. determining the set of database graphs that contains the query graph and hence do not directly support similarity search. In data mining and machine learning, various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models for supervised learning, graph kernel functions have (i) high computational complexity and (ii) non-trivial difficulty to be indexed in a graph database.Our objective is to bridge graph kernel function and similarity search in graph databases by proposing (i) a novel kernel-based similarity measurement and (ii) an efficient indexing structure for graph data management. Our method of similarity measurement builds upon local features extracted from each node and their neighboring nodes in graphs. A hash table is utilized to support efficient storage and fast search of the extracted local features. Using the hash table, a graph kernel function is defined to capture the intrinsic similarity of graphs and for fast similarity query processing. We have implemented our method, which we have named G-hash, and have demonstrated its utility on large chemical graph databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Most importantly, the new similarity measurement and the index structure is scalable to large database with smaller indexing size, faster indexing construction time, and faster query processing time as compared to state-of-the-art indexing methods such as C-tree, gIndex, and GraphGrep.