Options for accessing datasets for incidence, mortality, county populations, standard populations, expected survival, and SEER-linked and specialized data. Plus variable definitions, documentation for reporting and using datasets, statistical software (SEER*Stat), and observational research resources.
Standard Populations (Millions) for Age-Adjustment - SEER Population Datasets
Download files containing standard population data for use in statististical software. The files contain the same data distributed with SEER*Stat software. You can also view the standard populations, either 19 age groups or single ages.
Datasets for U.S. mortality, U.S. populations, standard populations, county attributes, and expected survival. Plus SEER-linked databases (SEER-Medicare, SEER-Medicare Health Outcomes Survey [SEER-MHOS], SEER-Consumer Assessment of Healthcare Providers and Systems [SEER-CAHPS]).
If you have access to SEER Research Data, use SEER*Stat to analyze SEER and other cancer-related databases. View individual records and produce statistics including incidence, mortality, survival, prevalence, and multiple primary. Tutorials and related analytic software tools are available.
SEER Linked Databases - SEER Datasets
SEER-Medicare database of elderly persons with cancer is useful for epidemiologic and health services research. SEER-MHOS has health-related quality of life information about elderly persons with cancer. SEER-CAHPS database has clinical, survey, and health services information on people with cancer.
Variable & Recode Definitions - SEER Documentation
Resources that define variables and provide documentation for reporting using SEER and related datasets. Choose from SEER coding and staging manuals plus instructions for recoding behavior, site, stage, cause of death, insurance, and several additional topics. Also guidance on months survived, calculating Hispanic mortality, and site-specific surgery.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Health Disparities Calculator (HD*Calc) - SEER Software
Statistical software that generates summary measures to evaluate and monitor health disparities. Users can import SEER data or other population-based health data to calculate 11 disparity measurements.
Software Used to Generate Cancer Statistics - SEER Cancer Statistics
Videos that highlight topics and trends in cancer statistics and definitions of statistical terms. Also software tools for analyzing and reporting cancer statistics, which are used to compile SEER's annual reports.
SHARE: system design and case studies for statistical health information release
Gardner, James; Xiong, Li; Xiao, Yonghui; Gao, Jingjing; Post, Andrew R; Jiang, Xiaoqian; Ohno-Machado, Lucila
2013-01-01
Objectives We present SHARE, a new system for statistical health information release with differential privacy. We present two case studies that evaluate the software on real medical datasets and demonstrate the feasibility and utility of applying the differential privacy framework on biomedical data. Materials and Methods SHARE releases statistical information in electronic health records with differential privacy, a strong privacy framework for statistical data release. It includes a number of state-of-the-art methods for releasing multidimensional histograms and longitudinal patterns. We performed a variety of experiments on two real datasets, the surveillance, epidemiology and end results (SEER) breast cancer dataset and the Emory electronic medical record (EeMR) dataset, to demonstrate the feasibility and utility of SHARE. Results Experimental results indicate that SHARE can deal with heterogeneous data present in medical data, and that the released statistics are useful. The Kullback–Leibler divergence between the released multidimensional histograms and the original data distribution is below 0.5 and 0.01 for seven-dimensional and three-dimensional data cubes generated from the SEER dataset, respectively. The relative error for longitudinal pattern queries on the EeMR dataset varies between 0 and 0.3. While the results are promising, they also suggest that challenges remain in applying statistical data release using the differential privacy framework for higher dimensional data. Conclusions SHARE is one of the first systems to provide a mechanism for custodians to release differentially private aggregate statistics for a variety of use cases in the medical domain. This proof-of-concept system is intended to be applied to large-scale medical data warehouses. PMID:23059729
Software Technology for Adaptable, Reliable Systems (STARS)
1994-03-25
Tmeline(3), SECOMO(3), SEER(3), GSFC Software Engineering Lab Model(l), SLIM(4), SEER-SEM(l), SPQR (2), PRICE-S(2), internally-developed models(3), APMSS(1...3 " Timeline - 3 " SASET (Software Architecture Sizing Estimating Tool) - 2 " MicroMan 11- 2 * LCM (Logistics Cost Model) - 2 * SPQR - 2 * PRICE-S - 2
SEER Abstracting Tool (SEER*Abs)
With this customizable tool, registrars can collect and store data abstracted from medical records. Download the software and find technical support and reference manuals. SEER*Abs has features for creating records, managing abstracting work and data, accessing reference data, and integrating edits.
2017-01-01
Background Population datasets and the Internet are playing an ever-growing role in the way cancer information is made available to providers, patients, and their caregivers. The Surveillance, Epidemiology, and End Results Cancer Survival Calculator (SEER*CSC) is a Web-based cancer prognostic tool that uses SEER data, a large population dataset, to provide physicians with highly valid, evidence-based prognostic estimates for increasing shared decision-making and improving patient-provider communication of complex health information. Objective The aim of this study was to develop, test, and implement SEER*CSC. Methods An iterative approach was used to develop the SEER*CSC. Based on input from cancer patient advocacy groups and physicians, an initial version of the tool was developed. Next, providers from 4 health care delivery systems were recruited to do formal usability testing of SEER*CSC. A revised version of SEER*CSC was then implemented in two health care delivery sites using a real-world clinical implementation approach, and usage data were collected. Post-implementation follow-up interviews were conducted with site champions. Finally, patients from two cancer advocacy groups participated in usability testing. Results Overall feedback of SEER*CSC from both providers and patients was positive, with providers noting that the tool was professional and reliable, and patients finding it to be informational and helpful to use when discussing their diagnosis with their provider. However, use during the small-scale implementation was low. Reasons for low usage included time to enter data, not having treatment options in the tool, and the tool not being incorporated into the electronic health record (EHR). Patients found the language in its current version to be too complex. Conclusions The implementation and usability results showed that participants were enthusiastic about the use and features of SEER*CSC, but sustained implementation in a real-world clinical setting faced significant challenges. As a result of these findings, SEER*CSC is being redesigned with more accessible language for a public facing release. Meta-tools, which put different tools in context of each other, are needed to assist in understanding the strengths and limitations of various tools and their place in the clinical decision-making pathway. The continued development and eventual release of prognostic tools should include feedback from multidisciplinary health care teams, various stakeholder groups, patients, and caregivers. PMID:28729232
Deriving the Cost of Software Maintenance for Software Intensive Systems
2011-08-29
more of software maintenance). Figure 4. SEER-SEM Maintenance Effort by Year Report (Reifer, Allen, Fersch, Hitchings, Judy , & Rosa, 2010...understand the linear relationship between two variables. The formula for the simple Pearson product-moment correlation is represented in Equation 5...standardization is required across the software maintenance community in order to ensure that the data being recorded can be employed beyond the agency or
Access tools for coding Extent of Disease 2018, plus Summary Staging Manual 2000, resources for comparison and mapping between staging systems, UICC information, and Collaborative Stage instructions and software.
Thomas, Kali S; Boyd, Eric; Mariotto, Angela B; Penn, Dolly C; Barrett, Michael J; Warren, Joan L
2018-02-02
The Surveillance, Epidemiology and End Results (SEER)-Medicare data combine clinical information from population-based cancer registries with Medicare claims. These data have been used in many studies to understand cancer screening, treatment, outcomes, and costs. However, until recently, these data included limited information related to the characteristics and outcomes of cancer patients residing in or admitted to nursing homes. To provide an overview of the new linkage between SEER-Medicare data and the Minimum Data Set (MDS), a nursing home resident assessment instrument detailing residents' physical, psychological, and psychosocial functioning as well as any therapies or treatments received. This is a descriptive, retrospective cohort study. Persons in SEER-Medicare diagnosed with cancer from 2004 to 2013 were linked to the 2011-2014 MDS, with 17% of SEER-Medicare patients linked to the MDS data. During 2011-2014, we identified 318,617 cancer patients receiving care in a nursing home and 256,947 cancer patients newly admitted to a total of 10,953 nursing homes. Of these patients, approximately two thirds were Medicare fee-for-service beneficiaries. The timing from cancer diagnoses to nursing home admission varied by cancer. In total, 93% of all patients were admitted directly to a nursing home from an acute care hospital. The majority of patients were cognitively intact, 21% reported some level of depression, and 9% had severe functional limitations. The new SEER-Medicare-MDS dataset provides a valuable resource for understanding the postacute and long-term care experiences of cancer patients receiving care in United States' nursing homes.
A Comparison of Software Schedule Estimators
1990-09-01
SLIM ...................................... 33 SPQR /20 ................................... 35 System -4 .................................... 37 Previous...24 3. PRICE-S Outputs ..................................... 26 4. COCOMO Factors by Category ........................... 28 5. SPQR /20 Activities...actual schedules experienced on the projects. The models analyzed were REVIC, PRICE-S, System-4, SPQR /20, and SEER. ix A COMPARISON OF SOFTWARE
Comparing trends in cancer rates across overlapping regions.
Li, Yi; Tiwari, Ram C
2008-12-01
Monitoring and comparing trends in cancer rates across geographic regions or over different time periods have been major tasks of the National Cancer Institute's (NCI) Surveillance, Epidemiology, and End Results (SEER) Program as it profiles healthcare quality as well as decides healthcare resource allocations within a spatial-temporal framework. A fundamental difficulty, however, arises when such comparisons have to be made for regions or time intervals that overlap, for example, comparing the change in trends of mortality rates in a local area (e.g., the mortality rate of breast cancer in California) with a more global level (i.e., the national mortality rate of breast cancer). In view of sparsity of available methodologies, this article develops a simple corrected Z-test that accounts for such overlapping. The performance of the proposed test over the two-sample "pooled"t-test that assumes independence across comparison groups is assessed via the Pitman asymptotic relative efficiency as well as Monte Carlo simulations and applications to the SEER cancer data. The proposed test will be important for the SEER * STAT software, maintained by the NCI, for the analysis of the SEER data.
Dong, Xing; Zhang, Kevin; Ren, Yuan; Wilson, Reda; O'Neil, Mary Elizabeth
2016-01-01
Studying population-based cancer survival by leveraging the high-quality cancer incidence data collected by the Centers for Disease Control and Prevention's National Program of Cancer Registries (NPCR) can offer valuable insight into the cancer burden and impact in the United States. We describe the development and validation of a SASmacro tool that calculates population-based cancer site-specific relative survival estimates comparable to those obtained through SEER*Stat. The NPCR relative survival analysis SAS tool (NPCR SAS tool) was developed based on the relative survival method and SAS macros developed by Paul Dickman. NPCR cancer incidence data from 25 states submitted in November 2012 were used, specifically cases diagnosed from 2003 to 2010 with follow-up through 2010. Decennial and annual complete life tables published by the National Center for Health Statistics (NCHS) for 2000 through 2009 were used. To assess comparability between the 2 tools, 5-year relative survival rates were calculated for 25 cancer sites by sex, race, and age group using the NPCR SAS tool and the National Cancer Institute's SEER*Stat 8.1.5 software. A module to create data files for SEER*Stat was also developed for the NPCR SAS tool. Comparison of the results produced by both SAS and SEER*Stat showed comparable and reliable relative survival estimates for NPCR data. For a majority of the sites, the net differences between the NPCR SAS tool and SEER*Stat-produced relative survival estimates ranged from -0.1% to 0.1%. The estimated standard errors were highly comparable between the 2 tools as well. The NPCR SAS tool will allow researchers to accurately estimate cancer 5-year relative survival estimates that are comparable to those produced by SEER*Stat for NPCR data. Comparison of output from the NPCR SAS tool and SEER*Stat provided additional quality control capabilities for evaluating data prior to producing NPCR relative survival estimates.
Healthcare experience among older cancer survivors: Analysis of the SEER-CAHPS dataset.
Halpern, Michael T; Urato, Matthew P; Lines, Lisa M; Cohen, Julia B; Arora, Neeraj K; Kent, Erin E
2018-05-01
Little is known about factors affecting medical care experiences of cancer survivors. This study examined experience of care among cancer survivors and assessed associations of survivors' characteristics with their experience. We used a newly-developed, unique data resource, SEER-CAHPS (NCI's Surveillance Epidemiology and End Results [SEER] data linked to Medicare Consumer Assessment of Healthcare Providers and Systems [CAHPS] survey responses), to examine experiences of care among breast, colorectal, lung, and prostate cancer survivors age >66years who completed CAHPS >1year after cancer diagnosis and survived ≥1year after survey completion. Experience of care was assessed by survivor-provided scores for overall care, health plan, physicians, customer service, doctor communication, and aspects of care. Multivariable logistic regression models assessed associations of survivors' sociodemographic and clinical characteristics with care experience. Among 19,455 cancer survivors with SEER-CAHPS data, higher self-reported general-health status was significantly associated with better care experiences for breast, colorectal, and prostate cancer survivors. In contrast, better mental-health status was associated with better care experience for lung cancer survivors. College-educated and Asian survivors were less likely to indicate high scores for care experiences. Few differences in survivors' experiences were observed by sex or years since diagnosis. The SEER-CAHPS data resources allows assessment of factors influencing experience of cancer among U.S. cancer survivors. Higher self-reported health status was associated with better experiences of care; other survivors' characteristics also predicted care experience. Interventions to improve cancer survivors' health status, such as increased access to supportive care services, may improve experience of care. Copyright © 2017 Elsevier Ltd. All rights reserved.
Valsangkar, Nakul P; Bush, Devon M; Michaelson, James S; Ferrone, Cristina R; Wargo, Jennifer A; Lillemoe, Keith D; Fernández-del Castillo, Carlos; Warshaw, Andrew L; Thayer, Sarah P
2013-02-01
We evaluated the prognostic accuracy of LN variables (N0/N1), numbers of positive lymph nodes (PLN), and lymph node ratio (LNR) in the context of the total number of examined lymph nodes (ELN). Patients from SEER and a single institution (MGH) were reviewed and survival analyses performed in subgroups based on numbers of ELN to calculate excess risk of death (hazard ratio, HR). In SEER and MGH, higher numbers of ELN improved the overall survival for N0 patients. The prognostic significance (N0/N1) and PLN were too variable as the importance of a single PLN depended on the total number of LN dissected. LNR consistently correlated with survival once a certain number of lymph nodes were dissected (≥13 in SEER and ≥17 in the MGH dataset). Better survival for N0 patients with increasing ELN likely represents improved staging. PLN have some predictive value but the ELN strongly influence their impact on survival, suggesting the need for a ratio-based classification. LNR strongly correlates with outcome provided that a certain number of lymph nodes is evaluated, suggesting that the prognostic accuracy of any LN variable depends on the total number of ELN.
Ask a SEER Registrar - SEER Registrars
First submit questions to your central registry as required and they will submit them to SEER. Use the form on this page to submit questions to SEER staff about coding cancer cases or SEER's reporting guideline materials. Coding and abstracting answers are on SEER Inquiry System website.
Tools & Services - SEER Registrars
View glossary for registrars. Access ICD conversion programs, SEER Abstracting Tool, SEER Data Viewer, SEER interactive drug database for coding oncology drugs, data documentation, variable recodes, and SEER Application Programming Interface for developers.
Disparities in Use of Gynecologic Oncologists for Women with Ovarian Cancer in the United States
Austin, Shamly; Martin, Michelle Y; Kim, Yongin; Funkhouser, Ellen M; Partridge, Edward E; Pisu, Maria
2013-01-01
Objective To examine disparities in utilization of gynecologic oncologists (GOs) across race and other sociodemographic factors for women with ovarian cancer. Data Sources Obtained SEER-Medicare linked dataset for 4,233 non-Hispanic White, non-Hispanic African American, Hispanic of any race, and Non-Hispanic Asian women aged ≥66 years old diagnosed with ovarian cancer during 2000–2002 from 17 SEER registries. Physician specialty was identified by linking data to the AMA master file using Unique Physician Identification Numbers. Study Design Retrospective claims data analysis for 1999–2006. Logistic regression models were used to analyze the association between GO utilization and race/ethnicity in the initial, continuing, and final phases of care. Principal Findings GO use decreased from the initial to final phase of care (51.4–28.8 percent). No racial/ethnic differences were found overall and by phase of cancer care. Women >70 years old and those with unstaged disease were less likely to receive GO care compared to their counterparts. GO use was lower in some SEER registries compared to the Atlanta registry. Conclusions GO use for the initial ovarian cancer treatment or for longer term care was low but not different across racial/ethnic groups. Future research should identify factors that affect GO utilization and understand why use of these specialists remains low. PMID:23206237
Impact of Extent of Surgery on Survival for Papillary Thyroid Cancer Patients Younger Than 45 Years
Abdelgadir Adam, Mohamed; Pura, John; Goffredo, Paolo; Dinan, Michaela A.; Hyslop, Terry; Reed, Shelby D.; Scheri, Randall P.; Sosa, Julie A.
2015-01-01
Context: Papillary thyroid cancer (PTC) patients <45 years old are considered to have an excellent prognosis; however, current guidelines recommend total thyroidectomy for PTC tumors >1.0 cm, regardless of age. Objective: Our objective was to examine the impact of extent of surgery on overall survival (OS) in patients <45 years old with stage I PTC of 1.1 to 4.0 cm. Design, Setting, and Patients: Adult patients <45 years of age undergoing surgery for stage I PTC were identified from the National Cancer Data Base (NCDB, 1998–2006) and the Surveillance, Epidemiology, and End Results dataset (SEER, 1988–2006). Main Outcome Measure: Multivariable modeling was used to compare OS for patients undergoing total thyroidectomy vs lobectomy. Results: In total, 29 522 patients in NCDB (3151 lobectomy, 26 371 total thyroidectomy) and 13 510 in SEER (1379 lobectomy, 12 131 total thyroidectomy) were included. Compared with patients undergoing lobectomy, patients having total thyroidectomy more often had extrathyroidal and lymph node disease. At 14 years, unadjusted OS was equivalent between total thyroidectomy and lobectomy in both databases. After adjustment, OS was similar for total thyroidectomy compared with lobectomy across all patients with tumors of 1.1 to 4.0 cm (NCDB: hazard ratio = 1.45 [confidence interval = 0.88–2.51], P = 0.19; SEER: 0.95 (0.70–1.29), P = 0.75) and when stratified by tumor size: 1.1 to 2.0 cm (NCDB: 1.12 [0.50–2.51], P = 0.78; SEER: 0.95 [0.56–1.62], P = 0.86) and 2.1 to 4.0 cm (NCDB: 1.93 [0.88–4.23], P = 0.10; SEER: 0.94 [0.60–1.49], P = 0.80). Conclusions: After adjusting for patient and clinical characteristics, total thyroidectomy compared with thyroid lobectomy was not associated with improved survival for patients <45 years of age with stage I PTC of 1.1 to 4.0 cm. Additional clinical and pathologic factors should be considered when choosing extent of resection. PMID:25337927
Westwick, Harrison J; Shamji, Mohammed F
2015-09-01
Most spinal meningiomas are intradural lesions in the thoracic spine that present with both local pain and myelopathy. By using the large prospective Surveillance, Epidemiology, and End Results (SEER) database, the authors studied the incidence of spinal meningiomas and examined demographic and treatment factors predictive of death. Using SEER*Stat software, the authors queried the SEER database for cases of spinal meningioma between 2000 and 2010. From the results, tumor incidence and demographic statistics were computed; incidence was analyzed as a function of tumor location, pathology, age, sex, and malignancy code. Survival was analyzed by using a Cox proportional hazards ratio in SPSS for age, sex, marital status, primary site, size quartile, treatment modality, and malignancy code. In this analysis, significance was set at a p value of 0.05. The 1709 spinal meningiomas reported in the SEER database represented 30.7% of all primary intradural spinal tumors and 7.9% of all meningiomas. These meningiomas occurred at an age-adjusted incidence of 0.193 (95% CI 0.183-0.202) per 100,000 population and were closely related to sex (337 [19.7%] male patients and 1372 [80.3%] female patients). The Cox hazard function for mortality in males was higher (2.4 [95% CI1.7-3.5]) and statistically significant, despite the lower lesion incidence in males. All-cause survival was lowest in patients older than 80 years. Primary site and treatment modality were not significant predictors of mortality. Spinal meningiomas represent a significant fraction of all primary intradural spinal tumors and of all meningiomas. The results of this study establish the association of lesion incidence and survival with sex, with a less frequent incidence in but greater mortality among males.
Lynch, Chip M; Abdollahi, Behnaz; Fuqua, Joshua D; de Carlo, Alexandra R; Bartholomai, James A; Balgemann, Rayeanne N; van Berkel, Victor H; Frieboes, Hermann B
2017-12-01
Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time with the ultimate goal to inform patient care decisions, and that the performance of these techniques with this particular dataset may be on par with that of classical methods. Copyright © 2017 Elsevier B.V. All rights reserved.
The SEER Readability Technique: How Practicable is It?
ERIC Educational Resources Information Center
Duffelmeyer, Frederick A.
1982-01-01
Evaluates the practicability of the Singer Eyeball Estimate of Readability (SEER) techniques with 32 college students. Reveals that only two of the students met SEER's criterion for being considered acceptable judges. Concludes that the criterion is overly stringent and proposes a revised criterion designed to make the SEER technique more…
Rule-Based Flight Software Cost Estimation
NASA Technical Reports Server (NTRS)
Stukes, Sherry A.; Spagnuolo, John N. Jr.
2015-01-01
This paper discusses the fundamental process for the computation of Flight Software (FSW) cost estimates. This process has been incorporated in a rule-based expert system [1] that can be used for Independent Cost Estimates (ICEs), Proposals, and for the validation of Cost Analysis Data Requirements (CADRe) submissions. A high-level directed graph (referred to here as a decision graph) illustrates the steps taken in the production of these estimated costs and serves as a basis of design for the expert system described in this paper. Detailed discussions are subsequently given elaborating upon the methodology, tools, charts, and caveats related to the various nodes of the graph. We present general principles for the estimation of FSW using SEER-SEM as an illustration of these principles when appropriate. Since Source Lines of Code (SLOC) is a major cost driver, a discussion of various SLOC data sources for the preparation of the estimates is given together with an explanation of how contractor SLOC estimates compare with the SLOC estimates used by JPL. Obtaining consistency in code counting will be presented as well as factors used in reconciling SLOC estimates from different code counters. When sufficient data is obtained, a mapping into the JPL Work Breakdown Structure (WBS) from the SEER-SEM output is illustrated. For across the board FSW estimates, as was done for the NASA Discovery Mission proposal estimates performed at JPL, a comparative high-level summary sheet for all missions with the SLOC, data description, brief mission description and the most relevant SEER-SEM parameter values is given to illustrate an encapsulation of the used and calculated data involved in the estimates. The rule-based expert system described provides the user with inputs useful or sufficient to run generic cost estimation programs. This system's incarnation is achieved via the C Language Integrated Production System (CLIPS) and will be addressed at the end of this paper.
Malignant pineal germ-cell tumors: an analysis of cases from three tumor registries.
Villano, J Lee; Propp, Jennifer M; Porter, Kimberly R; Stewart, Andrew K; Valyi-Nagy, Tibor; Li, Xinyu; Engelhard, Herbert H; McCarthy, Bridget J
2008-04-01
The exact incidence of pineal germ-cell tumors is largely unknown. The tumors are rare, and the number of patients with these tumors, as reported in clinical series, has been limited. The goal of this study was to describe pineal germ-cell tumors in a large number of patients, using data from available brain tumor databases. Three different databases were used: Surveillance, Epidemiology, and End Results (SEER) database (1973-2001); Central Brain Tumor Registry of the United States (CBTRUS; 1997-2001); and National Cancer Data Base (NCDB; 1985-2003). Tumors were identified using the International Classification of Diseases for Oncology, third edition (ICD-O-3), site code C75.3, and categorized according to histology codes 9060-9085. Data were analyzed using SAS/STAT release 8.2, SEER*Stat version 5.2, and SPSS version 13.0 software. A total of 1,467 cases of malignant pineal germ-cell tumors were identified: 1,159 from NCDB, 196 from SEER, and 112 from CBTRUS. All three databases showed a male predominance for pineal germ-cell tumors (>90%), and >72% of patients were Caucasian. The peak number of cases occurred in the 10- to 14-year age group in the CBTRUS data and in the 15- to 19-year age group in the SEER and NCDB data, and declined significantly thereafter. The majority of tumors (73%-86%) were germinomas, and patients with germinomas had the highest survival rate (>79% at 5 years). Most patients were treated with surgical resection and radiation therapy or with radiation therapy alone. The number of patients included in this study exceeds that of any study published to date. The proportions of malignant pineal germ-cell tumors and intracranial germ-cell tumors are in range with previous studies. Survival rates for malignant pineal germ-cell tumors are lower than results from recent treatment trials for intracranial germ-cell tumors, and patients that received radiation therapy in the treatment plan either with surgery or alone survived the longest.
Albany, C; Adra, N; Snavely, A C; Cary, C; Masterson, T A; Foster, R S; Kesler, K; Ulbright, T M; Cheng, L; Chovanec, M; Taza, F; Ku, K; Brames, M J; Hanna, N H; Einhorn, L H
2018-02-01
To report our experience utilizing a multidisciplinary clinic (MDC) at Indiana University (IU) since the publication of the International Germ Cell Cancer Collaborative Group (IGCCCG), and to compare our overall survival (OS) to that of the National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) Program. We conducted a retrospective analysis of all patients with metastatic germ-cell tumor (GCT) seen at IU from 1998 to 2014. A total of 1611 consecutive patients were identified, of whom 704 patients received an initial evaluation by our MDC (including medical oncology, pathology, urology and thoracic surgery) and started first-line chemotherapy at IU. These 704 patients were eligible for analysis. All patients in this cohort were treated with cisplatin-etoposide-based combination chemotherapy. We compared the progression-free survival (PFS) and OS of patients treated at IU with that of the published IGCCCG cohort. OS of the IU testis cancer primary cohort (n = 622) was further compared with the SEER data of 1283 patients labeled with 'distant' disease. The Kaplan-Meier method was used to estimate PFS and OS. With a median follow-up of 4.4 years, patients with good, intermediate, and poor risk disease by IGCCCG criteria treated at IU had 5-year PFS of 90%, 84%, and 54% and 5-year OS of 97%, 92%, and 73%, respectively. The 5-year PFS for all patients in the IU cohort was 79% [95% confidence interval (CI) 76% to 82%]. The 5-year OS for the IU cohort was 90% (95% CI 87% to 92%). IU testis cohort had 5-year OS 94% (95% CI 91% to 96%) versus 75% (95% CI 73% to 78%) for the SEER 'distant' cohort between 2000 and 2014, P-value <0.0001. The MDC approach to GCT at high-volume cancer center associated with improved OS outcomes in this contemporary dataset. OS is significantly higher in the IU cohort compared with the IGCCCG and SEER 'distant' cohort. © The Author 2017. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Metadata - Surveillance, Epidemiology, and End Results (SEER) Program
The Surveillance, Epidemiology, and End Results (SEER) program is an authoritative source of information on cancer incidence and mortality in the United States. SEER collects and publishes cancer data from a set of 17 population.
Zhu, Ping; Du, Xianglin L; Lu, Guangrong; Zhu, Jay-Jiguang
2017-07-04
Few population-based analyses have investigated survival change in glioblastoma multiforme (GBM) patients treated with concomitant radiotherapy-temozolomide (RT-TMZ) and adjuvant temozolomide (TMZ) and then bevacizumab (BEV) after Food and Drug Administration (FDA) approval, respectively. We aimed to explore the effects on survival with RT-TMZ, adjuvant TMZ and BEV in general GBM population based on the Surveillance, Epidemiology, and End Results (SEER) and Texas Cancer Registry (TCR) databases. A total of 28933 GBM patients from SEER (N = 24578) and TCR (N = 4355) between January 2000 and December 2013 were included. Patients were grouped into three calendar periods based on date of diagnosis: pre-RT-TMZ and pre-BEV (1/2000-2/2005, P1), post-RT-TMZ and pre-BEV (3/2005-4/2009, P2), and post-RT-TMZ and post-BEV (5/2009-12/2013, P3). The association between calendar period of diagnosis and survival was analyzed in SEER and TCR, separately, by the Kaplan-Meier method and Cox proportional hazards model. We found a significant increase in median overall survival (OS) across the three periods in both populations. In multivariate models, the risk of death was significantly reduced during P2 and further decreased in P3, which remained unchanged after stratification. Comparison and validation analysis were performed in the combined dataset, and consistent results were observed. We conclude that the OS of GBM patients in a "real-world" setting has been steadily improved from January 2000 to December 2013, which likely resulted from the administrations of TMZ concomitant with RT and adjuvant TMZ for newly diagnosed GBM and then BEV for recurrent GBM after respective FDA approval.
Goffredo, Paolo; Garancini, Mattia; Robinson, Timothy J; Frakes, Jessica; Hoshi, Hisakazu; Hassan, Imran
2018-06-01
The 8th edition of the American Joint Committee on Cancer (AJCC) updated the staging system of anal squamous cell cancer (ASCC) by subdividing stage II into A (T2N0M0) and B (T3N0M0) based on a secondary analysis of the RTOG 98-11 trial. We aimed to validate this new subclassification utilizing two nationally representative databases. The National Cancer Database (NCDB) [2004-2014] and the Surveillance, Epidemiology, and End Results (SEER) database [1988-2013] were queried to identify patients with stage II ASCC. A total of 6651 and 2579 stage IIA (2-5 cm) and 1777 and 641 stage IIB (> 5 cm) patients were identified in the NCDB and SEER databases, respectively. Compared with stage IIB patients, stage IIA patients within the NCDB were more often females with fewer comorbidities. No significant differences were observed between age, race, receipt of chemotherapy and radiation, and mean radiation dose. Demographic, clinical, and pathologic characteristics were comparable between patients in both datasets. The 5-year OS was 72% and 69% for stage IIA versus 57% and 50% for stage IIB in the NCDB and SEER databases, respectively (p < 0.001). After adjustment for available demographic and clinical confounders, stage IIB was significantly associated with worse survival in both cohorts (hazard ratio 1.58 and 2.01, both p < 0.001). This study validates the new AJCC subclassification of stage II anal cancer into A and B based on size (2-5 cm vs. > 5 cm) in the general ASCC population. AJCC stage IIB patients represent a higher risk category that should be targeted with more aggressive/novel therapies.
Min, Hua; Mobahi, Hedyeh; Irvin, Katherine; Avramovic, Sanja; Wojtusiak, Janusz
2017-09-16
Bio-ontologies are becoming increasingly important in knowledge representation and in the machine learning (ML) fields. This paper presents a ML approach that incorporates bio-ontologies and its application to the SEER-MHOS dataset to discover patterns of patient characteristics that impact the ability to perform activities of daily living (ADLs). Bio-ontologies are used to provide computable knowledge for ML methods to "understand" biomedical data. This retrospective study included 723 cancer patients from the SEER-MHOS dataset. Two ML methods were applied to create predictive models for ADL disabilities for the first year after a patient's cancer diagnosis. The first method is a standard rule learning algorithm; the second is that same algorithm additionally equipped with methods for reasoning with ontologies. The models showed that a patient's race, ethnicity, smoking preference, treatment plan and tumor characteristics including histology, staging, cancer site, and morphology were predictors for ADL performance levels one year after cancer diagnosis. The ontology-guided ML method was more accurate at predicting ADL performance levels (P < 0.1) than methods without ontologies. This study demonstrated that bio-ontologies can be harnessed to provide medical knowledge for ML algorithms. The presented method demonstrates that encoding specific types of hierarchical relationships to guide rule learning is possible, and can be extended to other types of semantic relationships present in biomedical ontologies. The ontology-guided ML method achieved better performance than the method without ontologies. The presented method can also be used to promote the effectiveness and efficiency of ML in healthcare, in which use of background knowledge and consistency with existing clinical expertise is critical.
SEER Cancer Registry Biospecimen Research: Yesterday and Tomorrow
Altekruse, Sean F.; Rosenfeld, Gabriel E.; Carrick, Danielle M.; Pressman, Emilee J.; Schully, Sheri D.; Mechanic, Leah E.; Cronin, Kathleen A.; Hernandez, Brenda Y.; Lynch, Charles F.; Cozen, Wendy; Khoury, Muin J.; Penberthy, Lynne T.
2014-01-01
The National Cancer Institute's (NCI) Surveillance, Epidemiology, and End Results (SEER) registries have been a source of biospecimens for cancer research for decades. Recently, registry-based biospecimen studies have become more practical, with the expansion of electronic networks for pathology and medical record reporting. Formalin-fixed paraffin-embedded specimens are now used for next-generation sequencing and other molecular techniques. These developments create new opportunities for SEER biospecimen research. We evaluated 31 research articles published during 2005–2013 based on author confirmation that these studies involved linkage of SEER data to biospecimens. Rather than providing an exhaustive review of all possible articles, our intent was to indicate the breadth of research made possible by such a resource. We also summarize responses to a 2012 questionnaire that was broadly distributed to the NCI intra- and extramural biospecimen research community. This included responses from 30 investigators who had used SEER biospecimens in their research. The survey was not intended to be a systematic sample, but instead to provide anecdotal insight on strengths, limitations, and the future of SEER biospecimen research. Identified strengths of this research resource include biospecimen availability, cost, and annotation of data, including demographic information, stage, and survival. Shortcomings include limited annotation of clinical attributes such as detailed chemotherapy history and recurrence, and timeliness of turnaround following biospecimen requests. A review of selected SEER biospecimen articles, investigator feedback, and technological advances reinforced our view that SEER biospecimen resources should be developed. This would advance cancer biology, etiology, and personalized therapy research. PMID:25472677
Ambient air emissions of polycyclic aromatic hydrocarbons and female breast cancer incidence in US.
Stults, William Parker; Wei, Yudan
2018-05-05
To examine ambient air pollutants, specifically polycyclic aromatic hydrocarbons (PAHs), as a factor in the geographic variation of breast cancer incidence seen in the US, we conducted an ecological study involving counties throughout the US to examine breast cancer incidence in relation to PAH emissions in ambient air. Age-adjusted incidence rates of female breast cancer from the surveillance, epidemiology, and end results (SEER) program of the US National Cancer Institute were collected and analyzed using SEER*Stat 8.3.2. PAH emissions data were obtained from the Environmental Protection Agency. Linear regression analysis was performed using SPSS 23 software for Windows to analyze the association between PAH emissions and breast cancer incidence, adjusting for potential confounders. Age-adjusted incidence rates of female breast cancer were found being significantly higher in more industrialized metropolitan SEER regions over the years of 1973-2013 as compared to less industrialized regions. After adjusting for sex, race, education, socioeconomic status, obesity, and smoking prevalence, PAH emission density was found to be significantly associated with female breast cancer incidence, with the adjusted β of 0.424 (95% CI 0.278, 0.570; p < 0.0001) for emissions from all sources and of 0.552 (95% CI 0.278, 0.826; p < 0.0001) for emissions from traffic source. This study suggests that PAH exposure from ambient air could play a role in the increased breast cancer risk among women living in urban areas of the US. Further research could provide insight into breast cancer etiology and prevention.
Akinyemiju, Tomi; Waterbor, John W; Pisu, Maria; Moore, Justin Xavier; Altekruse, Sean F
2016-04-01
This study aims to examine if access to healthcare, measured through the availability of medical resources at the neighborhood level, influences colorectal cancer (CRC) stage, treatment and survival using the Surveillance Epidemiology and Ends Result (SEER) dataset (November 2012), linked with the 2004 Area Resource File. A cross-sectional study was conducted to determine the association between availability of healthcare resources and CRC outcomes among non-Hispanic Black (n = 9162) and non-Hispanic White patients (n = 97,264). CRC patients were identified using the SEER*Stat program, and individual socio-demographic, clinical, and county-level healthcare access variables were obtained for each patient. Among NH-W patients, residence in counties with lower number of oncology hospitals was associated with increased odds of late stage diagnosis (OR 1.09, 95 % CI 1.04-1.14), reduced odds of receiving surgery (OR 0.83, 95 % CI 0.74-0.92) and higher hazard rates (HR 1.09, 95 % CI 1.06-1.12). There were no significant associations among NH-B patients. Increased availability of healthcare resources improves CRC outcomes among NH-W patients. However, future studies are required to better understand healthcare utilization patterns in NH-B neighborhoods, and identify other important dimensions of healthcare access such as affordability, acceptability and accommodation.
Statistical Reference Datasets
National Institute of Standards and Technology Data Gateway
Statistical Reference Datasets (Web, free access) The Statistical Reference Datasets is also supported by the Standard Reference Data Program. The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software.
2010-12-01
processes. Novice estimators must often use of these complicated cost estimation tools (e.g., ACEIT , SEER-H, SEER-S, PRICE-H, PRICE-S, etc.) until...However, the thesis will leverage the processes embedded in cost estimation tools such as the Automated Cost Estimating Integration Tool ( ACEIT ) and the
Duggan, Máire A.; Anderson, William F.; Altekruse, Sean; Penberthy, Lynne; Sherman, Mark E.
2016-01-01
The Surveillance, Epidemiology and End Results (SEER) program of the National Cancer Institute collects data on cancer diagnoses, treatment and survival for approximately 30% of the United States (U.S.) population. To reflect advances in research and oncology practice, approaches to cancer control are evolving from simply enumerating the development of cancers by organ sites in populations to include monitoring of cancer occurrence by histopathologic and molecular subtype, as defined by driver mutations and other alterations. SEER is an important population-based resource for understanding the implications of pathology diagnoses across demographic groups, geographic regions, and time, and provides unique insights into the practice of oncology in the U.S that are not attainable from other sources. It provides incidence, survival and mortality data for histopathologic cancer subtypes, and data by molecular subtyping is expanding. The program is developing systems to capture additional biomarker data, results from special populations, and expand bio-specimen banking to enable cutting edge cancer research that can improve oncology practice. Pathology has always been central and critical to the effectiveness of SEER, and strengthening this relationship in this modern era of cancer diagnosis could be mutually beneficial. Achieving this goal requires close interactions between pathologists and the SEER program. This review provides a brief overview of SEER, focuses on facets relevant to pathology practice and research, and highlights the opportunities and challenges for pathologists to benefit from and enhance the value of SEER data. PMID:27740970
Bright, C J; Rea, D W; Francis, A; Feltbower, R G
2016-10-01
UK breast cancer incidence rates suggest that upper outer quadrant (UOQ) cancers have risen disproportionately compared with other areas over time. We aimed to provide a comparison of the trend in quadrant-specific breast cancer incidence between the United States (US) and England, and determine whether a disproportionate UOQ increase is present. Surveillance Epidemiology and End Results (SEER) cancer registry data were obtained on 630,007 female breast cancers from 1975 to 2013. English cancer registry data were obtained on 1,121,134 female breast cancers from 1979 to 2013. Temporal incidence changes were analysed using negative binomial regression. Interaction terms determined whether incidence changes were similar between sites. English breast cancer incidence in the UOQ rose significantly from 13% to 28% from 1979 to 2013 whereas no significant increase was observed among SEER data. The significant interaction between quadrant and year of diagnosis (p<0.001) in both SEER and English data indicates that breast cancer incidence in each quadrant changed at a different rate. Incidence in the UOQ rose disproportionately compared to the nipple (SEER IRR=0.81, p<0.001; England IRR=0.78, p<0.001) and axillary tail (SEER IRR=0.87, p=0.018; England IRR=0.69, p<0.001) in both SEER and England. In addition, incidence rose disproportionately in the UOQ compared to non-site-specific tumours in England (Overlapping lesions IRR=0.81, p=0.002; NOS IRR=0.78, p<0.001). The proportion of non-site-specific tumours was substantially higher in England than SEER throughout the study period (62% in England; 39% in SEER). Breast cancer incidence in the UOQ increased disproportionately compared to non-site-specific tumours in England but not in SEER, likely due to the decrease in non-site-specific tumours observed in England over time. There may be real differences in incidence between the two countries, possibly due to differences in aetiology, but is much more likely to be an artefact of changing data collection methods and improvements in site coding in either country. Copyright © 2016 Elsevier Ltd. All rights reserved.
An infographic describing the functions of NCI’s Surveillance, Epidemiology, and End Results (SEER) program: collecting, analyzing, interpreting, and disseminating reliable population-based statistics.
Wang, Zi-Xian; Qiu, Miao-Zhen; Jiang, Yu-Ming; Zhou, Zhi-Wei; Li, Guo-Xin; Xu, Rui-Hua
2017-01-01
Purpose: Previous studies addressing the optimal nodal staging system in patients with resected gastric cancer have shown inconsistent results, and the optimal system for development of prognostic nomograms remains unclear. In this study, we compared prognostic nomograms based on the metastatic lymph node (MLN) count, lymph node ratio (LNR), and log odds of metastatic lymph nodes (LODDS) to predict the 5-year overall survival in patients with resected gastric cancer. Methods: We analysed 15,320 patients with resected gastric cancer in the Surveillance, Epidemiology, and End Results (SEER) database between 1988 and 2010. Missing data were handled using multiple imputation. When assessed as a continuous covariate with restricted cubic splines, each MLN, LNR, and LODDS variable was incorporated into a nomogram with other significant prognosticators to predict the 5-year overall survival. A two-centre Chinese dataset (1,595 cases) was used as external validation data. Results: The discriminatory abilities of the MLN-, LNR-, and LODDS-based nomograms were comparable (concordance indices: 0.744, 0.741, and 0.744, respectively, in the SEER set, P > 0.152 for all pairwise comparisons; 0.715, 0.712, and 0.713, respectively, in the Chinese set, P > 0.445 for all pairwise comparisons). The discriminatory abilities of the three nomograms were all superior to the American Joint Committee on Cancer (AJCC) TNM classification (concordance indices: 0.713, P < 0.001 for all in the SEER set; and 0.693, P < 0.001 for all in the Chinese set). The discriminatory abilities of the nomograms were comparable regardless of the number of nodes examined. Moreover, decision curve analyses indicated similar net benefits of using the nomograms. Conclusion: MLN-, LNR-, and LODDS should be considered equally in the development of multivariate prognostic models and nomograms to refine the prediction of survival among patients with resected gastric cancer.
Liu, Zheyu; Zhang, Yefei; Franzin, Luisa; Cormier, Janice N; Chan, Wenyaw; Xu, Hua; Du, Xianglin L
2015-04-01
Few studies have examined the cancer incidence trends in the state of Texas, and no study has ever been conducted to compare the temporal trends of breast and colorectal cancer incidence in Texas with those of the National Cancer Institute's Surveillance, Epidemiology and End Results (SEER) in the United States. This study aimed to conduct a parallel comparison between the Texas Cancer Registry and the National Cancer Institute's SEER on cancer incidence from 1995 to 2011. A total of 951,899 breast and colorectal cancer patients were included. Age-adjusted breast cancer incidence was 134.74 per 100,000 in Texas and 131.78 per 100,000 in SEER in 1995-2011, whereas age-adjusted colorectal cancer incidence was 50.52 per 100,000 in Texas and 49.44 per 100,000 in SEER. Breast cancer incidence increased from 1995 to 2001, decreased from 2002 to 2006, and then remained relatively stable from 2007 to 2011. For colorectal cancer, the incidence increased in 1995-1997, and then decreased continuously from 1998 to 2011 in Texas and SEER areas. Incidence rates and relative risks by age, gender and ethnicity were identical between Texas and SEER.
LIU, ZHEYU; ZHANG, YEFEI; FRANZIN, LUISA; CORMIER, JANICE N.; CHAN, WENYAW; XU, HUA; DU, XIANGLIN L.
2015-01-01
Few studies have examined the cancer incidence trends in the state of Texas, and no study has ever been conducted to compare the temporal trends of breast and colorectal cancer incidence in Texas with those of the National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) in the United States. This study aimed to conduct a parallel comparison between the Texas Cancer Registry and the National Cancer Institute’s SEER on cancer incidence from 1995 to 2011. A total of 951,899 breast and colorectal cancer patients were included. Age-adjusted breast cancer incidence was 134.74 per 100,000 in Texas and 131.78 per 100,000 in SEER in 1995–2011, whereas age-adjusted colorectal cancer incidence was 50.52 per 100,000 in Texas and 49.44 per 100,000 in SEER. Breast cancer incidence increased from 1995 to 2001, decreased from 2002 to 2006, and then remained relatively stable from 2007 to 2011. For colorectal cancer, the incidence increased in 1995–1997, and then decreased continuously from 1998 to 2011 in Texas and SEER areas. Incidence rates and relative risks by age, gender and ethnicity were identical between Texas and SEER. PMID:25672365
Clegg, Limin X; Reichman, Marsha E; Hankey, Benjamin F; Miller, Barry A; Lin, Yi D; Johnson, Norman J; Schwartz, Stephen M; Bernstein, Leslie; Chen, Vivien W; Goodman, Marc T; Gomez, Scarlett L; Graff, John J; Lynch, Charles F; Lin, Charles C; Edwards, Brenda K
2007-03-01
Population-based cancer registry data from the Surveillance, Epidemiology, and End Results (SEER) Program at the National Cancer Institute are based on medical records and administrative information. Although SEER data have been used extensively in health disparities research, the quality of information concerning race, Hispanic ethnicity, and immigrant status has not been systematically evaluated. The quality of this information was determined by comparing SEER data with self-reported data among 13,538 cancer patients diagnosed between 1973-2001 in the SEER--National Longitudinal Mortality Study linked database. The overall agreement was excellent on race (kappa = 0.90, 95% CI = 0.88-0.91), moderate to substantial on Hispanic ethnicity (kappa = 0.61, 95% CI = 0.58-0.64), and low on immigrant status (kappa = 0.21. 95% CI = 0.10, 0.23). The effect of these disagreements was that SEER data tended to under-classify patient numbers when compared to self-identifications, except for the non-Hispanic group which was slightly over-classified. These disagreements translated into varying racial-, ethnic-, and immigrant status-specific cancer statistics, depending on whether self-reported or SEER data were used. In particular, the 5-year Kaplan-Meier survival and the median survival time from all causes for American Indians/Alaska Natives were substantially higher when based on self-classification (59% and 140 months, respectively) than when based on SEER classification (44% and 53 months, respectively), although the number of patients is small. These results can serve as a useful guide to researchers contemplating the use of population-based registry data to ascertain disparities in cancer burden. In particular, the study results caution against evaluating health disparities by using birthplace as a measure of immigrant status and race information for American Indians/Alaska Natives.
Cancer Registrar Training - SEER Registrars
View questions and answers about becoming a cancer registrar, plus training materials for cancer registration and surveillance, including SEER*Educate and information about an annual training event for advanced topics.
Saad, Anas M; Abushouk, Abdelrahman Ibrahim; Al-Husseini, Muneer J; Salahia, Sami; Alrefai, Anas; Afifi, Ahmed M; Abdel-Daim, Mohamed M
The available literature on the incidence, management and prognosis of primary malignant cardiac tumors [PMCTs] is limited to single-center studies, prone to small sample size and referral bias. We used data from the Surveillance, Epidemiology, and End Results [SEER]-18 registry (between 2000 and 2014) to investigate the distribution, incidence trends and the survival rates of PMCTs. We used SEER*Stat (version 8.3.4) and the National Cancer Institute's Joinpoint Regression software (version 4.5.0.1) to calculate the incidence rates and annual percentage changes [APC] of PMCTs, respectively. We later used SPSS software (version 23) to perform Kaplan-Meier survival tests and covariate-adjusted Cox models. We identified 497 patients with PMCTs, including angiosarcomas (27.3%) and Non-Hodgkin's lymphomas [NHL] (26.9%). Unlike the incidence rate of NHL (0.108 per 10 6 person-years) that increased significantly (APC=3.56%, 95% CI, [1.445 to 5.725], P=.003) over the study period, we detected no significant change (APC=1.73%, 95% CI [-3.354 to 7.081], P=.483) in the incidence of cardiac angiosarcomas (0.107 per 10 6 person-years). Moreover, our analysis showed that the overall survival of NHL is significantly better than angiosarcomas (P<.001). In addition, surgical treatment was associated with a significant improvement (P=.027) in the overall survival of PMCTs. Our analysis showed a significant increase in the incidence of cardiac-NHL over the past 14 years with a significantly better survival than angiosarcomas. To further characterize these rare tumors, future studies should report data on the medical history and diagnostic and treatment modalities in these patients. Copyright © 2017 Elsevier Inc. All rights reserved.
Data Collection Answers - SEER Registrars
Read clarifications to existing coding rules, which should be implemented immediately. Data collection experts from American College of Surgeons Commission on Cancer, CDC National Program of Cancer Registries, and SEER Program compiled these answers.
SEER Statistics | DCCPS/NCI/NIH
The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute works to provide information on cancer statistics in an effort to reduce the burden of cancer among the U.S. population.
Software ion scan functions in analysis of glycomic and lipidomic MS/MS datasets.
Haramija, Marko
2018-03-01
Hardware ion scan functions unique to tandem mass spectrometry (MS/MS) mode of data acquisition, such as precursor ion scan (PIS) and neutral loss scan (NLS), are important for selective extraction of key structural data from complex MS/MS spectra. However, their software counterparts, software ion scan (SIS) functions, are still not regularly available. Software ion scan functions can be easily coded for additional functionalities, such as software multiple precursor ion scan, software no ion scan, and software variable ion scan functions. These are often necessary, since they allow more efficient analysis of complex MS/MS datasets, often encountered in glycomics and lipidomics. Software ion scan functions can be easily coded by using modern script languages and can be independent of instrument manufacturer. Here we demonstrate the utility of SIS functions on a medium-size glycomic MS/MS dataset. Knowledge of sample properties, as well as of diagnostic and conditional diagnostic ions crucial for data analysis, was needed. Based on the tables constructed with the output data from the SIS functions performed, a detailed analysis of a complex MS/MS glycomic dataset could be carried out in a quick, accurate, and efficient manner. Glycomic research is progressing slowly, and with respect to the MS experiments, one of the key obstacles for moving forward is the lack of appropriate bioinformatic tools necessary for fast analysis of glycomic MS/MS datasets. Adding novel SIS functionalities to the glycomic MS/MS toolbox has a potential to significantly speed up the glycomic data analysis process. Similar tools are useful for analysis of lipidomic MS/MS datasets as well, as will be discussed briefly. Copyright © 2017 John Wiley & Sons, Ltd.
Reference datasets for bioequivalence trials in a two-group parallel design.
Fuglsang, Anders; Schütz, Helmut; Labes, Detlew
2015-03-01
In order to help companies qualify and validate the software used to evaluate bioequivalence trials with two parallel treatment groups, this work aims to define datasets with known results. This paper puts a total 11 datasets into the public domain along with proposed consensus obtained via evaluations from six different software packages (R, SAS, WinNonlin, OpenOffice Calc, Kinetica, EquivTest). Insofar as possible, datasets were evaluated with and without the assumption of equal variances for the construction of a 90% confidence interval. Not all software packages provide functionality for the assumption of unequal variances (EquivTest, Kinetica), and not all packages can handle datasets with more than 1000 subjects per group (WinNonlin). Where results could be obtained across all packages, one showed questionable results when datasets contained unequal group sizes (Kinetica). A proposal is made for the results that should be used as validation targets.
This interactive website provides access to cancer statistics (rates and trends) for a cancer site by gender, race, calendar year, stage, and histology. Users can create custom graphs and tables, download data and images, download SEER*Stat sessions, and share results.
Georgakis, Marios K; Papathoma, Paraskevi; Ryzhov, Anton; Zivkovic-Perisic, Snezana; Eser, Sultan; Taraszkiewicz, Łukasz; Sekerija, Mario; Žagar, Tina; Antunes, Luis; Zborovskaya, Anna; Bastos, Joana; Florea, Margareta; Coza, Daniela; Demetriou, Anna; Agius, Domenic; Strahinja, Rajko M; Themistocleous, Marios; Tolia, Maria; Tzanis, Spyridon; Alexiou, George A; Papanikolaou, Panagiotis G; Nomikos, Panagiotis; Kantzanou, Maria; Dessypris, Nick; Pourtsidis, Apostolos; Petridou, Eleni T
2017-11-15
Unique features and worse outcomes have been reported for cancers among adolescents and young adults (AYAs; 15-39 years old). The aim of this study was to explore the mortality and survival patterns of malignant central nervous system (CNS) tumors among AYAs in Southern-Eastern Europe (SEE) in comparison with the US Surveillance, Epidemiology, and End Results (SEER) program. Malignant CNS tumors diagnosed in AYAs during the period spanning 1990-2014 were retrieved from 14 population-based cancer registries in the SEE region (n = 11,438). Age-adjusted mortality rates were calculated and survival patterns were evaluated via Kaplan-Meier curves and Cox regression analyses, and they were compared with respective 1990-2012 figures from SEER (n = 13,573). Mortality rates in SEE (range, 11.9-18.5 deaths per million) were higher overall than the SEER rate (9.4 deaths per million), with decreasing trends in both regions. Survival rates increased during a comparable period (2001-2009) in SEE and SEER. The 5-year survival rate was considerably lower in the SEE registries (46%) versus SEER (67%), mainly because of the extremely low rates in Ukraine; this finding was consistent across age groups and diagnostic subtypes. The highest 5-year survival rates were recorded for ependymomas (76% in SEE and 92% in SEER), and the worst were recorded for glioblastomas and anaplastic astrocytomas (28% in SEE and 37% in SEER). Advancing age, male sex, and rural residency at diagnosis adversely affected outcomes in both regions. Despite definite survival gains over the last years, the considerable outcome disparities between the less affluent SEE region and the United States for AYAs with malignant CNS tumors point to health care delivery inequalities. No considerable prognostic deficits for CNS tumors are evident for AYAs versus children. Cancer 2017;123:4458-71. © 2017 American Cancer Society. © 2017 American Cancer Society.
Second Primary Malignancies in Patients with Well-differentiated/Dedifferentiated Liposarcoma.
Jung, Eric; Fiore, Marco; Gronchi, Alessandro; Grignol, Valerie; Pollock, Raphael E; Chong, Susan S; Chopra, Shefali; Hamilton, Ann S; Tseng, William W
2018-06-01
Well-differentiated/dedifferentiated (WD/DD) liposarcoma is a rare malignancy of putative adipocyte origin. To our knowledge, there have only been isolated case reports describing second primary cancer in patients with this disease. We report on a combined case series of such patients and explore the frequency of this occurrence using a national cancer database. Demographics and clinicopathological data were collected from patients with WD/DD liposarcoma who were found to have a concurrent or subsequent second primary cancer, at one of three sarcoma referral centers from 2014-2016. The Surveillance, Epidemiology and End Results (SEER) database was also queried to identify adult patients diagnosed with WD/DD liposarcoma between 1973-2012. Observed/expected (O/E) ratios of second primary malignancies among these cases were calculated by comparison to the age-adjusted cancer incidence in the general population using SEER*stat software. In total, 26 out of 312 consecutive patients (8.3%) with WD/DD liposarcoma at our centers had a second primary cancer identified within 2 years of liposarcoma diagnosis. In the SEER database, among 1,845 patients with WD/DD liposarcoma, 75 (4.1%) had a second cancer within 2 years after liposarcoma diagnosis (O/E ratio=1.81, 99% confidence interval(CI)=1.33-2.40). Patients less than 50 years old at the time of liposarcoma diagnosis had a higher O/E ratio for second primary malignancy compared to older patients. A total of 269 patients (14.6%) developed a second cancer (O/E=1.33, 99% CI=1.15-1.54). In some patients with WD/DD liposarcoma, there appears to be an increased risk of having a second primary cancer. Further validation and investigation is needed, as this finding may have implications (e.g. closer screening) for patients with this disease. Copyright© 2018, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
Mukkamalla, Shiva Kumar R; Naseri, Hussain M; Kim, Byung M; Katz, Steven C; Armenio, Vincent A
2018-04-01
Background: Cholangiocarcinoma (CCA) includes cancers arising from the intrahepatic and extrahepatic bile ducts. The etiology and pathogenesis of CCA remain poorly understood. This is the first study investigating both incidence patterns of CCA from 1973 through 2012 and demographic, clinical, and treatment variables affecting survival of patients with CCA. Patients and Methods: Using the SEER database, age-adjusted incidence rates were evaluated from 1973-2012 using SEER*Stat software. A retrospective cohort of 26,994 patients diagnosed with CCA from 1973-2008 was identified for survival analysis. Cox proportional hazards models were used to perform multivariate survival analysis. Results: Overall incidence of CCA increased by 65% from 1973-2012. Extrahepatic CCA (ECC) remained more common than intrahepatic CCA (ICC), whereas the incidence rates for ICC increased by 350% compared with a 20% increase seen with ECC. Men belonging to non-African American and non-Caucasian ethnicities had the highest incidence rates of CCA. This trend persisted throughout the study period, although African Americans and Caucasians saw 50% and 59% increases in incidence rates, respectively, compared with a 9% increase among other races. Median overall survival (OS) was 8 months in patients with ECC compared with 4 months in those with ICC. Our survival analysis found Hispanic women to have the best 5-year survival outcome ( P <.0001). OS diminished with age ( P <.0001), and ECC had better survival outcomes compared with ICC ( P <.0001). Patients who were married, were nonsmokers, belonged to a higher income class, and underwent surgery had better survival outcomes compared with others ( P <.0001). Conclusions: This is the most up-to-date study of CCA from the SEER registry that shows temporal patterns of increasing incidence of CCA across different races, sexes, and ethnicities. We identified age, sex, race, marital status, income, smoking status, anatomic location of CCA, tumor grade, tumor stage, radiation, and surgery as independent prognostic factors for OS in patients with CCA. Copyright © 2018 by the National Comprehensive Cancer Network.
U.S. Population Data 1969-2016 - SEER Population Data
Download county population estimates used in SEER*Stat to calculate cancer incidence and mortality rates. The estimates are a modification of the U.S. Census Bureau's Population Estimates Program, in collaboration with National Center for Health Statistics.
Summary Staging Manual 2000 - SEER
Access this manual of codes and coding instructions for the summary stage field for cases diagnosed 2001-2017. 2000 version applies to every anatomic site. It uses all information in the medical record. Also called General Staging, California Staging, and SEER Staging.
The effect of multiple primary rules on population-based cancer survival
Weir, Hannah K.; Johnson, Christopher J.; Thompson, Trevor D.
2015-01-01
Purpose Different rules for registering multiple primary (MP) cancers are used by cancer registries throughout the world, making international data comparisons difficult. This study evaluates the effect of Surveillance, Epidemiology, and End Results (SEER) and International Association of Cancer Registries (IACR) MP rules on population-based cancer survival estimates. Methods Data from five US states and six metropolitan area cancer registries participating in the SEER Program were used to estimate age-standardized relative survival (RS%) for first cancers-only and all first cancers matching the selection criteria according to SEER and IACR MP rules for all cancer sites combined and for the top 25 cancer site groups among men and women. Results During 1995–2008, the percentage of MP cancers (all sites, both sexes) increased 25.4 % by using SEER rules (from 14.6 to 18.4 %) and 20.1 % by using IACR rules (from 13.2 to 15.8 %). More MP cancers were registered among females than among males, and SEER rules registered more MP cancers than IACR rules (15.8 vs. 14.4 % among males; 17.2 vs. 14.5 % among females). The top 3 cancer sites with the largest differences were melanoma (5.8 %), urinary bladder (3.5 %), and kidney and renal pelvis (2.9 %) among males, and breast (5.9 %), melanoma (3.9 %), and urinary bladder (3.4 %) among females. Five-year survival estimates (all sites combined) restricted to first primary cancers-only were higher than estimates by using first site-specific primaries (SEER or IACR rules), and for 11 of 21 sites among males and 11 of 23 sites among females. SEER estimates are comparable to IACR estimates for all site-specific cancers and marginally higher for all sites combined among females (RS 62.28 vs. 61.96 %). Conclusion Survival after diagnosis has improved for many leading cancers. However, cancer patients remain at risk of subsequent cancers. Survival estimates based on first cancers-only exclude a large and increasing number of MP cancers. To produce clinically and epidemiologically relevant and less biased cancer survival estimates, data on all cancers should be included in the analysis. The multiple primary rules (SEER or IACR) used to identify primary cancers do not affect survival estimates if all first cancers matching the selection criteria are used to produce site-specific survival estimates. PMID:23558444
3D reconstruction software comparison for short sequences
NASA Astrophysics Data System (ADS)
Strupczewski, Adam; Czupryński, BłaŻej
2014-11-01
Large scale multiview reconstruction is recently a very popular area of research. There are many open source tools that can be downloaded and run on a personal computer. However, there are few, if any, comparisons between all the available software in terms of accuracy on small datasets that a single user can create. The typical datasets for testing of the software are archeological sites or cities, comprising thousands of images. This paper presents a comparison of currently available open source multiview reconstruction software for small datasets. It also compares the open source solutions with a simple structure from motion pipeline developed by the authors from scratch with the use of OpenCV and Eigen libraries.
NASA Astrophysics Data System (ADS)
Carlton, A.; Cahoy, K.
2015-12-01
Reliability of geostationary communication satellites (GEO ComSats) is critical to many industries worldwide. The space radiation environment poses a significant threat and manufacturers and operators expend considerable effort to maintain reliability for users. Knowledge of the space radiation environment at the orbital location of a satellite is of critical importance for diagnosing and resolving issues resulting from space weather, for optimizing cost and reliability, and for space situational awareness. For decades, operators and manufacturers have collected large amounts of telemetry from geostationary (GEO) communications satellites to monitor system health and performance, yet this data is rarely mined for scientific purposes. The goal of this work is to acquire and analyze archived data from commercial operators using new algorithms that can detect when a space weather (or non-space weather) event of interest has occurred or is in progress. We have developed algorithms, collectively called SEER (System Event Evaluation Routine), to statistically analyze power amplifier current and temperature telemetry by identifying deviations from nominal operations or other events and trends of interest. This paper focuses on our work in progress, which currently includes methods for detection of jumps ("spikes", outliers) and step changes (changes in the local mean) in the telemetry. We then examine available space weather data from the NOAA GOES and the NOAA-computed Kp index and sunspot numbers to see what role, if any, it might have played. By combining the results of the algorithm for many components, the spacecraft can be used as a "sensor" for the space radiation environment. Similar events occurring at one time across many component telemetry streams may be indicative of a space radiation event or system-wide health and safety concern. Using SEER on representative datasets of telemetry from Inmarsat and Intelsat, we find events that occur across all or many of telemetry files at certain dates. We compare these system-wide events to known space weather storms, such as the 2003 Halloween storms, and to spacecraft operational events, such as maneuvers. We also present future applications and expansions of SEER for robust space environment sensing and system health and safety monitoring.
Access this manual of codes and coding instructions for the summary stage field for cases diagnosed January 1, 2018 and forward. 2018 version applies to every site and/or histology combination, including lymphomas and leukemias. Historically, also called General Staging, California Staging, and SEER Staging.
Risk of cardiac death among cancer survivors in the United States: a SEER database analysis.
Abdel-Rahman, Omar
2017-09-01
Population-based data on the risk of cardiac death among cancer survivors are needed. This scenario was evaluated in cancer survivors (>5 years) registered within the Surveillance, Epidemiology and End Results (SEER) database. The SEER database was queried using SEER*Stat to determine the frequency of cardiac death compared to other causes of death; and to determine heart disease-specific and cancer-specific survival rates in survivors of each of the 10 most common cancers in men and women in the SEER database. For cancer-specific survival rate, the highest rates were related to thyroid cancer survivors; while the lowest rates were related to lung cancer survivors. For heart disease-specific survival rate, the highest rates were related to thyroid cancer survivors; while the lowest rates were related to both lung cancer survivors and urinary bladder cancer survivors. The following factors were associated with a higher likelihood of cardiac death: male gender, old age at diagnosis, black race and local treatment with radiotherapy rather than surgery (P < 0.0001 for all parameters). Among cancer survivors (>5 years), cardiac death is a significant cause of death and there is a wide variability among different cancers in the relative importance of cardiac death vs. cancer-related death.
Oweira, Hani; Petrausch, Ulf; Helbling, Daniel; Schmidt, Jan; Mehrabi, Arianeb; Schöb, Othmar; Giryes, Anwar; Abdel-Rahman, Omar
2017-07-01
We the prognostic value of site-specific extra-hepatic disease in hepatocellular carcinoma (HCC) patients registered within the surveillance, epidemiology and end results (SEER) database. SEER database (2010-2013) has been queried through SEER*Stat program to determine the prognosis of advanced HCC patients according to the site of extra-hepatic disease. Survival analysis has been conducted through Kaplan Meier analysis. A total of 4396 patients with stage IV HCC were identified in the period from 2010-2013 and they were included into this analysis. Patients with isolated regional lymph node involvement have better outcomes compared to patients with any other site of extra-hepatic disease (P < 0.0001 for both endpoints). Among patients with distant metastases, patients with bone metastases have better outcomes compared to patients with lung metastases (P < 0.0001 for both endpoints). Multivariate analysis revealed that younger age, normal alpha fetoprotein, single site of extra-hepatic disease, local treatment to the primary tumor and surgery to the metastatic disease were associated with better overall survival and liver cancer-specific survival. Within the limits of the current SEER analysis, HCC patients with isolated lung metastases seem to have worse outcomes compared to patients with isolated bone or regional nodal metastases..
Future Directions for NCI’s Surveillance Research Program
Since the early 1970s, NCI’s SEER program has been an invaluable resource for statistics on cancer in the United States. For the past several years, SEER researchers have been working toward a much broader and comprehensive goal for providing cancer stati
Maximizing Accessibility to Spatially Referenced Digital Data.
ERIC Educational Resources Information Center
Hunt, Li; Joselyn, Mark
1995-01-01
Discusses some widely available spatially referenced datasets, including raster and vector datasets. Strategies for improving accessibility include: acquisition of data in a software-dependent format; reorganization of data into logical geographic units; acquisition of intelligent retrieval software; improving computer hardware; and intelligent…
Rosenberg, Aaron S.; Ruthazer, Robin; Paulus, Jessica K.; Kent, David M.; Evens, Andrew M.; Klein, Andreas K.
2016-01-01
Background Multiple myeloma/plasmacytoma-like post-transplant lymphoproliferative disorder (PTLD-MM) is a rare complication of solid organ transplant. Case series have shown variable outcomes and survival data in the modern era are lacking. Methods A cohort of 212 PTLD-MM patients was identified in the Scientific Registry of Transplant Recipients between 1999-2011. Overall survival (OS) was estimated using the Kaplan-Meier method and the effects of treatment and patient characteristics on OS evaluated with Cox proportional hazards models. OS in 185 PTLD-MM patients was compared with 4048 matched controls with multiple myeloma (SEER-MM) derived from SEER. Results Men comprised 71% of patients; extramedullary disease was noted in 58%. Novel therapeutic agents were used in 19% of patients (more commonly 2007-2011 versus 1999-2006 (P=0.01)), reduced immunosuppression in 55%, and chemotherapy in 32%. Median OS was 2.4 years, and improved in the later time period (aHR 0.64, P=0.05). Advanced age, creatinine>2, Caucasian race and use of OKT3 were associated with inferior OS in multivariable analysis. OS of PTLD-MM is significantly inferior to SEER-MM patients (aHR 1.6, p<0.001). Improvements in OS over time differed between PTLD-MM and SEER-MM. Median OS of patients diagnosed 2000-2005 was shorter for PTLD-MM than SEER-MM patients (18 vs 47 months P<0.001). There was no difference among those diagnosed 2006-2010 (44 mo vs median not reached P=0.5) (interaction P=0.08). Conclusions Age at diagnosis, elevated creatinine, Caucasian race and OKT3 were associated with inferior survival in patients with PTLD-MM. Survival of PTLD-MM is inferior to SEER-MM, though significant improvements in survival have been documented. PMID:27771291
Goldwasser, Deborah L
2017-03-15
The National Lung Screening Trial (NLST) demonstrated that non-small cell lung cancer (NSCLC) mortality can be reduced by a program of annual CT screening in high-risk individuals. However, CT screening regimens and adherence vary, potentially impacting the lung cancer mortality benefit. We defined the NSCLC cure threshold as the maximum tumor size at which a given NSCLC would be curable due to early detection. We obtained data from 518,234 NSCLCs documented in the U.S. SEER cancer registry between 1988 and 2012 and 1769 NSCLCs detected in the NLST. We demonstrated mathematically that the distribution function governing the cure threshold for the most aggressive NSCLCs, G(x|Φ = 1), was embedded in the probability function governing detection of SEER-documented NSCLCs. We determined the resulting probability functions governing detection over a range of G(x|Φ = 1) scenarios and compared them with their expected functional forms. We constructed a simulation framework to determine the cure threshold models most consistent with tumor sizes and outcomes documented in SEER and the NLST. Whereas the median tumor size for lethal NSCLCs documented in SEER is 43 mm (males) and 40 mm (females), a simulation model in which the median cure threshold for the most aggressive NSCLCs is 10 mm (males) and 15 mm (females) best fit the SEER and NLST data. The majority of NSCLCs in the NLST were treated at sizes greater than our median cure threshold estimates. New technology is needed to better distinguish and treat the most aggressive NSCLCs when they are small (i.e., 5-15 mm). © 2016 UICC.
OpinionSeer: interactive visualization of hotel customer feedback.
Wu, Yingcai; Wei, Furu; Liu, Shixia; Au, Norman; Cui, Weiwei; Zhou, Hong; Qu, Huamin
2010-01-01
The rapid development of Web technology has resulted in an increasing number of hotel customers sharing their opinions on the hotel services. Effective visual analysis of online customer opinions is needed, as it has a significant impact on building a successful business. In this paper, we present OpinionSeer, an interactive visualization system that could visually analyze a large collection of online hotel customer reviews. The system is built on a new visualization-centric opinion mining technique that considers uncertainty for faithfully modeling and analyzing customer opinions. A new visual representation is developed to convey customer opinions by augmenting well-established scatterplots and radial visualization. To provide multiple-level exploration, we introduce subjective logic to handle and organize subjective opinions with degrees of uncertainty. Several case studies illustrate the effectiveness and usefulness of OpinionSeer on analyzing relationships among multiple data dimensions and comparing opinions of different groups. Aside from data on hotel customer feedback, OpinionSeer could also be applied to visually analyze customer opinions on other products or services.
American-Indian diabetes mortality in the Great Plains Region 2002–2010
Kelley, Allyson; Giroux, Jennifer; Schulz, Mark; Aronson, Bob; Wallace, Debra; Bell, Ronny; Morrison, Sharon
2015-01-01
Objective To compare American-Indian and Caucasian mortality rates from diabetes among tribal Contract Health Service Delivery Areas (CHSDAs) in the Great Plains Region (GPR) and describe the disparities observed. Research design and methods Mortality data from the National Center for Vital Statistics and Seer*STAT were used to identify diabetes as the underlying cause of death for each decedent in the GPR from 2002 to 2010. Mortality data were abstracted and aggregated for American-Indians and Caucasians for 25 reservation CHSDAs in the GPR. Rate ratios (RR) with 95% CIs were used and SEER*Stat V.8.0.4 software calculated age-adjusted diabetes mortality rates. Results Age-adjusted mortality rates for American-Indians were significantly higher than those for Caucasians during the 8-year period. In the GPR, American-Indians were 3.44 times more likely to die from diabetes than Caucasians. South Dakota had the highest RR (5.47 times that of Caucasians), and Iowa had the lowest RR, (1.1). Reservation CHSDA RR ranged from 1.78 to 10.25. Conclusions American-Indians in the GPR have higher diabetes mortality rates than Caucasians in the GPR. Mortality rates among American-Indians persist despite special programs and initiatives aimed at reducing diabetes in these populations. Effective and immediate efforts are needed to address premature diabetes mortality among American-Indians in the GPR. PMID:25926992
Li, Zhucui; Lu, Yan; Guo, Yufeng; Cao, Haijie; Wang, Qinhong; Shui, Wenqing
2018-10-31
Data analysis represents a key challenge for untargeted metabolomics studies and it commonly requires extensive processing of more than thousands of metabolite peaks included in raw high-resolution MS data. Although a number of software packages have been developed to facilitate untargeted data processing, they have not been comprehensively scrutinized in the capability of feature detection, quantification and marker selection using a well-defined benchmark sample set. In this study, we acquired a benchmark dataset from standard mixtures consisting of 1100 compounds with specified concentration ratios including 130 compounds with significant variation of concentrations. Five software evaluated here (MS-Dial, MZmine 2, XCMS, MarkerView, and Compound Discoverer) showed similar performance in detection of true features derived from compounds in the mixtures. However, significant differences between untargeted metabolomics software were observed in relative quantification of true features in the benchmark dataset. MZmine 2 outperformed the other software in terms of quantification accuracy and it reported the most true discriminating markers together with the fewest false markers. Furthermore, we assessed selection of discriminating markers by different software using both the benchmark dataset and a real-case metabolomics dataset to propose combined usage of two software for increasing confidence of biomarker identification. Our findings from comprehensive evaluation of untargeted metabolomics software would help guide future improvements of these widely used bioinformatics tools and enable users to properly interpret their metabolomics results. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Goseva-Popstojanova, Katerina; Tyo, Jacob
2017-01-01
While some prior research work exists on characteristics of software faults (i.e., bugs) and failures, very little work has been published on analysis of software applications vulnerabilities. This paper aims to contribute towards filling that gap by presenting an empirical investigation of application vulnerabilities. The results are based on data extracted from issue tracking systems of two NASA missions. These data were organized in three datasets: Ground mission IVV issues, Flight mission IVV issues, and Flight mission Developers issues. In each dataset, we identified security related software bugs and classified them in specific vulnerability classes. Then, we created the security vulnerability profiles, i.e., determined where and when the security vulnerabilities were introduced and what were the dominating vulnerabilities classes. Our main findings include: (1) In IVV issues datasets the majority of vulnerabilities were code related and were introduced in the Implementation phase. (2) For all datasets, around 90 of the vulnerabilities were located in two to four subsystems. (3) Out of 21 primary classes, five dominated: Exception Management, Memory Access, Other, Risky Values, and Unused Entities. Together, they contributed from 80 to 90 of vulnerabilities in each dataset.
Registrar Staging Assistant (SEER*RSA) - SEER
Use this site for cases diagnosed 2018 and forward to code Extent of Disease 2018, Summary Stage 2018, Site-Specific Data Items, and Grade. Use it for 2016 and 2017 cases to determine UICC TNM 7th edition stage, Collaborative Stage v.02.05.50, and Site-Specific predictive and prognostic factors.
Karalexi, Maria A; Georgakis, Marios K; Dessypris, Nick; Ryzhov, Anton; Zborovskaya, Anna; Dimitrova, Nadya; Zivkovic, Snezana; Eser, Sultan; Antunes, Luis; Sekerija, Mario; Zagar, Tina; Bastos, Joana; Demetriou, Anna; Agius, Domenic; Florea, Margareta; Coza, Daniela; Bouka, Evdoxia; Dana, Helen; Hatzipantelis, Emmanuel; Kourti, Maria; Moschovi, Maria; Polychronopoulou, Sophia; Stiakaki, Eftichia; Pourtsidis, Apostolos; Petridou, Eleni Th
2017-12-01
Childhood (0-14 years) lymphomas, nowadays, present a highly curable malignancy compared with other types of cancer. We used readily available cancer registration data to assess mortality and survival disparities among children residing in Southern-Eastern European (SEE) countries and those in the United States. Average age-standardized mortality rates and time trends of Hodgkin (HL) and non-Hodgkin (NHL; including Burkitt [BL]) lymphomas in 14 SEE cancer registries (1990-2014) and the Surveillance, Epidemiology, and End Results Program (SEER, United States; 1990-2012) were calculated. Survival patterns in a total of 8918 cases distinguishing also BL were assessed through Kaplan-Meier curves and multivariate Cox regression models. Variable, rather decreasing, mortality trends were noted among SEE. Rates were overall higher than that in SEER (1.02/10 6 ), which presented a sizeable (-4.8%, P = .0001) annual change. Additionally, remarkable survival improvements were manifested in SEER (10 years: 96%, 86%, and 90% for HL, NHL, and BL, respectively), whereas diverse, still lower, rates were noted in SEE. Non-HL was associated with a poorer outcome and an amphi-directional age-specific pattern; specifically, prognosis was inferior in children younger than 5 years than in those who are 10 to 14 years old from SEE (hazard ratio 1.58, 95% confidence interval 1.28-1.96) and superior in children who are 5 to 9 years old from SEER/United States (hazard ratio 0.63, 95% confidence interval 0.46-0.88) than in those who are 10 to 14 years old. In conclusion, higher SEE lymphoma mortality rates than those in SEER, but overall decreasing trends, were found. Despite significant survival gains among developed countries, there are still substantial geographic, disease subtype-specific, and age-specific outcome disparities pointing to persisting gaps in the implementation of new treatment modalities and indicating further research needs. Copyright © 2016 John Wiley & Sons, Ltd.
Georgakis, Marios K; Panagopoulou, Paraskevi; Papathoma, Paraskevi; Tragiannidis, Athanasios; Ryzhov, Anton; Zivkovic-Perisic, Snezana; Eser, Sultan; Taraszkiewicz, Łukasz; Sekerija, Mario; Žagar, Tina; Antunes, Luis; Zborovskaya, Anna; Bastos, Joana; Florea, Margareta; Coza, Daniela; Demetriou, Anna; Agius, Domenic; Strahinja, Rajko M; Sfakianos, Georgios; Nikas, Ioannis; Kosmidis, Sofia; Razis, Evangelia; Pourtsidis, Apostolos; Kantzanou, Maria; Dessypris, Nick; Petridou, Eleni Th
2017-11-01
To present incidence of central nervous system (CNS) tumours among adolescents and young adults (AYAs; 15-39 years) derived from registries of Southern and Eastern Europe (SEE) in comparison to the Surveillance, Epidemiology and End Results (SEER), US and explore changes due to etiological parameters or registration improvement via evaluating time trends. Diagnoses of 11,438 incident malignant CNS tumours in AYAs (1990-2014) were retrieved from 14 collaborating SEE cancer registries and 13,573 from the publicly available SEER database (1990-2012). Age-adjusted incidence rates (AIRs) were calculated; Poisson and joinpoint regression analyses were performed for temporal trends. The overall AIR of malignant CNS tumours among AYAs was higher in SEE (28.1/million) compared to SEER (24.7/million). Astrocytomas comprised almost half of the cases in both regions, albeit the higher proportion of unspecified cases in SEE registries (30% versus 2.5% in SEER). Similar were the age and gender distributions across SEE and SEER with a male-to-female ratio of 1.3 and an overall increase of incidence by age. Increasing temporal trends in incidence were documented in four SEE registries (Greater Poland, Portugal North, Turkey-Izmir and Ukraine) versus an annual decrease in Croatia (-2.5%) and a rather stable rate in SEER (-0.3%). This first report on descriptive epidemiology of AYAs malignant CNS tumours in the SEE area shows higher incidence rates as compared to the United States of America and variable temporal trends that may be linked to registration improvements. Hence, it emphasises the need for optimisation of cancer registration processes, as to enable the in-depth evaluation of the observed patterns by disease subtype. Copyright © 2017 Elsevier Ltd. All rights reserved.
Hooper, Michele; Wenkert, Deborah; Bitman, Bojena; Dias, Virgil C; Bartley, Yessenia
2013-10-02
Malignancy risk may be increased in chronic inflammatory conditions that are mediated by tumor necrosis factor (TNF), such as juvenile idiopathic arthritis (JIA), but the role of TNF in human cancer biology is unclear. In response to a 2011 United States Food & Drug Administration requirement of TNF blocker manufacturers, we evaluated reporting rates of all malignancies in patients =30 years old who received the TNF blocker etanercept. All malignancies in etanercept-exposed patients aged =30 years from the Amgen clinical trial database (CTD) and postmarketing global safety database (PMD) were reviewed. PMD reporting rates were generated using exposure information based on commercial sources. Age-specific incidence rates of malignancy for the general US population were generated from the Surveillance Epidemiology and End Results (SEER) database v7.0.9. There were 2 malignancies in the CTD: 1 each in etanercept and placebo/comparator arms (both in patients 18-30 years old). Postmarketing etanercept exposure was 231,404 patient-years (62,379 patient-years in patients 0-17 years; 168,485 patient-years in patients 18-30 years). Reporting rates of malignancy per 100,000 patient-years in the PMD and incidence rates in SEER were 32.0 and 15.9, respectively, for patients 0-17 years and 46.9 and 42.1 for patients 18-30 years old. Reporting rates were higher than SEER incidence rates for Hodgkin lymphoma in the 0-17 years age group. PMD reporting rates per 100,000 patient-years and SEER incidence rates per 100,000 person-years for Hodgkin lymphoma were 9.54 and 0.9, respectively, for patients 0-17 years and 1.8 and 4.2 for patients 18-30 years old. There were =5 cases of leukemia, lymphoma, melanoma, thyroid, and cervical cancers. Leukemia, non-Hodgkin lymphoma, melanoma, thyroid cancer, and cervical cancer rates were similar in the PMD and SEER. Overall PMD malignancy reporting rates in etanercept-treated patients 0-17 years appeared higher than incidence rates in SEER, attributable to rates of Hodgkin lymphoma. Comparison to patients with similar burden of disease cannot be made; JIA, particularly very active disease, may be a risk factor for lymphoma. No increased malignancy reporting rate in the PMD relative to SEER was observed in the young-adult age group.
Nabizadeh, Ramin; Valadi Amin, Maryam; Alimohammadi, Mahmood; Naddafi, Kazem; Mahvi, Amir Hossein; Yousefzadeh, Samira
2013-04-26
Developing a water quality index which is used to convert the water quality dataset into a single number is the most important task of most water quality monitoring programmes. As the water quality index setup is based on different local obstacles, it is not feasible to introduce a definite water quality index to reveal the water quality level. In this study, an innovative software application, the Iranian Water Quality Index Software (IWQIS), is presented in order to facilitate calculation of a water quality index based on dynamic weight factors, which will help users to compute the water quality index in cases where some parameters are missing from the datasets. A dataset containing 735 water samples of drinking water quality in different parts of the country was used to show the performance of this software using different criteria parameters. The software proved to be an efficient tool to facilitate the setup of water quality indices based on flexible use of variables and water quality databases.
Llamas, César; González, Manuel A; Hernández, Carmen; Vegas, Jesús
2016-10-01
Nearly every practical improvement in modeling human motion is well founded in a properly designed collection of data or datasets. These datasets must be made publicly available for the community could validate and accept them. It is reasonable to concede that a collective, guided enterprise could serve to devise solid and substantial datasets, as a result of a collaborative effort, in the same sense as the open software community does. In this way datasets could be complemented, extended and expanded in size with, for example, more individuals, samples and human actions. For this to be possible some commitments must be made by the collaborators, being one of them sharing the same data acquisition platform. In this paper, we offer an affordable open source hardware and software platform based on inertial wearable sensors in a way that several groups could cooperate in the construction of datasets through common software suitable for collaboration. Some experimental results about the throughput of the overall system are reported showing the feasibility of acquiring data from up to 6 sensors with a sampling frequency no less than 118Hz. Also, a proof-of-concept dataset is provided comprising sampled data from 12 subjects suitable for gait analysis. Copyright © 2016 Elsevier Inc. All rights reserved.
Westwick, Harrison J; Giguère, Jean-François; Shamji, Mohammed F
2016-01-01
Intradural spinal hemangioblastoma are infrequent, vascular, pathologically benign tumors occurring either sporadically or in association with von Hippel-Lindau disease along the neural axis. Described in fewer than 1,000 cases, literature is variable with respect to epidemiological factors associated with spinal hemangioblastoma and their treatment. The objective of this study was to evaluate the epidemiology of intradural spinal hemangioblastoma with the Surveillance, Epidemiology and End Results (SEER) database while also presenting an illustrative case. The SEER database was queried for cases of spinal hemangioblastoma between 2000 and 2010 with the use of SEER*Stat software. Incidence was evaluated as a function of age, sex and race. Survival was evaluated with the Cox proportionate hazards ratio using IBM SPSS software evaluating age, sex, location, treatment modality, pathology and number of primaries (p = 0.05). Descriptive statistics of the same factors were also calculated. The case of a 43-year-old patient with a surgical upper cervical intramedullary hemangioblastoma is also presented. In the data set between 2000 and 2010, there were 133 cases with an age-adjusted incidence of 0.014 (0.012-0.017) per 100,000 to the standard USA population. Hemangioblastoma was the tenth most common intradural spinal tumor type representing 2.1% (133 of 6,156) of all spinal tumors. There was no difference in incidence between men and women with an female:male rate ratio of 1.05 (0.73-1.50) with p = 0.86. The average age of patients was 48.0 (45.2-50.9) years, and a lower incidence was noted in patients <15 years compared to all other age groups (p < 0.05). There was no difference in incidence amongst the different races. Treatment included surgical resection in 106 (79.7%) cases, radiation with surgery in 7 (5.3%) cases, and radiation alone was used in only 1 (0.8%) case, and no treatment was performed in 17 (12.8%) cases. Mortality was noted in 12 (9%) cases, and median survival of 27.5 months (range 1-66 months) over the 10-year period. Mortality was attributable to the malignancy in 3 (2%) cases. There was no statistically significant different in Cox hazard ratios for mortality for sex, race, treatment modality, pathology or number of primaries. Spinal hemangioblastoma represent a small fraction of primary intradural spinal tumors, and this study did not identify any difference in incidence between genders. Surgical treatment alone was the most common treatment modality. Overall prognosis is good, with 9% observed mortality over the 10-year period, with 2% mortality attributable to the malignancy. © 2015 S. Karger AG, Basel.
The Most Common Geometric and Semantic Errors in CityGML Datasets
NASA Astrophysics Data System (ADS)
Biljecki, F.; Ledoux, H.; Du, X.; Stoter, J.; Soon, K. H.; Khoo, V. H. S.
2016-10-01
To be used as input in most simulation and modelling software, 3D city models should be geometrically and topologically valid, and semantically rich. We investigate in this paper what is the quality of currently available CityGML datasets, i.e. we validate the geometry/topology of the 3D primitives (Solid and MultiSurface), and we validate whether the semantics of the boundary surfaces of buildings is correct or not. We have analysed all the CityGML datasets we could find, both from portals of cities and on different websites, plus a few that were made available to us. We have thus validated 40M surfaces in 16M 3D primitives and 3.6M buildings found in 37 CityGML datasets originating from 9 countries, and produced by several companies with diverse software and acquisition techniques. The results indicate that CityGML datasets without errors are rare, and those that are nearly valid are mostly simple LOD1 models. We report on the most common errors we have found, and analyse them. One main observation is that many of these errors could be automatically fixed or prevented with simple modifications to the modelling software. Our principal aim is to highlight the most common errors so that these are not repeated in the future. We hope that our paper and the open-source software we have developed will help raise awareness for data quality among data providers and 3D GIS software producers.
Forkert, N D; Cheng, B; Kemmling, A; Thomalla, G; Fiehler, J
2014-01-01
The objective of this work is to present the software tool ANTONIA, which has been developed to facilitate a quantitative analysis of perfusion-weighted MRI (PWI) datasets in general as well as the subsequent multi-parametric analysis of additional datasets for the specific purpose of acute ischemic stroke patient dataset evaluation. Three different methods for the analysis of DSC or DCE PWI datasets are currently implemented in ANTONIA, which can be case-specifically selected based on the study protocol. These methods comprise a curve fitting method as well as a deconvolution-based and deconvolution-free method integrating a previously defined arterial input function. The perfusion analysis is extended for the purpose of acute ischemic stroke analysis by additional methods that enable an automatic atlas-based selection of the arterial input function, an analysis of the perfusion-diffusion and DWI-FLAIR mismatch as well as segmentation-based volumetric analyses. For reliability evaluation, the described software tool was used by two observers for quantitative analysis of 15 datasets from acute ischemic stroke patients to extract the acute lesion core volume, FLAIR ratio, perfusion-diffusion mismatch volume with manually as well as automatically selected arterial input functions, and follow-up lesion volume. The results of this evaluation revealed that the described software tool leads to highly reproducible results for all parameters if the automatic arterial input function selection method is used. Due to the broad selection of processing methods that are available in the software tool, ANTONIA is especially helpful to support image-based perfusion and acute ischemic stroke research projects.
A longitudinal dataset of five years of public activity in the Scratch online community.
Hill, Benjamin Mako; Monroy-Hernández, Andrés
2017-01-31
Scratch is a programming environment and an online community where young people can create, share, learn, and communicate. In collaboration with the Scratch Team at MIT, we created a longitudinal dataset of public activity in the Scratch online community during its first five years (2007-2012). The dataset comprises 32 tables with information on more than 1 million Scratch users, nearly 2 million Scratch projects, more than 10 million comments, more than 30 million visits to Scratch projects, and more. To help researchers understand this dataset, and to establish the validity of the data, we also include the source code of every version of the software that operated the website, as well as the software used to generate this dataset. We believe this is the largest and most comprehensive downloadable dataset of youth programming artifacts and communication.
Technical Report from Grant Recipient - City of Redlands
DOE Office of Scientific and Technical Information (OSTI.GOV)
Giorgianni, Kathleen Margaret
2016-05-26
The goals and objectives of the HVAC upgrades are to replace equipment as old as twenty-three (23) years in five different facilities. The project will upgrade some facilities from SEER ratings of 9 to SEER ratings of 14 at a savings of 556 kilowatt hours per ton (savings depends on specific size of the system).
Urbanization in Zambia. An International Urbanization Survey Report to the Ford Foundation.
ERIC Educational Resources Information Center
Simmance, Alan J. F.
This report reviews the "Seers Report," which contained policy guidelines for modern development planning in Zambia, and compares its findings to recent findings during the period 1963-1970. The Seers Report found that Zambia was the most urbanized country in Africa south of the Sahara (excluding South Africa). This report finds that…
Updates to FuncLab, a Matlab based GUI for handling receiver functions
NASA Astrophysics Data System (ADS)
Porritt, Robert W.; Miller, Meghan S.
2018-02-01
Receiver functions are a versatile tool commonly used in seismic imaging. Depending on how they are processed, they can be used to image discontinuity structure within the crust or mantle or they can be inverted for seismic velocity either directly or jointly with complementary datasets. However, modern studies generally require large datasets which can be challenging to handle; therefore, FuncLab was originally written as an interactive Matlab GUI to assist in handling these large datasets. This software uses a project database to allow interactive trace editing, data visualization, H-κ stacking for crustal thickness and Vp/Vs ratio, and common conversion point stacking while minimizing computational costs. Since its initial release, significant advances have been made in the implementation of web services and changes in the underlying Matlab platform have necessitated a significant revision to the software. Here, we present revisions to the software, including new features such as data downloading via irisFetch.m, receiver function calculations via processRFmatlab, on-the-fly cross-section tools, interface picking, and more. In the descriptions of the tools, we present its application to a test dataset in Michigan, Wisconsin, and neighboring areas following the passage of USArray Transportable Array. The software is made available online at https://robporritt.wordpress.com/software.
GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets.
Jeong, Seongmun; Kim, Jae-Yoon; Jeong, Soon-Chun; Kang, Sung-Taeg; Moon, Jung-Kyung; Kim, Namshin
2017-01-01
Selecting core subsets from plant genotype datasets is important for enhancing cost-effectiveness and to shorten the time required for analyses of genome-wide association studies (GWAS), and genomics-assisted breeding of crop species, etc. Recently, a large number of genetic markers (>100,000 single nucleotide polymorphisms) have been identified from high-density single nucleotide polymorphism (SNP) arrays and next-generation sequencing (NGS) data. However, there is no software available for picking out the efficient and consistent core subset from such a huge dataset. It is necessary to develop software that can extract genetically important samples in a population with coherence. We here present a new program, GenoCore, which can find quickly and efficiently the core subset representing the entire population. We introduce simple measures of coverage and diversity scores, which reflect genotype errors and genetic variations, and can help to select a sample rapidly and accurately for crop genotype dataset. Comparison of our method to other core collection software using example datasets are performed to validate the performance according to genetic distance, diversity, coverage, required system resources, and the number of selected samples. GenoCore selects the smallest, most consistent, and most representative core collection from all samples, using less memory with more efficient scores, and shows greater genetic coverage compared to the other software tested. GenoCore was written in R language, and can be accessed online with an example dataset and test results at https://github.com/lovemun/Genocore.
Cary, C; Odisho, A Y; Cooperberg, M R
2016-06-01
We sought to assess variation in the primary treatment of prostate cancer by examining the effect of population density of the county of residence on treatment for clinically localized prostate cancer and quantify variation in primary treatment attributable to the county and state level. A total 138 226 men with clinically localized prostate cancer in the Surveillance, Epidemiology and End Result (SEER) database in 2005 through 2008 were analyzed. The main association of interest was between prostate cancer treatment and population density using multilevel hierarchical logit models while accounting for the random effects of counties nested within SEER regions. To quantify the effect of county and SEER region on individual treatment, the percent of total variance in treatment attributable to county of residence and SEER site was estimated with residual intraclass correlation coefficients. Men with localized prostate cancer in metropolitan counties had 23% higher odds of being treated with surgery or radiation compared with men in rural counties, controlling for number of urologists per county as well as clinical and sociodemographic characteristics. Three percent (95% confidence interval (CI): 1.2-6.2%) of the total variation in treatment was attributable to SEER site, while 6% (95% CI: 4.3-9.0%) of variation was attributable to county of residence, adjusting for clinical and sociodemographic characteristics. Variation in treatment for localized prostate cancer exists for men living in different population-dense counties of the country. These findings highlight the importance of comparative effectiveness research to improve understanding of this variation and lead to a reduction in unwarranted variation.
The effect of multiple primary rules on cancer incidence rates and trends
Weir, Hannah K.; Johnson, Christopher J.; Ward, Kevin C.; Coleman, Michel P.
2018-01-01
Purpose An examination of multiple primary cancers can provide insight into the etiologic role of genes, the environment, and prior cancer treatment on a cancer patient’s risk of developing a subsequent cancer. Different rules for registering multiple primary cancers (MP) are used by cancer registries throughout the world making data comparisons difficult. Methods We evaluated the effect of SEER and IARC/IACR rules on cancer incidence rates and trends using data from the SEER Program. We estimated age-standardized incidence rate (ASIR) and trends (1975–2011) for the top 26 cancer categories using joinpoint regression analysis. Results ASIRs were higher using SEER compared to IARC/IACR rules for all cancers combined (3 %) and, in rank order, melanoma (9 %), female breast (7 %), urinary bladder (6 %), colon (4 %), kidney and renal pelvis (4 %), oral cavity and pharynx (3 %), lung and bronchus (2 %), and non-Hodgkin lymphoma (2 %). ASIR differences were largest for patients aged 65+ years. Trends were similar using both MP rules with the exception of cancers of the urinary bladder, and kidney and renal pelvis. Conclusions The choice of multiple primary coding rules effects incidence rates and trends. Compared to SEER MP coding rules, IARC/IACR rules are less complex, have not changed over time, and report fewer multiple primary cancers, particularly cancers that occur in paired organs, at the same anatomic site and with the same or related histologic type. Cancer registries collecting incidence data using SEER rules may want to consider including incidence rates and trends using IARC/IACR rules to facilitate international data comparisons. PMID:26809509
Joeng, Hee-Koung; Chen, Ming-Hui; Kang, Sangwook
2015-01-01
Discrete survival data are routinely encountered in many fields of study including behavior science, economics, epidemiology, medicine, and social science. In this paper, we develop a class of proportional exponentiated link transformed hazards (ELTH) models. We carry out a detailed examination of the role of links in fitting discrete survival data and estimating regression coefficients. Several interesting results are established regarding the choice of links and baseline hazards. We also characterize the conditions for improper survival functions and the conditions for existence of the maximum likelihood estimates under the proposed ELTH models. An extensive simulation study is conducted to examine the empirical performance of the parameter estimates under the Cox proportional hazards model by treating discrete survival times as continuous survival times, and the model comparison criteria, AIC and BIC, in determining links and baseline hazards. A SEER breast cancer dataset is analyzed in details to further demonstrate the proposed methodology. PMID:25772374
Pivot tables for mortality analysis, or who needs life tables anyway?
Wesley, David; Cox, Hugh F
2007-01-01
Actuarial life-table analysis has long been used by life insurance medical directors for mortality abstraction from clinical studies. Ironically, today's life actuary instead uses pivot tables to analyze mortality. Pivot tables (a feature/function in MS Excel) collapse various dimensions of data that were previously arranged in an "experience study" format. Summary statistics such as actual deaths, actual and expected mortality (usually measured in dollars), and calculated results such as actual to expected ratios, are then displayed in a 2-dimensional grid. The same analytic process, excluding the dollar focus, can be used for clinical mortality studies. For raw survival data, especially large datasets, this combination of experience study data and pivot tables has clear advantages over life-table analysis in both accuracy and flexibility. Using the SEER breast cancer data, we compare the results of life-table analysis and pivot-table analysis.
2013-01-01
Background Developing a water quality index which is used to convert the water quality dataset into a single number is the most important task of most water quality monitoring programmes. As the water quality index setup is based on different local obstacles, it is not feasible to introduce a definite water quality index to reveal the water quality level. Findings In this study, an innovative software application, the Iranian Water Quality Index Software (IWQIS), is presented in order to facilitate calculation of a water quality index based on dynamic weight factors, which will help users to compute the water quality index in cases where some parameters are missing from the datasets. Conclusion A dataset containing 735 water samples of drinking water quality in different parts of the country was used to show the performance of this software using different criteria parameters. The software proved to be an efficient tool to facilitate the setup of water quality indices based on flexible use of variables and water quality databases. PMID:24499556
2013-01-01
Background Malignancy risk may be increased in chronic inflammatory conditions that are mediated by tumor necrosis factor (TNF), such as juvenile idiopathic arthritis (JIA), but the role of TNF in human cancer biology is unclear. In response to a 2011 United States Food & Drug Administration requirement of TNF blocker manufacturers, we evaluated reporting rates of all malignancies in patients ≤30 years old who received the TNF blocker etanercept. Methods All malignancies in etanercept-exposed patients aged ≤30 years from the Amgen clinical trial database (CTD) and postmarketing global safety database (PMD) were reviewed. PMD reporting rates were generated using exposure information based on commercial sources. Age-specific incidence rates of malignancy for the general US population were generated from the Surveillance Epidemiology and End Results (SEER) database v7.0.9. Results There were 2 malignancies in the CTD: 1 each in etanercept and placebo/comparator arms (both in patients 18–30 years old). Postmarketing etanercept exposure was 231,404 patient-years (62,379 patient-years in patients 0–17 years; 168,485 patient-years in patients 18–30 years). Reporting rates of malignancy per 100,000 patient-years in the PMD and incidence rates in SEER were 32.0 and 15.9, respectively, for patients 0–17 years and 46.9 and 42.1 for patients 18–30 years old. Reporting rates were higher than SEER incidence rates for Hodgkin lymphoma in the 0-17 years age group. PMD reporting rates per 100,000 patient-years and SEER incidence rates per 100,000 person-years for Hodgkin lymphoma were 9.54 and 0.9, respectively, for patients 0–17 years and 1.8 and 4.2 for patients 18–30 years old. There were ≥5 cases of leukemia, lymphoma, melanoma, thyroid, and cervical cancers. Leukemia, non-Hodgkin lymphoma, melanoma, thyroid cancer, and cervical cancer rates were similar in the PMD and SEER. Conclusions Overall PMD malignancy reporting rates in etanercept-treated patients 0–17 years appeared higher than incidence rates in SEER, attributable to rates of Hodgkin lymphoma. Comparison to patients with similar burden of disease cannot be made; JIA, particularly very active disease, may be a risk factor for lymphoma. No increased malignancy reporting rate in the PMD relative to SEER was observed in the young-adult age group. PMID:24225257
Zhang, Adah S.; Ostrom, Quinn T.; Kruchko, Carol; Rogers, Lisa; Peereboom, David M.
2017-01-01
Abstract Background. Complete prevalence proportions illustrate the burden of disease in a population. This study estimates the 2010 complete prevalence of malignant primary brain tumors overall and by Central Brain Tumor Registry of the United States (CBTRUS) histology groups, and compares the brain tumor prevalence estimates to the complete prevalence of other common cancers as determined by the Surveillance, Epidemiology, and End Results Program (SEER) by age at prevalence (2010): children (0–14 y), adolescent and young adult (AYA) (15–39 y), and adult (40+ y). Methods. Complete prevalence proportions were estimated using a novel regression method extended from the Completeness Index Method, which combines survival and incidence data from multiple sources. In this study, two datasets, CBTRUS and SEER, were used to calculate complete prevalence estimates of interest. Results. Complete prevalence for malignant primary brain tumors was 47.59/100000 population (22.31, 48.49, and 57.75/100000 for child, AYA, and adult populations). The most prevalent cancers by age were childhood leukemia (36.65/100000), AYA melanoma of the skin (66.21/100000), and adult female breast (1949.00/100000). The most prevalent CBTRUS histologies in children and AYA were pilocytic astrocytoma (6.82/100000, 5.92/100000), and glioblastoma (12.76/100000) in adults. Conclusions. The relative impact of malignant primary brain tumors is higher among children than any other age group; it emerges as the second most prevalent cancer among children. Complete prevalence estimates for primary malignant brain tumors fills a gap in overall cancer knowledge, which provides critical information toward public health and health care planning, including treatment, decision making, funding, and advocacy programs. PMID:28039365
DOE Office of Scientific and Technical Information (OSTI.GOV)
Walker, Gary V.; Giordano, Sharon H.; Williams, Melanie
2013-07-15
Purpose: To evaluate, in the setting of breast cancer, the accuracy of registry radiation therapy (RT) coding compared with the gold standard of Medicare claims. Methods and Materials: Using Surveillance, Epidemiology, and End Results (SEER)–Medicare data, we identified 73,077 patients aged ≥66 years diagnosed with breast cancer in the period 2001-2007. Underascertainment (1 - sensitivity), sensitivity, specificity, κ, and χ{sup 2} were calculated for RT receipt determined by registry data versus claims. Multivariate logistic regression characterized patient, treatment, and geographic factors associated with underascertainment of RT. Findings in the SEER–Medicare registries were compared with three non-SEER registries (Florida, New York,more » and Texas). Results: In the SEER–Medicare registries, 41.6% (n=30,386) of patients received RT according to registry coding, versus 49.3% (n=36,047) according to Medicare claims (P<.001). Underascertainment of RT was more likely if patients resided in a newer SEER registry (odds ratio [OR] 1.70, 95% confidence interval [CI] 1.60-1.80; P<.001), rural county (OR 1.34, 95% CI 1.21-1.48; P<.001), or if RT was delayed (OR 1.006/day, 95% CI 1.006-1.007; P<.001). Underascertainment of RT receipt in SEER registries was 18.7% (95% CI 18.6-18.8%), compared with 44.3% (95% CI 44.0-44.5%) in non-SEER registries. Conclusions: Population-based tumor registries are highly variable in ascertainment of RT receipt and should be augmented with other data sources when evaluating quality of breast cancer care. Future work should identify opportunities for the radiation oncology community to partner with registries to improve accuracy of treatment data.« less
Greater absolute risk for all subtypes of breast cancer in the US than Malaysia.
Horne, Hisani N; Beena Devi, C R; Sung, Hyuna; Tang, Tieng Swee; Rosenberg, Philip S; Hewitt, Stephen M; Sherman, Mark E; Anderson, William F; Yang, Xiaohong R
2015-01-01
Hormone receptor (HR) negative breast cancers are relatively more common in low-risk than high-risk countries and/or populations. However, the absolute variations between these different populations are not well established given the limited number of cancer registries with incidence rate data by breast cancer subtype. We, therefore, used two unique population-based resources with molecular data to compare incidence rates for the 'intrinsic' breast cancer subtypes between a low-risk Asian population in Malaysia and high-risk non-Hispanic white population in the National Cancer Institute's surveillance, epidemiology, and end results 18 registries database (SEER 18). The intrinsic breast cancer subtypes were recapitulated with the joint expression of the HRs (estrogen receptor and progesterone receptor) and human epidermal growth factor receptor-2 (HER2). Invasive breast cancer incidence rates overall were fivefold greater in SEER 18 than in Malaysia. The majority of breast cancers were HR-positive in SEER 18 and HR-negative in Malaysia. Notwithstanding the greater relative distribution for HR-negative cancers in Malaysia, there was a greater absolute risk for all subtypes in SEER 18; incidence rates were nearly 7-fold higher for HR-positive and 2-fold higher for HR-negative cancers in SEER 18. Despite the well-established relative breast cancer differences between low-risk and high-risk countries and/or populations, there was a greater absolute risk for HR-positive and HR-negative subtypes in the US than Malaysia. Additional analytical studies are sorely needed to determine the factors responsible for the elevated risk of all subtypes of breast cancer in high-risk countries like the United States.
Georgakis, Marios K; Dessypris, Nick; Baka, Margarita; Moschovi, Maria; Papadakis, Vassilios; Polychronopoulou, Sophia; Kourti, Maria; Hatzipantelis, Emmanuel; Stiakaki, Eftichia; Dana, Helen; Bouka, Evdoxia; Antunes, Luis; Bastos, Joana; Coza, Daniela; Demetriou, Anna; Agius, Domenic; Eser, Sultan; Gheorghiu, Raluca; Sekerija, Mario; Trojanowski, Maciej; Zagar, Tina; Zborovskaya, Anna; Ryzhov, Anton; Tragiannidis, Athanassios; Panagopoulou, Paraskevi; Steliarova-Foucher, Eva; Petridou, Eleni Th
2018-05-15
Neuroblastoma comprises the most common neoplasm during infancy (first year of life). Our study describes incidence of neuroblastoma in Southern-Eastern Europe (SEE), including - for the first time - the Nationwide Registry for Childhood Hematological Malignancies and Solid Tumors (NARECHEM-ST)/Greece, compared to the US population, while controlling for human development index (HDI). Age-adjusted incidence rates (AIR) were calculated for 1,859 childhood (0-14 years) neuroblastoma cases, retrieved from 13 collaborating SEE registries (1990-2016), and were compared to those of SEER/US (N = 3,166; 1990-2012); temporal trends were assessed using Poisson regression and Joinpoint analyses. The overall AIR was significantly lower in SEE (10.1/million) compared to SEER (11.7 per million); the difference was maximum during infancy (43.7 vs. 53.3 per million, respectively), when approximately one-third of cases were diagnosed. Incidence rates of neuroblastoma at ages <1 and 1-4 years were positively associated with HDI, whereas lower median age at diagnosis was correlated with higher overall AIR. Distribution of primary site and histology was similar in SEE and SEER. Neuroblastoma was slightly more common among males compared to females (male-to-female ratio: 1.1), mainly among SEE infants. Incidence trends decreased in infants in Slovenia, Cyprus and SEER and increased in Ukraine and Belarus. The lower incidence in SEE compared to SEER, especially in infants living in low HDI countries possibly indicates a lower level of overdiagnosis in SEE. Hence, increases in incidence rates in infancy noted in some subpopulations should be carefully monitored to avoid the unnecessary costs health impacts of tumors that could potentially spontaneously regress. © 2017 UICC.
Retinoblastoma incidence patterns in the US Surveillance, Epidemiology, and End Results program.
Wong, Jeannette R; Tucker, Margaret A; Kleinerman, Ruth A; Devesa, Susan S
2014-04-01
IMPORTANCE Several studies have found no temporal or demographic differences in the incidence of retinoblastoma except for age at diagnosis, whereas other studies have reported variations in incidence by sex and race/ethnicity. OBJECTIVE To examine updated US retinoblastoma incidence patterns by sex, age at diagnosis, laterality, race/ethnicity, and year of diagnosis. DESIGN, SETTING, AND PARTICIPANTS The Surveillance, Epidemiology, and End Results (SEER) databases were examined for retinoblastoma incidence patterns by demographic and tumor characteristics. We studied 721 children in SEER 18 registries, 659 in SEER 13 registries, and 675 in SEER 9 registries. MAIN OUTCOMES AND MEASURES Incidence rates, incidence rate ratios (IRRs), and annual percent changes in rates. RESULTS During 2000-2009 in SEER 18, there was a significant excess of total retinoblastoma among boys compared with girls (IRR, 1.18; 95% CI, 1.02 to 1.36), in contrast to earlier reports of a female predominance. Bilateral retinoblastoma among white Hispanic boys was significantly elevated relative to white non-Hispanic boys (IRR, 1.81; 95% CI, 1.22 to 2.79) and white Hispanic girls (IRR, 1.75; 95% CI, 1.11 to 2.91) because of less rapid decreases in bilateral rates since the 1990s among white Hispanic boys than among the other groups. Retinoblastoma rates among white non-Hispanics decreased significantly since 1992 among those younger than 1 year and since 1998 among those with bilateral disease. CONCLUSIONS AND RELEVANCE Although changes in the availability of prenatal screening practices for retinoblastoma may have contributed to these incidence patterns, further research is necessary to determine their actual effect on the changing incidence of retinoblastoma in the US population. In addition, consistent with other cancers, an excess of retinoblastoma diagnosed in boys suggests a potential effect of sex on cancer origin.
Georgakis, Marios K; Karalexi, Maria A; Agius, Domenic; Antunes, Luis; Bastos, Joana; Coza, Daniela; Demetriou, Anna; Dimitrova, Nadya; Eser, Sultan; Florea, Margareta; Ryzhov, Anton; Sekerija, Mario; Žagar, Tina; Zborovskaya, Anna; Zivkovic, Snezana; Bouka, Evdoxia; Kanavidis, Prodromos; Dana, Helen; Hatzipantelis, Emmanuel; Kourti, Maria; Moschovi, Maria; Polychronopoulou, Sophia; Stiakaki, Eftichia; Kantzanou, Μaria; Pourtsidis, Apostolos; Petridou, Eleni Th
2016-11-01
To describe epidemiologic patterns of childhood (0-14 years) lymphomas in the Southern and Eastern European (SEE) region in comparison with the Surveillance, Epidemiology and End Results (SEER), USA, and explore tentative discrepancies. Childhood lymphomas were retrieved from 14 SEE registries (n = 4,702) and SEER (n = 4,416), diagnosed during 1990-2014; incidence rates were estimated and time trends were evaluated. Overall age-adjusted incidence rate was higher in SEE (16.9/10 6 ) compared to SEER (13.6/10 6 ), because of a higher incidence of Hodgkin (HL, 7.5/10 6 vs. 5.1/10 6 ) and Burkitt lymphoma (BL, 3.1 vs. 2.3/10 6 ), whereas the incidence of non-Hodgkin lymphoma (NHL) was overall identical (5.9/10 6 vs. 5.8/10 6 ), albeit variable among SEE. Incidence increased with age, except for BL which peaked at 4 years; HL in SEE also showed an early male-specific peak at 4 years. The male preponderance was more pronounced for BL and attenuated with increasing age for HL. Increasing trends were noted in SEER for total lymphomas and NHL, and was marginal for HL, as contrasted to the decreasing HL and NHL trends generally observed in SEE registries, with the exception of increasing HL incidence in Portugal; of note, BL incidence trend followed a male-specific increasing trend in SEE. Registry-based data reveal variable patterns and time trends of childhood lymphomas in SEE and SEER during the last decades, possibly reflecting diverse levels of socioeconomic development of the populations in the respective areas; optimization of registration process may allow further exploration of molecular characteristics of disease subtypes.
Synthetic ALSPAC longitudinal datasets for the Big Data VR project.
Avraam, Demetris; Wilson, Rebecca C; Burton, Paul
2017-01-01
Three synthetic datasets - of observation size 15,000, 155,000 and 1,555,000 participants, respectively - were created by simulating eleven cardiac and anthropometric variables from nine collection ages of the ALSAPC birth cohort study. The synthetic datasets retain similar data properties to the ALSPAC study data they are simulated from (co-variance matrices, as well as the mean and variance values of the variables) without including the original data itself or disclosing participant information. In this instance, the three synthetic datasets have been utilised in an academia-industry collaboration to build a prototype virtual reality data analysis software, but they could have a broader use in method and software development projects where sensitive data cannot be freely shared.
Epidemiology of Medicare Abuse: The Example of Power Wheelchairs R2
Goodwin, James S.; Nguyen-Oghalai, Tracy U.; Kuo, Yong-Fang; Ottenbacher, Kenneth J.
2007-01-01
Background Press reports and government investigations have uncovered widespread abuse in power wheelchair prescriptions reimbursed by Medicare, with specific targeting of minority neighborhoods for aggressive marketing. Objective We sought to determine the impact of neighborhood ethnic composition on power wheelchair prescriptions. Design The 5% non-cancer sample of Medicare recipients in the Surveillance, Epidemiology and End Results (SEER)-Medicare linked database, from 1994–2001 Setting SEER regions Participants Individuals covered by Medicare living in SEER regions without a cancer diagnosis Measurements Individual characteristics (age, gender, ethnicity, justifying diagnosis, and comorbidity), primary diagnoses, neighborhood characteristics (% black, % Hispanic, % with <12 years education and median income) and SEER region Results The rate of power wheelchair prescriptions increased 33 fold from 1994 to 2001, with a shift over time from justifying diagnoses more closely tied to mobility impairment, such as strokes, to less specific medical diagnoses, such as osteoarthritis. In multilevel, multivariate analyses, individuals living in neighborhoods with higher percentages of blacks or Hispanics were more likely to receive power wheelchairs (OR= 1.09 for each 10% increase in black residents and 1.23 for each 10% increase in Hispanic residents), after controlling for ethnicity and other characteristics at the individual level. Conclusion These results support allegations that minority neighborhoods have been specifically targeted by marketers promoting power wheelchairs. PMID:17302658
Research on cross - Project software defect prediction based on transfer learning
NASA Astrophysics Data System (ADS)
Chen, Ya; Ding, Xiaoming
2018-04-01
According to the two challenges in the prediction of cross-project software defects, the distribution differences between the source project and the target project dataset and the class imbalance in the dataset, proposing a cross-project software defect prediction method based on transfer learning, named NTrA. Firstly, solving the source project data's class imbalance based on the Augmented Neighborhood Cleaning Algorithm. Secondly, the data gravity method is used to give different weights on the basis of the attribute similarity of source project and target project data. Finally, a defect prediction model is constructed by using Trad boost algorithm. Experiments were conducted using data, come from NASA and SOFTLAB respectively, from a published PROMISE dataset. The results show that the method has achieved good values of recall and F-measure, and achieved good prediction results.
Zheng, Guangyong; Xu, Yaochen; Zhang, Xiujun; Liu, Zhi-Ping; Wang, Zhuo; Chen, Luonan; Zhu, Xin-Guang
2016-12-23
A gene regulatory network (GRN) represents interactions of genes inside a cell or tissue, in which vertexes and edges stand for genes and their regulatory interactions respectively. Reconstruction of gene regulatory networks, in particular, genome-scale networks, is essential for comparative exploration of different species and mechanistic investigation of biological processes. Currently, most of network inference methods are computationally intensive, which are usually effective for small-scale tasks (e.g., networks with a few hundred genes), but are difficult to construct GRNs at genome-scale. Here, we present a software package for gene regulatory network reconstruction at a genomic level, in which gene interaction is measured by the conditional mutual information measurement using a parallel computing framework (so the package is named CMIP). The package is a greatly improved implementation of our previous PCA-CMI algorithm. In CMIP, we provide not only an automatic threshold determination method but also an effective parallel computing framework for network inference. Performance tests on benchmark datasets show that the accuracy of CMIP is comparable to most current network inference methods. Moreover, running tests on synthetic datasets demonstrate that CMIP can handle large datasets especially genome-wide datasets within an acceptable time period. In addition, successful application on a real genomic dataset confirms its practical applicability of the package. This new software package provides a powerful tool for genomic network reconstruction to biological community. The software can be accessed at http://www.picb.ac.cn/CMIP/ .
Reference datasets for 2-treatment, 2-sequence, 2-period bioequivalence studies.
Schütz, Helmut; Labes, Detlew; Fuglsang, Anders
2014-11-01
It is difficult to validate statistical software used to assess bioequivalence since very few datasets with known results are in the public domain, and the few that are published are of moderate size and balanced. The purpose of this paper is therefore to introduce reference datasets of varying complexity in terms of dataset size and characteristics (balance, range, outlier presence, residual error distribution) for 2-treatment, 2-period, 2-sequence bioequivalence studies and to report their point estimates and 90% confidence intervals which companies can use to validate their installations. The results for these datasets were calculated using the commercial packages EquivTest, Kinetica, SAS and WinNonlin, and the non-commercial package R. The results of three of these packages mostly agree, but imbalance between sequences seems to provoke questionable results with one package, which illustrates well the need for proper software validation.
Transforming the Geocomputational Battlespace Framework with HDF5
2010-08-01
layout level, dataset arrays can be stored in chunks or tiles , enabling fast subsetting of large datasets, including compressed datasets. HDF software...Image Base (CIB) image of the AOI: an orthophoto made from rectified grayscale aerial images b. An IKONOS satellite image made up of 3 spectral
2010-01-01
Background The development of DNA microarrays has facilitated the generation of hundreds of thousands of transcriptomic datasets. The use of a common reference microarray design allows existing transcriptomic data to be readily compared and re-analysed in the light of new data, and the combination of this design with large datasets is ideal for 'systems'-level analyses. One issue is that these datasets are typically collected over many years and may be heterogeneous in nature, containing different microarray file formats and gene array layouts, dye-swaps, and showing varying scales of log2- ratios of expression between microarrays. Excellent software exists for the normalisation and analysis of microarray data but many data have yet to be analysed as existing methods struggle with heterogeneous datasets; options include normalising microarrays on an individual or experimental group basis. Our solution was to develop the Batch Anti-Banana Algorithm in R (BABAR) algorithm and software package which uses cyclic loess to normalise across the complete dataset. We have already used BABAR to analyse the function of Salmonella genes involved in the process of infection of mammalian cells. Results The only input required by BABAR is unprocessed GenePix or BlueFuse microarray data files. BABAR provides a combination of 'within' and 'between' microarray normalisation steps and diagnostic boxplots. When applied to a real heterogeneous dataset, BABAR normalised the dataset to produce a comparable scaling between the microarrays, with the microarray data in excellent agreement with RT-PCR analysis. When applied to a real non-heterogeneous dataset and a simulated dataset, BABAR's performance in identifying differentially expressed genes showed some benefits over standard techniques. Conclusions BABAR is an easy-to-use software tool, simplifying the simultaneous normalisation of heterogeneous two-colour common reference design cDNA microarray-based transcriptomic datasets. We show BABAR transforms real and simulated datasets to allow for the correct interpretation of these data, and is the ideal tool to facilitate the identification of differentially expressed genes or network inference analysis from transcriptomic datasets. PMID:20128918
Is Mammography Useful in Older Women
1999-06-01
mammography in women age 70 and older . Using the Linked Medicare-SEER Tumor Registry Database, created by the National Cancer Institute and the Health Care... Health Interview Survey) have documented that mammography use decreases with advancing age (11,21,22). In 1993, only 25% of women age 65 and older ...related health services research. The linked database contains cancer information on patients 65 years of age and older from NCI’s SEER Program and
SEER*Educate: Use of Abstracting Quality Index Scores to Monitor Improvement of All Employees.
Potts, Mary S; Scott, Tim; Hafterson, Jennifer L
2016-01-01
Integral parts of the Seattle-Puget Sound's Cancer Surveillance System registry's continuous improvement model include the incorporation of SEER*Educate into its training program for all staff and analyzing assessment results using the Abstracting Quality Index (AQI). The AQI offers a comprehensive measure of overall performance in SEER*Educate, which is a Web-based application used to personalize learning and diagnostically pinpoint each staff member's place on the AQI continuum. The assessment results are tallied from 6 abstracting standards within 2 domains: incidence reporting and coding accuracy. More than 100 data items are aligned to 1 or more of the 6 standards to build an aggregated score that is placed on a continuum for continuous improvement. The AQI score accurately identifies those individuals who have a good understanding of how to apply the 6 abstracting standards to reliably generate high quality abstracts.
Gabriel, Abigail; Batey, Jason; Capogreco, Joseph; Kimball, David; Walters, Andy; Tubbs, R Shane; Loukas, Marios
2014-08-25
Despite much epidemiological research on brain cancer in the United States, the etiology for the various subtypes remains elusive. The black population in the United States currently experiences lower incidence but higher survival rates when compared to other races. Thus, the aim of this study is to analyze the trends in incidence and survival for the 6 most common primary brain tumors in the black population of the United States. The Surveillance, Epidemiology, and End Results (SEER) database was utilized in this study to analyze the incidence and survival rates for the 6 most common brain tumor subtypes. Joinpoint 3.5.2 software was used to analyze trends in the incidence of diagnosis from 1973 to 2008. A Kaplan-Meier curve was generated to analyze mean time to death and survival at 60 months. Joinpoint analysis revealed that per year the incidence of brain cancer in the U.S. black population increased by 0.11 between 1973 and 1989. After this period, a moderate decrease by 0.06 per annum was observed from 1989 to 2008. Lymphoma was the most common primary tumor subtype for black individuals ages 20-34, and glioblastoma was identified as the most common tumor subtype for black individuals in the age groups of 35-49, 50-64, 65-79, and 80+. This population-based retrospective study of brain cancer in black adults in the United States revealed significant sex and age differences in the incidence of the 6 most common brain tumor subtypes from 1973 to 2008.
Dynamic Weather Routes Architecture Overview
NASA Technical Reports Server (NTRS)
Eslami, Hassan; Eshow, Michelle
2014-01-01
Dynamic Weather Routes Architecture Overview, presents the high level software architecture of DWR, based on the CTAS software framework and the Direct-To automation tool. The document also covers external and internal data flows, required dataset, changes to the Direct-To software for DWR, collection of software statistics, and the code structure.
BEANS - a software package for distributed Big Data analysis
NASA Astrophysics Data System (ADS)
Hypki, Arkadiusz
2018-03-01
BEANS software is a web based, easy to install and maintain, new tool to store and analyse in a distributed way a massive amount of data. It provides a clear interface for querying, filtering, aggregating, and plotting data from an arbitrary number of datasets. Its main purpose is to simplify the process of storing, examining and finding new relations in huge datasets. The software is an answer to a growing need of the astronomical community to have a versatile tool to store, analyse and compare the complex astrophysical numerical simulations with observations (e.g. simulations of the Galaxy or star clusters with the Gaia archive). However, this software was built in a general form and it is ready to use in any other research field. It can be used as a building block for other open source software too.
Design and Applications of Rapid Image Tile Producing Software Based on Mosaic Dataset
NASA Astrophysics Data System (ADS)
Zha, Z.; Huang, W.; Wang, C.; Tang, D.; Zhu, L.
2018-04-01
Map tile technology is widely used in web geographic information services. How to efficiently produce map tiles is key technology for rapid service of images on web. In this paper, a rapid producing software for image tile data based on mosaic dataset is designed, meanwhile, the flow of tile producing is given. Key technologies such as cluster processing, map representation, tile checking, tile conversion and compression in memory are discussed. Accomplished by software development and tested by actual image data, the results show that this software has a high degree of automation, would be able to effectively reducing the number of IO and improve the tile producing efficiency. Moreover, the manual operations would be reduced significantly.
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Carlo, Francesco; Gürsoy, Doğa; Ching, Daniel J.
There is a widening gap between the fast advancement of computational methods for tomographic reconstruction and their successful implementation in production software at various synchrotron facilities. This is due in part to the lack of readily available instrument datasets and phantoms representative of real materials for validation and comparison of new numerical methods. Recent advancements in detector technology made sub-second and multi-energy tomographic data collection possible [1], but also increased the demand to develop new reconstruction methods able to handle in-situ [2] and dynamic systems [3] that can be quickly incorporated in beamline production software [4]. The X-ray Tomography Datamore » Bank, tomoBank, provides a repository of experimental and simulated datasets with the aim to foster collaboration among computational scientists, beamline scientists, and experimentalists and to accelerate the development and implementation of tomographic reconstruction methods for synchrotron facility production software by providing easy access to challenging dataset and their descriptors.« less
CometQuest: A Rosetta Adventure
NASA Technical Reports Server (NTRS)
Leon, Nancy J.; Fisher, Diane K.; Novati, Alexander; Chmielewski, Artur B.; Fitzpatrick, Austin J.; Angrum, Andrea
2012-01-01
This software is a higher-performance implementation of tiled WMS, with integral support for KML and time-varying data. This software is compliant with the Open Geospatial WMS standard, and supports KML natively as a WMS return type, including support for the time attribute. Regionated KML wrappers are generated that match the existing tiled WMS dataset. Ping and JPG formats are supported, and the software is implemented as an Apache 2.0 module that supports a threading execution model that is capable of supporting very high request rates. The module intercepts and responds to WMS requests that match certain patterns and returns the existing tiles. If a KML format that matches an existing pyramid and tile dataset is requested, regionated KML is generated and returned to the requesting application. In addition, KML requests that do not match the existing tile datasets generate a KML response that includes the corresponding JPG WMS request, effectively adding KML support to a backing WMS server.
NASA Technical Reports Server (NTRS)
Plesea, Lucian
2012-01-01
This software is a higher-performance implementation of tiled WMS, with integral support for KML and time-varying data. This software is compliant with the Open Geospatial WMS standard, and supports KML natively as a WMS return type, including support for the time attribute. Regionated KML wrappers are generated that match the existing tiled WMS dataset. Ping and JPG formats are supported, and the software is implemented as an Apache 2.0 module that supports a threading execution model that is capable of supporting very high request rates. The module intercepts and responds to WMS requests that match certain patterns and returns the existing tiles. If a KML format that matches an existing pyramid and tile dataset is requested, regionated KML is generated and returned to the requesting application. In addition, KML requests that do not match the existing tile datasets generate a KML response that includes the corresponding JPG WMS request, effectively adding KML support to a backing WMS server.
Thyroid cancer incidence patterns in Sao Paulo, Brazil, and the U.S. SEER program, 1997-2008.
Veiga, Lene H S; Neta, Gila; Aschebrook-Kilfoy, Briseis; Ron, Elaine; Devesa, Susan S
2013-06-01
Thyroid cancer incidence has risen steadily over the last few decades in most of the developed world, but information on incidence trends in developing countries is limited. Sao Paulo, Brazil, has one of the highest rates of thyroid cancer worldwide, higher than in the United States. We examined thyroid cancer incidence patterns using data from the Sao Paulo Cancer Registry (SPCR) in Brazil and the National Cancer Institute's Surveillance Epidemiology End Results (SEER) program in the United States. Data on thyroid cancer cases diagnosed during 1997-2008 were obtained from SPCR (n=15,892) and SEER (n=42,717). Age-adjusted and age-specific rates were calculated by sex and histology and temporal patterns were compared between the two populations. Overall incidence rates increased over time in both populations and were higher in Sao Paulo than in the United States among females (SPCR/SEER incidence rate ratio [IRR]=1.65) and males (IRR=1.23). Papillary was the most common histology in both populations, followed by follicular and medullary carcinomas. Incidence rates by histology were consistently higher in Sao Paulo than in the United States, with the greatest differences for follicular (IRR=2.44) and medullary (IRR=3.29) carcinomas among females. The overall female/male IRR was higher in Sao Paulo (IRR=4.17) than in SEER (IRR=3.10) and did not change over time. Papillary rates rose over time more rapidly in Sao Paulo (annual percentage change=10.3% among females and 9.6% among males) than in the United States (6.9% and 5.7%, respectively). Regardless of sex, rates rose faster among younger people (<50 years) in Sao Paulo, but among older people (≥50 years) in the United States. The papillary to follicular carcinoma ratio rose from <3 to >8 among both Sao Paulo males and females, in contrast to increases from 9 to 12 and from 6 to 7 among U.S.males and females, respectively. Increased diagnostic activity may be contributing to the notable rise in incidence, mainly for papillary type, in both populations, but it is not likely to be the only reason. Differences in iodine nutrition status between Sao Paulo and the U.S. SEER population might have affected the observed incidence patterns.
Thyroid Cancer Incidence Patterns in Sao Paulo, Brazil, and the U.S. SEER Program, 1997–2008
Neta, Gila; Aschebrook-Kilfoy, Briseis; Ron, Elaine; Devesa, Susan S.
2013-01-01
Background Thyroid cancer incidence has risen steadily over the last few decades in most of the developed world, but information on incidence trends in developing countries is limited. Sao Paulo, Brazil, has one of the highest rates of thyroid cancer worldwide, higher than in the United States. We examined thyroid cancer incidence patterns using data from the Sao Paulo Cancer Registry (SPCR) in Brazil and the National Cancer Institute's Surveillance Epidemiology End Results (SEER) program in the United States. Methods Data on thyroid cancer cases diagnosed during 1997–2008 were obtained from SPCR (n=15,892) and SEER (n=42,717). Age-adjusted and age-specific rates were calculated by sex and histology and temporal patterns were compared between the two populations. Results Overall incidence rates increased over time in both populations and were higher in Sao Paulo than in the United States among females (SPCR/SEER incidence rate ratio [IRR]=1.65) and males (IRR=1.23). Papillary was the most common histology in both populations, followed by follicular and medullary carcinomas. Incidence rates by histology were consistently higher in Sao Paulo than in the United States, with the greatest differences for follicular (IRR=2.44) and medullary (IRR=3.29) carcinomas among females. The overall female/male IRR was higher in Sao Paulo (IRR=4.17) than in SEER (IRR=3.10) and did not change over time. Papillary rates rose over time more rapidly in Sao Paulo (annual percentage change=10.3% among females and 9.6% among males) than in the United States (6.9% and 5.7%, respectively). Regardless of sex, rates rose faster among younger people (<50 years) in Sao Paulo, but among older people (≥50 years) in the United States. The papillary to follicular carcinoma ratio rose from <3 to >8 among both Sao Paulo males and females, in contrast to increases from 9 to 12 and from 6 to 7 among U.S.males and females, respectively. Conclusions Increased diagnostic activity may be contributing to the notable rise in incidence, mainly for papillary type, in both populations, but it is not likely to be the only reason. Differences in iodine nutrition status between Sao Paulo and the U.S. SEER population might have affected the observed incidence patterns. PMID:23410185
Park, Henry S; Gross, Cary P; Makarov, Danil V; Yu, James B
2012-08-01
To evaluate the influence of immortal time bias on observational cohort studies of postoperative radiotherapy (PORT) and the effectiveness of sequential landmark analysis to account for this bias. First, we reviewed previous studies of the Surveillance, Epidemiology, and End Results (SEER) database to determine how frequently this bias was considered. Second, we used SEER to select three tumor types (glioblastoma multiforme, Stage IA-IVM0 gastric adenocarcinoma, and Stage II-III rectal carcinoma) for which prospective trials demonstrated an improvement in survival associated with PORT. For each tumor type, we calculated conditional survivals and adjusted hazard ratios of PORT vs. postoperative observation cohorts while restricting the sample at sequential monthly landmarks. Sixty-two percent of previous SEER publications evaluating PORT failed to use a landmark analysis. As expected, delivery of PORT for all three tumor types was associated with improved survival, with the largest associated benefit favoring PORT when all patients were included regardless of survival. Preselecting a cohort with a longer minimum survival sequentially diminished the apparent benefit of PORT. Although the majority of previous SEER articles do not correct for it, immortal time bias leads to altered estimates of PORT effectiveness, which are very sensitive to landmark selection. We suggest the routine use of sequential landmark analysis to account for this bias. Copyright © 2012 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, Henry S.; Gross, Cary P.; Makarov, Danil V.
2012-08-01
Purpose: To evaluate the influence of immortal time bias on observational cohort studies of postoperative radiotherapy (PORT) and the effectiveness of sequential landmark analysis to account for this bias. Methods and Materials: First, we reviewed previous studies of the Surveillance, Epidemiology, and End Results (SEER) database to determine how frequently this bias was considered. Second, we used SEER to select three tumor types (glioblastoma multiforme, Stage IA-IVM0 gastric adenocarcinoma, and Stage II-III rectal carcinoma) for which prospective trials demonstrated an improvement in survival associated with PORT. For each tumor type, we calculated conditional survivals and adjusted hazard ratios of PORTmore » vs. postoperative observation cohorts while restricting the sample at sequential monthly landmarks. Results: Sixty-two percent of previous SEER publications evaluating PORT failed to use a landmark analysis. As expected, delivery of PORT for all three tumor types was associated with improved survival, with the largest associated benefit favoring PORT when all patients were included regardless of survival. Preselecting a cohort with a longer minimum survival sequentially diminished the apparent benefit of PORT. Conclusions: Although the majority of previous SEER articles do not correct for it, immortal time bias leads to altered estimates of PORT effectiveness, which are very sensitive to landmark selection. We suggest the routine use of sequential landmark analysis to account for this bias.« less
Polednak, Anthony P
2013-01-01
Inaccuracies in primary liver cancer (ie, excluding intrahepatic bile duct [IHBD]) or IHBD cancer as the underlying cause of death on the death certificate vs the cancer site in a cancer registry should be considered in surveillance of mortality rates in the population. Concordance between cancer site on the death record (1999-2010) and diagnosis (1973-2010) in the database for 9 cancer registries of the Surveillance, Epidemiology, and End Results (SEER) Program was examined for decedents with only 1 cancer recorded. Overreporting of deaths coded to liver cancer (ie, lack of confirmation in SEER) was largely balanced by underreporting (ie, a cancer site other than liver cancer in SEER). For IHBD cancer, overreporting was much more frequent than underreporting. Using modified rates, based on the most accurate numerators available, had little impact on trends for liver cancer in the SEER population, which were similar to trends for the entire US population based on routine statistics. An increase in the death rate for IHBD cancer, however, was no longer evident after modification. The findings support the use of routine data on underlying cause of death for surveillance of trends in death rates for liver cancer but not for IHBD cancer. Additional population-based cancer registries could potentially be used for surveillance of recent and future trends in mortality rates from these cancers.
The computational structural mechanics testbed data library description
NASA Technical Reports Server (NTRS)
Stewart, Caroline B. (Compiler)
1988-01-01
The datasets created and used by the Computational Structural Mechanics Testbed software system are documented by this manual. A description of each dataset including its form, contents, and organization is presented.
The computational structural mechanics testbed data library description
NASA Technical Reports Server (NTRS)
Stewart, Caroline B. (Compiler)
1988-01-01
The datasets created and used by the Computational Structural Mechanics Testbed software system is documented by this manual. A description of each dataset including its form, contents, and organization is presented.
Architecture of the local spatial data infrastructure for regional climate change research
NASA Astrophysics Data System (ADS)
Titov, Alexander; Gordov, Evgeny
2013-04-01
Georeferenced datasets (meteorological databases, modeling and reanalysis results, etc.) are actively used in modeling and analysis of climate change for various spatial and temporal scales. Due to inherent heterogeneity of environmental datasets as well as their size which might constitute up to tens terabytes for a single dataset studies in the area of climate and environmental change require a special software support based on SDI approach. A dedicated architecture of the local spatial data infrastructure aiming at regional climate change analysis using modern web mapping technologies is presented. Geoportal is a key element of any SDI, allowing searching of geoinformation resources (datasets and services) using metadata catalogs, producing geospatial data selections by their parameters (data access functionality) as well as managing services and applications of cartographical visualization. It should be noted that due to objective reasons such as big dataset volume, complexity of data models used, syntactic and semantic differences of various datasets, the development of environmental geodata access, processing and visualization services turns out to be quite a complex task. Those circumstances were taken into account while developing architecture of the local spatial data infrastructure as a universal framework providing geodata services. So that, the architecture presented includes: 1. Effective in terms of search, access, retrieval and subsequent statistical processing, model of storing big sets of regional georeferenced data, allowing in particular to store frequently used values (like monthly and annual climate change indices, etc.), thus providing different temporal views of the datasets 2. General architecture of the corresponding software components handling geospatial datasets within the storage model 3. Metadata catalog describing in detail using ISO 19115 and CF-convention standards datasets used in climate researches as a basic element of the spatial data infrastructure as well as its publication according to OGC CSW (Catalog Service Web) specification 4. Computational and mapping web services to work with geospatial datasets based on OWS (OGC Web Services) standards: WMS, WFS, WPS 5. Geoportal as a key element of thematic regional spatial data infrastructure providing also software framework for dedicated web applications development To realize web mapping services Geoserver software is used since it provides natural WPS implementation as a separate software module. To provide geospatial metadata services GeoNetwork Opensource (http://geonetwork-opensource.org) product is planned to be used for it supports ISO 19115/ISO 19119/ISO 19139 metadata standards as well as ISO CSW 2.0 profile for both client and server. To implement thematic applications based on geospatial web services within the framework of local SDI geoportal the following open source software have been selected: 1. OpenLayers JavaScript library, providing basic web mapping functionality for the thin client such as web browser 2. GeoExt/ExtJS JavaScript libraries for building client-side web applications working with geodata services. The web interface developed will be similar to the interface of such popular desktop GIS applications, as uDIG, QuantumGIS etc. The work is partially supported by RF Ministry of Education and Science grant 8345, SB RAS Program VIII.80.2.1 and IP 131.
Adamo, Margaret Peggy; Boten, Jessica A; Coyle, Linda M; Cronin, Kathleen A; Lam, Clara J K; Negoita, Serban; Penberthy, Lynne; Stevens, Jennifer L; Ward, Kevin C
2017-02-15
Researchers have used prostate-specific antigen (PSA) values collected by central cancer registries to evaluate tumors for potential aggressive clinical disease. An independent study collecting PSA values suggested a high error rate (18%) related to implied decimal points. To evaluate the error rate in the Surveillance, Epidemiology, and End Results (SEER) program, a comprehensive review of PSA values recorded across all SEER registries was performed. Consolidated PSA values for eligible prostate cancer cases in SEER registries were reviewed and compared with text documentation from abstracted records. Four types of classification errors were identified: implied decimal point errors, abstraction or coding implementation errors, nonsignificant errors, and changes related to "unknown" values. A total of 50,277 prostate cancer cases diagnosed in 2012 were reviewed. Approximately 94.15% of cases did not have meaningful changes (85.85% correct, 5.58% with a nonsignificant change of <1 ng/mL, and 2.80% with no clinical change). Approximately 5.70% of cases had meaningful changes (1.93% due to implied decimal point errors, 1.54% due to abstract or coding errors, and 2.23% due to errors related to unknown categories). Only 419 of the original 50,277 cases (0.83%) resulted in a change in disease stage due to a corrected PSA value. The implied decimal error rate was only 1.93% of all cases in the current validation study, with a meaningful error rate of 5.81%. The reasons for the lower error rate in SEER are likely due to ongoing and rigorous quality control and visual editing processes by the central registries. The SEER program currently is reviewing and correcting PSA values back to 2004 and will re-release these data in the public use research file. Cancer 2017;123:697-703. © 2016 American Cancer Society. © 2016 The Authors. Cancer published by Wiley Periodicals, Inc. on behalf of American Cancer Society.
A daily global mesoscale ocean eddy dataset from satellite altimetry.
Faghmous, James H; Frenger, Ivy; Yao, Yuanshun; Warmka, Robert; Lindell, Aron; Kumar, Vipin
2015-01-01
Mesoscale ocean eddies are ubiquitous coherent rotating structures of water with radial scales on the order of 100 kilometers. Eddies play a key role in the transport and mixing of momentum and tracers across the World Ocean. We present a global daily mesoscale ocean eddy dataset that contains ~45 million mesoscale features and 3.3 million eddy trajectories that persist at least two days as identified in the AVISO dataset over a period of 1993-2014. This dataset, along with the open-source eddy identification software, extract eddies with any parameters (minimum size, lifetime, etc.), to study global eddy properties and dynamics, and to empirically estimate the impact eddies have on mass or heat transport. Furthermore, our open-source software may be used to identify mesoscale features in model simulations and compare them to observed features. Finally, this dataset can be used to study the interaction between mesoscale ocean eddies and other components of the Earth System.
A daily global mesoscale ocean eddy dataset from satellite altimetry
Faghmous, James H.; Frenger, Ivy; Yao, Yuanshun; Warmka, Robert; Lindell, Aron; Kumar, Vipin
2015-01-01
Mesoscale ocean eddies are ubiquitous coherent rotating structures of water with radial scales on the order of 100 kilometers. Eddies play a key role in the transport and mixing of momentum and tracers across the World Ocean. We present a global daily mesoscale ocean eddy dataset that contains ~45 million mesoscale features and 3.3 million eddy trajectories that persist at least two days as identified in the AVISO dataset over a period of 1993–2014. This dataset, along with the open-source eddy identification software, extract eddies with any parameters (minimum size, lifetime, etc.), to study global eddy properties and dynamics, and to empirically estimate the impact eddies have on mass or heat transport. Furthermore, our open-source software may be used to identify mesoscale features in model simulations and compare them to observed features. Finally, this dataset can be used to study the interaction between mesoscale ocean eddies and other components of the Earth System. PMID:26097744
Towards interoperable and reproducible QSAR analyses: Exchange of datasets.
Spjuth, Ola; Willighagen, Egon L; Guha, Rajarshi; Eklund, Martin; Wikberg, Jarl Es
2010-06-30
QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but also allows for analyzing the effect descriptors have on the statistical model's performance. The presented Bioclipse plugins equip scientists with graphical tools that make QSAR-ML easily accessible for the community.
Towards interoperable and reproducible QSAR analyses: Exchange of datasets
2010-01-01
Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but also allows for analyzing the effect descriptors have on the statistical model's performance. The presented Bioclipse plugins equip scientists with graphical tools that make QSAR-ML easily accessible for the community. PMID:20591161
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kurnik, Charles W; Jacobson, David; Metoyer, Jarred
The specific measure described here involves improving the overall efficiency in air-conditioning systems as a whole (compressor, evaporator, condenser, and supply fan). The efficiency rating is expressed as the energy efficiency ratio (EER), seasonal energy efficiency ratio (SEER), and integrated energy efficiency ratio (IEER). The higher the EER, SEER or IEER, the more efficient the unit is.
A multi-center study benchmarks software tools for label-free proteome quantification
Gillet, Ludovic C; Bernhardt, Oliver M.; MacLean, Brendan; Röst, Hannes L.; Tate, Stephen A.; Tsou, Chih-Chiang; Reiter, Lukas; Distler, Ute; Rosenberger, George; Perez-Riverol, Yasset; Nesvizhskii, Alexey I.; Aebersold, Ruedi; Tenzer, Stefan
2016-01-01
The consistent and accurate quantification of proteins by mass spectrometry (MS)-based proteomics depends on the performance of instruments, acquisition methods and data analysis software. In collaboration with the software developers, we evaluated OpenSWATH, SWATH2.0, Skyline, Spectronaut and DIA-Umpire, five of the most widely used software methods for processing data from SWATH-MS (sequential window acquisition of all theoretical fragment ion spectra), a method that uses data-independent acquisition (DIA) for label-free protein quantification. We analyzed high-complexity test datasets from hybrid proteome samples of defined quantitative composition acquired on two different MS instruments using different SWATH isolation windows setups. For consistent evaluation we developed LFQbench, an R-package to calculate metrics of precision and accuracy in label-free quantitative MS, and report the identification performance, robustness and specificity of each software tool. Our reference datasets enabled developers to improve their software tools. After optimization, all tools provided highly convergent identification and reliable quantification performance, underscoring their robustness for label-free quantitative proteomics. PMID:27701404
Explain the CERES file naming convention
Atmospheric Science Data Center
2014-12-08
... using the dataset name, configuration code and date information which make each file name unique. A Dataset name consists ...
COMPASS: a suite of pre- and post-search proteomics software tools for OMSSA
Wenger, Craig D.; Phanstiel, Douglas H.; Lee, M. Violet; Bailey, Derek J.; Coon, Joshua J.
2011-01-01
Here we present the Coon OMSSA Proteomic Analysis Software Suite (COMPASS): a free and open-source software pipeline for high-throughput analysis of proteomics data, designed around the Open Mass Spectrometry Search Algorithm. We detail a synergistic set of tools for protein database generation, spectral reduction, peptide false discovery rate analysis, peptide quantitation via isobaric labeling, protein parsimony and protein false discovery rate analysis, and protein quantitation. We strive for maximum ease of use, utilizing graphical user interfaces and working with data files in the original instrument vendor format. Results are stored in plain text comma-separated values files, which are easy to view and manipulate with a text editor or spreadsheet program. We illustrate the operation and efficacy of COMPASS through the use of two LC–MS/MS datasets. The first is a dataset of a highly annotated mixture of standard proteins and manually validated contaminants that exhibits the identification workflow. The second is a dataset of yeast peptides, labeled with isobaric stable isotope tags and mixed in known ratios, to demonstrate the quantitative workflow. For these two datasets, COMPASS performs equivalently or better than the current de facto standard, the Trans-Proteomic Pipeline. PMID:21298793
Projection of incidence rates to a larger population using ecologic variables.
Frey, C M; Feuer, E J; Timmel, M J
1994-09-15
There is wide acceptance of direct standardization of vital rates to adjust for differing age distributions according to the representation within age categories of some referent population. One can use a similar process to standardize, and subsequently project vital rates with respect to continuous, or ratio scale ecologic variables. We obtained from the National Cancer Institute's Surveillance, Epidemiology and End Results (SEER) programme, a 10 per cent subset of the total U.S. population, country-level breast cancer incidence during 1987-1989 for white women aged 50 and over. We applied regression coefficients that relate ecologic factors to SEER incidence to the full national complement of county-level information to produce an age and ecologic factor adjusted rate that may be more representative of the U.S. than the simple age-adjusted SEER incidence. We conducted a validation study using breast cancer mortality data available for the entire U.S. and which supports the appropriateness of this method for projecting rates.
Moore, Eider B; Poliakov, Andrew V; Lincoln, Peter; Brinkley, James F
2007-10-15
Three-dimensional (3-D) visualization of multimodality neuroimaging data provides a powerful technique for viewing the relationship between structure and function. A number of applications are available that include some aspect of 3-D visualization, including both free and commercial products. These applications range from highly specific programs for a single modality, to general purpose toolkits that include many image processing functions in addition to visualization. However, few if any of these combine both stand-alone and remote multi-modality visualization in an open source, portable and extensible tool that is easy to install and use, yet can be included as a component of a larger information system. We have developed a new open source multimodality 3-D visualization application, called MindSeer, that has these features: integrated and interactive 3-D volume and surface visualization, Java and Java3D for true cross-platform portability, one-click installation and startup, integrated data management to help organize large studies, extensibility through plugins, transparent remote visualization, and the ability to be integrated into larger information management systems. We describe the design and implementation of the system, as well as several case studies that demonstrate its utility. These case studies are available as tutorials or demos on the associated website: http://sig.biostr.washington.edu/projects/MindSeer. MindSeer provides a powerful visualization tool for multimodality neuroimaging data. Its architecture and unique features also allow it to be extended into other visualization domains within biomedicine.
Tissues from population-based cancer registries: a novel approach to increasing research potential.
Goodman, Marc T; Hernandez, Brenda Y; Hewitt, Stephen; Lynch, Charles F; Coté, Timothy R; Frierson, Henry F; Moskaluk, Christopher A; Killeen, Jeffrey L; Cozen, Wendy; Key, Charles R; Clegg, Limin; Reichman, Marsha; Hankey, Benjamin F; Edwards, Brenda
2005-07-01
Population-based cancer registries, such as those included in the Surveillance, Epidemiology, and End-Results (SEER) Program, offer tremendous research potential beyond traditional surveillance activities. We describe the expansion of SEER registries to gather formalin-fixed, paraffin-embedded tissue from cancer patients on a population basis. Population-based tissue banks have the advantage of providing an unbiased sampling frame for evaluating the public health impact of genes or protein targets that may be used for therapeutic or diagnostic purposes in defined communities. Such repositories provide a unique resource for testing new molecular classification schemes for cancer, validating new biologic markers of malignancy, prognosis and progression, assessing therapeutic targets, and measuring allele frequencies of cancer-associated genetic polymorphisms or germline mutations in representative samples. The assembly of tissue microarrays will allow for the use of rapid, large-scale protein-expression profiling of tumor samples while limiting depletion of this valuable resource. Access to biologic specimens through SEER registries will provide researchers with demographic, clinical, and risk factor information on cancer patients with assured data quality and completeness. Clinical outcome data, such as disease-free survival, can be correlated with previously validated prognostic markers. Furthermore, the anonymity of the study subject can be protected through rigorous standards of confidentiality. SEER-based tissue resources represent a step forward in true, population-based tissue repositories of tumors from US patients and may serve as a foundation for molecular epidemiology studies of cancer in this country.
Cancer Incidence in the U.S. Military Population: Comparison with Rates from the SEER Program
Zhu, Kangmin; Devesa, Susan S.; Wu, Hongyu; Zahm, Shelia H.; Jatoi, Ismail; Anderson, William F.; Peoples, George; Maxwell, Larry G.; Granger, Elder; Potter, John F.; McGlynn, Katherine A.
2009-01-01
The U.S. active-duty military population may differ from the U.S. general population in its exposure to cancer risk factors and access to medical care. Yet, it is not known if cancer incidence rates differ between these two populations. We therefore compared the incidence of four cancers common in U.S. adults (lung, colorectum, prostate, and breast cancers) and two cancers more common in U.S. young adults (testicular and cervical cancers) in the military and general populations. Data from the Department of Defense's Automated Central Tumor Registry (ACTUR) and the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) nine cancer registries for the years 1990-2004 for persons aged 20-59 years were analyzed. Incidence rates were significantly lower in the military population for colorectal cancer in white men, lung cancer in white and black men and white women, and cervical cancer in black women. In contrast, incidence rates of breast and prostate cancers were significantly higher in the military among both whites and blacks. Incidence rates of testicular cancer did not differ between ACTUR and SEER. Although the numbers of diagnoses among military personnel were relatively small for temporal trend analysis, we found a more prominent increase in prostate cancer in ACTUR than in SEER. Overall, these results suggest that cancer patterns may differ between military and non-military populations. Further studies are needed to confirm these findings and explore contributing factors. PMID:19505907
Software Framework for Development of Web-GIS Systems for Analysis of Georeferenced Geophysical Data
NASA Astrophysics Data System (ADS)
Okladnikov, I.; Gordov, E. P.; Titov, A. G.
2011-12-01
Georeferenced datasets (meteorological databases, modeling and reanalysis results, remote sensing products, etc.) are currently actively used in numerous applications including modeling, interpretation and forecast of climatic and ecosystem changes for various spatial and temporal scales. Due to inherent heterogeneity of environmental datasets as well as their size which might constitute up to tens terabytes for a single dataset at present studies in the area of climate and environmental change require a special software support. A dedicated software framework for rapid development of providing such support information-computational systems based on Web-GIS technologies has been created. The software framework consists of 3 basic parts: computational kernel developed using ITTVIS Interactive Data Language (IDL), a set of PHP-controllers run within specialized web portal, and JavaScript class library for development of typical components of web mapping application graphical user interface (GUI) based on AJAX technology. Computational kernel comprise of number of modules for datasets access, mathematical and statistical data analysis and visualization of results. Specialized web-portal consists of web-server Apache, complying OGC standards Geoserver software which is used as a base for presenting cartographical information over the Web, and a set of PHP-controllers implementing web-mapping application logic and governing computational kernel. JavaScript library aiming at graphical user interface development is based on GeoExt library combining ExtJS Framework and OpenLayers software. Based on the software framework an information-computational system for complex analysis of large georeferenced data archives was developed. Structured environmental datasets available for processing now include two editions of NCEP/NCAR Reanalysis, JMA/CRIEPI JRA-25 Reanalysis, ECMWF ERA-40 Reanalysis, ECMWF ERA Interim Reanalysis, MRI/JMA APHRODITE's Water Resources Project Reanalysis, meteorological observational data for the territory of the former USSR for the 20th century, and others. Current version of the system is already involved into a scientific research process. Particularly, recently the system was successfully used for analysis of Siberia climate changes and its impact in the region. The software framework presented allows rapid development of Web-GIS systems for geophysical data analysis thus providing specialists involved into multidisciplinary research projects with reliable and practical instruments for complex analysis of climate and ecosystems changes on global and regional scales. This work is partially supported by RFBR grants #10-07-00547, #11-05-01190, and SB RAS projects 4.31.1.5, 4.31.2.7, 4, 8, 9, 50 and 66.
Sonification Prototype for Space Physics
NASA Astrophysics Data System (ADS)
Candey, R. M.; Schertenleib, A. M.; Diaz Merced, W. L.
2005-12-01
As an alternative and adjunct to visual displays, auditory exploration of data via sonification (data controlled sound) and audification (audible playback of data samples) is promising for complex or rapidly/temporally changing visualizations, for data exploration of large datasets (particularly multi-dimensional datasets), and for exploring datasets in frequency rather than spatial dimensions (see also International Conferences on Auditory Display
TomoBank: a tomographic data repository for computational x-ray science
De Carlo, Francesco; Gürsoy, Doğa; Ching, Daniel J.; ...
2018-02-08
There is a widening gap between the fast advancement of computational methods for tomographic reconstruction and their successful implementation in production software at various synchrotron facilities. This is due in part to the lack of readily available instrument datasets and phantoms representative of real materials for validation and comparison of new numerical methods. Recent advancements in detector technology made sub-second and multi-energy tomographic data collection possible [1], but also increased the demand to develop new reconstruction methods able to handle in-situ [2] and dynamic systems [3] that can be quickly incorporated in beamline production software [4]. The X-ray Tomography Datamore » Bank, tomoBank, provides a repository of experimental and simulated datasets with the aim to foster collaboration among computational scientists, beamline scientists, and experimentalists and to accelerate the development and implementation of tomographic reconstruction methods for synchrotron facility production software by providing easy access to challenging dataset and their descriptors.« less
Ou, Judy Y; Fowler, Brynn; Ding, Qian; Kirchhoff, Anne C; Pappas, Lisa; Boucher, Kenneth; Akerley, Wallace; Wu, Yelena; Kaphingst, Kimberly; Harding, Garrett; Kepka, Deanna
2018-01-31
Lung cancer is the leading cause of cancer-related mortality in Utah despite having the nation's lowest smoking rate. Radon exposure and differences in lung cancer incidence between nonmetropolitan and metropolitan areas may explain this phenomenon. We compared smoking-adjusted lung cancer incidence rates between nonmetropolitan and metropolitan counties by predicted indoor radon level, sex, and cancer stage. We also compared lung cancer incidence by county classification between Utah and all SEER sites. SEER*Stat provided annual age-adjusted rates per 100,000 from 1991 to 2010 for each Utah county and all other SEER sites. County classification, stage, and sex were obtained from SEER*Stat. Smoking was obtained from Environmental Public Health Tracking estimates by Ortega et al. EPA provided low (< 2 pCi/L), moderate (2-4 pCi/L), and high (> 4 pCi/L) indoor radon levels for each county. Poisson models calculated overall, cancer stage, and sex-specific rates and p-values for smoking-adjusted and unadjusted models. LOESS smoothed trend lines compared incidence rates between Utah and all SEER sites by county classification. All metropolitan counties had moderate radon levels; 12 (63%) of the 19 nonmetropolitan counties had moderate predicted radon levels and 7 (37%) had high predicted radon levels. Lung cancer incidence rates were higher in nonmetropolitan counties than metropolitan counties (34.8 vs 29.7 per 100,000, respectively). Incidence of distant stage cancers was significantly higher in nonmetropolitan counties after controlling for smoking (16.7 vs 15.4, p = 0.02*). Incidence rates in metropolitan, moderate radon and nonmetropolitan, moderate radon counties were similar. Nonmetropolitan, high radon counties had a significantly higher incidence of lung cancer compared to nonmetropolitan, moderate radon counties after adjustment for smoking (41.7 vs 29.2, p < 0.0001*). Lung cancer incidence patterns in Utah were opposite of metropolitan/nonmetropolitan trends in other SEER sites. Lung cancer incidence and distant stage incidence rates were consistently higher in nonmetropolitan Utah counties than metropolitan counties, suggesting that limited access to preventative screenings may play a role in this disparity. Smoking-adjusted incidence rates in nonmetropolitan, high radon counties were significantly higher than moderate radon counties, suggesting that radon was also major contributor to lung cancer in these regions. National studies should account for geographic and environmental factors when examining nonmetropolitan/metropolitan differences in lung cancer.
Khan, Hafiz; Saxena, Anshul; Perisetti, Abhilash; Rafiq, Aamrin; Gabbidon, Kemesha; Mende, Sarah; Lyuksyutova, Maria; Quesada, Kandi; Blakely, Summre; Torres, Tiffany; Afesse, Mahlet
2016-12-01
Background: Breast cancer is a worldwide public health concern and is the most prevalent type of cancer in women in the United States. This study concerned the best fit of statistical probability models on the basis of survival times for nine state cancer registries: California, Connecticut, Georgia, Hawaii, Iowa, Michigan, New Mexico, Utah, and Washington. Materials and Methods: A probability random sampling method was applied to select and extract records of 2,000 breast cancer patients from the Surveillance Epidemiology and End Results (SEER) database for each of the nine state cancer registries used in this study. EasyFit software was utilized to identify the best probability models by using goodness of fit tests, and to estimate parameters for various statistical probability distributions that fit survival data. Results: Statistical analysis for the summary of statistics is reported for each of the states for the years 1973 to 2012. Kolmogorov-Smirnov, Anderson-Darling, and Chi-squared goodness of fit test values were used for survival data, the highest values of goodness of fit statistics being considered indicative of the best fit survival model for each state. Conclusions: It was found that California, Connecticut, Georgia, Iowa, New Mexico, and Washington followed the Burr probability distribution, while the Dagum probability distribution gave the best fit for Michigan and Utah, and Hawaii followed the Gamma probability distribution. These findings highlight differences between states through selected sociodemographic variables and also demonstrate probability modeling differences in breast cancer survival times. The results of this study can be used to guide healthcare providers and researchers for further investigations into social and environmental factors in order to reduce the occurrence of and mortality due to breast cancer. Creative Commons Attribution License
Gogoshin, Grigoriy; Boerwinkle, Eric
2017-01-01
Abstract Bayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs. To overcome these and other obstacles, we present BNOmics, an improved algorithm and software toolkit for inferring and analyzing BNs from omics datasets. BNOmics aims at comprehensive systems biology—type data exploration, including both generating new biological hypothesis and testing and validating the existing ones. Novel aspects of the algorithm center around increasing scalability and applicability to varying data types (with different explicit and implicit distributional assumptions) within the same analysis framework. An output and visualization interface to widely available graph-rendering software is also included. Three diverse applications are detailed. BNOmics was originally developed in the context of genetic epidemiology data and is being continuously optimized to keep pace with the ever-increasing inflow of available large-scale omics datasets. As such, the software scalability and usability on the less than exotic computer hardware are a priority, as well as the applicability of the algorithm and software to the heterogeneous datasets containing many data types—single-nucleotide polymorphisms and other genetic/epigenetic/transcriptome variables, metabolite levels, epidemiological variables, endpoints, and phenotypes, etc. PMID:27681505
Gogoshin, Grigoriy; Boerwinkle, Eric; Rodin, Andrei S
2017-04-01
Bayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs. To overcome these and other obstacles, we present BNOmics, an improved algorithm and software toolkit for inferring and analyzing BNs from omics datasets. BNOmics aims at comprehensive systems biology-type data exploration, including both generating new biological hypothesis and testing and validating the existing ones. Novel aspects of the algorithm center around increasing scalability and applicability to varying data types (with different explicit and implicit distributional assumptions) within the same analysis framework. An output and visualization interface to widely available graph-rendering software is also included. Three diverse applications are detailed. BNOmics was originally developed in the context of genetic epidemiology data and is being continuously optimized to keep pace with the ever-increasing inflow of available large-scale omics datasets. As such, the software scalability and usability on the less than exotic computer hardware are a priority, as well as the applicability of the algorithm and software to the heterogeneous datasets containing many data types-single-nucleotide polymorphisms and other genetic/epigenetic/transcriptome variables, metabolite levels, epidemiological variables, endpoints, and phenotypes, etc.
Cost-Sensitive Radial Basis Function Neural Network Classifier for Software Defect Prediction
Venkatesan, R.
2016-01-01
Effective prediction of software modules, those that are prone to defects, will enable software developers to achieve efficient allocation of resources and to concentrate on quality assurance activities. The process of software development life cycle basically includes design, analysis, implementation, testing, and release phases. Generally, software testing is a critical task in the software development process wherein it is to save time and budget by detecting defects at the earliest and deliver a product without defects to the customers. This testing phase should be carefully operated in an effective manner to release a defect-free (bug-free) software product to the customers. In order to improve the software testing process, fault prediction methods identify the software parts that are more noted to be defect-prone. This paper proposes a prediction approach based on conventional radial basis function neural network (RBFNN) and the novel adaptive dimensional biogeography based optimization (ADBBO) model. The developed ADBBO based RBFNN model is tested with five publicly available datasets from the NASA data program repository. The computed results prove the effectiveness of the proposed ADBBO-RBFNN classifier approach with respect to the considered metrics in comparison with that of the early predictors available in the literature for the same datasets. PMID:27738649
Cost-Sensitive Radial Basis Function Neural Network Classifier for Software Defect Prediction.
Kumudha, P; Venkatesan, R
Effective prediction of software modules, those that are prone to defects, will enable software developers to achieve efficient allocation of resources and to concentrate on quality assurance activities. The process of software development life cycle basically includes design, analysis, implementation, testing, and release phases. Generally, software testing is a critical task in the software development process wherein it is to save time and budget by detecting defects at the earliest and deliver a product without defects to the customers. This testing phase should be carefully operated in an effective manner to release a defect-free (bug-free) software product to the customers. In order to improve the software testing process, fault prediction methods identify the software parts that are more noted to be defect-prone. This paper proposes a prediction approach based on conventional radial basis function neural network (RBFNN) and the novel adaptive dimensional biogeography based optimization (ADBBO) model. The developed ADBBO based RBFNN model is tested with five publicly available datasets from the NASA data program repository. The computed results prove the effectiveness of the proposed ADBBO-RBFNN classifier approach with respect to the considered metrics in comparison with that of the early predictors available in the literature for the same datasets.
LeMasters, Traci J; Madhavan, Suresh S; Sambamoorthi, Usha; Vyas, Ami M
2017-07-01
Although breast cancer is most prevalent among older women, the majority are diagnosed at an early stage. When diagnosed at an early stage, women have the option of breast-conserving surgery (BCS) plus radiation therapy (RT) or mastectomy for the treatment of early-stage breast cancer (ESBC). Omission of RT when receiving BCS increases the risk for recurrence and poor survival. Yet, a small subset of older women may omit RT after BCS. This study examines the current patterns of local treatment for ESBC among older women. This study conducted a retrospective observational analysis using the Surveillance, Epidemiology, and End Results (SEER)-Medicare linked dataset of women age ≥66 diagnosed with stage I-II breast cancer in 2003-2009. SEER-Medicare data was additionally linked with data from the Area Resource File (ARF) to examine the association between area-level healthcare resources and treatment. Two logistic regression models were used to estimate how study factors were associated with receiving (1) BCS versus BCS+RT and (2) Mastectomy versus BCS+RT. A stratified analysis was also conducted among women aged <70 years. Among 45,924 patients, 55% received BCS+RT, 23% received mastectomy, and 22% received BCS only. Women of increasing age, comorbidity, primary care provider visits, stage II disease, and nonwhite race were more likely to have mastectomy or BCS only, than BCS+RT. Women diagnosed in 2004-2006, treated by an oncology surgeon, residing in metro areas, areas of greater education and income, were less likely to receive mastectomy or BCS only, than BCS+RT. While women aged <70 years were more likely to receive BCS+RT, socioeconomic and physician specialties were associated with receiving BCS only. Over half of older women with ESBC initially receive BCS+RT. The likelihood for mastectomy and BCS only increases with age, comorbidity, and vulnerable socio-demographic characteristics. Findings demonstrate continued treatment disparities among certain vulnerable populations.
Mariotto, Angela B.; Woloshin, Steven; Schwartz, Lisa M.
2014-01-01
Background To isolate progress against cancer from changes in competing causes of death, population cancer registries have traditionally reported cancer prognosis (net measures). But clinicians and cancer patients generally want to understand actual prognosis (crude measures): the chance of surviving, dying from the specific cancer and from competing causes of death in a given time period. Objective To compare cancer and actual prognosis in the United States for four leading cancers—lung, breast, prostate, and colon—by age, comorbidity, and cancer stage and to provide templates to help patients, clinicians, and researchers understand actual prognosis. Method Using population-based registry data from the Surveillance, Epidemiology, and End Results (SEER) Program, we calculated cancer prognosis (relative survival) and actual prognosis (five-year overall survival and the “crude” probability of dying from cancer and competing causes) for three important prognostic determinants (age, comorbidity [Charlson-score from 2012 SEER-Medicare linkage dataset] and cancer stage at diagnosis). Result For younger, healthier, and earlier stage cancer patients, cancer and actual prognosis estimates were quite similar. For older and sicker patients, these prognosis estimates differed substantially. For example, the five-year overall survival for an 85-year-old patient with colorectal cancer is 54% (cancer prognosis) versus 22% (actual prognosis)—the difference reflecting the patient’s substantial chance of dying from competing causes. The corresponding five-year chances of dying from the patient’s cancer are 46% versus 37%. Although age and comorbidity lowered actual prognosis, stage at diagnosis was the most powerful factor: The five-year chance of colon cancer death was 10% for localized stage and 83% for distant stage. Conclusion Both cancer and actual prognosis measures are important. Cancer registries should routinely report both cancer and actual prognosis to help clinicians and researchers understand the difference between these measures and what question they can and cannot answer. We encourage them to use formats like the ones presented in this paper to communicate them clearly. PMID:25417239
Howlader, Nadia; Mariotto, Angela B; Woloshin, Steven; Schwartz, Lisa M
2014-11-01
To isolate progress against cancer from changes in competing causes of death, population cancer registries have traditionally reported cancer prognosis (net measures). But clinicians and cancer patients generally want to understand actual prognosis (crude measures): the chance of surviving, dying from the specific cancer and from competing causes of death in a given time period. To compare cancer and actual prognosis in the United States for four leading cancers-lung, breast, prostate, and colon-by age, comorbidity, and cancer stage and to provide templates to help patients, clinicians, and researchers understand actual prognosis. Using population-based registry data from the Surveillance, Epidemiology, and End Results (SEER) Program, we calculated cancer prognosis (relative survival) and actual prognosis (five-year overall survival and the "crude" probability of dying from cancer and competing causes) for three important prognostic determinants (age, comorbidity [Charlson-score from 2012 SEER-Medicare linkage dataset] and cancer stage at diagnosis). For younger, healthier, and earlier stage cancer patients, cancer and actual prognosis estimates were quite similar. For older and sicker patients, these prognosis estimates differed substantially. For example, the five-year overall survival for an 85-year-old patient with colorectal cancer is 54% (cancer prognosis) versus 22% (actual prognosis)-the difference reflecting the patient's substantial chance of dying from competing causes. The corresponding five-year chances of dying from the patient's cancer are 46% versus 37%. Although age and comorbidity lowered actual prognosis, stage at diagnosis was the most powerful factor: The five-year chance of colon cancer death was 10% for localized stage and 83% for distant stage. Both cancer and actual prognosis measures are important. Cancer registries should routinely report both cancer and actual prognosis to help clinicians and researchers understand the difference between these measures and what question they can and cannot answer. We encourage them to use formats like the ones presented in this paper to communicate them clearly. Published by Oxford University Press 2014.
Silvestri, Valentina; Barrowdale, Daniel; Mulligan, Anna Marie; Neuhausen, Susan L; Fox, Stephen; Karlan, Beth Y; Mitchell, Gillian; James, Paul; Thull, Darcy L; Zorn, Kristin K; Carter, Natalie J; Nathanson, Katherine L; Domchek, Susan M; Rebbeck, Timothy R; Ramus, Susan J; Nussbaum, Robert L; Olopade, Olufunmilayo I; Rantala, Johanna; Yoon, Sook-Yee; Caligo, Maria A; Spugnesi, Laura; Bojesen, Anders; Pedersen, Inge Sokilde; Thomassen, Mads; Jensen, Uffe Birk; Toland, Amanda Ewart; Senter, Leigha; Andrulis, Irene L; Glendon, Gord; Hulick, Peter J; Imyanitov, Evgeny N; Greene, Mark H; Mai, Phuong L; Singer, Christian F; Rappaport-Fuerhauser, Christine; Kramer, Gero; Vijai, Joseph; Offit, Kenneth; Robson, Mark; Lincoln, Anne; Jacobs, Lauren; Machackova, Eva; Foretova, Lenka; Navratilova, Marie; Vasickova, Petra; Couch, Fergus J; Hallberg, Emily; Ruddy, Kathryn J; Sharma, Priyanka; Kim, Sung-Won; Teixeira, Manuel R; Pinto, Pedro; Montagna, Marco; Matricardi, Laura; Arason, Adalgeir; Johannsson, Oskar Th; Barkardottir, Rosa B; Jakubowska, Anna; Lubinski, Jan; Izquierdo, Angel; Pujana, Miguel Angel; Balmaña, Judith; Diez, Orland; Ivady, Gabriella; Papp, Janos; Olah, Edith; Kwong, Ava; Nevanlinna, Heli; Aittomäki, Kristiina; Perez Segura, Pedro; Caldes, Trinidad; Van Maerken, Tom; Poppe, Bruce; Claes, Kathleen B M; Isaacs, Claudine; Elan, Camille; Lasset, Christine; Stoppa-Lyonnet, Dominique; Barjhoux, Laure; Belotti, Muriel; Meindl, Alfons; Gehrig, Andrea; Sutter, Christian; Engel, Christoph; Niederacher, Dieter; Steinemann, Doris; Hahnen, Eric; Kast, Karin; Arnold, Norbert; Varon-Mateeva, Raymonda; Wand, Dorothea; Godwin, Andrew K; Evans, D Gareth; Frost, Debra; Perkins, Jo; Adlard, Julian; Izatt, Louise; Platte, Radka; Eeles, Ros; Ellis, Steve; Hamann, Ute; Garber, Judy; Fostira, Florentia; Fountzilas, George; Pasini, Barbara; Giannini, Giuseppe; Rizzolo, Piera; Russo, Antonio; Cortesi, Laura; Papi, Laura; Varesco, Liliana; Palli, Domenico; Zanna, Ines; Savarese, Antonella; Radice, Paolo; Manoukian, Siranoush; Peissel, Bernard; Barile, Monica; Bonanni, Bernardo; Viel, Alessandra; Pensotti, Valeria; Tommasi, Stefania; Peterlongo, Paolo; Weitzel, Jeffrey N; Osorio, Ana; Benitez, Javier; McGuffog, Lesley; Healey, Sue; Gerdes, Anne-Marie; Ejlertsen, Bent; Hansen, Thomas V O; Steele, Linda; Ding, Yuan Chun; Tung, Nadine; Janavicius, Ramunas; Goldgar, David E; Buys, Saundra S; Daly, Mary B; Bane, Anita; Terry, Mary Beth; John, Esther M; Southey, Melissa; Easton, Douglas F; Chenevix-Trench, Georgia; Antoniou, Antonis C; Ottini, Laura
2016-02-09
BRCA1 and, more commonly, BRCA2 mutations are associated with increased risk of male breast cancer (MBC). However, only a paucity of data exists on the pathology of breast cancers (BCs) in men with BRCA1/2 mutations. Using the largest available dataset, we determined whether MBCs arising in BRCA1/2 mutation carriers display specific pathologic features and whether these features differ from those of BRCA1/2 female BCs (FBCs). We characterised the pathologic features of 419 BRCA1/2 MBCs and, using logistic regression analysis, contrasted those with data from 9675 BRCA1/2 FBCs and with population-based data from 6351 MBCs in the Surveillance, Epidemiology, and End Results (SEER) database. Among BRCA2 MBCs, grade significantly decreased with increasing age at diagnosis (P = 0.005). Compared with BRCA2 FBCs, BRCA2 MBCs were of significantly higher stage (P for trend = 2 × 10(-5)) and higher grade (P for trend = 0.005) and were more likely to be oestrogen receptor-positive [odds ratio (OR) 10.59; 95 % confidence interval (CI) 5.15-21.80] and progesterone receptor-positive (OR 5.04; 95 % CI 3.17-8.04). With the exception of grade, similar patterns of associations emerged when we compared BRCA1 MBCs and FBCs. BRCA2 MBCs also presented with higher grade than MBCs from the SEER database (P for trend = 4 × 10(-12)). On the basis of the largest series analysed to date, our results show that BRCA1/2 MBCs display distinct pathologic characteristics compared with BRCA1/2 FBCs, and we identified a specific BRCA2-associated MBC phenotype characterised by a variable suggesting greater biological aggressiveness (i.e., high histologic grade). These findings could lead to the development of gender-specific risk prediction models and guide clinical strategies appropriate for MBC management.
Palmblad, Magnus; van der Burgt, Yuri E M; Dalebout, Hans; Derks, Rico J E; Schoenmaker, Bart; Deelder, André M
2009-05-02
Accurate mass determination enhances peptide identification in mass spectrometry based proteomics. We here describe the combination of two previously published open source software tools to improve mass measurement accuracy in Fourier transform ion cyclotron resonance mass spectrometry (FTICRMS). The first program, msalign, aligns one MS/MS dataset with one FTICRMS dataset. The second software, recal2, uses peptides identified from the MS/MS data for automated internal calibration of the FTICR spectra, resulting in sub-ppm mass measurement errors.
MMX-I: data-processing software for multimodal X-ray imaging and tomography.
Bergamaschi, Antoine; Medjoubi, Kadda; Messaoudi, Cédric; Marco, Sergio; Somogyi, Andrea
2016-05-01
A new multi-platform freeware has been developed for the processing and reconstruction of scanning multi-technique X-ray imaging and tomography datasets. The software platform aims to treat different scanning imaging techniques: X-ray fluorescence, phase, absorption and dark field and any of their combinations, thus providing an easy-to-use data processing tool for the X-ray imaging user community. A dedicated data input stream copes with the input and management of large datasets (several hundred GB) collected during a typical multi-technique fast scan at the Nanoscopium beamline and even on a standard PC. To the authors' knowledge, this is the first software tool that aims at treating all of the modalities of scanning multi-technique imaging and tomography experiments.
Bengtsson, Johan; Eriksson, K Martin; Hartmann, Martin; Wang, Zheng; Shenoy, Belle Damodara; Grelet, Gwen-Aëlle; Abarenkov, Kessy; Petri, Anna; Rosenblad, Magnus Alm; Nilsson, R Henrik
2011-10-01
The ribosomal small subunit (SSU) rRNA gene has emerged as an important genetic marker for taxonomic identification in environmental sequencing datasets. In addition to being present in the nucleus of eukaryotes and the core genome of prokaryotes, the gene is also found in the mitochondria of eukaryotes and in the chloroplasts of photosynthetic eukaryotes. These three sets of genes are conceptually paralogous and should in most situations not be aligned and analyzed jointly. To identify the origin of SSU sequences in complex sequence datasets has hitherto been a time-consuming and largely manual undertaking. However, the present study introduces Metaxa ( http://microbiology.se/software/metaxa/ ), an automated software tool to extract full-length and partial SSU sequences from larger sequence datasets and assign them to an archaeal, bacterial, nuclear eukaryote, mitochondrial, or chloroplast origin. Using data from reference databases and from full-length organelle and organism genomes, we show that Metaxa detects and scores SSU sequences for origin with very low proportions of false positives and negatives. We believe that this tool will be useful in microbial and evolutionary ecology as well as in metagenomics.
Cheung, Rex
2016-01-01
This study used receiver operating characteristic curve to analyze Surveillance, Epidemiology and End Results (SEER) adenosquamous carcinoma data to identify predictive models and potential disparities in outcome. This study analyzed socio-economic, staging and treatment factors available in the SEER database for adenosquamous carcinoma. For the risk modeling, each factor was fitted by a generalized linear model to predict the cause specific survival. An area under the receiver operating characteristic curve (ROC) was computed. Similar strata were combined to construct the most parsimonious models. A total of 20,712 patients diagnosed from 1973 to 2009 were included in this study. The mean follow up time (S.D.) was 54.2 (78.4) months. Some 2/3 of the patients were female. The mean (S.D.) age was 63 (13.8) years. SEER stage was the most predictive factor of outcome (ROC area of 0.71). 13.9% of the patients were un-staged and had risk of cause specific death of 61.3% that was higher than the 45.3% risk for the regional disease and lower than the 70.3% for metastatic disease. Sex, site, radiotherapy, and surgery had ROC areas of about 0.55-0.65. Rural residence and race contributed to socioeconomic disparity for treatment outcome. Radiotherapy was underused even with localized and regional stages when the intent was curative. This under use was most pronounced in older patients. Anatomic stage was predictive and useful in treatment selection. Under-staging may have contributed to poor outcome.
Li, Dong; Secher, Jan O.; Mashayekhi, Kaveh; Nielsen, Troels T.; Hyttel, Poul; Freude, Kristine K.
2017-01-01
ABSTRACT Previous research has shown that a subpopulation of cells within cultured human dermal fibroblasts, termed multilineage-differentiating stress enduring (Muse) cells, are preferentially reprogrammed into induced pluripotent stem cells. However, controversy exists over whether these cells are the only cells capable of being reprogrammed from a heterogeneous population of fibroblasts. Similarly, there is little research to suggest such cells may exist in embryonic tissues or other species. To address if such a cell population exists in pigs, we investigated porcine embryonic fibroblast populations (pEFs) and identified heterogeneous expression of several key cell surface markers. Strikingly, we discovered a small population of stage-specific embryonic antigen 1 positive cells (SSEA-1+) in Danish Landrace and Göttingen minipig pEFs, which were absent in the Yucatan pEFs. Furthermore, reprogramming of SSEA-1+ sorted pEFs led to higher reprogramming efficiency. Subsequent transcriptome profiling of the SSEA-1+ vs. the SSEA-1neg cell fraction revealed highly comparable gene signatures. However several genes that were found to be upregulated in the SSEA-1+ cells were similarly expressed in mesenchymal stem cells (MSCs). We therefore termed these cells SSEA-1 Expressing Enhanced Reprogramming (SEER) cells. Interestingly, SEER cells were more effective at differentiating into osteocytes and chondrocytes in vitro. We conclude that SEER cells are more amenable for reprogramming and that the expression of mesenchymal stem cell genes is advantageous in the reprogramming process. This data provides evidence supporting the elite theory and helps to delineate which cell types and specific genes are important for reprogramming in the pig. PMID:28426281
Influence of morphology on survival for non-Hodgkin lymphoma in Europe and the United States.
Sant, Milena; Allemani, Claudia; De Angelis, Roberta; Carbone, Antonino; de Sanjosè, Silvia; Gianni, Alessandro M; Giraldo, Pilar; Marchesi, Francesca; Marcos-Gragera, Rafael; Martos-Jiménez, Carmen; Maynadié, Marc; Raphael, Martine; Berrino, Franco
2008-03-01
We explored the influence of morphology on geographic differences in 5-year survival for non-Hodgkin lymphoma (NHL) diagnosed in 1990-1994 and followed for 5years: 16,955 cases from 27 EUROCARE-3 cancer registries, and 22,713 cases from 9 US SEER registries. Overall 5-year relative survival was 56.1% in EUROCARE west, 47.1% in EUROCARE east and 56.3% in SEER. Relative excess risk (RER) of death was 1.05 (95% confidence interval (CI) 1.01-1.10) in EUROCARE west, 1.52 (95% CI 1.44-1.60) in EUROCARE east (SEER reference). Excess risk of death was significantly above reference (diffuse B lymphoma) for Burkitt's and NOS lymphoma; not different for lymphoblastic and other T-cell; significantly below reference (in the order of decreasing relative excess risk) for NHL NOS, mantle cell/centrocytic, lymphoplasmacytic, follicular, small lymphocytic/chronic lymphocytic leukaemia, other specified NHL and cutaneous morphologies. Interpretation of marked variation in survival with morphology is complicated by classification inconsistencies. The completeness and standardisation of cancer registry morphology data needs to be improved.
Hourly simulation of a Ground-Coupled Heat Pump system
NASA Astrophysics Data System (ADS)
Naldi, C.; Zanchini, E.
2017-01-01
In this paper, we present a MATLAB code for the hourly simulation of a whole Ground-Coupled Heat Pump (GCHP) system, based on the g-functions previously obtained by Zanchini and Lazzari. The code applies both to on-off heat pumps and to inverter-driven ones. It is employed to analyse the effects of the inverter and of the total length of the Borehole Heat Exchanger (BHE) field on the mean seasonal COP (SCOP) and on the mean seasonal EER (SEER) of a GCHP system designed for a residential house with 6 apartments in Bologna, North-Center Italy, with dominant heating loads. A BHE field with 3 in line boreholes is considered, with length of each BHE either 75 m or 105 m. The results show that the increase of the BHE length yields a SCOP enhancement of about 7%, while the SEER remains nearly unchanged. The replacement of the on-off heat pump by an inverter-driven one yields a SCOP enhancement of about 30% and a SEER enhancement of about 50%. The results demonstrate the importance of employing inverter-driven heat pumps for GCHP systems.
Rane, Swati; Plassard, Andrew; Landman, Bennett A.; Claassen, Daniel O.; Donahue, Manus J.
2017-01-01
This work explores the feasibility of combining anatomical MRI data across two public repositories namely, the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Progressive Parkinson’s Markers Initiative (PPMI). We compared cortical thickness and subcortical volumes in cognitively normal older adults between datasets with distinct imaging parameters to assess if they would provide equivalent information. Three distinct datasets were identified. Major differences in data were scanner manufacturer and the use of magnetization inversion to enhance tissue contrast. Equivalent datasets, i.e., those providing similar volumetric measurements in cognitively normal controls, were identified in ADNI and PPMI. These were datasets obtained on the Siemens scanner with TI = 900 ms. Our secondary goal was to assess the agreement between subcortical volumes that are obtained with different software packages. Three subcortical measurement applications (FSL, FreeSurfer, and a recent multi-atlas approach) were compared. Our results show significant agreement in the measurements of caudate, putamen, pallidum, and hippocampus across the packages and poor agreement between measurements of accumbens and amygdala. This is likely due to their smaller size and lack of gray matter-white matter tissue contrast for accurate segmentation. This work provides a segue to combine imaging data from ADNI and PPMI to increase statistical power as well as to interrogate common mechanisms in disparate pathologies such as Alzheimer’s and Parkinson’s diseases. It lays the foundation for comparison of anatomical data acquired with disparate imaging parameters and analyzed with disparate software tools. Furthermore, our work partly explains the variability in the results of studies using different software packages. PMID:29756095
Rane, Swati; Plassard, Andrew; Landman, Bennett A; Claassen, Daniel O; Donahue, Manus J
2017-01-01
This work explores the feasibility of combining anatomical MRI data across two public repositories namely, the Alzheimer's Disease Neuroimaging Initiative (ADNI) and the Progressive Parkinson's Markers Initiative (PPMI). We compared cortical thickness and subcortical volumes in cognitively normal older adults between datasets with distinct imaging parameters to assess if they would provide equivalent information. Three distinct datasets were identified. Major differences in data were scanner manufacturer and the use of magnetization inversion to enhance tissue contrast. Equivalent datasets, i.e., those providing similar volumetric measurements in cognitively normal controls, were identified in ADNI and PPMI. These were datasets obtained on the Siemens scanner with TI = 900 ms. Our secondary goal was to assess the agreement between subcortical volumes that are obtained with different software packages. Three subcortical measurement applications (FSL, FreeSurfer, and a recent multi-atlas approach) were compared. Our results show significant agreement in the measurements of caudate, putamen, pallidum, and hippocampus across the packages and poor agreement between measurements of accumbens and amygdala. This is likely due to their smaller size and lack of gray matter-white matter tissue contrast for accurate segmentation. This work provides a segue to combine imaging data from ADNI and PPMI to increase statistical power as well as to interrogate common mechanisms in disparate pathologies such as Alzheimer's and Parkinson's diseases. It lays the foundation for comparison of anatomical data acquired with disparate imaging parameters and analyzed with disparate software tools. Furthermore, our work partly explains the variability in the results of studies using different software packages.
ToxMiner Software Interface for Visualizing and Analyzing ToxCast Data
The ToxCast dataset represents a collection of assays and endpoints that will require both standard statistical approaches as well as customized data analysis workflows. To analyze this unique dataset, we have developed an integrated database with Javabased interface called ToxMi...
Exploratory visualization software for reporting environmental survey results.
Fisher, P; Arnot, C; Bastin, L; Dykes, J
2001-08-01
Environmental surveys yield three principal products: maps, a set of data tables, and a textual report. The relationships between these three elements, however, are often cumbersome to present, making full use of all the information in an integrated and systematic sense difficult. The published paper report is only a partial solution. Modern developments in computing, particularly in cartography, GIS, and hypertext, mean that it is increasingly possible to conceive of an easier and more interactive approach to the presentation of such survey results. Here, we present such an approach which links map and tabular datasets arising from a vegetation survey, allowing users ready access to a complex dataset using dynamic mapping techniques. Multimedia datasets equipped with software like this provide an exciting means of quick and easy visual data exploration and comparison. These techniques are gaining popularity across the sciences as scientists and decision-makers are presented with increasing amounts of diverse digital data. We believe that the software environment actively encourages users to make complex interrogations of the survey information, providing a new vehicle for the reader of an environmental survey report.
Tularosa Basin Play Fairway Analysis Data and Models
Nash, Greg
2017-07-11
This submission includes raster datasets for each layer of evidence used for weights of evidence analysis as well as the deterministic play fairway analysis (PFA). Data representative of heat, permeability and groundwater comprises some of the raster datasets. Additionally, the final deterministic PFA model is provided along with a certainty model. All of these datasets are best used with an ArcGIS software package, specifically Spatial Data Modeler.
Scheuch, Matthias; Höper, Dirk; Beer, Martin
2015-03-03
Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.
MMX-I: data-processing software for multimodal X-ray imaging and tomography
Bergamaschi, Antoine; Medjoubi, Kadda; Messaoudi, Cédric; Marco, Sergio; Somogyi, Andrea
2016-01-01
A new multi-platform freeware has been developed for the processing and reconstruction of scanning multi-technique X-ray imaging and tomography datasets. The software platform aims to treat different scanning imaging techniques: X-ray fluorescence, phase, absorption and dark field and any of their combinations, thus providing an easy-to-use data processing tool for the X-ray imaging user community. A dedicated data input stream copes with the input and management of large datasets (several hundred GB) collected during a typical multi-technique fast scan at the Nanoscopium beamline and even on a standard PC. To the authors’ knowledge, this is the first software tool that aims at treating all of the modalities of scanning multi-technique imaging and tomography experiments. PMID:27140159
Cox regression analysis with missing covariates via nonparametric multiple imputation.
Hsu, Chiu-Hsieh; Yu, Mandi
2018-01-01
We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.
NASA Technical Reports Server (NTRS)
Goseva-Popstojanova, Katerina; Tyo, Jacob P.; Sizemore, Brian
2017-01-01
NASA develops, runs, and maintains software systems for which security is of vital importance. Therefore, it is becoming an imperative to develop secure systems and extend the current software assurance capabilities to cover information assurance and cybersecurity concerns of NASA missions. The results presented in this report are based on the information provided in the issue tracking systems of one ground mission and one flight mission. The extracted data were used to create three datasets: Ground mission IVV issues, Flight mission IVV issues, and Flight mission Developers issues. In each dataset, we identified the software bugs that are security related and classified them in specific security classes. This information was then used to create the security vulnerability profiles (i.e., to determine how, why, where, and when the security vulnerabilities were introduced) and explore the existence of common trends. The main findings of our work include:- Code related security issues dominated both the Ground and Flight mission IVV security issues, with 95 and 92, respectively. Therefore, enforcing secure coding practices and verification and validation focused on coding errors would be cost effective ways to improve mission's security. (Flight mission Developers issues dataset did not contain data in the Issue Category.)- In both the Ground and Flight mission IVV issues datasets, the majority of security issues (i.e., 91 and 85, respectively) were introduced in the Implementation phase. In most cases, the phase in which the issues were found was the same as the phase in which they were introduced. The most security related issues of the Flight mission Developers issues dataset were found during Code Implementation, Build Integration, and Build Verification; the data on the phase in which these issues were introduced were not available for this dataset.- The location of security related issues, as the location of software issues in general, followed the Pareto principle. Specifically, for all three datasets, from 86 to 88 the security related issues were located in two to four subsystems.- The severity levels of most security issues were moderate, in all three datasets.- Out of 21 primary security classes, five dominated: Exception Management, Memory Access, Other, Risky Values, and Unused Entities. Together, these classes contributed from around 80 to 90 of all security issues in each dataset. This again proves the Pareto principle of uneven distribution of security issues, in this case across CWE classes, and supports the fact that addressing these dominant security classes provides the most cost efficient way to improve missions' security. The findings presented in this report uncovered the security vulnerability profiles and identified the common trends and dominant classes of security issues, which in turn can be used to select the most efficient secure design and coding best practices compiled by the part of the SARP project team associated with the NASA's Johnson Space Center. In addition, these findings provide valuable input to the NASA IVV initiative aimed at identification of the two 25 CWEs of ground and flight missions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Apte, Michael G.; Norman, Bourassa; Faulkner, David
An improved HVAC system for portable classrooms was specified to address key problems in existing units. These included low energy efficiency, poor control of and provision for adequate ventilation, and excessive acoustic noise. Working with industry, a prototype improved heat pump air conditioner was developed to meet the specification. A one-year measurement-intensive field-test of ten of these IHPAC systems was conducted in occupied classrooms in two distinct California climates. These measurements are compared to those made in parallel in side by side portable classrooms equipped with standard 10 SEER heat pump air conditioner equipment. The IHPAC units were found tomore » work as designed, providing predicted annual energy efficiency improvements of about 36 percent to 42 percent across California's climate zones, relative to 10 SEER units. Classroom ventilation was vastly improved as evidenced by far lower indoor minus outdoor CO2 concentrations. TheIHPAC units were found to provide ventilation that meets both California State energy and occupational codes and the ASHRAE minimum ventilation requirements; the classrooms equipped with the 10 SEER equipment universally did not meet these targets. The IHPAC system provided a major improvement in indoor acoustic conditions. HVAC system generated background noise was reduced in fan-only and fan and compressor modes, reducing the nose levels to better than the design objective of 45 dB(A), and acceptable for additional design points by the Collaborative on High Performance Schools. The IHPAC provided superior ventilation, with indoor minus outdoor CO2 concentrations that showed that the Title 24 minimum ventilation requirement of 15 CFM per occupant was nearly always being met. The opposite was found in the classrooms utilizing the 10 SEER system, where the indoor minus outdoor CO2 concentrations frequently exceeded levels that reflect inadequate ventilation. Improved ventilation conditions in the IHPAC lead to effective removal of volatile organic compounds and aldehydes, on average lowering the concentrations by 57 percent relative to the levels in the 10 SEER classrooms. The average IHPAC to 10 SEER formaldehyde ratio was about 67 percent, indicating only a 33 percent reduction of this compound in indoor air. The IHPAC thermal control system provided less variability in occupied classroom temperature than the 10 SEER thermostats. The average room temperatures in all seasons tended to be slightly lower in the IHPAC classrooms, often below the lower limit of the ASHRAE 55 thermal comfort band. State-wide and national energy modeling provided conservative estimates of potential energy savings by use of the IHPAC system that would provide payback a the range of time far lower than the lifetime of the equipment. Assuming electricity costs of $0.15/kWh, the perclassroom range of savings is from about $85 to $195 per year in California, and about $89 to $250 per year in the U.S., depending upon the city. These modelsdid not include the non-energy benefits to the classrooms including better air quality and acoustic conditions that could lead to improved health and learning in school. Market connection efforts that were part of the study give all indication that this has been a very successful project. The successes include the specification of the IHPAC equipment in the CHPS portable classroom standards, the release of a commercial product based on the standards that is now being installed in schools around the U.S., and the fact that a public utility company is currently considering the addition of the technology to its customer incentive program. These successes indicate that the IHPAC may reach its potential to improve ventilation and save energy in classrooms.« less
Swede, Helen; Sarwar, Amna; Magge, Anil; Braithwaite, Dejana; Cook, Linda S; Gregorio, David I; Jones, Beth A; R Hoag, Jessica; Gonsalves, Lou; L Salner, Andrew; Zarfos, Kristen; Andemariam, Biree; Stevens, Richard G; G Dugan, Alicia; Pensa, Mellisa; A Brockmeyer, Jessica
2016-05-01
A comparatively high prevalence of comorbidities among African-American/Blacks (AA/B) has been implicated in disparate survival in breast cancer. There is a scarcity of data, however, if this effect persists when accounting for the adverse triple-negative breast cancer (TNBC) subtype which occurs at threefold the rate in AA/B compared to white breast cancer patients. We reviewed charts of 214 white and 202 AA/B breast cancer patients in the NCI-SEER Connecticut Tumor Registry who were diagnosed in 2000-2007. We employed the Charlson Co-Morbidity Index (CCI), a weighted 17-item tool to predict risk of death in cancer populations. Cox survival analyses estimated hazard ratios (HRs) for all-cause mortality in relation to TNBC and CCI adjusting for clinicopathological factors. Among patients with SEER local stage, TNBC increased the risk of death (HR 2.18, 95 % CI 1.14-4.16), which was attenuated when the CCI score was added to the model (Adj. HR 1.50, 95 % CI 0.74-3.01). Conversely, the adverse impact of the CCI score persisted when controlling for TNBC (Adj. HR 1.49, 95 % CI 1.29-1.71; per one point increase). Similar patterns were observed in SEER regional stage, but estimated HRs were lower. AA/B patients with a CCI score of ≥3 had a significantly higher risk of death compared to AA/B patients without comorbidities (Adj. HR 5.65, 95 % CI 2.90-11.02). A lower and nonsignificant effect was observed for whites with a CCI of ≥3 (Adj. HR 1.90, 95 % CI 0.68-5.29). comorbidities at diagnosis increase risk of death independent of TNBC, and AA/B patients may be disproportionately at risk.
Management and Survival Patterns of Patients with Gliomatosis Cerebri: A SEER-Based Analysis.
Carroll, Kate T; Hirshman, Brian; Ali, Mir Amaan; Alattar, Ali A; Brandel, Michael G; Lochte, Bryson; Lanman, Tyler; Carter, Bob; Chen, Clark C
2017-07-01
We used the SEER (Surveillance Epidemiology and End Results) database (1999-2010) to analyze the clinical practice patterns and overall survival in patients with gliomatosis cerebri (GC), or glioma involving 3 or more lobes of the cerebrum. We identified 111 patients (age ≥18 years) with clinically or microscopically diagnosed GC in the SEER database. Analyses were performed to determine clinical practice patterns for these patients and whether these practices were associated with survival. Fifty-eight percent of the 111 patients with GC received microscopic confirmation of their diagnosis. Of the remaining patients, 40% were diagnosed via imaging or laboratory tests, and 2% had unknown methods of diagnosis. Seven percent of patients who did not have microscopic confirmation of their diagnosis received radiation therapy. Radiation therapy and surgery were not associated with survival. The only variable significantly associated with overall survival was age at diagnosis. Patients aged 18-50 years showed improved survival relative to patients aged >50 years (median survival, 11 and 6 months, respectively; P = 0.03). For patients aged >50 years, improved overall survival was observed in the post-temozolomide era (2005-2010) relative to those treated in the pre-temozolomide era (1999-2004) (median survival, 9 and 4 months, respectively; P = 0.005). In the SEER database, ∼40% of the patients with glioma with imaging findings of GC do not receive microscopic confirmation of their diagnosis. We propose that tissue confirmation is warranted in patients with GC, because genomic analysis of these specimens may provide insights that will contribute to meaningful therapeutic intervention. Copyright © 2017 Elsevier Inc. All rights reserved.
Ali, Arif N; Switchenko, Jeffrey M; Kim, Sungjin; Kowalski, Jeanne; El-Deiry, Mark W; Beitler, Jonathan J
2014-11-15
The current study was conducted to develop a multifactorial statistical model to predict the specific head and neck (H&N) tumor site origin in cases of squamous cell carcinoma confined to the cervical lymph nodes ("unknown primaries"). The Surveillance, Epidemiology, and End Results (SEER) database was analyzed for patients with an H&N tumor site who were diagnosed between 2004 and 2011. The SEER patients were identified according to their H&N primary tumor site and clinically positive cervical lymph node levels at the time of presentation. The SEER patient data set was randomly divided into 2 data sets for the purposes of internal split-sample validation. The effects of cervical lymph node levels, age, race, and sex on H&N primary tumor site were examined using univariate and multivariate analyses. Multivariate logistic regression models and an associated set of nomograms were developed based on relevant factors to provide probabilities of tumor site origin. Analysis of the SEER database identified 20,011 patients with H&N disease with both site-level and lymph node-level data. Sex, race, age, and lymph node levels were associated with primary H&N tumor site (nasopharynx, hypopharynx, oropharynx, and larynx) in the multivariate models. Internal validation techniques affirmed the accuracy of these models on separate data. The incorporation of epidemiologic and lymph node data into a predictive model has the potential to provide valuable guidance to clinicians in the treatment of patients with squamous cell carcinoma confined to the cervical lymph nodes. © 2014 The Authors. Cancer published by Wiley Periodicals, Inc. on behalf of American Cancer Society.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Halasz, Lia M., E-mail: lhalasz@uw.edu; Harvard Radiation Oncology Program, Harvard Medical School, Boston, Massachusetts; Weeks, Jane C.
2013-02-01
Purpose: The indications for treatment of brain metastases from non-small cell lung cancer (NSCLC) with stereotactic radiosurgery (SRS) remain controversial. We studied patterns, predictors, and cost of SRS use in elderly patients with NSCLC. Methods and Materials: Using the Surveillance, Epidemiology, and End Results-Medicare (SEER-Medicare) database, we identified patients with NSCLC who were diagnosed with brain metastases between 2000 and 2007. Our cohort included patients treated with radiation therapy and not surgical resection as initial treatment for brain metastases. Results: We identified 7684 patients treated with radiation therapy within 2 months after brain metastases diagnosis, of whom 469 (6.1%) casesmore » had billing codes for SRS. Annual SRS use increased from 3.0% in 2000 to 8.2% in 2005 and varied from 3.4% to 12.5% by specific SEER registry site. After controlling for clinical and sociodemographic characteristics, we found SRS use was significantly associated with increasing year of diagnosis, specific SEER registry, higher socioeconomic status, admission to a teaching hospital, no history of participation in low-income state buy-in programs (a proxy for Medicaid eligibility), no extracranial metastases, and longer intervals from NSCLC diagnosis. The average cost per patient associated with radiation therapy was 2.19 times greater for those who received SRS than for those who did not. Conclusions: The use of SRS in patients with metastatic NSCLC increased almost 3-fold from 2000 to 2005. In addition, we found significant variations in SRS use across SEER registries and socioeconomic quartiles. National practice patterns in this study suggested both a lack of consensus and an overall limited use of the approach among elderly patients before 2008.« less
Wu, Chao; Chen, Ping; Qian, Jian-Jun; Jin, Sheng-Jie; Yao, Jie; Wang, Xiao-Dong; Bai, Dou-Sheng; Jiang, Guo-Qing
2016-11-29
Marital status has been reported as an independent prognostic factor for survival in various cancers, but it has been rarely studied in hepatocellular carcinoma (HCC) treated by surgical resection. We retrospectively investigated Surveillance, Epidemiology, and End Results (SEER) population-based data and identified 13,408 cases of HCC with surgical treatment between 1998 and 2013. The patients were categorized according to marital status, as "married," "never married," "widowed," or "divorced/separated." The 5-year HCC cause-specific survival (HCSS) data were obtained, and Kaplan-Meier methods and multivariate Cox regression models were used to ascertain whether marital status is also an independent prognostic factor for survival in HCC. Patients in the widowed group had the higher proportion of women, a greater proportion of older (>60 years) patients, more frequency in latest year of diagnosis (2008-2013), a greater number of tumors at TNM stage I/II, and more prevalence at localized SEER Stage, all of which were statistically significant within-group comparisons (P < 0.001). Marital status was demonstrated to be an independent prognostic factor by multivariate survival analysis (P < 0.001). Married patients had better 5-year HCSS than did unmarried patients (46.7% vs 37.8%) (P < 0.001); conversely, widowed patients had lowest HCSS compared with all other patients, overall, at each SEER stage, and for different tumor sizes. Marital status is an important prognostic factor for survival in patients with HCC treated with surgical resection. Widowed patients have the highest risk of death compared with other groups.
Eil, Robert; Diggs, Brian S; Wang, Samuel J; Dolan, James P; Hunter, John G; Thomas, Charles R
2014-02-15
The survival impact of neoadjuvant chemoradiotherapy (CRT) on esophageal cancer remains difficult to establish for specific patients. The aim of the current study was to create a Web-based prediction tool providing individualized survival projections based on tumor and treatment data. Patients diagnosed with esophageal cancer between 1997 and 2005 were selected from the Surveillance, Epidemiology, and End Results (SEER)-Medicare database. The covariates analyzed were sex, T and N classification, histology, total number of lymph nodes examined, and treatment with esophagectomy or CRT followed by esophagectomy. After propensity score weighting, a log-logistic regression model for overall survival was selected based on the Akaike information criterion. A total of 824 patients with esophageal cancer who were treated with esophagectomy or trimodal therapy met the selection criteria. On multivariate analysis, age, sex, T and N classification, number of lymph nodes examined, treatment, and histology were found to be significantly associated with overall survival and were included in the regression analysis. Preoperative staging data and final surgical margin status were not available within the SEER-Medicare data set and therefore were not included. The model predicted that patients with T4 or lymph node disease benefitted from CRT. The internally validated concordance index was 0.72. The SEER-Medicare database of patients with esophageal cancer can be used to produce a survival prediction tool that: 1) serves as a counseling and decision aid to patients and 2) assists in risk modeling. Patients with T4 or lymph node disease appeared to benefit from CRT. This nomogram may underestimate the benefit of CRT due to its variable downstaging effect on pathologic stage. It is available at skynet.ohsu.edu/nomograms. © 2013 American Cancer Society.
Data quality can make or break a research infrastructure
NASA Astrophysics Data System (ADS)
Pastorello, G.; Gunter, D.; Chu, H.; Christianson, D. S.; Trotta, C.; Canfora, E.; Faybishenko, B.; Cheah, Y. W.; Beekwilder, N.; Chan, S.; Dengel, S.; Keenan, T. F.; O'Brien, F.; Elbashandy, A.; Poindexter, C.; Humphrey, M.; Papale, D.; Agarwal, D.
2017-12-01
Research infrastructures (RIs) commonly support observational data provided by multiple, independent sources. Uniformity in the data distributed by such RIs is important in most applications, e.g., in comparative studies using data from two or more sources. Achieving uniformity in terms of data quality is challenging, especially considering that many data issues are unpredictable and cannot be detected until a first occurrence of the issue. With that, many data quality control activities within RIs require a manual, human-in-the-loop element, making it an expensive activity. Our motivating example is the FLUXNET2015 dataset - a collection of ecosystem-level carbon, water, and energy fluxes between land and atmosphere from over 200 sites around the world, some sites with over 20 years of data. About 90% of the human effort to create the dataset was spent in data quality related activities. Based on this experience, we have been working on solutions to increase the automation of data quality control procedures. Since it is nearly impossible to fully automate all quality related checks, we have been drawing from the experience with techniques used in software development, which shares a few common constraints. In both managing scientific data and writing software, human time is a precious resource; code bases, as Science datasets, can be large, complex, and full of errors; both scientific and software endeavors can be pursued by individuals, but collaborative teams can accomplish a lot more. The lucrative and fast-paced nature of the software industry fueled the creation of methods and tools to increase automation and productivity within these constraints. Issue tracking systems, methods for translating problems into automated tests, powerful version control tools are a few examples. Terrestrial and aquatic ecosystems research relies heavily on many types of observational data. As volumes of data collection increases, ensuring data quality is becoming an unwieldy challenge for RIs. Business as usual approaches to data quality do not work with larger data volumes. We believe RIs can benefit greatly from adapting and imitating this body of theory and practice from software quality into data quality, enabling systematic and reproducible safeguards against errors and mistakes in datasets as much as in software.
Software for improved field surveys of nesting marine turtles.
Anastácio, R; Gonzalez, J M; Slater, K; Pereira, M J
2017-09-07
Field data are still recorded on paper in many worldwide beach surveys of nesting marine turtles. The data must be subsequently transferred into an electronic database, and this can introduce errors in the dataset. To minimize such errors, the "Turtles" software was developed and piloted to record field data by one software user accompanying one Tortuguero in Akumal beaches, Quintana Roo, Mexico, from June 1 st to July 31 st during the night patrols. Comparisons were made between exported data from the software with the paper forms entered into a database (henceforth traditional). Preliminary assessment indicated that the software user tended to record a greater amount of metrics (i.e., an average of 18.3 fields ± 5.4 sd vs. 8.6 fields ± 2.1 sd recorded by the traditional method). The traditional method introduce three types of "errors" into a dataset: missing values in relevant fields (40.1%), different answers for the same value (9.8%), and inconsistent data (0.9%). Only 5.8% of these (missing values) were found with the software methodology. Although only tested by a single user, the software may suggest increased efficacy and warrants further examination to accurately assess the merit of replacing traditional methods of data recording for beach monitoring programmes.
Spear, Timothy T; Nishimura, Michael I; Simms, Patricia E
2017-08-01
Advancement in flow cytometry reagents and instrumentation has allowed for simultaneous analysis of large numbers of lineage/functional immune cell markers. Highly complex datasets generated by polychromatic flow cytometry require proper analytical software to answer investigators' questions. A problem among many investigators and flow cytometry Shared Resource Laboratories (SRLs), including our own, is a lack of access to a flow cytometry-knowledgeable bioinformatics team, making it difficult to learn and choose appropriate analysis tool(s). Here, we comparatively assess various multidimensional flow cytometry software packages for their ability to answer a specific biologic question and provide graphical representation output suitable for publication, as well as their ease of use and cost. We assessed polyfunctional potential of TCR-transduced T cells, serving as a model evaluation, using multidimensional flow cytometry to analyze 6 intracellular cytokines and degranulation on a per-cell basis. Analysis of 7 parameters resulted in 128 possible combinations of positivity/negativity, far too complex for basic flow cytometry software to analyze fully. Various software packages were used, analysis methods used in each described, and representative output displayed. Of the tools investigated, automated classification of cellular expression by nonlinear stochastic embedding (ACCENSE) and coupled analysis in Pestle/simplified presentation of incredibly complex evaluations (SPICE) provided the most user-friendly manipulations and readable output, evaluating effects of altered antigen-specific stimulation on T cell polyfunctionality. This detailed approach may serve as a model for other investigators/SRLs in selecting the most appropriate software to analyze complex flow cytometry datasets. Further development and awareness of available tools will help guide proper data analysis to answer difficult biologic questions arising from incredibly complex datasets. © Society for Leukocyte Biology.
LIME: 3D visualisation and interpretation of virtual geoscience models
NASA Astrophysics Data System (ADS)
Buckley, Simon; Ringdal, Kari; Dolva, Benjamin; Naumann, Nicole; Kurz, Tobias
2017-04-01
Three-dimensional and photorealistic acquisition of surface topography, using methods such as laser scanning and photogrammetry, has become widespread across the geosciences over the last decade. With recent innovations in photogrammetric processing software, robust and automated data capture hardware, and novel sensor platforms, including unmanned aerial vehicles, obtaining 3D representations of exposed topography has never been easier. In addition to 3D datasets, fusion of surface geometry with imaging sensors, such as multi/hyperspectral, thermal and ground-based InSAR, and geophysical methods, create novel and highly visual datasets that provide a fundamental spatial framework to address open geoscience research questions. Although data capture and processing routines are becoming well-established and widely reported in the scientific literature, challenges remain related to the analysis, co-visualisation and presentation of 3D photorealistic models, especially for new users (e.g. students and scientists new to geomatics methods). Interpretation and measurement is essential for quantitative analysis of 3D datasets, and qualitative methods are valuable for presentation purposes, for planning and in education. Motivated by this background, the current contribution presents LIME, a lightweight and high performance 3D software for interpreting and co-visualising 3D models and related image data in geoscience applications. The software focuses on novel data integration and visualisation of 3D topography with image sources such as hyperspectral imagery, logs and interpretation panels, geophysical datasets and georeferenced maps and images. High quality visual output can be generated for dissemination purposes, to aid researchers with communication of their research results. The background of the software is described and case studies from outcrop geology, in hyperspectral mineral mapping and geophysical-geospatial data integration are used to showcase the novel methods developed.
NASA Astrophysics Data System (ADS)
Ulbricht, Damian; Elger, Kirsten; Bertelmann, Roland; Klump, Jens
2016-04-01
With the foundation of DataCite in 2009 and the technical infrastructure installed in the last six years it has become very easy to create citable dataset DOIs. Nowadays, dataset DOIs are increasingly accepted and required by journals in reference lists of manuscripts. In addition, DataCite provides usage statistics [1] of assigned DOIs and offers a public search API to make research data count. By linking related information to the data, they become more useful for future generations of scientists. For this purpose, several identifier systems, as ISBN for books, ISSN for journals, DOI for articles or related data, Orcid for authors, and IGSN for physical samples can be attached to DOIs using the DataCite metadata schema [2]. While these are good preconditions to publish data, free and open solutions that help with the curation of data, the publication of research data, and the assignment of DOIs in one software seem to be rare. At GFZ Potsdam we built a modular software stack that is made of several free and open software solutions and we established 'GFZ Data Services'. 'GFZ Data Services' provides storage, a metadata editor for publication and a facility to moderate minted DOIs. All software solutions are connected through web APIs, which makes it possible to reuse and integrate established software. Core component of 'GFZ Data Services' is an eSciDoc [3] middleware that is used as central storage, and has been designed along the OAIS reference model for digital preservation. Thus, data are stored in self-contained packages that are made of binary file-based data and XML-based metadata. The eSciDoc infrastructure provides access control to data and it is able to handle half-open datasets, which is useful in embargo situations when a subset of the research data are released after an adequate period. The data exchange platform panMetaDocs [4] makes use of eSciDoc's REST API to upload file-based data into eSciDoc and uses a metadata editor [5] to annotate the files with metadata. The metadata editor has a user-friendly interface with nominal lists, extensive explanations, and an interactive mapping tool to provide assistance to scientists describing the data. It is possible to deposit metadata templates to fill certain fields with default values. The metadata editor generates metadata in the schemas ISO19139, NASA GCMD DIF, and DataCite and could be extended for other schemas. panMetaDocs is able to mint dataset DOIs through DOIDB, which is our component to moderate dataset DOIs issued through 'GFZ Data Services'. DOIDB accepts metadata in the schemas ISO19139, DIF, and DataCite. In addition, DOIDB provides an OAI-PMH interface to disseminate all deposited metadata to data portals. The presentation of datasets on DOI landing pages is done though XSLT stylesheet transformation of the XML-based metadata. The landing pages have been designed to meet needs of scientists. We are able to render the metadata to different layouts. Furthermore, additional information about datasets and publications is assembled into the webpage by querying public databases on the internet. The work presented here will focus on technical details of the software stack. [1] http://stats.datacite.org [2] http://www.dlib.org/dlib/january11/starr/01starr.html [3] http://www.escidoc.org [4] http://panmetadocs.sf.net [5] http://github.com/ulbricht
The polyGeVero® software for fast and easy computation of 3D radiotherapy dosimetry data
NASA Astrophysics Data System (ADS)
Kozicki, Marek; Maras, Piotr
2015-01-01
The polyGeVero® software package was elaborated for calculations of 3D dosimetry data such as the polymer gel dosimetry. It comprises four workspaces designed for: i) calculating calibrations, ii) storing calibrations in a database, iii) calculating dose distribution 3D cubes, iv) comparing two datasets e.g. a measured one with a 3D dosimetry with a calculated one with the aid of a treatment planning system. To accomplish calculations the software was equipped with a number of tools such as the brachytherapy isotopes database, brachytherapy dose versus distance calculation based on the line approximation approach, automatic spatial alignment of two 3D dose cubes for comparison purposes, 3D gamma index, 3D gamma angle, 3D dose difference, Pearson's coefficient, histograms calculations, isodoses superimposition for two datasets, and profiles calculations in any desired direction. This communication is to briefly present the main functions of the software and report on the speed of calculations performed by polyGeVero®.
Software for the Integration of Multiomics Experiments in Bioconductor.
Ramos, Marcel; Schiffer, Lucas; Re, Angela; Azhar, Rimsha; Basunia, Azfar; Rodriguez, Carmen; Chan, Tiffany; Chapman, Phil; Davis, Sean R; Gomez-Cabrero, David; Culhane, Aedin C; Haibe-Kains, Benjamin; Hansen, Kasper D; Kodali, Hanish; Louis, Marie S; Mer, Arvind S; Riester, Markus; Morgan, Martin; Carey, Vince; Waldron, Levi
2017-11-01
Multiomics experiments are increasingly commonplace in biomedical research and add layers of complexity to experimental design, data integration, and analysis. R and Bioconductor provide a generic framework for statistical analysis and visualization, as well as specialized data classes for a variety of high-throughput data types, but methods are lacking for integrative analysis of multiomics experiments. The MultiAssayExperiment software package, implemented in R and leveraging Bioconductor software and design principles, provides for the coordinated representation of, storage of, and operation on multiple diverse genomics data. We provide the unrestricted multiple 'omics data for each cancer tissue in The Cancer Genome Atlas as ready-to-analyze MultiAssayExperiment objects and demonstrate in these and other datasets how the software simplifies data representation, statistical analysis, and visualization. The MultiAssayExperiment Bioconductor package reduces major obstacles to efficient, scalable, and reproducible statistical analysis of multiomics data and enhances data science applications of multiple omics datasets. Cancer Res; 77(21); e39-42. ©2017 AACR . ©2017 American Association for Cancer Research.
Das, Narendra; Stampoulis, Dimitrios; Ines, Amor; Fisher, Joshua B.; Granger, Stephanie; Kawata, Jessie; Han, Eunjin; Behrangi, Ali
2017-01-01
The Regional Hydrologic Extremes Assessment System (RHEAS) is a prototype software framework for hydrologic modeling and data assimilation that automates the deployment of water resources nowcasting and forecasting applications. A spatially-enabled database is a key component of the software that can ingest a suite of satellite and model datasets while facilitating the interfacing with Geographic Information System (GIS) applications. The datasets ingested are obtained from numerous space-borne sensors and represent multiple components of the water cycle. The object-oriented design of the software allows for modularity and extensibility, showcased here with the coupling of the core hydrologic model with a crop growth model. RHEAS can exploit multi-threading to scale with increasing number of processors, while the database allows delivery of data products and associated uncertainty through a variety of GIS platforms. A set of three example implementations of RHEAS in the United States and Kenya are described to demonstrate the different features of the system in real-world applications. PMID:28545077
Andreadis, Konstantinos M; Das, Narendra; Stampoulis, Dimitrios; Ines, Amor; Fisher, Joshua B; Granger, Stephanie; Kawata, Jessie; Han, Eunjin; Behrangi, Ali
2017-01-01
The Regional Hydrologic Extremes Assessment System (RHEAS) is a prototype software framework for hydrologic modeling and data assimilation that automates the deployment of water resources nowcasting and forecasting applications. A spatially-enabled database is a key component of the software that can ingest a suite of satellite and model datasets while facilitating the interfacing with Geographic Information System (GIS) applications. The datasets ingested are obtained from numerous space-borne sensors and represent multiple components of the water cycle. The object-oriented design of the software allows for modularity and extensibility, showcased here with the coupling of the core hydrologic model with a crop growth model. RHEAS can exploit multi-threading to scale with increasing number of processors, while the database allows delivery of data products and associated uncertainty through a variety of GIS platforms. A set of three example implementations of RHEAS in the United States and Kenya are described to demonstrate the different features of the system in real-world applications.
Numericware i: Identical by State Matrix Calculator
Kim, Bongsong; Beavis, William D
2017-01-01
We introduce software, Numericware i, to compute identical by state (IBS) matrix based on genotypic data. Calculating an IBS matrix with a large dataset requires large computer memory and takes lengthy processing time. Numericware i addresses these challenges with 2 algorithmic methods: multithreading and forward chopping. The multithreading allows computational routines to concurrently run on multiple central processing unit (CPU) processors. The forward chopping addresses memory limitation by dividing a dataset into appropriately sized subsets. Numericware i allows calculation of the IBS matrix for a large genotypic dataset using a laptop or a desktop computer. For comparison with different software, we calculated genetic relationship matrices using Numericware i, SPAGeDi, and TASSEL with the same genotypic dataset. Numericware i calculates IBS coefficients between 0 and 2, whereas SPAGeDi and TASSEL produce different ranges of values including negative values. The Pearson correlation coefficient between the matrices from Numericware i and TASSEL was high at .9972, whereas SPAGeDi showed low correlation with Numericware i (.0505) and TASSEL (.0587). With a high-dimensional dataset of 500 entities by 10 000 000 SNPs, Numericware i spent 382 minutes using 19 CPU threads and 64 GB memory by dividing the dataset into 3 pieces, whereas SPAGeDi and TASSEL failed with the same dataset. Numericware i is freely available for Windows and Linux under CC-BY 4.0 license at https://figshare.com/s/f100f33a8857131eb2db. PMID:28469375
TaxI: a software tool for DNA barcoding using distance methods
Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel
2005-01-01
DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755
NASA Astrophysics Data System (ADS)
Kakatkar, Aarti S.; Gautam, Raj Kamal; Shashidhar, Ravindranath
2017-01-01
Fish and fishery products are most perishable. Combination of chilling with gamma irradiation, edible coatings, addition of antimicrobials etc has been applied to extend the shelf life. In the present study, a process to enhance the shelf life of seer fish (Scomberomorus guttatus) steaks using combination of coating prepared from gel dispersion of same fish; incorporated with nisin and gamma irradiation is described. A combination of glazing incorporated with nisin and irradiation at 2 kGy and 5 kGy increased the shelf life of the steaks from 7 days up to 34 and 42 days respectively on chilled storage.
Cancer incidence among Arab Americans in California, Detroit, and New Jersey SEER registries.
Bergmans, Rachel; Soliman, Amr S; Ruterbusch, Julie; Meza, Rafael; Hirko, Kelly; Graff, John; Schwartz, Kendra
2014-06-01
We calculated cancer incidence for Arab Americans in California; Detroit, Michigan; and New Jersey, and compared rates with non-Hispanic, non-Arab Whites (NHNAWs); Blacks; and Hispanics. We conducted a study using population-based data. We linked new cancers diagnosed in 2000 from the Surveillance, Epidemiology, and End Results Program (SEER) to an Arab surname database. We used standard SEER definitions and methodology for calculating rates. Population estimates were extracted from the 2000 US Census. We calculated incidence and rate ratios. Arab American men and women had similar incidence rates across the 3 geographic regions, and the rates were comparable to NHNAWs. However, the thyroid cancer rate was elevated among Arab American women compared with NHNAWs, Hispanics, and Blacks. For all sites combined, for prostate and lung cancer, Arab American men had a lower incidence than Blacks and higher incidence than Hispanics in all 3 geographic regions. Arab American male bladder cancer incidence was higher than that in Hispanics and Blacks in these regions. Our results suggested that further research would benefit from the federal recognition of Arab Americans as a specified ethnicity to estimate and address the cancer burden in this growing segment of the population.
Atmospheric data access for the geospatial user community
NASA Astrophysics Data System (ADS)
van de Vegte, John; Som de Cerff, Wim-Jan; van den Oord, Gijsbertus H. J.; Sluiter, Raymond; van der Neut, Ian A.; Plieger, Maarten; van Hees, Richard M.; de Jeu, Richard A. M.; Schaepman, Michael E.; Hoogerwerf, Marc R.; Groot, Nikée E.; Domenico, Ben; Nativi, Stefano; Wilhelmi, Olga V.
2007-10-01
Historically the atmospheric and meteorological communities are separate worlds with their own data formats and tools for data handling making sharing of data difficult and cumbersome. On the other hand, these information sources are becoming increasingly of interest outside these communities because of the continuously improving spatial and temporal resolution of e.g. model and satellite data and the interest in historical datasets. New user communities that use geographically based datasets in a cross-domain manner are emerging. This development is supported by the progress made in Geographical Information System (GIS) software. The current GIS software is not yet ready for the wealth of atmospheric data, although the faint outlines of new generation software are already visible: support of HDF, NetCDF and an increasing understanding of temporal issues are only a few of the hints.
Murugaiyan, Jayaseelan; Eravci, Murat; Weise, Christoph; Roesler, Uwe
2017-06-01
Here, we provide the dataset associated with our research article 'label-free quantitative proteomic analysis of harmless and pathogenic strains of infectious microalgae, Prototheca spp.' (Murugaiyan et al., 2017) [1]. This dataset describes liquid chromatography-mass spectrometry (LC-MS)-based protein identification and quantification of a non-infectious strain, Prototheca zopfii genotype 1 and two strains associated with severe and mild infections, respectively, P. zopfii genotype 2 and Prototheca blaschkeae . Protein identification and label-free quantification was carried out by analysing MS raw data using the MaxQuant-Andromeda software suit. The expressional level differences of the identified proteins among the strains were computed using Perseus software and the results were presented in [1]. This DiB provides the MaxQuant output file and raw data deposited in the PRIDE repository with the dataset identifier PXD005305.
Iterative non-sequential protein structural alignment.
Salem, Saeed; Zaki, Mohammed J; Bystroff, Christopher
2009-06-01
Structural similarity between proteins gives us insights into their evolutionary relationships when there is low sequence similarity. In this paper, we present a novel approach called SNAP for non-sequential pair-wise structural alignment. Starting from an initial alignment, our approach iterates over a two-step process consisting of a superposition step and an alignment step, until convergence. We propose a novel greedy algorithm to construct both sequential and non-sequential alignments. The quality of SNAP alignments were assessed by comparing against the manually curated reference alignments in the challenging SISY and RIPC datasets. Moreover, when applied to a dataset of 4410 protein pairs selected from the CATH database, SNAP produced longer alignments with lower rmsd than several state-of-the-art alignment methods. Classification of folds using SNAP alignments was both highly sensitive and highly selective. The SNAP software along with the datasets are available online at http://www.cs.rpi.edu/~zaki/software/SNAP.
NASA Astrophysics Data System (ADS)
Changyong, Dou; Huadong, Guo; Chunming, Han; Ming, Liu
2014-03-01
With more and more Earth observation data available to the community, how to manage and sharing these valuable remote sensing datasets is becoming an urgent issue to be solved. The web based Geographical Information Systems (GIS) technology provides a convenient way for the users in different locations to share and make use of the same dataset. In order to efficiently use the airborne Synthetic Aperture Radar (SAR) remote sensing data acquired in the Airborne Remote Sensing Center of the Institute of Remote Sensing and Digital Earth (RADI), Chinese Academy of Sciences (CAS), a Web-GIS based platform for airborne SAR data management, distribution and sharing was designed and developed. The major features of the system include map based navigation search interface, full resolution imagery shown overlaid the map, and all the software adopted in the platform are Open Source Software (OSS). The functions of the platform include browsing the imagery on the map navigation based interface, ordering and downloading data online, image dataset and user management, etc. At present, the system is under testing in RADI and will come to regular operation soon.
TomoBank: a tomographic data repository for computational x-ray science
NASA Astrophysics Data System (ADS)
De Carlo, Francesco; Gürsoy, Doğa; Ching, Daniel J.; Joost Batenburg, K.; Ludwig, Wolfgang; Mancini, Lucia; Marone, Federica; Mokso, Rajmund; Pelt, Daniël M.; Sijbers, Jan; Rivers, Mark
2018-03-01
There is a widening gap between the fast advancement of computational methods for tomographic reconstruction and their successful implementation in production software at various synchrotron facilities. This is due in part to the lack of readily available instrument datasets and phantoms representative of real materials for validation and comparison of new numerical methods. Recent advancements in detector technology have made sub-second and multi-energy tomographic data collection possible (Gibbs et al 2015 Sci. Rep. 5 11824), but have also increased the demand to develop new reconstruction methods able to handle in situ (Pelt and Batenburg 2013 IEEE Trans. Image Process. 22 5238-51) and dynamic systems (Mohan et al 2015 IEEE Trans. Comput. Imaging 1 96-111) that can be quickly incorporated in beamline production software (Gürsoy et al 2014 J. Synchrotron Radiat. 21 1188-93). The x-ray tomography data bank, tomoBank, provides a repository of experimental and simulated datasets with the aim to foster collaboration among computational scientists, beamline scientists, and experimentalists and to accelerate the development and implementation of tomographic reconstruction methods for synchrotron facility production software by providing easy access to challenging datasets and their descriptors.
Flux Tower Eddy Covariance and Meteorological Measurements for Barrow, Alaska: 2012-2016
Dengel, Sigrid; Torn, Margaret; Billesbach, David
2017-08-24
The dataset contains half-hourly eddy covariance flux measurements and determinations, companion meteorological measurements, and ancillary data from the flux tower (US-NGB) on the Barrow Environmental Observatory at Barrow (Utqiagvik), Alaska for the period 2012 through 2016. Data have been processed using EddyPro software and screened by the contributor. The flux tower sits in an Arctic coastal tundra ecosystem. This dataset updates a previous dataset by reprocessing a longer period of record in the same manner. Related dataset "Eddy-Covariance and auxiliary measurements, NGEE-Barrow, 2012-2013" DOI:10.5440/1124200.
Comparison of software tools for kinetic evaluation of chemical degradation data.
Ranke, Johannes; Wöltjen, Janina; Meinecke, Stefan
2018-01-01
For evaluating the fate of xenobiotics in the environment, a variety of degradation or environmental metabolism experiments are routinely conducted. The data generated in such experiments are evaluated by optimizing the parameters of kinetic models in a way that the model simulation fits the data. No comparison of the main software tools currently in use has been published to date. This article shows a comparison of numerical results as well as an overall, somewhat subjective comparison based on a scoring system using a set of criteria. The scoring was separately performed for two types of uses. Uses of type I are routine evaluations involving standard kinetic models and up to three metabolites in a single compartment. Evaluations involving non-standard model components, more than three metabolites or more than a single compartment belong to use type II. For use type I, usability is most important, while the flexibility of the model definition is most important for use type II. Test datasets were assembled that can be used to compare the numerical results for different software tools. These datasets can also be used to ensure that no unintended or erroneous behaviour is introduced in newer versions. In the comparison of numerical results, good agreement between the parameter estimates was observed for datasets with up to three metabolites. For the now unmaintained reference software DegKinManager/ModelMaker, and for OpenModel which is still under development, user options were identified that should be taken care of in order to obtain results that are as reliable as possible. Based on the scoring system mentioned above, the software tools gmkin, KinGUII and CAKE received the best scores for use type I. Out of the 15 software packages compared with respect to use type II, again gmkin and KinGUII were the first two, followed by the script based tool mkin, which is the technical basis for gmkin, and by OpenModel. Based on the evaluation using the system of criteria mentioned above and the comparison of numerical results for the suite of test datasets, the software tools gmkin, KinGUII and CAKE are recommended for use type I, and gmkin and KinGUII for use type II. For users that prefer to work with scripts instead of graphical user interfaces, mkin is recommended. For future software evaluations, it is recommended to include a measure for the total time that a typical user needs for a kinetic evaluation into the scoring scheme. It is the hope of the authors that the publication of test data, source code and overall rankings foster the evolution of useful and reliable software in the field.
NEOview: Near Earth Object Data Discovery and Query
NASA Astrophysics Data System (ADS)
Tibbetts, M.; Elvis, M.; Galache, J. L.; Harbo, P.; McDowell, J. C.; Rudenko, M.; Van Stone, D.; Zografou, P.
2013-10-01
Missions to Near Earth Objects (NEOs) figure prominently in NASA's Flexible Path approach to human space exploration. NEOs offer insight into both the origins of the Solar System and of life, as well as a source of materials for future missions. With NEOview scientists can locate NEO datasets, explore metadata provided by the archives, and query or combine disparate NEO datasets in the search for NEO candidates for exploration. NEOview is a software system that illustrates how standards-based interfaces facilitate NEO data discovery and research. NEOview software follows a client-server architecture. The server is a configurable implementation of the International Virtual Observatory Alliance (IVOA) Table Access Protocol (TAP), a general interface for tabular data access, that can be deployed as a front end to existing NEO datasets. The TAP client, seleste, is a graphical interface that provides intuitive means of discovering NEO providers, exploring dataset metadata to identify fields of interest, and constructing queries to retrieve or combine data. It features a powerful, graphical query builder capable of easing the user's introduction to table searches. Through science use cases, NEOview demonstrates how potential targets for NEO rendezvous could be identified by combining data from complementary sources. Through deployment and operations, it has been shown that the software components are data independent and configurable to many different data servers. As such, NEOview's TAP server and seleste TAP client can be used to create a seamless environment for data discovery and exploration for tabular data in any astronomical archive.
Performance testing of LiDAR exploitation software
NASA Astrophysics Data System (ADS)
Varela-González, M.; González-Jorge, H.; Riveiro, B.; Arias, P.
2013-04-01
Mobile LiDAR systems are being used widely in recent years for many applications in the field of geoscience. One of most important limitations of this technology is the large computational requirements involved in data processing. Several software solutions for data processing are available in the market, but users are often unknown about the methodologies to verify their performance accurately. In this work a methodology for LiDAR software performance testing is presented and six different suites are studied: QT Modeler, AutoCAD Civil 3D, Mars 7, Fledermaus, Carlson and TopoDOT (all of them in x64). Results depict as QTModeler, TopoDOT and AutoCAD Civil 3D allow the loading of large datasets, while Fledermaus, Mars7 and Carlson do not achieve these powerful performance. AutoCAD Civil 3D needs large loading time in comparison with the most powerful softwares such as QTModeler and TopoDOT. Carlson suite depicts the poorest results among all the softwares under study, where point clouds larger than 5 million points cannot be loaded and loading time is very large in comparison with the other suites even for the smaller datasets. AutoCAD Civil 3D, Carlson and TopoDOT show more threads than other softwares like QTModeler, Mars7 and Fledermaus.
TDat: An Efficient Platform for Processing Petabyte-Scale Whole-Brain Volumetric Images.
Li, Yuxin; Gong, Hui; Yang, Xiaoquan; Yuan, Jing; Jiang, Tao; Li, Xiangning; Sun, Qingtao; Zhu, Dan; Wang, Zhenyu; Luo, Qingming; Li, Anan
2017-01-01
Three-dimensional imaging of whole mammalian brains at single-neuron resolution has generated terabyte (TB)- and even petabyte (PB)-sized datasets. Due to their size, processing these massive image datasets can be hindered by the computer hardware and software typically found in biological laboratories. To fill this gap, we have developed an efficient platform named TDat, which adopts a novel data reformatting strategy by reading cuboid data and employing parallel computing. In data reformatting, TDat is more efficient than any other software. In data accessing, we adopted parallelization to fully explore the capability for data transmission in computers. We applied TDat in large-volume data rigid registration and neuron tracing in whole-brain data with single-neuron resolution, which has never been demonstrated in other studies. We also showed its compatibility with various computing platforms, image processing software and imaging systems.
User's Guide for MapIMG 2: Map Image Re-projection Software Package
Finn, Michael P.; Trent, Jason R.; Buehler, Robert A.
2006-01-01
BACKGROUND Scientists routinely accomplish small-scale geospatial modeling in the raster domain, using high-resolution datasets for large parts of continents and low-resolution to high-resolution datasets for the entire globe. Direct implementation of point-to-point transformation with appropriate functions yields the variety of projections available in commercial software packages, but implementation with data other than points requires specific adaptation of the transformation equations or prior preparation of the data to allow the transformation to succeed. It seems that some of these packages use the U.S. Geological Survey's (USGS) General Cartographic Transformation Package (GCTP) or similar point transformations without adaptation to the specific characteristics of raster data (Usery and others, 2003a). Usery and others (2003b) compiled and tabulated the accuracy of categorical areas in projected raster datasets of global extent. Based on the shortcomings identified in these studies, geographers and applications programmers at the USGS expanded and evolved a USGS software package, MapIMG, for raster map projection transformation (Finn and Trent, 2004). Daniel R. Steinwand of Science Applications International Corporation, National Center for Earth Resources Observation and Science, originally developed MapIMG for the USGS, basing it on GCTP. Through previous and continuing efforts at the USGS' National Geospatial Technical Operations Center, this program has been transformed from an application based on command line input into a software package based on a graphical user interface for Windows, Linux, and other UNIX machines.
Enhancing GIS Capabilities for High Resolution Earth Science Grids
NASA Astrophysics Data System (ADS)
Koziol, B. W.; Oehmke, R.; Li, P.; O'Kuinghttons, R.; Theurich, G.; DeLuca, C.
2017-12-01
Applications for high performance GIS will continue to increase as Earth system models pursue more realistic representations of Earth system processes. Finer spatial resolution model input and output, unstructured or irregular modeling grids, data assimilation, and regional coordinate systems present novel challenges for GIS frameworks operating in the Earth system modeling domain. This presentation provides an overview of two GIS-driven applications that combine high performance software with big geospatial datasets to produce value-added tools for the modeling and geoscientific community. First, a large-scale interpolation experiment using National Hydrography Dataset (NHD) catchments, a high resolution rectilinear CONUS grid, and the Earth System Modeling Framework's (ESMF) conservative interpolation capability will be described. ESMF is a parallel, high-performance software toolkit that provides capabilities (e.g. interpolation) for building and coupling Earth science applications. ESMF is developed primarily by the NOAA Environmental Software Infrastructure and Interoperability (NESII) group. The purpose of this experiment was to test and demonstrate the utility of high performance scientific software in traditional GIS domains. Special attention will be paid to the nuanced requirements for dealing with high resolution, unstructured grids in scientific data formats. Second, a chunked interpolation application using ESMF and OpenClimateGIS (OCGIS) will demonstrate how spatial subsetting can virtually remove computing resource ceilings for very high spatial resolution interpolation operations. OCGIS is a NESII-developed Python software package designed for the geospatial manipulation of high-dimensional scientific datasets. An overview of the data processing workflow, why a chunked approach is required, and how the application could be adapted to meet operational requirements will be discussed here. In addition, we'll provide a general overview of OCGIS's parallel subsetting capabilities including challenges in the design and implementation of a scientific data subsetter.
Cancer Incidence Among Arab Americans in California, Detroit, and New Jersey SEER Registries
Bergmans, Rachel; Ruterbusch, Julie; Meza, Rafael; Hirko, Kelly; Graff, John; Schwartz, Kendra
2014-01-01
Objectives. We calculated cancer incidence for Arab Americans in California; Detroit, Michigan; and New Jersey, and compared rates with non-Hispanic, non-Arab Whites (NHNAWs); Blacks; and Hispanics. Methods. We conducted a study using population-based data. We linked new cancers diagnosed in 2000 from the Surveillance, Epidemiology, and End Results Program (SEER) to an Arab surname database. We used standard SEER definitions and methodology for calculating rates. Population estimates were extracted from the 2000 US Census. We calculated incidence and rate ratios. Results. Arab American men and women had similar incidence rates across the 3 geographic regions, and the rates were comparable to NHNAWs. However, the thyroid cancer rate was elevated among Arab American women compared with NHNAWs, Hispanics, and Blacks. For all sites combined, for prostate and lung cancer, Arab American men had a lower incidence than Blacks and higher incidence than Hispanics in all 3 geographic regions. Arab American male bladder cancer incidence was higher than that in Hispanics and Blacks in these regions. Conclusions. Our results suggested that further research would benefit from the federal recognition of Arab Americans as a specified ethnicity to estimate and address the cancer burden in this growing segment of the population. PMID:24825237
Incidence and survival of sebaceous carcinoma in the United States.
Tripathi, Raghav; Chen, Zhengyi; Li, Li; Bordeaux, Jeremy S
2016-12-01
Information on risk factors, epidemiology, and clinical characteristics of sebaceous carcinoma (SC) is limited. We sought to analyze trends in SC in the United States from 2000 through 2012. We used data from the 18 registries of the Surveillance, Epidemiology, and End Results (SEER) Program from 2000 to 2012 to calculate the cause of death, relative frequencies/incidences, 5-/10-year Kaplan-Meier survival, hazard ratios, and incidence rates for SC. Each parameter was analyzed by age, location of occurrence (ocular/extraocular), race, sex, and SEER registry. Overall incidence was 0.32 (male) and 0.16 (female) per 100,000 person-years. Incidence significantly increased, primarily because of an increase among men. Incidence among whites was almost 3 times the rate among non-whites. Male sex (P < .0001), black race (P = .01), and extraocular anatomic location (P < .0001) were associated with significantly higher all-cause mortality. However, overall case-specific mortality for SC decreased significantly. Underregistration of patients in SEER registries, lack of verification of individual diagnoses, and low levels of staging data because of low stage-classification rate are limitations. The overall incidence of SC is increasing significantly. Male sex, black race, and extraocular occurrences are associated with significantly greater mortality. Copyright © 2016 American Academy of Dermatology, Inc. Published by Elsevier Inc. All rights reserved.
Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses
Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M.; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V.; Ma’ayan, Avi
2018-01-01
Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated ‘canned’ analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools. PMID:29485625
Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses.
Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V; Ma'ayan, Avi
2018-02-27
Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated 'canned' analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools.
Measurement of skeletal related events in SEER-Medicare: a comparison of claims-based methods.
Aly, Abdalla; Onukwugha, Eberechukwu; Woods, Corinne; Mullins, C Daniel; Kwok, Young; Qian, Yi; Arellano, Jorge; Balakumaran, Arun; Hussain, Arif
2015-08-19
Skeletal related events (SREs) are common in men with metastatic prostate cancer (mPC). Various methods have been used to identify SREs from claims data. The objective of this study was to provide a framework for measuring SREs from claims and compare SRE prevalence and cumulative incidence estimates based on alternative approaches in men with mPC. Several claims-based approaches for identifying SREs were developed and applied to data for men aged [greater than or equal to] 66 years newly diagnosed with mPC between 2000 and 2009 in the SEER-Medicare datasets and followed through 2010 or until censoring. Post-diagnosis SREs were identified using claims that indicated spinal cord compression (SCC), pathologic fracture (PF), surgery to bone (BS), or radiation (suggestive of bone palliative radiation, RAD). To measure SRE prevalence, two SRE definitions were created: 'base case' (most commonly used in the literature) and 'alternative' in which different claims were used to identify each type of SRE. To measure cumulative incidence, we used the 'base case' definition and applied three periods in which claims were clustered to episodes: 14-, 21-, and 28-day windows. Among 8997 mPC patients, 46 % experienced an SRE according to the 'base case' definition and 43 % patients experienced an SRE according to the 'alternative' definition. Varying the code definition from 'base case' to 'alternative' resulted in an 8 % increase in the overall SRE prevalence. Using the 21-day window, a total of 12,930 SRE episodes were observed during follow up. Varying the window length from 21 to 28 days resulted in an 8 % decrease in SRE cumulative incidence (RAD: 10 %, PF: 8 %, SCC: 6 %, BS: 0.2 %). SRE prevalence was affected by the codes used, with PF being most impacted. The overall SRE cumulative incidence was affected by the window length used, with RAD being most affected. These results underscore the importance of the baseline definitions used to study claims data when attempting to understand relevant clinical events such as SREs in the real world setting.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jackson, Matthew W., E-mail: matthew.jackson@ucdenver.edu; Rusthoven, Chad G.; Jones, Bernard L.
Background: Primary mediastinal B cell lymphoma (PMBCL) is an uncommon lymphoma for which trials are few with small patient numbers. The role of radiation therapy (RT) after standard immunochemotherapy for early-stage disease has never been studied prospectively. We used the Surveillance, Epidemiology, and End Results (SEER) database to evaluate PMBCL and the impact of RT on outcomes. Methods and Materials: We queried the SEER database for patients with stage I-II PMBCL diagnosed from 2001 to 2011. Retrievable data included age, gender, race (white/nonwhite), stage, extranodal disease, year of diagnosis, and use of RT as a component of definitive therapy. Kaplan-Meier overallmore » survival (OS) estimates, univariate (UVA) log-rank and multivariate (MVA) Cox proportional hazards regression analyses were performed. Results: Two hundred fifty patients with stage I-II disease were identified, with a median follow-up time of 39 months (range, 3-125 months). The median age was 36 years (range, 18-89 years); 61% were female; 76% were white; 45% had stage I disease, 60% had extranodal disease, and 55% were given RT. The 5-year OS for the entire cohort was 86%. On UVA, OS was improved with RT (hazard ratio [HR] 0.446, P=.029) and decreased in association with nonwhite race (HR 2.70, P=.006). The 5-year OS was 79% (no RT) and 90% (RT). On MVA, white race and RT remained significantly associated with improved OS (P=.007 and .018, respectively). The use of RT decreased over time: 61% for the 67 patients whose disease was diagnosed from 2001 to 2005 and 53% in the 138 patients treated from 2006 to 2010. Conclusion: This retrospective population-based analysis is the largest PMBCL dataset to date and demonstrates a significant survival benefit associated with RT. Nearly half of patients treated in the United States do not receive RT, and its use appears to be declining. In the absence of phase 3 data, the use of RT should be strongly considered for its survival benefit in early-stage disease.« less
Swede, Helen; Sarwar, Amna; Magge, Anil; Braithwaite, Dejana; Cook, Linda S.; Gregorio, David I.; Jones, Beth A; Hoag, Jessica; Gonsalves, Lou; Salner, Andrew; Zarfos, Kristen; Andemariam, Biree; Stevens, Richard G; Dugan, Alicia; Pensa, Mellisa; Brockmeyer, Jessica
2017-01-01
Purpose A comparatively high prevalence of co-morbidities among African-American/Blacks (AA/B) has been implicated in disparate survival in breast cancer. There is a scarcity of data, however, if this effect persists when accounting for the adverse triple-negative breast cancer (TNBC) subtype which occurs at three-fold the rate in AA/B compared to white breast cancer patients. Methods We reviewed charts of 214 white and 202 AA/B breast cancer patients in the NCI-SEER Connecticut Tumor Registry who were diagnosed in 2000-07. We employed the Charlson Co-Morbidity Index (CCI), a weighted 17-item tool to predict risk of death in cancer populations. Cox Survival Analyses estimated hazard ratios (HR) for all-cause mortality in relation to TNBC and CCI adjusting for clinicopathological factors. Results Among patients with SEER-Local Stage, TNBC increased the risk of death (HR=2.18, 95% CI 1.14-4.16), which was attenuated when the CCI score was added to the model (Adj. HR=1.50, 95% CI 0.74-3.01). Conversely, the adverse impact of the CCI score persisted when controlling for TNBC (Adj. HR=1.49, 95% CI 1.29-1.71; per one point increase). Similar patterns were observed in SEER-Regional Stage but estimated HRs were lower. AA/B patients with a CCI score of ≥3 had a significantly higher risk of death compared to AA/B patients without comorbidities (Adj. HR=5.65, 95% CI 2.90-11.02). A lower and non-significant effect was observed for whites with a CCI of ≥3 (Adj. HR=1.90, 95% CI 0.68-5.29). Conclusions Co-morbidities at diagnosis increase risk of death independent of TNBC, and AA/B patients may be disproportionately at risk. PMID:27000206
Petkov, Valentina I; Miller, Dave P; Howlader, Nadia; Gliner, Nathan; Howe, Will; Schussler, Nicola; Cronin, Kathleen; Baehner, Frederick L; Cress, Rosemary; Deapen, Dennis; Glaser, Sally L; Hernandez, Brenda Y; Lynch, Charles F; Mueller, Lloyd; Schwartz, Ann G; Schwartz, Stephen M; Stroup, Antoinette; Sweeney, Carol; Tucker, Thomas C; Ward, Kevin C; Wiggins, Charles; Wu, Xiao-Cheng; Penberthy, Lynne; Shak, Steven
2016-01-01
The 21-gene Recurrence Score assay is validated to predict recurrence risk and chemotherapy benefit in hormone-receptor-positive (HR+) invasive breast cancer. To determine prospective breast-cancer-specific mortality (BCSM) outcomes by baseline Recurrence Score results and clinical covariates, the National Cancer Institute collaborated with Genomic Health and 14 population-based registries in the the Surveillance, Epidemiology, and End Results (SEER) Program to electronically supplement cancer surveillance data with Recurrence Score results. The prespecified primary analysis cohort was 40-84 years of age, and had node-negative, HR+, HER2-negative, nonmetastatic disease diagnosed between January 2004 and December 2011 in the entire SEER population, and Recurrence Score results ( N =38,568). Unadjusted 5-year BCSM were 0.4% ( n =21,023; 95% confidence interval (CI), 0.3-0.6%), 1.4% ( n =14,494; 95% CI, 1.1-1.7%), and 4.4% ( n =3,051; 95% CI, 3.4-5.6%) for Recurrence Score <18, 18-30, and ⩾31 groups, respectively ( P <0.001). In multivariable analysis adjusted for age, tumor size, grade, and race, the Recurrence Score result predicted BCSM ( P <0.001). Among patients with node-positive disease (micrometastases and up to three positive nodes; N =4,691), 5-year BCSM (unadjusted) was 1.0% ( n =2,694; 95% CI, 0.5-2.0%), 2.3% ( n =1,669; 95% CI, 1.3-4.1%), and 14.3% ( n =328; 95% CI, 8.4-23.8%) for Recurrence Score <18, 18-30, ⩾31 groups, respectively ( P <0.001). Five-year BCSM by Recurrence Score group are reported for important patient subgroups, including age, race, tumor size, grade, and socioeconomic status. This SEER study represents the largest report of prospective BCSM outcomes based on Recurrence Score results for patients with HR+, HER2-negative, node-negative, or node-positive breast cancer, including subgroups often under-represented in clinical trials.
Roberts, Megan C; Miller, Dave P; Shak, Steven; Petkov, Valentina I
2017-06-01
The Oncotype DX ® Breast Recurrence Score™ (RS) assay is validated to predict breast cancer (BC) recurrence and adjuvant chemotherapy benefit in select patients with lymph node-positive (LN+), hormone receptor-positive (HR+), HER2-negative BC. We assessed 5-year BC-specific survival (BCSS) in LN+ patients with RS results in SEER databases. In this population-based study, BC cases in SEER registries (diagnosed 2004-2013) were linked to RS results from assays performed by Genomic Health (2004-2014). The primary analysis included only patients (diagnosed 2004-2012) with LN+ (including micrometastases), HR+ (per SEER), and HER2-negative (per RT-PCR) primary invasive BC (N = 6768). BCSS, assessed by RS category and number of positive lymph nodes, was calculated using the actuarial method. The proportion of patients with RS results and LN+ disease (N = 8782) increased over time between 2004 and 2013, and decreased with increasing lymph node involvement from micrometastases to ≥4 lymph nodes. Five-year BCSS outcomes for those with RS < 18 ranged from 98.9% (95% CI 97.4-99.6) for those with micrometastases to 92.8% (95% CI 73.4-98.2) for those with ≥4 lymph nodes. Similar patterns were found for patients with RS 18-30 and RS ≥ 31. RS group was strongly predictive of BCSS among patients with micrometastases or up to three positive lymph nodes (p < 0.001). Overall, 5-year BCSS is excellent for patients with RS < 18 and micrometastases, one or two positive lymph nodes, and worsens with additionally involved lymph nodes. Further analyses should account for treatment variables, and longitudinal updates will be important to better characterize utilization of Oncotype DX testing and long-term survival outcomes.
IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics
2016-01-01
Background We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. Objective To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. Methods The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Results Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix. Conclusions IBMWA is a new alternative for data analytics software that automates descriptive, predictive, and visual analytics. This program is very user-friendly but requires data preprocessing, statistical conceptual understanding, and domain expertise. PMID:27729304
IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics.
Hoyt, Robert Eugene; Snider, Dallas; Thompson, Carla; Mantravadi, Sarita
2016-10-11
We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix. IBMWA is a new alternative for data analytics software that automates descriptive, predictive, and visual analytics. This program is very user-friendly but requires data preprocessing, statistical conceptual understanding, and domain expertise.
Liang, Li-Jung; Weiss, Robert E; Redelings, Benjamin; Suchard, Marc A
2009-10-01
Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest. We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion-deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.
These summaries provide statistics for common cancer types. The statistics include incidence, mortality, survival, stage, prevalence, and lifetime risk. Links to additional resources are included. Updated annually.
ViPAR: a software platform for the Virtual Pooling and Analysis of Research Data.
Carter, Kim W; Francis, Richard W; Carter, K W; Francis, R W; Bresnahan, M; Gissler, M; Grønborg, T K; Gross, R; Gunnes, N; Hammond, G; Hornig, M; Hultman, C M; Huttunen, J; Langridge, A; Leonard, H; Newman, S; Parner, E T; Petersson, G; Reichenberg, A; Sandin, S; Schendel, D E; Schalkwyk, L; Sourander, A; Steadman, C; Stoltenberg, C; Suominen, A; Surén, P; Susser, E; Sylvester Vethanayagam, A; Yusof, Z
2016-04-01
Research studies exploring the determinants of disease require sufficient statistical power to detect meaningful effects. Sample size is often increased through centralized pooling of disparately located datasets, though ethical, privacy and data ownership issues can often hamper this process. Methods that facilitate the sharing of research data that are sympathetic with these issues and which allow flexible and detailed statistical analyses are therefore in critical need. We have created a software platform for the Virtual Pooling and Analysis of Research data (ViPAR), which employs free and open source methods to provide researchers with a web-based platform to analyse datasets housed in disparate locations. Database federation permits controlled access to remotely located datasets from a central location. The Secure Shell protocol allows data to be securely exchanged between devices over an insecure network. ViPAR combines these free technologies into a solution that facilitates 'virtual pooling' where data can be temporarily pooled into computer memory and made available for analysis without the need for permanent central storage. Within the ViPAR infrastructure, remote sites manage their own harmonized research dataset in a database hosted at their site, while a central server hosts the data federation component and a secure analysis portal. When an analysis is initiated, requested data are retrieved from each remote site and virtually pooled at the central site. The data are then analysed by statistical software and, on completion, results of the analysis are returned to the user and the virtually pooled data are removed from memory. ViPAR is a secure, flexible and powerful analysis platform built on open source technology that is currently in use by large international consortia, and is made publicly available at [http://bioinformatics.childhealthresearch.org.au/software/vipar/]. © The Author 2015. Published by Oxford University Press on behalf of the International Epidemiological Association.
GenomeGraphs: integrated genomic data visualization with R.
Durinck, Steffen; Bullard, James; Spellman, Paul T; Dudoit, Sandrine
2009-01-06
Biological studies involve a growing number of distinct high-throughput experiments to characterize samples of interest. There is a lack of methods to visualize these different genomic datasets in a versatile manner. In addition, genomic data analysis requires integrated visualization of experimental data along with constantly changing genomic annotation and statistical analyses. We developed GenomeGraphs, as an add-on software package for the statistical programming environment R, to facilitate integrated visualization of genomic datasets. GenomeGraphs uses the biomaRt package to perform on-line annotation queries to Ensembl and translates these to gene/transcript structures in viewports of the grid graphics package. This allows genomic annotation to be plotted together with experimental data. GenomeGraphs can also be used to plot custom annotation tracks in combination with different experimental data types together in one plot using the same genomic coordinate system. GenomeGraphs is a flexible and extensible software package which can be used to visualize a multitude of genomic datasets within the statistical programming environment R.
Integrated web system of geospatial data services for climate research
NASA Astrophysics Data System (ADS)
Okladnikov, Igor; Gordov, Evgeny; Titov, Alexander
2016-04-01
Georeferenced datasets are currently actively used for modeling, interpretation and forecasting of climatic and ecosystem changes on different spatial and temporal scales. Due to inherent heterogeneity of environmental datasets as well as their huge size (up to tens terabytes for a single dataset) a special software supporting studies in the climate and environmental change areas is required. An approach for integrated analysis of georefernced climatological data sets based on combination of web and GIS technologies in the framework of spatial data infrastructure paradigm is presented. According to this approach a dedicated data-processing web system for integrated analysis of heterogeneous georeferenced climatological and meteorological data is being developed. It is based on Open Geospatial Consortium (OGC) standards and involves many modern solutions such as object-oriented programming model, modular composition, and JavaScript libraries based on GeoExt library, ExtJS Framework and OpenLayers software. This work is supported by the Ministry of Education and Science of the Russian Federation, Agreement #14.613.21.0037.
NASA Astrophysics Data System (ADS)
Bhattacharjee, T.; Kumar, P.; Fillipe, L.
2018-02-01
Vibrational spectroscopy, especially FTIR and Raman, has shown enormous potential in disease diagnosis, especially in cancers. Their potential for detecting varied pathological conditions are regularly reported. However, to prove their applicability in clinics, large multi-center multi-national studies need to be undertaken; and these will result in enormous amount of data. A parallel effort to develop analytical methods, including user-friendly software that can quickly pre-process data and subject them to required multivariate analysis is warranted in order to obtain results in real time. This study reports a MATLAB based script that can automatically import data, preprocess spectra— interpolation, derivatives, normalization, and then carry out Principal Component Analysis (PCA) followed by Linear Discriminant Analysis (LDA) of the first 10 PCs; all with a single click. The software has been verified on data obtained from cell lines, animal models, and in vivo patient datasets, and gives results comparable to Minitab 16 software. The software can be used to import variety of file extensions, asc, .txt., .xls, and many others. Options to ignore noisy data, plot all possible graphs with PCA factors 1 to 5, and save loading factors, confusion matrices and other parameters are also present. The software can provide results for a dataset of 300 spectra within 0.01 s. We believe that the software will be vital not only in clinical trials using vibrational spectroscopic data, but also to obtain rapid results when these tools get translated into clinics.
Informatics research using publicly available pathology data.
Berman, Jules J
2011-01-24
The day has not arrived when pathology departments freely distribute their collected anatomic and clinical data for research purposes. Nonetheless, several valuable public domain data sets are currently available, from the U.S. Government. Two public data sets of special interest to pathologists are the SEER (the U.S. National Cancer Institute's Surveillance, Epidemiology and End Results program) public use data files, and the CDC (Center for Disease Control and Prevention) mortality files. The SEER files contain about 4 million de-identified cancer records, dating from 1973. The CDC mortality files contain approximately 85 million de-identified death records, dating from 1968. This editorial briefly describes both data sources, how they can be obtained, and how they may be used for pathology research.
Software tools for interactive instruction in radiologic anatomy.
Alvarez, Antonio; Gold, Garry E; Tobin, Brian; Desser, Terry S
2006-04-01
To promote active learning in an introductory Radiologic Anatomy course through the use of computer-based exercises. DICOM datasets from our hospital PACS system were transferred to a networked cluster of desktop computers in a medical school classroom. Medical students in the Radiologic Anatomy course were divided into four small groups and assigned to work on a clinical case for 45 minutes. The groups used iPACS viewer software, a free DICOM viewer, to view images and annotate anatomic structures. The classroom instructor monitored and displayed each group's work sequentially on the master screen by running SynchronEyes, a software tool for controlling PC desktops remotely. Students were able to execute the assigned tasks using the iPACS software with minimal oversight or instruction. Course instructors displayed each group's work on the main display screen of the classroom as the students presented the rationale for their decisions. The interactive component of the course received high ratings from the students and overall course ratings were higher than in prior years when the course was given solely in lecture format. DICOM viewing software is an excellent tool for enabling students to learn radiologic anatomy from real-life clinical datasets. Interactive exercises performed in groups can be powerful tools for stimulating students to learn radiologic anatomy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, V; Kohli, K
Purpose: Metal artifact reduction (MAR) software in computed tomography (CT) was previously evaluated with phantoms demonstrating the algorithm is capable of reducing metal artifacts without affecting the overall image quality. The goal of this study is to determine the dosimetric impact when calculating with CT datasets reconstructed with and without MAR software. Methods: Twelve head and neck cancer patients with dental fillings and four pelvic cancer patients with hip prosthesis were scanned with a GE Optima RT 580 CT scanner. Images were reconstructed with and without the MAR software. 6MV IMRT and VMAT plans were calculated with AAA on themore » MAR dataset until all constraints met our clinic’s guidelines. Contours from the MAR dataset were copied to the non-MAR dataset. Next, dose calculation on the non-MAR dataset was performed using the same field arrangements and fluence as the MAR plan. Conformality index, D99% and V100% to PTV were compared between MAR and non-MAR plans. Results: Differences between MAR and non-MAR plans were evaluated. For head and neck plans, the largest variations in conformality index, D99% and V100% were −3.8%, −0.9% and −2.1% respectively whereas for pelvic plans, the biggest discrepancies were −32.7%, −0.4% and -33.5% respectively. The dosimetric impact from hip prosthesis is greater because it produces more artifacts compared to dental fillings. Coverage to PTV can increase or decrease depending on the artifacts since dark streaks reduce the HU whereas bright streaks increase the HU. In the majority of the cases, PTV dose in the non-MAR plans is higher than MAR plans. Conclusion: With the presence of metals, MAR algorithm can allow more accurate delineation of targets and OARs. Dose difference between MAR and non-MAR plans depends on the proximity of the organ to the high density material, the streaking artifacts and the beam arrangements of the plan.« less
Earth-Science Data Co-Locating Tool
NASA Technical Reports Server (NTRS)
Lee, Seungwon; Pan, Lei; Block, Gary L.
2012-01-01
This software is used to locate Earth-science satellite data and climate-model analysis outputs in space and time. This enables the direct comparison of any set of data with different spatial and temporal resolutions. It is written in three separate modules that are clearly separated for their functionality and interface with other modules. This enables a fast development of supporting any new data set. In this updated version of the tool, several new front ends are developed for new products. This software finds co-locatable data pairs for given sets of data products and creates new data products that share the same spatial and temporal coordinates. This facilitates the direct comparison between the two heterogeneous datasets and the comprehensive and synergistic use of the datasets.
NASA Technical Reports Server (NTRS)
Johnson, Jeffrey R.
2006-01-01
This viewgraph presentation reviews the problems that non-mission researchers have in accessing data to use in their analysis of Mars. The increasing complexity of Mars datasets results in custom software development by instrument teams that is often the only means to visualize and analyze the data. The solutions to the problem are to continue efforts toward synergizing data from multiple missions and making the data, s/w, derived products available in standardized, easily-accessible formats, encourage release of "lite" versions of mission-related software prior to end-of-mission, and planetary image data should be systematically processed in a coordinated way and made available in an easily accessed form. The recommendations of Mars Environmental GIS Workshop are reviewed.
MIPE: A metagenome-based community structure explorer and SSU primer evaluation tool
Zhou, Quan
2017-01-01
An understanding of microbial community structure is an important issue in the field of molecular ecology. The traditional molecular method involves amplification of small subunit ribosomal RNA (SSU rRNA) genes by polymerase chain reaction (PCR). However, PCR-based amplicon approaches are affected by primer bias and chimeras. With the development of high-throughput sequencing technology, unbiased SSU rRNA gene sequences can be mined from shotgun sequencing-based metagenomic or metatranscriptomic datasets to obtain a reflection of the microbial community structure in specific types of environment and to evaluate SSU primers. However, the use of short reads obtained through next-generation sequencing for primer evaluation has not been well resolved. The software MIPE (MIcrobiota metagenome Primer Explorer) was developed to adapt numerous short reads from metagenomes and metatranscriptomes. Using metagenomic or metatranscriptomic datasets as input, MIPE extracts and aligns rRNA to reveal detailed information on microbial composition and evaluate SSU rRNA primers. A mock dataset, a real Metagenomics Rapid Annotation using Subsystem Technology (MG-RAST) test dataset, two PrimerProspector test datasets and a real metatranscriptomic dataset were used to validate MIPE. The software calls Mothur (v1.33.3) and the SILVA database (v119) for the alignment and classification of rRNA genes from a metagenome or metatranscriptome. MIPE can effectively extract shotgun rRNA reads from a metagenome or metatranscriptome and is capable of classifying these sequences and exhibiting sensitivity to different SSU rRNA PCR primers. Therefore, MIPE can be used to guide primer design for specific environmental samples. PMID:28350876
Search this database of articles and other publications produced by cancer registry staff and Surveillance Research Program staff. Search by author, title, date, and organization. Provides links to PubMed and abstracts.
P-MartCancer–Interactive Online Software to Enable Analysis of Shotgun Cancer Proteomic Datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Webb-Robertson, Bobbie-Jo M.; Bramer, Lisa M.; Jensen, Jeffrey L.
P-MartCancer is a new interactive web-based software environment that enables biomedical and biological scientists to perform in-depth analyses of global proteomics data without requiring direct interaction with the data or with statistical software. P-MartCancer offers a series of statistical modules associated with quality assessment, peptide and protein statistics, protein quantification and exploratory data analyses driven by the user via customized workflows and interactive visualization. Currently, P-MartCancer offers access to multiple cancer proteomic datasets generated through the Clinical Proteomics Tumor Analysis Consortium (CPTAC) at the peptide, gene and protein levels. P-MartCancer is deployed using Azure technologies (http://pmart.labworks.org/cptac.html), the web-service is alternativelymore » available via Docker Hub (https://hub.docker.com/r/pnnl/pmart-web/) and many statistical functions can be utilized directly from an R package available on GitHub (https://github.com/pmartR).« less
Pre-operative segmentation of neck CT datasets for the planning of neck dissections
NASA Astrophysics Data System (ADS)
Cordes, Jeanette; Dornheim, Jana; Preim, Bernhard; Hertel, Ilka; Strauss, Gero
2006-03-01
For the pre-operative segmentation of CT neck datasets, we developed the software assistant NeckVision. The relevant anatomical structures for neck dissection planning can be segmented and the resulting patient-specific 3D-models are visualized afterwards in another software system for intervention planning. As a first step, we examined the appropriateness of elementary segmentation techniques based on gray values and contour information to extract the structures in the neck region from CT data. Region growing, interactive watershed transformation and live-wire are employed for segmentation of different target structures. It is also examined, which of the segmentation tasks can be automated. Based on this analysis, the software assistant NeckVision was developed to optimally support the workflow of image analysis for clinicians. The usability of NeckVision was tested within a first evaluation with four otorhinolaryngologists from the university hospital of Leipzig, four computer scientists from the university of Magdeburg and two laymen in both fields.
Duchman, Kyle R; Gao, Yubo; Miller, Benjamin J
2015-04-01
The current study aims to determine cause-specific survival in patients with Ewing's sarcoma while reporting clinical risk factors for survival. The Surveillance, Epidemiology, and End Results (SEER) Program database was used to identify patients with osseous Ewing's sarcoma from 1991 to 2010. Patient, tumor, and socioeconomic variables were analyzed to determine prognostic factors for survival. There were 1163 patients with Ewing's sarcoma identified in the SEER Program database. The 10-year cause-specific survival for patients with non-metastatic disease at diagnosis was 66.8% and 28.1% for patients with metastatic disease. Black patients demonstrated reduced survival at 10 years with an increased frequency of metastatic disease at diagnosis as compared to patients of other race, while Hispanic patients more frequently presented with tumor size>10cm. Univariate analysis revealed that metastatic disease at presentation, tumor size>10cm, axial tumor location, patient age≥20 years, black race, and male sex were associated with decreased cause-specific survival at 10 years. Metastatic disease at presentation, axial tumor location, tumor size>10cm, and age≥20 years remained significant in the multivariate analysis. Patients with Ewing's sarcoma have decreased cause-specific survival at 10 years when metastatic at presentation, axial tumor location, tumor size>10cm, and patient age≥20 years. Copyright © 2015 Elsevier Ltd. All rights reserved.
Polednak, Anthony P
2014-08-01
To enhance surveillance of mortality from oral cavity-pharynx cancer (OCPC) by considering inaccuracies in the cancer site coded as the underlying cause of death on death certificates vs. cancer site in a population-based cancer registry (as the gold standard). A database was used for 9 population-based cancer registries of the Surveillance, Epidemiology and End Results (SEER) Program, including deaths in 1999-2010 for patients diagnosed in 1973-2010. Numbers of deaths and death rates for OCPC in the SEER population were modified for apparent inaccuracies in the cancer site coded as the underlying cause of death. For age groups <65 years, deaths from OCPC were underestimated by 22-35% by using unmodified (vs. modified) numbers, but temporal declines in death rates were still evident in the SEER population and were similar to declines using routine mortality data for the entire U.S. population. Deaths were underestimated by about 70-80% using underlying cause for tonsillar cancers, strongly associated with human papillomavirus (HPV) infection, but a lack of decline in death rates was still evident. Routine mortality statistics based on underlying cause of death underestimate OCPC deaths but demonstrate trends in OCPC death rates that require continued surveillance in view of increasing incidence rates for HPV-related OCPC. Copyright © 2014 Elsevier Ltd. All rights reserved.
Implications of inaccurate clinical nodal staging in pancreatic adenocarcinoma.
Swords, Douglas S; Firpo, Matthew A; Johnson, Kirsten M; Boucher, Kenneth M; Scaife, Courtney L; Mulvihill, Sean J
2017-07-01
Many patients with stage I-II pancreatic adenocarcinoma do not undergo resection. We hypothesized that (1) clinical staging underestimates nodal involvement, causing stage IIB to have a greater percent of resected patients and (2) this stage-shift causes discrepancies in observed survival. The Surveillance, Epidemiology, and End Results (SEER) research database was used to evaluate cause-specific survival in patients with pancreatic adenocarcinoma from 2004-2012. Survival was compared using the log-rank test. Single-center data on 105 patients who underwent resection of pancreatic adenocarcinoma without neoadjuvant treatment were used to compare clinical and pathologic nodal staging. In SEER data, medium-term survival in stage IIB was superior to IB and IIA, with median cause-specific survival of 14, 9, and 11 months, respectively (P < .001). Seventy-two percent of stage IIB patients underwent resection vs 28% in IB and 36% in IIA (P < .001). In our institutional data, 12.4% of patients had clinical evidence of nodal involvement vs 69.5% by pathologic staging (P < .001). Among clinical stage IA-IIA patients, 71.6% had nodal involvement by pathologic staging. Both SEER and institutional data support substantial underestimation of nodal involvement by clinical staging. This finding has implications in decisions regarding neoadjuvant therapy and analysis of outcomes in the absence of pathologic staging. Copyright © 2017 Elsevier Inc. All rights reserved.
Parallel Index and Query for Large Scale Data Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chou, Jerry; Wu, Kesheng; Ruebel, Oliver
2011-07-18
Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less
Cardiovascular imaging environment: will the future be cloud-based?
Kawel-Boehm, Nadine; Bluemke, David A
2017-07-01
In cardiovascular CT and MR imaging large datasets have to be stored, post-processed, analyzed and distributed. Beside basic assessment of volume and function in cardiac magnetic resonance imaging e.g., more sophisticated quantitative analysis is requested requiring specific software. Several institutions cannot afford various types of software and provide expertise to perform sophisticated analysis. Areas covered: Various cloud services exist related to data storage and analysis specifically for cardiovascular CT and MR imaging. Instead of on-site data storage, cloud providers offer flexible storage services on a pay-per-use basis. To avoid purchase and maintenance of specialized software for cardiovascular image analysis, e.g. to assess myocardial iron overload, MR 4D flow and fractional flow reserve, evaluation can be performed with cloud based software by the consumer or complete analysis is performed by the cloud provider. However, challenges to widespread implementation of cloud services include regulatory issues regarding patient privacy and data security. Expert commentary: If patient privacy and data security is guaranteed cloud imaging is a valuable option to cope with storage of large image datasets and offer sophisticated cardiovascular image analysis for institutions of all sizes.
a Critical Review of Automated Photogrammetric Processing of Large Datasets
NASA Astrophysics Data System (ADS)
Remondino, F.; Nocerino, E.; Toschi, I.; Menna, F.
2017-08-01
The paper reports some comparisons between commercial software able to automatically process image datasets for 3D reconstruction purposes. The main aspects investigated in the work are the capability to correctly orient large sets of image of complex environments, the metric quality of the results, replicability and redundancy. Different datasets are employed, each one featuring a diverse number of images, GSDs at cm and mm resolutions, and ground truth information to perform statistical analyses of the 3D results. A summary of (photogrammetric) terms is also provided, in order to provide rigorous terms of reference for comparisons and critical analyses.
NASA Astrophysics Data System (ADS)
Stevens, S. E.; Nelson, B. R.; Langston, C.; Qi, Y.
2012-12-01
The National Mosaic and Multisensor QPE (NMQ/Q2) software suite, developed at NOAA's National Severe Storms Laboratory (NSSL) in Norman, OK, addresses a large deficiency in the resolution of currently archived precipitation datasets. Current standards, both radar- and satellite-based, provide for nationwide precipitation data with a spatial resolution of up to 4-5 km, with a temporal resolution as fine as one hour. Efforts are ongoing to process archived NEXRAD data for the period of record (1996 - present), producing a continuous dataset providing precipitation data at a spatial resolution of 1 km, on a timescale of only five minutes. In addition, radar-derived precipitation data are adjusted hourly using a wide variety of automated gauge networks spanning the United States. Applications for such a product range widely, from emergency management and flash flood guidance, to hydrological studies and drought monitoring. Results are presented from a subset of the NEXRAD dataset, providing basic statistics on the distribution of rainrates, relative frequency of precipitation types, and several other variables which demonstrate the variety of output provided by the software. Precipitation data from select case studies are also presented to highlight the increased resolution provided by this reanalysis and the possibilities that arise from the availability of data on such fine scales. A previously completed pilot project and steps toward a nationwide implementation are presented along with proposed strategies for managing and processing such a large dataset. Reprocessing efforts span several institutions in both North Carolina and Oklahoma, and data/software coordination are key in producing a homogeneous record of precipitation to be archived alongside NOAA's other Climate Data Records. Methods are presented for utilizing supercomputing capability in expediting processing, to allow for the iterative nature of a reanalysis effort.
Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data
Kumar, Shailesh; Vo, Angie Duy; Qin, Fujun; Li, Hui
2016-01-01
RNA-Seq made possible the global identification of fusion transcripts, i.e. “chimeric RNAs”. Even though various software packages have been developed to serve this purpose, they behave differently in different datasets provided by different developers. It is important for both users, and developers to have an unbiased assessment of the performance of existing fusion detection tools. Toward this goal, we compared the performance of 12 well-known fusion detection software packages. We evaluated the sensitivity, false discovery rate, computing time, and memory usage of these tools in four different datasets (positive, negative, mixed, and test). We conclude that some tools are better than others in terms of sensitivity, positive prediction value, time consumption and memory usage. We also observed small overlaps of the fusions detected by different tools in the real dataset (test dataset). This could be due to false discoveries by various tools, but could also be due to the reason that none of the tools are inclusive. We have found that the performance of the tools depends on the quality, read length, and number of reads of the RNA-Seq data. We recommend that users choose the proper tools for their purpose based on the properties of their RNA-Seq data. PMID:26862001
More Cancer Types - SEER Cancer Stat Facts
Cancer Statistical Fact Sheets are summaries of common cancer types developed to provide an overview of frequently-requested cancer statistics including incidence, mortality, survival, stage, prevalence, and lifetime risk.
DOE Office of Scientific and Technical Information (OSTI.GOV)
C. Withers, J. Cummings, B. Nigusse, E. Martin
A new generation of central, ducted variable-capacity heat pump systems has come on the market, promising very high cooling and heating efficiency. Instead of cycling on at full capacity and then cycling off when the thermostat is satisfied, they vary their cooling and heating output over a wide range (approximately 40 to 118% of nominal full capacity); thus, staying 'on' for 60% to 100% more hours per day compared to fixed-capacity systems. Current Phase 4 experiments in an instrumented lab home with simulated occupancy evaluate the impact of duct R-value enhancement on the overall operating efficiency of the variable-capacity systemmore » compared to the fixed-capacity system.« less
Did You Know? Video Series - SEER Cancer Statistics
Videos that explain cancer statistics. Choose from topics including survival, statistics overview, survivorship, disparities, and specific cancer types including breast, lung, colorectal, prostate, melanoma of the skin, and others.
Description of the U.S. Geological Survey Geo Data Portal data integration framework
Blodgett, David L.; Booth, Nathaniel L.; Kunicki, Thomas C.; Walker, Jordan I.; Lucido, Jessica M.
2012-01-01
The U.S. Geological Survey has developed an open-standard data integration framework for working efficiently and effectively with large collections of climate and other geoscience data. A web interface accesses catalog datasets to find data services. Data resources can then be rendered for mapping and dataset metadata are derived directly from these web services. Algorithm configuration and information needed to retrieve data for processing are passed to a server where all large-volume data access and manipulation takes place. The data integration strategy described here was implemented by leveraging existing free and open source software. Details of the software used are omitted; rather, emphasis is placed on how open-standard web services and data encodings can be used in an architecture that integrates common geographic and atmospheric data.
bioWeb3D: an online webGL 3D data visualisation tool.
Pettit, Jean-Baptiste; Marioni, John C
2013-06-07
Data visualization is critical for interpreting biological data. However, in practice it can prove to be a bottleneck for non trained researchers; this is especially true for three dimensional (3D) data representation. Whilst existing software can provide all necessary functionalities to represent and manipulate biological 3D datasets, very few are easily accessible (browser based), cross platform and accessible to non-expert users. An online HTML5/WebGL based 3D visualisation tool has been developed to allow biologists to quickly and easily view interactive and customizable three dimensional representations of their data along with multiple layers of information. Using the WebGL library Three.js written in Javascript, bioWeb3D allows the simultaneous visualisation of multiple large datasets inputted via a simple JSON, XML or CSV file, which can be read and analysed locally thanks to HTML5 capabilities. Using basic 3D representation techniques in a technologically innovative context, we provide a program that is not intended to compete with professional 3D representation software, but that instead enables a quick and intuitive representation of reasonably large 3D datasets.
Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter; Koslicki, David; Janssen, Stefan; Dröge, Johannes; Gregor, Ivan; Majda, Stephan; Fiedler, Jessika; Dahms, Eik; Bremges, Andreas; Fritz, Adrian; Garrido-Oter, Ruben; Jørgensen, Tue Sparholt; Shapiro, Nicole; Blood, Philip D.; Gurevich, Alexey; Bai, Yang; Turaev, Dmitrij; DeMaere, Matthew Z.; Chikhi, Rayan; Nagarajan, Niranjan; Quince, Christopher; Meyer, Fernando; Balvočiūtė, Monika; Hansen, Lars Hestbjerg; Sørensen, Søren J.; Chia, Burton K. H.; Denis, Bertrand; Froula, Jeff L.; Wang, Zhong; Egan, Robert; Kang, Dongwan Don; Cook, Jeffrey J.; Deltel, Charles; Beckstette, Michael; Lemaitre, Claire; Peterlongo, Pierre; Rizk, Guillaume; Lavenier, Dominique; Wu, Yu-Wei; Singer, Steven W.; Jain, Chirag; Strous, Marc; Klingenberg, Heiner; Meinicke, Peter; Barton, Michael; Lingner, Thomas; Lin, Hsin-Hung; Liao, Yu-Chieh; Silva, Genivaldo Gueiros Z.; Cuevas, Daniel A.; Edwards, Robert A.; Saha, Surya; Piro, Vitor C.; Renard, Bernhard Y.; Pop, Mihai; Klenk, Hans-Peter; Göker, Markus; Kyrpides, Nikos C.; Woyke, Tanja; Vorholt, Julia A.; Schulze-Lefert, Paul; Rubin, Edward M.; Darling, Aaron E.; Rattei, Thomas; McHardy, Alice C.
2018-01-01
In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions. PMID:28967888
Manipulation of volumetric patient data in a distributed virtual reality environment.
Dech, F; Ai, Z; Silverstein, J C
2001-01-01
Due to increases in network speed and bandwidth, distributed exploration of medical data in immersive Virtual Reality (VR) environments is becoming increasingly feasible. The volumetric display of radiological data in such environments presents a unique set of challenges. The shear size and complexity of the datasets involved not only make them difficult to transmit to remote sites, but these datasets also require extensive user interaction in order to make them understandable to the investigator and manageable to the rendering hardware. A sophisticated VR user interface is required in order for the clinician to focus on the aspects of the data that will provide educational and/or diagnostic insight. We will describe a software system of data acquisition, data display, Tele-Immersion, and data manipulation that supports interactive, collaborative investigation of large radiological datasets. The hardware required in this strategy is still at the high-end of the graphics workstation market. Future software ports to Linux and NT, along with the rapid development of PC graphics cards, open the possibility for later work with Linux or NT PCs and PC clusters.
NASA Astrophysics Data System (ADS)
Evans, Robert D.; Petropavlovskikh, Irina; McClure-Begley, Audra; McConville, Glen; Quincy, Dorothy; Miyagawa, Koji
2017-10-01
The United States government has operated Dobson ozone spectrophotometers at various sites, starting during the International Geophysical Year (1 July 1957 to 31 December 1958). A network of stations for long-term monitoring of the total column content (thickness of the ozone layer) of the atmosphere was established in the early 1960s and eventually grew to 16 stations, 14 of which are still operational and submit data to the United States of America's National Oceanic and Atmospheric Administration (NOAA). Seven of these sites are also part of the Network for the Detection of Atmospheric Composition Change (NDACC), an organization that maintains its own data archive. Due to recent changes in data processing software the entire dataset was re-evaluated for possible changes. To evaluate and minimize potential changes caused by the new processing software, the reprocessed data record was compared to the original data record archived in the World Ozone and UV Data Center (WOUDC) in Toronto, Canada. The history of the observations at the individual stations, the instruments used for the NOAA network monitoring at the station, the method for reducing zenith-sky observations to total ozone, and calibration procedures were re-evaluated using data quality control tools built into the new software. At the completion of the evaluation, the new datasets are to be published as an update to the WOUDC and NDACC archives, and the entire dataset is to be made available to the scientific community. The procedure for reprocessing Dobson data and the results of the reanalysis on the archived record are presented in this paper. A summary of historical changes to 14 station records is also provided.
MemAxes Visualization Software
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hardware advancements such as Intel's PEBS and AMD's IBS, as well as software developments such as the perf_event API in Linux have made available the acquisition of memory access samples with performance information. MemAxes is a visualization and analysis tool for memory access sample data. By mapping the samples to their associated code, variables, node topology, and application dataset, MemAxes provides intuitive views of the data.
Ai, Weiyun Z; Keegan, Theresa H; Press, David J; Yang, Juan; Pincus, Laura B; Kim, Youn H; Chang, Ellen T
2014-07-01
Mycosis fungoides and Sézary syndrome (MF/SS) are rare in children and young adults, and thus the incidence and outcomes in this patient population are not well studied. To assess the incidence and outcomes of MF/SS in patients diagnosed before 30 years of age. Retrospective study of 2 population-based cancer registries-the California Cancer Registry (n = 204) and 9 US cancer registries of the Surveillance, Epidemiology, and End Results program (SEER 9; n = 195)-for patients diagnosed with MF/SS before 30 years of age. Overall survival was calculated by the Kaplan-Meier method. The risk of a second cancer was assessed by calculating the standard incidence ratio (SIR) comparing observed cancer incidence in patients with MF/SS with the expected incidence in the age-, sex-, and race-standardized general population. The incidence of MF/SS is rare before 30 years of age, with an incidence rate of 0.05 per 100,000 persons per year before age 20 years and 0.12 per 100,000 persons per year between ages 20 and 29 years in the California Cancer Registry. At 10 years, patients with MF/SS had an overall survival of 94.3% (95% CI, 89.6%-97.2%) in the California Cancer Registry and 88.9% (95% CI, 82.4%-93.2%) in SEER 9. In SEER 9, there was a significant excess risk of all types of second cancers combined (SIR, 3.40; 95% CI, 1.55-6.45), particularly lymphoma (SIR, 12.86; 95% CI, 2.65-37.59) and melanoma (SIR, 9.31; 95% CI, 8.75-33.62). In the California Cancer Registry, the SIR for risk of all types of second cancers was similar to that in SEER 9 (SIR, 3.45; 95% CI, 0.94-8.83), although not statistically significant. Young patients with MF/SS have a favorable outcome, despite a strong suggestion of an increased risk of second primary cancers. Prolonged follow-up is warranted to definitively assess their risk of developing second cancers in a lifetime.
Petkov, Valentina I; Miller, Dave P; Howlader, Nadia; Gliner, Nathan; Howe, Will; Schussler, Nicola; Cronin, Kathleen; Baehner, Frederick L; Cress, Rosemary; Deapen, Dennis; Glaser, Sally L; Hernandez, Brenda Y; Lynch, Charles F; Mueller, Lloyd; Schwartz, Ann G; Schwartz, Stephen M; Stroup, Antoinette; Sweeney, Carol; Tucker, Thomas C; Ward, Kevin C; Wiggins, Charles; Wu, Xiao-Cheng; Penberthy, Lynne; Shak, Steven
2016-01-01
The 21-gene Recurrence Score assay is validated to predict recurrence risk and chemotherapy benefit in hormone-receptor-positive (HR+) invasive breast cancer. To determine prospective breast-cancer-specific mortality (BCSM) outcomes by baseline Recurrence Score results and clinical covariates, the National Cancer Institute collaborated with Genomic Health and 14 population-based registries in the the Surveillance, Epidemiology, and End Results (SEER) Program to electronically supplement cancer surveillance data with Recurrence Score results. The prespecified primary analysis cohort was 40–84 years of age, and had node-negative, HR+, HER2-negative, nonmetastatic disease diagnosed between January 2004 and December 2011 in the entire SEER population, and Recurrence Score results (N=38,568). Unadjusted 5-year BCSM were 0.4% (n=21,023; 95% confidence interval (CI), 0.3–0.6%), 1.4% (n=14,494; 95% CI, 1.1–1.7%), and 4.4% (n=3,051; 95% CI, 3.4–5.6%) for Recurrence Score <18, 18–30, and ⩾31 groups, respectively (P<0.001). In multivariable analysis adjusted for age, tumor size, grade, and race, the Recurrence Score result predicted BCSM (P<0.001). Among patients with node-positive disease (micrometastases and up to three positive nodes; N=4,691), 5-year BCSM (unadjusted) was 1.0% (n=2,694; 95% CI, 0.5–2.0%), 2.3% (n=1,669; 95% CI, 1.3–4.1%), and 14.3% (n=328; 95% CI, 8.4–23.8%) for Recurrence Score <18, 18–30, ⩾31 groups, respectively (P<0.001). Five-year BCSM by Recurrence Score group are reported for important patient subgroups, including age, race, tumor size, grade, and socioeconomic status. This SEER study represents the largest report of prospective BCSM outcomes based on Recurrence Score results for patients with HR+, HER2-negative, node-negative, or node-positive breast cancer, including subgroups often under-represented in clinical trials. PMID:28721379
Nilsson, R Henrik; Tedersoo, Leho; Ryberg, Martin; Kristiansson, Erik; Hartmann, Martin; Unterseher, Martin; Porter, Teresita M; Bengtsson-Palme, Johan; Walker, Donald M; de Sousa, Filipe; Gamper, Hannes Andres; Larsson, Ellen; Larsson, Karl-Henrik; Kõljalg, Urmas; Edgar, Robert C; Abarenkov, Kessy
2015-01-01
The nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen genetic marker for the molecular identification of fungi in environmental sequencing and molecular ecology studies. Several analytical issues complicate such efforts, one of which is the formation of chimeric-artificially joined-DNA sequences during PCR amplification or sequence assembly. Several software tools are currently available for chimera detection, but rely to various degrees on the presence of a chimera-free reference dataset for optimal performance. However, no such dataset is available for use with the fungal ITS region. This study introduces a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database for the molecular identification of fungi. This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. The performance of the dataset on a large set of artificial chimeras was above 99.5%, and we subsequently used the dataset to remove nearly 1,000 compromised fungal ITS sequences from public circulation. The dataset is available at http://unite.ut.ee/repository.php and is subject to web-based third-party curation.
Nilsson, R. Henrik; Tedersoo, Leho; Ryberg, Martin; Kristiansson, Erik; Hartmann, Martin; Unterseher, Martin; Porter, Teresita M.; Bengtsson-Palme, Johan; Walker, Donald M.; de Sousa, Filipe; Gamper, Hannes Andres; Larsson, Ellen; Larsson, Karl-Henrik; Kõljalg, Urmas; Edgar, Robert C.; Abarenkov, Kessy
2015-01-01
The nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen genetic marker for the molecular identification of fungi in environmental sequencing and molecular ecology studies. Several analytical issues complicate such efforts, one of which is the formation of chimeric—artificially joined—DNA sequences during PCR amplification or sequence assembly. Several software tools are currently available for chimera detection, but rely to various degrees on the presence of a chimera-free reference dataset for optimal performance. However, no such dataset is available for use with the fungal ITS region. This study introduces a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database for the molecular identification of fungi. This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. The performance of the dataset on a large set of artificial chimeras was above 99.5%, and we subsequently used the dataset to remove nearly 1,000 compromised fungal ITS sequences from public circulation. The dataset is available at http://unite.ut.ee/repository.php and is subject to web-based third-party curation. PMID:25786896
CAMBerVis: visualization software to support comparative analysis of multiple bacterial strains.
Woźniak, Michał; Wong, Limsoon; Tiuryn, Jerzy
2011-12-01
A number of inconsistencies in genome annotations are documented among bacterial strains. Visualization of the differences may help biologists to make correct decisions in spurious cases. We have developed a visualization tool, CAMBerVis, to support comparative analysis of multiple bacterial strains. The software manages simultaneous visualization of multiple bacterial genomes, enabling visual analysis focused on genome structure annotations. The CAMBerVis software is freely available at the project website: http://bioputer.mimuw.edu.pl/camber. Input datasets for Mycobacterium tuberculosis and Staphylocacus aureus are integrated with the software as examples. m.wozniak@mimuw.edu.pl Supplementary data are available at Bioinformatics online.
pyPcazip: A PCA-based toolkit for compression and analysis of molecular simulation data
NASA Astrophysics Data System (ADS)
Shkurti, Ardita; Goni, Ramon; Andrio, Pau; Breitmoser, Elena; Bethune, Iain; Orozco, Modesto; Laughton, Charles A.
The biomolecular simulation community is currently in need of novel and optimised software tools that can analyse and process, in reasonable timescales, the large generated amounts of molecular simulation data. In light of this, we have developed and present here pyPcazip: a suite of software tools for compression and analysis of molecular dynamics (MD) simulation data. The software is compatible with trajectory file formats generated by most contemporary MD engines such as AMBER, CHARMM, GROMACS and NAMD, and is MPI parallelised to permit the efficient processing of very large datasets. pyPcazip is a Unix based open-source software (BSD licenced) written in Python.
SEER Cancer Query Systems (CanQues)
These applications provide access to cancer statistics including incidence, mortality, survival, prevalence, and probability of developing or dying from cancer. Users can display reports of the statistics or extract them for additional analyses.
Risk Factors for Cancer | Did You Know?
Age, weight, exposure to carcinogens, and genetics can increase the risk of developing cancer. Learn more from this Did You Know? video produced by NCI's Surveillance, Epidemiology, and End Results (SEER) program.
Monographs - SEER Publications
In-depth publications on topics in cancer statistics, including collaborative staging and registry data, cancer survival from a policy and clinical perspective, a description of cancer in American Indians/Alaska Natives, and measures of health disparities.
interPopula: a Python API to access the HapMap Project dataset
2010-01-01
Background The HapMap project is a publicly available catalogue of common genetic variants that occur in humans, currently including several million SNPs across 1115 individuals spanning 11 different populations. This important database does not provide any programmatic access to the dataset, furthermore no standard relational database interface is provided. Results interPopula is a Python API to access the HapMap dataset. interPopula provides integration facilities with both the Python ecology of software (e.g. Biopython and matplotlib) and other relevant human population datasets (e.g. Ensembl gene annotation and UCSC Known Genes). A set of guidelines and code examples to address possible inconsistencies across heterogeneous data sources is also provided. Conclusions interPopula is a straightforward and flexible Python API that facilitates the construction of scripts and applications that require access to the HapMap dataset. PMID:21210977
A three-dimensional multivariate representation of atmospheric variability
NASA Astrophysics Data System (ADS)
Žagar, Nedjeljka; Jelić, Damjan; Blaauw, Marten; Jesenko, Blaž
2016-04-01
A recently developed MODES software has been applied to the ECMWF analyses and forecasts and to several reanalysis datasets to describe the global variability of the balanced and inertio-gravity (IG) circulation across many scales by considering both mass and wind field and the whole model depth. In particular, the IG spectrum, which has only recently become observable in global datasets, can be studied simultaneously in the mass field and wind field and considering the whole model depth. MODES is open-access software that performs the normal-mode function decomposition of the 3D global datasets. Its application to the ERA Interim dataset reveals several aspects of the large-scale circulation after it has been partitioned into the linearly balanced and IG components. The global energy distribution is dominated by the balanced energy while the IG modes contribute around 8% of the total wave energy. However, on subsynoptic scales IG energy dominates and it is associated with the main features of tropical variability on all scales. The presented energy distribution and features of the zonally-averaged and equatorial circulation provide a reference for the intercomparison of several reanalysis datasets and for the validation of climate models. Features of the global IG circulation are compared in ERA Interim, MERRA and JRA reanalysis datasets and in several CMIP5 models. Since October 2014 the operational medium-range forecasts of the European Centre for Medium-Range Weather Forecasts (ECMWF) have been analyzed by MODES daily and an online archive of all the outputs is available at http://meteo.fmf.uni-lj.si/MODES. New outputs are made available daily based on the 00 UTC run and subsequent 12-hour forecasts up to 240-hour forecast. In addition to the energy spectra and horizontal circulation on selected levels for the balanced and IG components, the equatorial Kelvin waves are presented in time and space as the most energetic tropical IG modes propagating vertically and along the equator from its main generation regions in the upper troposphere over the Indian and Pacific region. The validation of the 10-day ECMWF forecasts with analyses in the modal space suggests a lack of variability in the tropics in the medium range. Reference: Žagar, N. et al., 2015: Normal-mode function representation of global 3-D data sets: open-access software for the atmospheric research community. Geosci. Model Dev., 8, 1169-1195, doi:10.5194/gmd-8-1169-2015 Žagar, N., R. Buizza, and J. Tribbia, 2015: A three-dimensional multivariate modal analysis of atmospheric predictability with application to the ECMWF ensemble. J. Atmos. Sci., 72, 4423-4444 The MODES software is available from http://meteo.fmf.uni-lj.si/MODES.
Use of Electronic Health-Related Datasets in Nursing and Health-Related Research.
Al-Rawajfah, Omar M; Aloush, Sami; Hewitt, Jeanne Beauchamp
2015-07-01
Datasets of gigabyte size are common in medical sciences. There is increasing consensus that significant untapped knowledge lies hidden in these large datasets. This review article aims to discuss Electronic Health-Related Datasets (EHRDs) in terms of types, features, advantages, limitations, and possible use in nursing and health-related research. Major scientific databases, MEDLINE, ScienceDirect, and Scopus, were searched for studies or review articles regarding using EHRDs in research. A total number of 442 articles were located. After application of study inclusion criteria, 113 articles were included in the final review. EHRDs were categorized into Electronic Administrative Health-Related Datasets and Electronic Clinical Health-Related Datasets. Subcategories of each major category were identified. EHRDs are invaluable assets for nursing the health-related research. Advanced research skills such as using analytical softwares, advanced statistical procedures, dealing with missing data and missing variables will maximize the efficient utilization of EHRDs in research. © The Author(s) 2014.
Golla, Gowtham Kumar; Carlson, Jordan A; Huan, Jun; Kerr, Jacqueline; Mitchell, Tarrah; Borner, Kelsey
2016-10-01
Sedentary behavior of youth is an important determinant of health. However, better measures are needed to improve understanding of this relationship and the mechanisms at play, as well as to evaluate health promotion interventions. Wearable accelerometers are considered as the standard for assessing physical activity in research, but do not perform well for assessing posture (i.e., sitting vs. standing), a critical component of sedentary behavior. The machine learning algorithms that we propose for assessing sedentary behavior will allow us to re-examine existing accelerometer data to better understand the association between sedentary time and health in various populations. We collected two datasets, a laboratory-controlled dataset and a free-living dataset. We trained machine learning classifiers separately on each dataset and compared performance across datasets. The classifiers predict five postures: sit, stand, sit-stand, stand-sit, and stand\\walk. We compared a manually constructed Hidden Markov model (HMM) with an automated HMM from existing software. The manually constructed HMM gave more F1-Macro score on both datasets.
NASA Technical Reports Server (NTRS)
Minow, Joseph I.; Altstatt, Richard L.; Skipworth, William C.
2007-01-01
The Genesis spacecraft launched on 8 August 2001 sampled solar wind environments at L1 from 2001 to 2004. After the Science Capsule door was opened, numerous foils and samples were exposed to the various solar wind environments during periods including slow solar wind from the streamer belts, fast solar wind flows from coronal holes, and coronal mass ejections. The Survey and Examination of Eroded Returned Surfaces (SEERS) program led by NASA's Space Environments and Effects program had initiated access for the space materials community to the remaining Science Capsule hardware after the science samples had been removed for evaluation of materials exposure to the space environment. This presentation will describe the process used to generate a reference radiation Genesis Radiation Environment developed for the SEERS program for use by the materials science community in their analyses of the Genesis hardware.
Schootman, Mario; Jeffe, Donna B; Lian, Min; Gillanders, William E; Aft, Rebecca
2009-03-01
The authors examined disparities in survival among women aged 66 years or older in association with census-tract-level poverty rate, racial distribution, and individual-level factors, including patient-, treatment-, and tumor-related factors, utilization of medical care, and mammography use. They used linked data from the 1992-1999 Surveillance, Epidemiology, and End Results (SEER) programs, 1991-1999 Medicare claims, and the 1990 US Census. A geographic information system and advanced statistics identified areas of increased or reduced breast cancer survival and possible reasons for geographic variation in survival in 2 of the 5 SEER areas studied. In the Detroit, Michigan, area, one geographic cluster of shorter-than-expected breast cancer survival was identified (hazard ratio (HR) = 1.60). An additional area where survival was longer than expected approached statistical significance (HR = 0.4; P = 0.056). In the Atlanta, Georgia, area, one cluster of shorter- (HR = 1.81) and one cluster of longer-than-expected (HR = 0.72) breast cancer survival were identified. Stage at diagnosis and census-tract poverty (and patient's race in Atlanta) explained the geographic variation in breast cancer survival. No geographic clusters were identified in the 3 other SEER programs. Interventions to reduce late-stage breast cancer, focusing on areas of high poverty and targeting African Americans, may reduce disparities in breast cancer survival in the Detroit and Atlanta areas.
Moore, Eider B; Poliakov, Andrew V; Lincoln, Peter; Brinkley, James F
2007-01-01
Background Three-dimensional (3-D) visualization of multimodality neuroimaging data provides a powerful technique for viewing the relationship between structure and function. A number of applications are available that include some aspect of 3-D visualization, including both free and commercial products. These applications range from highly specific programs for a single modality, to general purpose toolkits that include many image processing functions in addition to visualization. However, few if any of these combine both stand-alone and remote multi-modality visualization in an open source, portable and extensible tool that is easy to install and use, yet can be included as a component of a larger information system. Results We have developed a new open source multimodality 3-D visualization application, called MindSeer, that has these features: integrated and interactive 3-D volume and surface visualization, Java and Java3D for true cross-platform portability, one-click installation and startup, integrated data management to help organize large studies, extensibility through plugins, transparent remote visualization, and the ability to be integrated into larger information management systems. We describe the design and implementation of the system, as well as several case studies that demonstrate its utility. These case studies are available as tutorials or demos on the associated website: . Conclusion MindSeer provides a powerful visualization tool for multimodality neuroimaging data. Its architecture and unique features also allow it to be extended into other visualization domains within biomedicine. PMID:17937818
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thomas, Mathew; Marshall, Matthew J.; Miller, Erin A.
2014-08-26
Understanding the interactions of structured communities known as “biofilms” and other complex matrixes is possible through the X-ray micro tomography imaging of the biofilms. Feature detection and image processing for this type of data focuses on efficiently identifying and segmenting biofilms and bacteria in the datasets. The datasets are very large and often require manual interventions due to low contrast between objects and high noise levels. Thus new software is required for the effectual interpretation and analysis of the data. This work specifies the evolution and application of the ability to analyze and visualize high resolution X-ray micro tomography datasets.
Parallel Planes Information Visualization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bush, Brian
2015-12-26
This software presents a user-provided multivariate dataset as an interactive three dimensional visualization so that the user can explore the correlation between variables in the observations and the distribution of observations among the variables.
Image Classification Workflow Using Machine Learning Methods
NASA Astrophysics Data System (ADS)
Christoffersen, M. S.; Roser, M.; Valadez-Vergara, R.; Fernández-Vega, J. A.; Pierce, S. A.; Arora, R.
2016-12-01
Recent increases in the availability and quality of remote sensing datasets have fueled an increasing number of scientifically significant discoveries based on land use classification and land use change analysis. However, much of the software made to work with remote sensing data products, specifically multispectral images, is commercial and often prohibitively expensive. The free to use solutions that are currently available come bundled up as small parts of much larger programs that are very susceptible to bugs and difficult to install and configure. What is needed is a compact, easy to use set of tools to perform land use analysis on multispectral images. To address this need, we have developed software using the Python programming language with the sole function of land use classification and land use change analysis. We chose Python to develop our software because it is relatively readable, has a large body of relevant third party libraries such as GDAL and Spectral Python, and is free to install and use on Windows, Linux, and Macintosh operating systems. In order to test our classification software, we performed a K-means unsupervised classification, Gaussian Maximum Likelihood supervised classification, and a Mahalanobis Distance based supervised classification. The images used for testing were three Landsat rasters of Austin, Texas with a spatial resolution of 60 meters for the years of 1984 and 1999, and 30 meters for the year 2015. The testing dataset was easily downloaded using the Earth Explorer application produced by the USGS. The software should be able to perform classification based on any set of multispectral rasters with little to no modification. Our software makes the ease of land use classification using commercial software available without an expensive license.
3D Printing of CT Dataset: Validation of an Open Source and Consumer-Available Workflow.
Bortolotto, Chandra; Eshja, Esmeralda; Peroni, Caterina; Orlandi, Matteo A; Bizzotto, Nicola; Poggi, Paolo
2016-02-01
The broad availability of cheap three-dimensional (3D) printing equipment has raised the need for a thorough analysis on its effects on clinical accuracy. Our aim is to determine whether the accuracy of 3D printing process is affected by the use of a low-budget workflow based on open source software and consumer's commercially available 3D printers. A group of test objects was scanned with a 64-slice computed tomography (CT) in order to build their 3D copies. CT datasets were elaborated using a software chain based on three free and open source software. Objects were printed out with a commercially available 3D printer. Both the 3D copies and the test objects were measured using a digital professional caliper. Overall, the objects' mean absolute difference between test objects and 3D copies is 0.23 mm and the mean relative difference amounts to 0.55 %. Our results demonstrate that the accuracy of 3D printing process remains high despite the use of a low-budget workflow.
Tessera: Open source software for accelerated data science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sego, Landon H.; Hafen, Ryan P.; Director, Hannah M.
2014-06-30
Extracting useful, actionable information from data can be a formidable challenge for the safeguards, nonproliferation, and arms control verification communities. Data scientists are often on the “front-lines” of making sense of complex and large datasets. They require flexible tools that make it easy to rapidly reformat large datasets, interactively explore and visualize data, develop statistical algorithms, and validate their approaches—and they need to perform these activities with minimal lines of code. Existing commercial software solutions often lack extensibility and the flexibility required to address the nuances of the demanding and dynamic environments where data scientists work. To address this need,more » Pacific Northwest National Laboratory developed Tessera, an open source software suite designed to enable data scientists to interactively perform their craft at the terabyte scale. Tessera automatically manages the complicated tasks of distributed storage and computation, empowering data scientists to do what they do best: tackling critical research and mission objectives by deriving insight from data. We illustrate the use of Tessera with an example analysis of computer network data.« less
Lee, L.; Helsel, D.
2005-01-01
Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
Data Standards for Flow Cytometry
SPIDLEN, JOSEF; GENTLEMAN, ROBERT C.; HAALAND, PERRY D.; LANGILLE, MORGAN; MEUR, NOLWENN LE; OCHS, MICHAEL F.; SCHMITT, CHARLES; SMITH, CLAYTON A.; TREISTER, ADAM S.; BRINKMAN, RYAN R.
2009-01-01
Flow cytometry (FCM) is an analytical tool widely used for cancer and HIV/AIDS research, and treatment, stem cell manipulation and detecting microorganisms in environmental samples. Current data standards do not capture the full scope of FCM experiments and there is a demand for software tools that can assist in the exploration and analysis of large FCM datasets. We are implementing a standardized approach to capturing, analyzing, and disseminating FCM data that will facilitate both more complex analyses and analysis of datasets that could not previously be efficiently studied. Initial work has focused on developing a community-based guideline for recording and reporting the details of FCM experiments. Open source software tools that implement this standard are being created, with an emphasis on facilitating reproducible and extensible data analyses. As well, tools for electronic collaboration will assist the integrated access and comprehension of experiments to empower users to collaborate on FCM analyses. This coordinated, joint development of bioinformatics standards and software tools for FCM data analysis has the potential to greatly facilitate both basic and clinical research—impacting a notably diverse range of medical and environmental research areas. PMID:16901228
Textual data compression in computational biology: a synopsis.
Giancarlo, Raffaele; Scaturro, Davide; Utro, Filippo
2009-07-01
Textual data compression, and the associated techniques coming from information theory, are often perceived as being of interest for data communication and storage. However, they are also deeply related to classification and data mining and analysis. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison and reverse engineering of biological networks. The main focus of this review is on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been used. When possible, a unifying organization of the main ideas and techniques is also provided. It goes without saying that most of the research results reviewed here offer software prototypes to the bioinformatics community. The Supplementary Material provides pointers to software and benchmark datasets for a range of applications of broad interest. In addition to provide reference to software, the Supplementary Material also gives a brief presentation of some fundamental results and techniques related to this paper. It is at: http://www.math.unipa.it/ approximately raffaele/suppMaterial/compReview/
Countering Insider Threats - Handling Insider Threats Using Dynamic, Run-Time Forensics
2007-10-01
able to handle the security policy requirements of a large organization containing many decentralized and diverse users, while being easily managed... contained in the TIF folder. Searching for any text string and sorting is supported also. The cache index file of Internet Explorer is not changed... containing thousands of malware software signatures. Separate datasets can be created for various classifications of malware such as encryption software
A software tool for determination of breast cancer treatment methods using data mining approach.
Cakır, Abdülkadir; Demirel, Burçin
2011-12-01
In this work, breast cancer treatment methods are determined using data mining. For this purpose, software is developed to help to oncology doctor for the suggestion of application of the treatment methods about breast cancer patients. 462 breast cancer patient data, obtained from Ankara Oncology Hospital, are used to determine treatment methods for new patients. This dataset is processed with Weka data mining tool. Classification algorithms are applied one by one for this dataset and results are compared to find proper treatment method. Developed software program called as "Treatment Assistant" uses different algorithms (IB1, Multilayer Perception and Decision Table) to find out which one is giving better result for each attribute to predict and by using Java Net beans interface. Treatment methods are determined for the post surgical operation of breast cancer patients using this developed software tool. At modeling step of data mining process, different Weka algorithms are used for output attributes. For hormonotherapy output IB1, for tamoxifen and radiotherapy outputs Multilayer Perceptron and for the chemotherapy output decision table algorithm shows best accuracy performance compare to each other. In conclusion, this work shows that data mining approach can be a useful tool for medical applications particularly at the treatment decision step. Data mining helps to the doctor to decide in a short time.
Data Resources | Geospatial Data Science | NREL
variety of renewable energy technologies. These datasets are designed to be used in GIS software applications. Biomass Data Geothermal Data Hydrogen Data Marine and Hydrokinetic Data Solar Data Wind Data
X-MATE: a flexible system for mapping short read data
Pearson, John V.; Cloonan, Nicole; Grimmond, Sean M.
2011-01-01
Summary: Accurate and complete mapping of short-read sequencing to a reference genome greatly enhances the discovery of biological results and improves statistical predictions. We recently presented RNA-MATE, a pipeline for the recursive mapping of RNA-Seq datasets. With the rapid increase in genome re-sequencing projects, progression of available mapping software and the evolution of file formats, we now present X-MATE, an updated version of RNA-MATE, capable of mapping both RNA-Seq and DNA datasets and with improved performance, output file formats, configuration files, and flexibility in core mapping software. Availability: Executables, source code, junction libraries, test data and results and the user manual are available from http://grimmond.imb.uq.edu.au/X-MATE/. Contact: n.cloonan@uq.edu.au; s.grimmond@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics Online. PMID:21216778
Northwestern University Schizophrenia Data and Software Tool (NUSDAST)
Wang, Lei; Kogan, Alex; Cobia, Derin; Alpert, Kathryn; Kolasny, Anthony; Miller, Michael I.; Marcus, Daniel
2013-01-01
The schizophrenia research community has invested substantial resources on collecting, managing and sharing large neuroimaging datasets. As part of this effort, our group has collected high resolution magnetic resonance (MR) datasets from individuals with schizophrenia, their non-psychotic siblings, healthy controls and their siblings. This effort has resulted in a growing resource, the Northwestern University Schizophrenia Data and Software Tool (NUSDAST), an NIH-funded data sharing project to stimulate new research. This resource resides on XNAT Central, and it contains neuroimaging (MR scans, landmarks and surface maps for deep subcortical structures, and FreeSurfer cortical parcellation and measurement data), cognitive (cognitive domain scores for crystallized intelligence, working memory, episodic memory, and executive function), clinical (demographic, sibling relationship, SAPS and SANS psychopathology), and genetic (20 polymorphisms) data, collected from more than 450 subjects, most with 2-year longitudinal follow-up. A neuroimaging mapping, analysis and visualization software tool, CAWorks, is also part of this resource. Moreover, in making our existing neuroimaging data along with the associated meta-data and computational tools publically accessible, we have established a web-based information retrieval portal that allows the user to efficiently search the collection. This research-ready dataset meaningfully combines neuroimaging data with other relevant information, and it can be used to help facilitate advancing neuroimaging research. It is our hope that this effort will help to overcome some of the commonly recognized technical barriers in advancing neuroimaging research such as lack of local organization and standard descriptions. PMID:24223551
Northwestern University Schizophrenia Data and Software Tool (NUSDAST).
Wang, Lei; Kogan, Alex; Cobia, Derin; Alpert, Kathryn; Kolasny, Anthony; Miller, Michael I; Marcus, Daniel
2013-01-01
The schizophrenia research community has invested substantial resources on collecting, managing and sharing large neuroimaging datasets. As part of this effort, our group has collected high resolution magnetic resonance (MR) datasets from individuals with schizophrenia, their non-psychotic siblings, healthy controls and their siblings. This effort has resulted in a growing resource, the Northwestern University Schizophrenia Data and Software Tool (NUSDAST), an NIH-funded data sharing project to stimulate new research. This resource resides on XNAT Central, and it contains neuroimaging (MR scans, landmarks and surface maps for deep subcortical structures, and FreeSurfer cortical parcellation and measurement data), cognitive (cognitive domain scores for crystallized intelligence, working memory, episodic memory, and executive function), clinical (demographic, sibling relationship, SAPS and SANS psychopathology), and genetic (20 polymorphisms) data, collected from more than 450 subjects, most with 2-year longitudinal follow-up. A neuroimaging mapping, analysis and visualization software tool, CAWorks, is also part of this resource. Moreover, in making our existing neuroimaging data along with the associated meta-data and computational tools publically accessible, we have established a web-based information retrieval portal that allows the user to efficiently search the collection. This research-ready dataset meaningfully combines neuroimaging data with other relevant information, and it can be used to help facilitate advancing neuroimaging research. It is our hope that this effort will help to overcome some of the commonly recognized technical barriers in advancing neuroimaging research such as lack of local organization and standard descriptions.
Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets.
Clark, Alex M; Dole, Krishna; Coulon-Spektor, Anna; McNutt, Andrew; Grass, George; Freundlich, Joel S; Reynolds, Robert C; Ekins, Sean
2015-06-22
On the order of hundreds of absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) models have been described in the literature in the past decade which are more often than not inaccessible to anyone but their authors. Public accessibility is also an issue with computational models for bioactivity, and the ability to share such models still remains a major challenge limiting drug discovery. We describe the creation of a reference implementation of a Bayesian model-building software module, which we have released as an open source component that is now included in the Chemistry Development Kit (CDK) project, as well as implemented in the CDD Vault and in several mobile apps. We use this implementation to build an array of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochemical properties. We show that these models possess cross-validation receiver operator curve values comparable to those generated previously in prior publications using alternative tools. We have now described how the implementation of Bayesian models with FCFP6 descriptors generated in the CDD Vault enables the rapid production of robust machine learning models from public data or the user's own datasets. The current study sets the stage for generating models in proprietary software (such as CDD) and exporting these models in a format that could be run in open source software using CDK components. This work also demonstrates that we can enable biocomputation across distributed private or public datasets to enhance drug discovery.
Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets
2015-01-01
On the order of hundreds of absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) models have been described in the literature in the past decade which are more often than not inaccessible to anyone but their authors. Public accessibility is also an issue with computational models for bioactivity, and the ability to share such models still remains a major challenge limiting drug discovery. We describe the creation of a reference implementation of a Bayesian model-building software module, which we have released as an open source component that is now included in the Chemistry Development Kit (CDK) project, as well as implemented in the CDD Vault and in several mobile apps. We use this implementation to build an array of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochemical properties. We show that these models possess cross-validation receiver operator curve values comparable to those generated previously in prior publications using alternative tools. We have now described how the implementation of Bayesian models with FCFP6 descriptors generated in the CDD Vault enables the rapid production of robust machine learning models from public data or the user’s own datasets. The current study sets the stage for generating models in proprietary software (such as CDD) and exporting these models in a format that could be run in open source software using CDK components. This work also demonstrates that we can enable biocomputation across distributed private or public datasets to enhance drug discovery. PMID:25994950
Development and application of GIS-based PRISM integration through a plugin approach
NASA Astrophysics Data System (ADS)
Lee, Woo-Seop; Chun, Jong Ahn; Kang, Kwangmin
2014-05-01
A PRISM (Parameter-elevation Regressions on Independent Slopes Model) QGIS-plugin was developed on Quantum GIS platform in this study. This Quantum GIS plugin system provides user-friendly graphic user interfaces (GUIs) so that users can obtain gridded meteorological data of high resolutions (1 km × 1 km). Also, this software is designed to run on a personal computer so that it does not require an internet access or a sophisticated computer system. This module is a user-friendly system that a user can generate PRISM data with ease. The proposed PRISM QGIS-plugin is a hybrid statistical-geographic model system that uses coarse resolution datasets (APHRODITE datasets in this study) with digital elevation data to generate the fine-resolution gridded precipitation. To validate the performance of the software, Prek Thnot River Basin in Kandal, Cambodia is selected for application. Overall statistical analysis shows promising outputs generated by the proposed plugin. Error measures such as RMSE (Root Mean Square Error) and MAPE (Mean Absolute Percentage Error) were used to evaluate the performance of the developed PRISM QGIS-plugin. Evaluation results using RMSE and MAPE were 2.76 mm and 4.2%, respectively. This study suggested that the plugin can be used to generate high resolution precipitation datasets for hydrological and climatological studies at a watershed where observed weather datasets are limited.
Simulating a base population in honey bee for molecular genetic studies
2012-01-01
Background Over the past years, reports have indicated that honey bee populations are declining and that infestation by an ecto-parasitic mite (Varroa destructor) is one of the main causes. Selective breeding of resistant bees can help to prevent losses due to the parasite, but it requires that a robust breeding program and genetic evaluation are implemented. Genomic selection has emerged as an important tool in animal breeding programs and simulation studies have shown that it yields more accurate breeding value estimates, higher genetic gain and low rates of inbreeding. Since genomic selection relies on marker data, simulations conducted on a genomic dataset are a pre-requisite before selection can be implemented. Although genomic datasets have been simulated in other species undergoing genetic evaluation, simulation of a genomic dataset specific to the honey bee is required since this species has a distinct genetic and reproductive biology. Our software program was aimed at constructing a base population by simulating a random mating honey bee population. A forward-time population simulation approach was applied since it allows modeling of genetic characteristics and reproductive behavior specific to the honey bee. Results Our software program yielded a genomic dataset for a base population in linkage disequilibrium. In addition, information was obtained on (1) the position of markers on each chromosome, (2) allele frequency, (3) χ2 statistics for Hardy-Weinberg equilibrium, (4) a sorted list of markers with a minor allele frequency less than or equal to the input value, (5) average r2 values of linkage disequilibrium between all simulated marker loci pair for all generations and (6) average r2 value of linkage disequilibrium in the last generation for selected markers with the highest minor allele frequency. Conclusion We developed a software program that takes into account the genetic and reproductive biology specific to the honey bee and that can be used to constitute a genomic dataset compatible with the simulation studies necessary to optimize breeding programs. The source code together with an instruction file is freely accessible at http://msproteomics.org/Research/Misc/honeybeepopulationsimulator.html PMID:22520469
Simulating a base population in honey bee for molecular genetic studies.
Gupta, Pooja; Conrad, Tim; Spötter, Andreas; Reinsch, Norbert; Bienefeld, Kaspar
2012-06-27
Over the past years, reports have indicated that honey bee populations are declining and that infestation by an ecto-parasitic mite (Varroa destructor) is one of the main causes. Selective breeding of resistant bees can help to prevent losses due to the parasite, but it requires that a robust breeding program and genetic evaluation are implemented. Genomic selection has emerged as an important tool in animal breeding programs and simulation studies have shown that it yields more accurate breeding value estimates, higher genetic gain and low rates of inbreeding. Since genomic selection relies on marker data, simulations conducted on a genomic dataset are a pre-requisite before selection can be implemented. Although genomic datasets have been simulated in other species undergoing genetic evaluation, simulation of a genomic dataset specific to the honey bee is required since this species has a distinct genetic and reproductive biology. Our software program was aimed at constructing a base population by simulating a random mating honey bee population. A forward-time population simulation approach was applied since it allows modeling of genetic characteristics and reproductive behavior specific to the honey bee. Our software program yielded a genomic dataset for a base population in linkage disequilibrium. In addition, information was obtained on (1) the position of markers on each chromosome, (2) allele frequency, (3) χ(2) statistics for Hardy-Weinberg equilibrium, (4) a sorted list of markers with a minor allele frequency less than or equal to the input value, (5) average r(2) values of linkage disequilibrium between all simulated marker loci pair for all generations and (6) average r2 value of linkage disequilibrium in the last generation for selected markers with the highest minor allele frequency. We developed a software program that takes into account the genetic and reproductive biology specific to the honey bee and that can be used to constitute a genomic dataset compatible with the simulation studies necessary to optimize breeding programs. The source code together with an instruction file is freely accessible at http://msproteomics.org/Research/Misc/honeybeepopulationsimulator.html.
A new dataset validation system for the Planetary Science Archive
NASA Astrophysics Data System (ADS)
Manaud, N.; Zender, J.; Heather, D.; Martinez, S.
2007-08-01
The Planetary Science Archive is the official archive for the Mars Express mission. It has received its first data by the end of 2004. These data are delivered by the PI teams to the PSA team as datasets, which are formatted conform to the Planetary Data System (PDS). The PI teams are responsible for analyzing and calibrating the instrument data as well as the production of reduced and calibrated data. They are also responsible of the scientific validation of these data. ESA is responsible of the long-term data archiving and distribution to the scientific community and must ensure, in this regard, that all archived products meet quality. To do so, an archive peer-review is used to control the quality of the Mars Express science data archiving process. However a full validation of its content is missing. An independent review board recently recommended that the completeness of the archive as well as the consistency of the delivered data should be validated following well-defined procedures. A new validation software tool is being developed to complete the overall data quality control system functionality. This new tool aims to improve the quality of data and services provided to the scientific community through the PSA, and shall allow to track anomalies in and to control the completeness of datasets. It shall ensure that the PSA end-users: (1) can rely on the result of their queries, (2) will get data products that are suitable for scientific analysis, (3) can find all science data acquired during a mission. We defined dataset validation as the verification and assessment process to check the dataset content against pre-defined top-level criteria, which represent the general characteristics of good quality datasets. The dataset content that is checked includes the data and all types of information that are essential in the process of deriving scientific results and those interfacing with the PSA database. The validation software tool is a multi-mission tool that has been designed to provide the user with the flexibility of defining and implementing various types of validation criteria, to iteratively and incrementally validate datasets, and to generate validation reports.
PyPWA: A partial-wave/amplitude analysis software framework
NASA Astrophysics Data System (ADS)
Salgado, Carlos
2016-05-01
The PyPWA project aims to develop a software framework for Partial Wave and Amplitude Analysis of data; providing the user with software tools to identify resonances from multi-particle final states in photoproduction. Most of the code is written in Python. The software is divided into two main branches: one general-shell where amplitude's parameters (or any parametric model) are to be estimated from the data. This branch also includes software to produce simulated data-sets using the fitted amplitudes. A second branch contains a specific realization of the isobar model (with room to include Deck-type and other isobar model extensions) to perform PWA with an interface into the computer resources at Jefferson Lab. We are currently implementing parallelism and vectorization using the Intel's Xeon Phi family of coprocessors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
P-Mart was designed specifically to allow cancer researchers to perform robust statistical processing of publicly available cancer proteomic datasets. To date an online statistical processing suite for proteomics does not exist. The P-Mart software is designed to allow statistical programmers to utilize these algorithms through packages in the R programming language as well as offering a web-based interface using the Azure cloud technology. The Azure cloud technology also allows the release of the software via Docker containers.
Multiple Primary and Histology Coding Rules - SEER
Download the coding manual and training resources for cases diagnosed from 2007 to 2017. Sites included are lung, breast, colon, melanoma of the skin, head and neck, kidney, renal pelvis/ureter/bladder, benign brain, and malignant brain.
Hematopoietic Project - SEER Registrars
Use this manual and corresponding database for coding cases diagnosed January 1, 2010 and forward. The changes do not require recoding of old cases. Contains data collection rules for hematopoietic and lymphoid neoplasms (2010+). Access a database and coding manual.
MICCA: a complete and accurate software for taxonomic profiling of metagenomic data.
Albanese, Davide; Fontana, Paolo; De Filippo, Carlotta; Cavalieri, Duccio; Donati, Claudio
2015-05-19
The introduction of high throughput sequencing technologies has triggered an increase of the number of studies in which the microbiota of environmental and human samples is characterized through the sequencing of selected marker genes. While experimental protocols have undergone a process of standardization that makes them accessible to a large community of scientist, standard and robust data analysis pipelines are still lacking. Here we introduce MICCA, a software pipeline for the processing of amplicon metagenomic datasets that efficiently combines quality filtering, clustering of Operational Taxonomic Units (OTUs), taxonomy assignment and phylogenetic tree inference. MICCA provides accurate results reaching a good compromise among modularity and usability. Moreover, we introduce a de-novo clustering algorithm specifically designed for the inference of Operational Taxonomic Units (OTUs). Tests on real and synthetic datasets shows that thanks to the optimized reads filtering process and to the new clustering algorithm, MICCA provides estimates of the number of OTUs and of other common ecological indices that are more accurate and robust than currently available pipelines. Analysis of public metagenomic datasets shows that the higher consistency of results improves our understanding of the structure of environmental and human associated microbial communities. MICCA is an open source project.
bioWeb3D: an online webGL 3D data visualisation tool
2013-01-01
Background Data visualization is critical for interpreting biological data. However, in practice it can prove to be a bottleneck for non trained researchers; this is especially true for three dimensional (3D) data representation. Whilst existing software can provide all necessary functionalities to represent and manipulate biological 3D datasets, very few are easily accessible (browser based), cross platform and accessible to non-expert users. Results An online HTML5/WebGL based 3D visualisation tool has been developed to allow biologists to quickly and easily view interactive and customizable three dimensional representations of their data along with multiple layers of information. Using the WebGL library Three.js written in Javascript, bioWeb3D allows the simultaneous visualisation of multiple large datasets inputted via a simple JSON, XML or CSV file, which can be read and analysed locally thanks to HTML5 capabilities. Conclusions Using basic 3D representation techniques in a technologically innovative context, we provide a program that is not intended to compete with professional 3D representation software, but that instead enables a quick and intuitive representation of reasonably large 3D datasets. PMID:23758781
MICCA: a complete and accurate software for taxonomic profiling of metagenomic data
Albanese, Davide; Fontana, Paolo; De Filippo, Carlotta; Cavalieri, Duccio; Donati, Claudio
2015-01-01
The introduction of high throughput sequencing technologies has triggered an increase of the number of studies in which the microbiota of environmental and human samples is characterized through the sequencing of selected marker genes. While experimental protocols have undergone a process of standardization that makes them accessible to a large community of scientist, standard and robust data analysis pipelines are still lacking. Here we introduce MICCA, a software pipeline for the processing of amplicon metagenomic datasets that efficiently combines quality filtering, clustering of Operational Taxonomic Units (OTUs), taxonomy assignment and phylogenetic tree inference. MICCA provides accurate results reaching a good compromise among modularity and usability. Moreover, we introduce a de-novo clustering algorithm specifically designed for the inference of Operational Taxonomic Units (OTUs). Tests on real and synthetic datasets shows that thanks to the optimized reads filtering process and to the new clustering algorithm, MICCA provides estimates of the number of OTUs and of other common ecological indices that are more accurate and robust than currently available pipelines. Analysis of public metagenomic datasets shows that the higher consistency of results improves our understanding of the structure of environmental and human associated microbial communities. MICCA is an open source project. PMID:25988396
Processing and population genetic analysis of multigenic datasets with ProSeq3 software.
Filatov, Dmitry A
2009-12-01
The current tendency in molecular population genetics is to use increasing numbers of genes in the analysis. Here I describe a program for handling and population genetic analysis of DNA polymorphism data collected from multiple genes. The program includes a sequence/alignment editor and an internal relational database that simplify the preparation and manipulation of multigenic DNA polymorphism datasets. The most commonly used DNA polymorphism analyses are implemented in ProSeq3, facilitating population genetic analysis of large multigenic datasets. Extensive input/output options make ProSeq3 a convenient hub for sequence data processing and analysis. The program is available free of charge from http://dps.plants.ox.ac.uk/sequencing/proseq.htm.
Sustainable Data Evolution Technology for Power Grid Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
The SDET Tool is used to create open-access power grid data sets and facilitate updates of these data sets by the community. Pacific Northwest National Laboratory (PNNL) and its power industry and software vendor partners are developing an innovative sustainable data evolution technology (SDET) to create open-access power grid datasets and facilitate updates to these datasets by the power grid community. The objective is to make this a sustained effort within and beyond the ARPA-E GRID DATA program so that the datasets can evolve over time and meet the current and future needs for power grid optimization and potentially othermore » applications in power grid operation and planning.« less
GODIVA2: interactive visualization of environmental data on the Web.
Blower, J D; Haines, K; Santokhee, A; Liu, C L
2009-03-13
GODIVA2 is a dynamic website that provides visual access to several terabytes of physically distributed, four-dimensional environmental data. It allows users to explore large datasets interactively without the need to install new software or download and understand complex data. Through the use of open international standards, GODIVA2 maintains a high level of interoperability with third-party systems, allowing diverse datasets to be mutually compared. Scientists can use the system to search for features in large datasets and to diagnose the output from numerical simulations and data processing algorithms. Data providers around Europe have adopted GODIVA2 as an INSPIRE-compliant dynamic quick-view system for providing visual access to their data.
TeraStitcher - A tool for fast automatic 3D-stitching of teravoxel-sized microscopy images
2012-01-01
Background Further advances in modern microscopy are leading to teravoxel-sized tiled 3D images at high resolution, thus increasing the dimension of the stitching problem of at least two orders of magnitude. The existing software solutions do not seem adequate to address the additional requirements arising from these datasets, such as the minimization of memory usage and the need to process just a small portion of data. Results We propose a free and fully automated 3D Stitching tool designed to match the special requirements coming out of teravoxel-sized tiled microscopy images that is able to stitch them in a reasonable time even on workstations with limited resources. The tool was tested on teravoxel-sized whole mouse brain images with micrometer resolution and it was also compared with the state-of-the-art stitching tools on megavoxel-sized publicy available datasets. This comparison confirmed that the solutions we adopted are suited for stitching very large images and also perform well on datasets with different characteristics. Indeed, some of the algorithms embedded in other stitching tools could be easily integrated in our framework if they turned out to be more effective on other classes of images. To this purpose, we designed a software architecture which separates the strategies that use efficiently memory resources from the algorithms which may depend on the characteristics of the acquired images. Conclusions TeraStitcher is a free tool that enables the stitching of Teravoxel-sized tiled microscopy images even on workstations with relatively limited resources of memory (<8 GB) and processing power. It exploits the knowledge of approximate tile positions and uses ad-hoc strategies and algorithms designed for such very large datasets. The produced images can be saved into a multiresolution representation to be efficiently retrieved and processed. We provide TeraStitcher both as standalone application and as plugin of the free software Vaa3D. PMID:23181553
Clearing your Desk! Software and Data Services for Collaborative Web Based GIS Analysis
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Idaszak, R.; Horsburgh, J. S.; Ames, D. P.; Goodall, J. L.; Band, L. E.; Merwade, V.; Couch, A.; Hooper, R. P.; Maidment, D. R.; Dash, P. K.; Stealey, M.; Yi, H.; Gan, T.; Gichamo, T.; Yildirim, A. A.; Liu, Y.
2015-12-01
Can your desktop computer crunch the large GIS datasets that are becoming increasingly common across the geosciences? Do you have access to or the know-how to take advantage of advanced high performance computing (HPC) capability? Web based cyberinfrastructure takes work off your desk or laptop computer and onto infrastructure or "cloud" based data and processing servers. This talk will describe the HydroShare collaborative environment and web based services being developed to support the sharing and processing of hydrologic data and models. HydroShare supports the upload, storage, and sharing of a broad class of hydrologic data including time series, geographic features and raster datasets, multidimensional space-time data, and other structured collections of data. Web service tools and a Python client library provide researchers with access to HPC resources without requiring them to become HPC experts. This reduces the time and effort spent in finding and organizing the data required to prepare the inputs for hydrologic models and facilitates the management of online data and execution of models on HPC systems. This presentation will illustrate the use of web based data and computation services from both the browser and desktop client software. These web-based services implement the Terrain Analysis Using Digital Elevation Model (TauDEM) tools for watershed delineation, generation of hydrology-based terrain information, and preparation of hydrologic model inputs. They allow users to develop scripts on their desktop computer that call analytical functions that are executed completely in the cloud, on HPC resources using input datasets stored in the cloud, without installing specialized software, learning how to use HPC, or transferring large datasets back to the user's desktop. These cases serve as examples for how this approach can be extended to other models to enhance the use of web and data services in the geosciences.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kronewitter, Scott R.; Slysz, Gordon W.; Marginean, Ioan
2014-05-31
Dense LC-MS datasets have convoluted extracted ion chromatograms with multiple chromatographic peaks that cloud the differentiation between intact compounds with their overlapping isotopic distributions, peaks due to insource ion fragmentation, and noise. Making this differentiation is critical in glycomics datasets because chromatographic peaks correspond to different intact glycan structural isomers. The GlyQ-IQ software is targeted chromatography centric software designed for chromatogram and mass spectra data processing and subsequent glycan composition annotation. The targeted analysis approach offers several key advantages to LC-MS data processing and annotation over traditional algorithms. A priori information about the individual target’s elemental composition allows for exactmore » isotope profile modeling for improved feature detection and increased sensitivity by focusing chromatogram generation and peak fitting on the isotopic species in the distribution having the highest intensity and data quality. Glycan target annotation is corroborated by glycan family relationships and in source fragmentation detection. The GlyQ-IQ software is developed in this work (Part 1) and was used to profile N-glycan compositions from human serum LC-MS Datasets. The companion manuscript GlyQ-IQ Part 2 discusses developments in human serum N-glycan sample preparation, glycan isomer separation, and glycan electrospray ionization. A case study is presented to demonstrate how GlyQ-IQ identifies and removes confounding chromatographic peaks from high mannose glycan isomers from human blood serum. In addition, GlyQ-IQ was used to generate a broad N-glycan profile from a high resolution (100K/60K) nESI-LS-MS/MS dataset including CID and HCD fragmentation acquired on a Velos Pro Mass spectrometer. 101 glycan compositions and 353 isomer peaks were detected from a single sample. 99% of the GlyQ-IQ glycan-feature assignments passed manual validation and are backed with high resolution mass spectra and mass accuracies less than 7 ppm.« less
A Secure Architecture to Provide a Medical Emergency Dataset for Patients in Germany and Abroad.
Storck, Michael; Wohlmann, Jan; Krudwig, Sarah; Vogel, Alexander; Born, Judith; Weber, Thomas; Dugas, Martin; Juhra, Christian
2017-01-01
The ongoing fragmentation of medical care and mobility of patients severely restrains exchange of lifesaving information about patient's medical history in case of emergencies. Therefore, the objective of this work is to offer a secure technical solution to supply medical professionals with emergency-relevant information concerning the current patient via mobile accessibility. To achieve this goal, the official national emergency data set was extended by additional features to form a patient summary for emergencies, a software architecture was developed and data security and data protection issues were taken into account. The patient has sovereignty over his/her data and can therefore decide who has access to or can change his/her stored data, but the treating physician composes the validated dataset. Building upon the introduced concept, future activities are the development of user-interfaces for the software components of the different user groups as well as functioning prototypes for upcoming field tests.
P-MartCancer-Interactive Online Software to Enable Analysis of Shotgun Cancer Proteomic Datasets.
Webb-Robertson, Bobbie-Jo M; Bramer, Lisa M; Jensen, Jeffrey L; Kobold, Markus A; Stratton, Kelly G; White, Amanda M; Rodland, Karin D
2017-11-01
P-MartCancer is an interactive web-based software environment that enables statistical analyses of peptide or protein data, quantitated from mass spectrometry-based global proteomics experiments, without requiring in-depth knowledge of statistical programming. P-MartCancer offers a series of statistical modules associated with quality assessment, peptide and protein statistics, protein quantification, and exploratory data analyses driven by the user via customized workflows and interactive visualization. Currently, P-MartCancer offers access and the capability to analyze multiple cancer proteomic datasets generated through the Clinical Proteomics Tumor Analysis Consortium at the peptide, gene, and protein levels. P-MartCancer is deployed as a web service (https://pmart.labworks.org/cptac.html), alternatively available via Docker Hub (https://hub.docker.com/r/pnnl/pmart-web/). Cancer Res; 77(21); e47-50. ©2017 AACR . ©2017 American Association for Cancer Research.
Kang, Dongwan D.; Froula, Jeff; Egan, Rob; ...
2015-01-01
Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. Lastly, it automatically formsmore » hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.« less
Statistical evaluation of manual segmentation of a diffuse low-grade glioma MRI dataset.
Ben Abdallah, Meriem; Blonski, Marie; Wantz-Mezieres, Sophie; Gaudeau, Yann; Taillandier, Luc; Moureaux, Jean-Marie
2016-08-01
Software-based manual segmentation is critical to the supervision of diffuse low-grade glioma patients and to the optimal treatment's choice. However, manual segmentation being time-consuming, it is difficult to include it in the clinical routine. An alternative to circumvent the time cost of manual segmentation could be to share the task among different practitioners, providing it can be reproduced. The goal of our work is to assess diffuse low-grade gliomas' manual segmentation's reproducibility on MRI scans, with regard to practitioners, their experience and field of expertise. A panel of 13 experts manually segmented 12 diffuse low-grade glioma clinical MRI datasets using the OSIRIX software. A statistical analysis gave promising results, as the practitioner factor, the medical specialty and the years of experience seem to have no significant impact on the average values of the tumor volume variable.
Alford, Sharon Hensley; Schwartz, Kendra; Soliman, Amr; Johnson, Christine Cole; Gruber, Stephen B.; Merajver, Sofia D.
2009-01-01
Background Data from Arab world studies suggest that Arab women may experience a more aggressive breast cancer phenotype. To investigate this finding, we focused on one of the largest settlements of Arabs and Iraqi Christians (Chaldeans) in the US, metropolitan Detroit- a SEER reporting site since 1973. Materials and Methods We identified a cohort of primary breast cancer cases diagnosed 1973–2003. Using a validated name algorithm, women were identified as being of Arab/Chaldean descent if they had an Arab last or maiden name. We compared characteristics at diagnosis (age, grade, histology, SEER stage, and marker status) and overall survival between Arab-, European-, and African-Americans. Results The cohort included 1,652 (2%) women of Arab descent, 13,855 (18%) African-American women, and 63,615 (80%) European-American. There were statistically significant differences between the racial groups for all characteristics at diagnosis. Survival analyses overall and for each SEER stage showed that Arab-American women had the best survival, followed by European-American women. African-American women had the poorest overall survival and were 1.37 (95% confidence interval: 1.23–1.52) times more likely to be diagnosed with an aggressive tumor (adjusting for age, grade, marker status, and year of diagnosis). Conclusion Overall, Arab-American women have a distribution of breast cancer histology similar to European-American women. In contrast, the stage, age, and hormone receptor status at diagnosis among Arab-Americans was more similar to African-American women. However, Arab-American women have a better overall survival than even European-American women. PMID:18415013
Hensley Alford, Sharon; Schwartz, Kendra; Soliman, Amr; Johnson, Christine Cole; Gruber, Stephen B; Merajver, Sofia D
2009-03-01
Data from Arab world studies suggest that Arab women may experience a more aggressive breast cancer phenotype. To investigate this finding, we focused on one of the largest settlements of Arabs and Iraqi Christians (Chaldeans) in the US, metropolitan Detroit- a SEER reporting site since 1973. We identified a cohort of primary breast cancer cases diagnosed 1973-2003. Using a validated name algorithm, women were identified as being of Arab/Chaldean descent if they had an Arab last or maiden name. We compared characteristics at diagnosis (age, grade, histology, SEER stage, and marker status) and overall survival between Arab-, European-, and African-Americans. The cohort included 1,652 (2%) women of Arab descent, 13,855 (18%) African-American women, and 63,615 (80%) European-American women. There were statistically significant differences between the racial groups for all characteristics at diagnosis. Survival analyses overall and for each SEER stage showed that Arab-American women had the best survival, followed by European-American women. African-American women had the poorest overall survival and were 1.37 (95% confidence interval: 1.23-1.52) times more likely to be diagnosed with an aggressive tumor (adjusting for age, grade, marker status, and year of diagnosis). Overall, Arab-American women have a distribution of breast cancer histology similar to European-American women. In contrast, the stage, age, and hormone receptor status at diagnosis among Arab-Americans was more similar to African-American women. However, Arab-American women have a better overall survival than even European-American women.
DOIDB: Reusing DataCite's search software as metadata portal for GFZ Data Services
NASA Astrophysics Data System (ADS)
Elger, K.; Ulbricht, D.; Bertelmann, R.
2016-12-01
GFZ Data Services is the central service point for the publication of research data at the Helmholtz Centre Potsdam GFZ German Research Centre for Geosciences (GFZ). It provides data publishing services to scientists of GFZ, associated projects, and associated institutions. The publishing services aim to make research data and physical samples visible and citable, by assigning persistent identifiers (DOI, IGSN) and by complementing existing IT infrastructure. To integrate several research domains a modular software stack that is made of free software components has been created to manage data and metadata as well as register persistent identifiers [1]. Pivotal component for the registration of DOIs is the DOIDB. It has been derived from three software components provided by DataCite [2] that moderate the registration of DOIs and the deposition of metadata, allow the dissemination of metadata, and provide a user interface to navigate and discover datasets. The DOIDB acts as a proxy to the DataCite infrastructure and in addition to the DataCite metadata schema, it allows to deposit and disseminate metadata following the schemas ISO19139 and NASA GCMD DIF. The search component has been modified to meet the requirements of a geosciences metadata portal. In particular, the search component has been altered to make use of Apache SOLRs capability to index and query spatial coordinates. Furthermore, the user interface has been adjusted to provide a first impression of the data by showing a map, summary information and subjects. DOIDB and its components are available on GitHub [3].We present a software solution for registration of DOIs that allows to integrate existing data systems, keeps track of registered DOIs, and provides a metadata portal to discover datasets [4]. [1] Ulbricht, D.; Elger, K.; Bertelmann, R.; Klump, J. panMetaDocs, eSciDoc, and DOIDB—An Infrastructure for the Curation and Publication of File-Based Datasets for GFZ Data Services. ISPRS Int. J. Geo-Inf. 2016, 5, 25. http://doi.org/10.3390/ijgi5030025[2] https://github.com/datacite[3] https://github.com/ulbricht/search/tree/doidb , https://github.com/ulbricht/mds/tree/doidb , https://github.com/ulbricht/oaip/tree/doidb[4] http://doidb.wdc-terra.org
Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models
Stephens, Zachary D.; Hudson, Matthew E.; Mainzer, Liudmila S.; Taschuk, Morgan; Weber, Matthew R.; Iyer, Ravishankar K.
2016-01-01
An obstacle to validating and benchmarking methods for genome analysis is that there are few reference datasets available for which the “ground truth” about the mutational landscape of the sample genome is known and fully validated. Additionally, the free and public availability of real human genome datasets is incompatible with the preservation of donor privacy. In order to better analyze and understand genomic data, we need test datasets that model all variants, reflecting known biology as well as sequencing artifacts. Read simulators can fulfill this requirement, but are often criticized for limited resemblance to true data and overall inflexibility. We present NEAT (NExt-generation sequencing Analysis Toolkit), a set of tools that not only includes an easy-to-use read simulator, but also scripts to facilitate variant comparison and tool evaluation. NEAT has a wide variety of tunable parameters which can be set manually on the default model or parameterized using real datasets. The software is freely available at github.com/zstephens/neat-genreads. PMID:27893777
Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.
Ernst, Jason; Kellis, Manolis
2015-04-01
With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.
Delora, Adam; Gonzales, Aaron; Medina, Christopher S; Mitchell, Adam; Mohed, Abdul Faheem; Jacobs, Russell E; Bearer, Elaine L
2016-01-15
Magnetic resonance imaging (MRI) is a well-developed technique in neuroscience. Limitations in applying MRI to rodent models of neuropsychiatric disorders include the large number of animals required to achieve statistical significance, and the paucity of automation tools for the critical early step in processing, brain extraction, which prepares brain images for alignment and voxel-wise statistics. This novel timesaving automation of template-based brain extraction ("skull-stripping") is capable of quickly and reliably extracting the brain from large numbers of whole head images in a single step. The method is simple to install and requires minimal user interaction. This method is equally applicable to different types of MR images. Results were evaluated with Dice and Jacquard similarity indices and compared in 3D surface projections with other stripping approaches. Statistical comparisons demonstrate that individual variation of brain volumes are preserved. A downloadable software package not otherwise available for extraction of brains from whole head images is included here. This software tool increases speed, can be used with an atlas or a template from within the dataset, and produces masks that need little further refinement. Our new automation can be applied to any MR dataset, since the starting point is a template mask generated specifically for that dataset. The method reliably and rapidly extracts brain images from whole head images, rendering them useable for subsequent analytical processing. This software tool will accelerate the exploitation of mouse models for the investigation of human brain disorders by MRI. Copyright © 2015 Elsevier B.V. All rights reserved.
MassImager: A software for interactive and in-depth analysis of mass spectrometry imaging data.
He, Jiuming; Huang, Luojiao; Tian, Runtao; Li, Tiegang; Sun, Chenglong; Song, Xiaowei; Lv, Yiwei; Luo, Zhigang; Li, Xin; Abliz, Zeper
2018-07-26
Mass spectrometry imaging (MSI) has become a powerful tool to probe molecule events in biological tissue. However, it is a widely held viewpoint that one of the biggest challenges is an easy-to-use data processing software for discovering the underlying biological information from complicated and huge MSI dataset. Here, a user-friendly and full-featured MSI software including three subsystems, Solution, Visualization and Intelligence, named MassImager, is developed focusing on interactive visualization, in-situ biomarker discovery and artificial intelligent pathological diagnosis. Simplified data preprocessing and high-throughput MSI data exchange, serialization jointly guarantee the quick reconstruction of ion image and rapid analysis of dozens of gigabytes datasets. It also offers diverse self-defined operations for visual processing, including multiple ion visualization, multiple channel superposition, image normalization, visual resolution enhancement and image filter. Regions-of-interest analysis can be performed precisely through the interactive visualization between the ion images and mass spectra, also the overlaid optical image guide, to directly find out the region-specific biomarkers. Moreover, automatic pattern recognition can be achieved immediately upon the supervised or unsupervised multivariate statistical modeling. Clear discrimination between cancer tissue and adjacent tissue within a MSI dataset can be seen in the generated pattern image, which shows great potential in visually in-situ biomarker discovery and artificial intelligent pathological diagnosis of cancer. All the features are integrated together in MassImager to provide a deep MSI processing solution at the in-situ metabolomics level for biomarker discovery and future clinical pathological diagnosis. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Ipsative imputation for a 15-item Geriatric Depression Scale in community-dwelling elderly people.
Imai, Hissei; Furukawa, Toshiaki A; Kasahara, Yoriko; Ishimoto, Yasuko; Kimura, Yumi; Fukutomi, Eriko; Chen, Wen-Ling; Tanaka, Mire; Sakamoto, Ryota; Wada, Taizo; Fujisawa, Michiko; Okumiya, Kiyohito; Matsubayashi, Kozo
2014-09-01
Missing data are inevitable in almost all medical studies. Imputation methods using the probabilistic model are common, but they cannot impute individual data and require special software. In contrast, the ipsative imputation method, which substitutes the missing items by the mean of the remaining items within the individual, is easy and does not need any special software, but it can provide individual scores. The aim of the present study was to evaluate the validity of the ipsative imputation method using data involving the 15-item Geriatric Depression Scale. Participants were community-dwelling elderly individuals (n = 1178). A structural equation model was constructed. The model fit indexes were calculated to assess the validity of the imputation method when it is used for individuals who were missing 20% of data or less and 40% of data or less, depending on whether we assumed that their correlation coefficients were the same as the dataset with no missing items. Finally, we compared path coefficients of the dataset imputed by ipsative imputation with those by multiple imputation. When compared with the assumption that the datasets differed, all of the model fit indexes were better under the assumption that the dataset without missing data is the same as that that was missing 20% of data or less. However, by the same assumption, the model fit indexes were worse in the dataset that was missing 40% of data or less. The path coefficients of the dataset imputed by ipsative imputation and by multiple imputation were compatible with each other if the proportion of missing items was 20% or less. Ipsative imputation appears to be a valid imputation method and can be used to impute data in studies using the 15-item Geriatric Depression Scale, if the percentage of its missing items is 20% or less. © 2014 The Authors. Psychogeriatrics © 2014 Japanese Psychogeriatric Society.
The National Center for Atmospheric Research (NCAR) Research Data Archive: a Data Education Center
NASA Astrophysics Data System (ADS)
Peng, G. S.; Schuster, D.
2015-12-01
The National Center for Atmospheric Research (NCAR) Research Data Archive (RDA), rda.ucar.edu, is not just another data center or data archive. It is a data education center. We not only serve data, we TEACH data. Weather and climate data is the original "Big Data" dataset and lessons learned while playing with weather data are applicable to a wide range of data investigations. Erroneous data assumptions are the Achilles heel of Big Data. It doesn't matter how much data you crunch if the data is not what you think it is. Each dataset archived at the RDA is assigned to a data specialist (DS) who curates the data. If a user has a question not answered in the dataset information web pages, they can call or email a skilled DS for further clarification. The RDA's diverse staff—with academic training in meteorology, oceanography, engineering (electrical, civil, ocean and database), mathematics, physics, chemistry and information science—means we likely have someone who "speaks your language." Data discovery is another difficult Big Data problem; one can only solve problems with data if one can find the right data. Metadata, both machine and human-generated, underpin the RDA data search tools. Users can quickly find datasets by name or dataset ID number. They can also perform a faceted search that successively narrows the options by user requirements or simply kick off an indexed search with a few words. Weather data formats can be difficult to read for non-expert users; it's usually packed in binary formats requiring specialized software and parameter names use specialized vocabularies. DSs create detailed information pages for each dataset and maintain lists of helpful software, documentation and links of information around the web. We further grow the level of sophistication of the users with tips, tutorials and data stories on the RDA Blog, http://ncarrda.blogspot.com/. How-to video tutorials are also posted on the NCAR Computational and Information Systems Laboratory (CISL) YouTube channel.
Map_plot and bgg_plot: software for integration of geoscience datasets
NASA Astrophysics Data System (ADS)
Gaillot, Philippe; Punongbayan, Jane T.; Rea, Brice
2004-02-01
Since 1985, the Ocean Drilling Program (ODP) has been supporting multidisciplinary research in exploring the structure and history of Earth beneath the oceans. After more than 200 Legs, complementary datasets covering different geological environments, periods and space scales have been obtained and distributed world-wide using the ODP-Janus and Lamont Doherty Earth Observatory-Borehole Research Group (LDEO-BRG) database servers. In Earth Sciences, more than in any other science, the ensemble of these data is characterized by heterogeneous formats and graphical representation modes. In order to fully and quickly assess this information, a set of Unix/Linux and Generic Mapping Tool-based C programs has been designed to convert and integrate datasets acquired during the present ODP and the future Integrated ODP (IODP) Legs. Using ODP Leg 199 datasets, we show examples of the capabilities of the proposed programs. The program map_plot is used to easily display datasets onto 2-D maps. The program bgg_plot (borehole geology and geophysics plot) displays data with respect to depth and/or time. The latter program includes depth shifting, filtering and plotting of core summary information, continuous and discrete-sample core measurements (e.g. physical properties, geochemistry, etc.), in situ continuous logs, magneto- and bio-stratigraphies, specific sedimentological analyses (lithology, grain size, texture, porosity, etc.), as well as core and borehole wall images. Outputs from both programs are initially produced in PostScript format that can be easily converted to Portable Document Format (PDF) or standard image formats (GIF, JPEG, etc.) using widely distributed conversion programs. Based on command line operations and customization of parameter files, these programs can be included in other shell- or database-scripts, automating plotting procedures of data requests. As an open source software, these programs can be customized and interfaced to fulfill any specific plotting need of geoscientists using ODP-like datasets.
Falkner, Jayson; Andrews, Philip
2005-05-15
Comparing tandem mass spectra (MSMS) against a known dataset of protein sequences is a common method for identifying unknown proteins; however, the processing of MSMS by current software often limits certain applications, including comprehensive coverage of post-translational modifications, non-specific searches and real-time searches to allow result-dependent instrument control. This problem deserves attention as new mass spectrometers provide the ability for higher throughput and as known protein datasets rapidly grow in size. New software algorithms need to be devised in order to address the performance issues of conventional MSMS protein dataset-based protein identification. This paper describes a novel algorithm based on converting a collection of monoisotopic, centroided spectra to a new data structure, named 'peptide finite state machine' (PFSM), which may be used to rapidly search a known dataset of protein sequences, regardless of the number of spectra searched or the number of potential modifications examined. The algorithm is verified using a set of commercially available tryptic digest protein standards analyzed using an ABI 4700 MALDI TOFTOF mass spectrometer, and a free, open source PFSM implementation. It is illustrated that a PFSM can accurately search large collections of spectra against large datasets of protein sequences (e.g. NCBI nr) using a regular desktop PC; however, this paper only details the method for identifying peptide and subsequently protein candidates from a dataset of known protein sequences. The concept of using a PFSM as a peptide pre-screening technique for MSMS-based search engines is validated by using PFSM with Mascot and XTandem. Complete source code, documentation and examples for the reference PFSM implementation are freely available at the Proteome Commons, http://www.proteomecommons.org and source code may be used both commercially and non-commercially as long as the original authors are credited for their work.
A tool for the estimation of the distribution of landslide area in R
NASA Astrophysics Data System (ADS)
Rossi, M.; Cardinali, M.; Fiorucci, F.; Marchesini, I.; Mondini, A. C.; Santangelo, M.; Ghosh, S.; Riguer, D. E. L.; Lahousse, T.; Chang, K. T.; Guzzetti, F.
2012-04-01
We have developed a tool in R (the free software environment for statistical computing, http://www.r-project.org/) to estimate the probability density and the frequency density of landslide area. The tool implements parametric and non-parametric approaches to the estimation of the probability density and the frequency density of landslide area, including: (i) Histogram Density Estimation (HDE), (ii) Kernel Density Estimation (KDE), and (iii) Maximum Likelihood Estimation (MLE). The tool is available as a standard Open Geospatial Consortium (OGC) Web Processing Service (WPS), and is accessible through the web using different GIS software clients. We tested the tool to compare Double Pareto and Inverse Gamma models for the probability density of landslide area in different geological, morphological and climatological settings, and to compare landslides shown in inventory maps prepared using different mapping techniques, including (i) field mapping, (ii) visual interpretation of monoscopic and stereoscopic aerial photographs, (iii) visual interpretation of monoscopic and stereoscopic VHR satellite images and (iv) semi-automatic detection and mapping from VHR satellite images. Results show that both models are applicable in different geomorphological settings. In most cases the two models provided very similar results. Non-parametric estimation methods (i.e., HDE and KDE) provided reasonable results for all the tested landslide datasets. For some of the datasets, MLE failed to provide a result, for convergence problems. The two tested models (Double Pareto and Inverse Gamma) resulted in very similar results for large and very large datasets (> 150 samples). Differences in the modeling results were observed for small datasets affected by systematic biases. A distinct rollover was observed in all analyzed landslide datasets, except for a few datasets obtained from landslide inventories prepared through field mapping or by semi-automatic mapping from VHR satellite imagery. The tool can also be used to evaluate the probability density and the frequency density of landslide volume.
Jayashree, B; Rajgopal, S; Hoisington, D; Prasanth, V P; Chandra, S
2008-09-24
Structure, is a widely used software tool to investigate population genetic structure with multi-locus genotyping data. The software uses an iterative algorithm to group individuals into "K" clusters, representing possibly K genetically distinct subpopulations. The serial implementation of this programme is processor-intensive even with small datasets. We describe an implementation of the program within a parallel framework. Speedup was achieved by running different replicates and values of K on each node of the cluster. A web-based user-oriented GUI has been implemented in PHP, through which the user can specify input parameters for the programme. The number of processors to be used can be specified in the background command. A web-based visualization tool "Visualstruct", written in PHP (HTML and Java script embedded), allows for the graphical display of population clusters output from Structure, where each individual may be visualized as a line segment with K colors defining its possible genomic composition with respect to the K genetic sub-populations. The advantage over available programs is in the increased number of individuals that can be visualized. The analyses of real datasets indicate a speedup of up to four, when comparing the speed of execution on clusters of eight processors with the speed of execution on one desktop. The software package is freely available to interested users upon request.
Epileptic Seizure Forewarning by Nonlinear Techniques
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hively, L.M.
2002-04-19
This report describes work that was performed under a Cooperative Research and Development Agreement (CRADA) between UT-Battelle, LLC (Contractor) and a commercial participant, VIASYS Healthcare Inc. (formerly Nicolet Biomedical, Inc.). The Contractor has patented technology that forewarns of impending epileptic events via scalp electroencephalograph (EEG) data and successfully demonstrated this technology on 20 datasets from the Participant under pre-CRADA effort. This CRADA sought to bridge the gap between the Contractor's existing research-class software and a prototype medical device for subsequent commercialization by the Participant. The objectives of this CRADA were (1) development of a combination of existing computer hardware andmore » Contractor-patented software into a clinical process for warning of impending epileptic events in human patients, and (2) validation of the epilepsy warning methodology. This work modified the ORNL research-class FORTRAN for forewarning to run under a graphical user interface (GUI). The GUI-FORTRAN software subsequently was installed on desktop computers at five epilepsy monitoring units. The forewarning prototypes have run for more than one year without any hardware or software failures. This work also reported extensive analysis of model and EEG datasets to demonstrate the usefulness of the methodology. However, the Participant recently chose to stop work on the CRADA, due to a change in business priorities. Much work remains to convert the technology into a commercial clinical or ambulatory device for patient use, as discussed in App. H.« less
Husen, Peter; Tarasov, Kirill; Katafiasz, Maciej; Sokol, Elena; Vogt, Johannes; Baumgart, Jan; Nitsch, Robert; Ekroos, Kim; Ejsing, Christer S
2013-01-01
Global lipidomics analysis across large sample sizes produces high-content datasets that require dedicated software tools supporting lipid identification and quantification, efficient data management and lipidome visualization. Here we present a novel software-based platform for streamlined data processing, management and visualization of shotgun lipidomics data acquired using high-resolution Orbitrap mass spectrometry. The platform features the ALEX framework designed for automated identification and export of lipid species intensity directly from proprietary mass spectral data files, and an auxiliary workflow using database exploration tools for integration of sample information, computation of lipid abundance and lipidome visualization. A key feature of the platform is the organization of lipidomics data in "database table format" which provides the user with an unsurpassed flexibility for rapid lipidome navigation using selected features within the dataset. To demonstrate the efficacy of the platform, we present a comparative neurolipidomics study of cerebellum, hippocampus and somatosensory barrel cortex (S1BF) from wild-type and knockout mice devoid of the putative lipid phosphate phosphatase PRG-1 (plasticity related gene-1). The presented framework is generic, extendable to processing and integration of other lipidomic data structures, can be interfaced with post-processing protocols supporting statistical testing and multivariate analysis, and can serve as an avenue for disseminating lipidomics data within the scientific community. The ALEX software is available at www.msLipidomics.info.
Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph.
Benoit, Gaëtan; Lemaitre, Claire; Lavenier, Dominique; Drezen, Erwan; Dayris, Thibault; Uricaru, Raluca; Rizk, Guillaume
2015-09-14
Data volumes generated by next-generation sequencing (NGS) technologies is now a major concern for both data storage and transmission. This triggered the need for more efficient methods than general purpose compression tools, such as the widely used gzip method. We present a novel reference-free method meant to compress data issued from high throughput sequencing technologies. Our approach, implemented in the software LEON, employs techniques derived from existing assembly principles. The method is based on a reference probabilistic de Bruijn Graph, built de novo from the set of reads and stored in a Bloom filter. Each read is encoded as a path in this graph, by memorizing an anchoring kmer and a list of bifurcations. The same probabilistic de Bruijn Graph is used to perform a lossy transformation of the quality scores, which allows to obtain higher compression rates without losing pertinent information for downstream analyses. LEON was run on various real sequencing datasets (whole genome, exome, RNA-seq or metagenomics). In all cases, LEON showed higher overall compression ratios than state-of-the-art compression software. On a C. elegans whole genome sequencing dataset, LEON divided the original file size by more than 20. LEON is an open source software, distributed under GNU affero GPL License, available for download at http://gatb.inria.fr/software/leon/.
Kuharev, Jörg; Navarro, Pedro; Distler, Ute; Jahn, Olaf; Tenzer, Stefan
2015-09-01
Label-free quantification (LFQ) based on data-independent acquisition workflows currently experiences increasing popularity. Several software tools have been recently published or are commercially available. The present study focuses on the evaluation of three different software packages (Progenesis, synapter, and ISOQuant) supporting ion mobility enhanced data-independent acquisition data. In order to benchmark the LFQ performance of the different tools, we generated two hybrid proteome samples of defined quantitative composition containing tryptically digested proteomes of three different species (mouse, yeast, Escherichia coli). This model dataset simulates complex biological samples containing large numbers of both unregulated (background) proteins as well as up- and downregulated proteins with exactly known ratios between samples. We determined the number and dynamic range of quantifiable proteins and analyzed the influence of applied algorithms (retention time alignment, clustering, normalization, etc.) on quantification results. Analysis of technical reproducibility revealed median coefficients of variation of reported protein abundances below 5% for MS(E) data for Progenesis and ISOQuant. Regarding accuracy of LFQ, evaluation with synapter and ISOQuant yielded superior results compared to Progenesis. In addition, we discuss reporting formats and user friendliness of the software packages. The data generated in this study have been deposited to the ProteomeXchange Consortium with identifier PXD001240 (http://proteomecentral.proteomexchange.org/dataset/PXD001240). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
POWERLIB: SAS/IML Software for Computing Power in Multivariate Linear Models
Johnson, Jacqueline L.; Muller, Keith E.; Slaughter, James C.; Gurka, Matthew J.; Gribbin, Matthew J.; Simpson, Sean L.
2014-01-01
The POWERLIB SAS/IML software provides convenient power calculations for a wide range of multivariate linear models with Gaussian errors. The software includes the Box, Geisser-Greenhouse, Huynh-Feldt, and uncorrected tests in the “univariate” approach to repeated measures (UNIREP), the Hotelling Lawley Trace, Pillai-Bartlett Trace, and Wilks Lambda tests in “multivariate” approach (MULTIREP), as well as a limited but useful range of mixed models. The familiar univariate linear model with Gaussian errors is an important special case. For estimated covariance, the software provides confidence limits for the resulting estimated power. All power and confidence limits values can be output to a SAS dataset, which can be used to easily produce plots and tables for manuscripts. PMID:25400516
A novel statistical method for quantitative comparison of multiple ChIP-seq datasets.
Chen, Li; Wang, Chi; Qin, Zhaohui S; Wu, Hao
2015-06-15
ChIP-seq is a powerful technology to measure the protein binding or histone modification strength in the whole genome scale. Although there are a number of methods available for single ChIP-seq data analysis (e.g. 'peak detection'), rigorous statistical method for quantitative comparison of multiple ChIP-seq datasets with the considerations of data from control experiment, signal to noise ratios, biological variations and multiple-factor experimental designs is under-developed. In this work, we develop a statistical method to perform quantitative comparison of multiple ChIP-seq datasets and detect genomic regions showing differential protein binding or histone modification. We first detect peaks from all datasets and then union them to form a single set of candidate regions. The read counts from IP experiment at the candidate regions are assumed to follow Poisson distribution. The underlying Poisson rates are modeled as an experiment-specific function of artifacts and biological signals. We then obtain the estimated biological signals and compare them through the hypothesis testing procedure in a linear model framework. Simulations and real data analyses demonstrate that the proposed method provides more accurate and robust results compared with existing ones. An R software package ChIPComp is freely available at http://web1.sph.emory.edu/users/hwu30/software/ChIPComp.html. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A New Approach to Create Image Control Networks in ISIS
NASA Astrophysics Data System (ADS)
Becker, K. J.; Berry, K. L.; Mapel, J. A.; Walldren, J. C.
2017-06-01
A new approach was used to create a feature-based control point network that required the development of new tools in the Integrated Software for Imagers and Spectrometers (ISIS3) system to process very large datasets.
NASA Astrophysics Data System (ADS)
Schwartz, Richard A.; Zarro, D.; Csillaghy, A.; Dennis, B.; Tolbert, A. K.; Etesi, L.
2009-05-01
We report on our activities to integrate VSO search and retrieval capabilities into standard data access, display, and analysis tools. In addition to its standard Web-based search form, the VSO provides an Interactive Data Language (IDL) client (vso_search) that is available through the Solar Software (SSW) package. We have incorporated this client into an IDL-widget interface program (show_synop) that allows for more simplified searching and downloading of VSO datasets directly into a user's IDL data analysis environment. In particular, we have provided the capability to read VSO datasets into a general purpose IDL package (plotman) that can display different datatypes (lightcurves, images, and spectra) and perform basic data operations such as zooming, image overlays, solar rotation, etc. Currently, the show_synop tool supports access to ground-based and space-based (SOHO, STEREO, and Hinode) observations, and has the capability to include new datasets as they become available. A user encounters two major hurdles when using the VSO: (1) Instrument-specific software (such as level-0 file readers and data-prepping procedures) may not be available in the user's local SSW distribution. (2) Recent calibration files (such as flat-fields) are not automatically distributed with the analysis software. To address these issues, we have developed a dedicated server (prepserver) that incorporates all the latest instrument-specific software libraries and calibration files. The prepserver uses an IDL-Java bridge to read and implement data processing requests from a client and return a processed data file that can be readily displayed with the show_synop/plotman package. The advantage of the prepserver is that the user is only required to install the general branch (gen) of the SSW tree, and is freed from the more onerous task of installing instrument-specific libraries and calibration files. We will demonstrate how the prepserver can be used to read, process, and overlay SOHO/EIT, TRACE, SECCHI/EUVI, and RHESSI images.
A general framework for parametric survival analysis.
Crowther, Michael J; Lambert, Paul C
2014-12-30
Parametric survival models are being increasingly used as an alternative to the Cox model in biomedical research. Through direct modelling of the baseline hazard function, we can gain greater understanding of the risk profile of patients over time, obtaining absolute measures of risk. Commonly used parametric survival models, such as the Weibull, make restrictive assumptions of the baseline hazard function, such as monotonicity, which is often violated in clinical datasets. In this article, we extend the general framework of parametric survival models proposed by Crowther and Lambert (Journal of Statistical Software 53:12, 2013), to incorporate relative survival, and robust and cluster robust standard errors. We describe the general framework through three applications to clinical datasets, in particular, illustrating the use of restricted cubic splines, modelled on the log hazard scale, to provide a highly flexible survival modelling framework. Through the use of restricted cubic splines, we can derive the cumulative hazard function analytically beyond the boundary knots, resulting in a combined analytic/numerical approach, which substantially improves the estimation process compared with only using numerical integration. User-friendly Stata software is provided, which significantly extends parametric survival models available in standard software. Copyright © 2014 John Wiley & Sons, Ltd.
Approaching the exa-scale: a real-world evaluation of rendering extremely large data sets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patchett, John M; Ahrens, James P; Lo, Li - Ta
2010-10-15
Extremely large scale analysis is becoming increasingly important as supercomputers and their simulations move from petascale to exascale. The lack of dedicated hardware acceleration for rendering on today's supercomputing platforms motivates our detailed evaluation of the possibility of interactive rendering on the supercomputer. In order to facilitate our understanding of rendering on the supercomputing platform, we focus on scalability of rendering algorithms and architecture envisioned for exascale datasets. To understand tradeoffs for dealing with extremely large datasets, we compare three different rendering algorithms for large polygonal data: software based ray tracing, software based rasterization and hardware accelerated rasterization. We presentmore » a case study of strong and weak scaling of rendering extremely large data on both GPU and CPU based parallel supercomputers using Para View, a parallel visualization tool. Wc use three different data sets: two synthetic and one from a scientific application. At an extreme scale, algorithmic rendering choices make a difference and should be considered while approaching exascale computing, visualization, and analysis. We find software based ray-tracing offers a viable approach for scalable rendering of the projected future massive data sizes.« less
The igmspec database of public spectra probing the intergalactic medium
NASA Astrophysics Data System (ADS)
Prochaska, J. X.
2017-04-01
We describe v02 of igmspec, a database of publicly available ultraviolet, optical, and near-infrared spectra that probe the intergalactic medium (IGM). This database, a child of the specdb repository in the specdb github organization, comprises 403 277 unique sources and 434 686 spectra obtained with the world's greatest observatories. All of these data are distributed in a single ≈ 25GB HDF5 file maintained at the University of California Observatories and the University of California, Santa Cruz. The specdb software package includes Python scripts and modules for searching the source catalog and spectral datasets, and software links to the linetools package for spectral analysis. The repository also includes software to generate private spectral datasets that are compliant with International Virtual Observatory Alliance (IVOA) protocols and a Python-based interface for IVOA Simple Spectral Access queries. Future versions of igmspec will ingest other sources (e.g. gamma-ray burst afterglows) and other surveys as they become publicly available. The overall goal is to include every spectrum that effectively probes the IGM. Future databases of specdb may include publicly available galaxy spectra (exgalspec) and published supernovae spectra (snspec). The community is encouraged to join the effort on github: https://github.com/specdb.
Informed-Proteomics: open-source software package for top-down proteomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, Jungkap; Piehowski, Paul D.; Wilkins, Christopher
Top-down proteomics involves the analysis of intact proteins. This approach is very attractive as it allows for analyzing proteins in their endogenous form without proteolysis, preserving valuable information about post-translation modifications, isoforms, proteolytic processing or their combinations collectively called proteoforms. Moreover, the quality of the top-down LC-MS/MS datasets is rapidly increasing due to advances in the liquid chromatography and mass spectrometry instrumentation and sample processing protocols. However, the top-down mass spectra are substantially more complex compare to the more conventional bottom-up data. To take full advantage of the increasing quality of the top-down LC-MS/MS datasets there is an urgent needmore » to develop algorithms and software tools for confident proteoform identification and quantification. In this study we present a new open source software suite for top-down proteomics analysis consisting of an LC-MS feature finding algorithm, a database search algorithm, and an interactive results viewer. The presented tool along with several other popular tools were evaluated using human-in-mouse xenograft luminal and basal breast tumor samples that are known to have significant differences in protein abundance based on bottom-up analysis.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
none,
PNNL, Florida HERO, and Energy Smart Home Plans helped Ravenwood Homes achieve a HERS 15 with PV or HERS 65 without PV on a home in Florida with SEER 16 AC, concrete block and rigid foam walls, high-performance windows, solar water heating, and 5.98 kW PV.
DETECTION OF PATHOGENS IN DRINKING WATER (SEER 2)
Project investigators developed a polymerase chain reaction (PCR)-based technique to detect E. coli 0157:H7 cells in environmental samples using previously reported PCR primers for the specific detection of genes involved in biosynthesis of 0157 polysacchari...
Chikkagoudar, Satish; Wang, Kai; Li, Mingyao
2011-05-26
Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.
NASA Astrophysics Data System (ADS)
Das, I.; Oberai, K.; Sarathi Roy, P.
2012-07-01
Landslides exhibit themselves in different mass movement processes and are considered among the most complex natural hazards occurring on the earth surface. Making landslide database available online via WWW (World Wide Web) promotes the spreading and reaching out of the landslide information to all the stakeholders. The aim of this research is to present a comprehensive database for generating landslide hazard scenario with the help of available historic records of landslides and geo-environmental factors and make them available over the Web using geospatial Free & Open Source Software (FOSS). FOSS reduces the cost of the project drastically as proprietary software's are very costly. Landslide data generated for the period 1982 to 2009 were compiled along the national highway road corridor in Indian Himalayas. All the geo-environmental datasets along with the landslide susceptibility map were served through WEBGIS client interface. Open source University of Minnesota (UMN) mapserver was used as GIS server software for developing web enabled landslide geospatial database. PHP/Mapscript server-side application serve as a front-end application and PostgreSQL with PostGIS extension serve as a backend application for the web enabled landslide spatio-temporal databases. This dynamic virtual visualization process through a web platform brings an insight into the understanding of the landslides and the resulting damage closer to the affected people and user community. The landslide susceptibility dataset is also made available as an Open Geospatial Consortium (OGC) Web Feature Service (WFS) which can be accessed through any OGC compliant open source or proprietary GIS Software.
2011-01-01
Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/. PMID:21615923
Biology Needs Evolutionary Software Tools: Let’s Build Them Right
Team, Galaxy; Goecks, Jeremy; Taylor, James
2018-01-01
Abstract Research in population genetics and evolutionary biology has always provided a computational backbone for life sciences as a whole. Today evolutionary and population biology reasoning are essential for interpretation of large complex datasets that are characteristic of all domains of today’s life sciences ranging from cancer biology to microbial ecology. This situation makes algorithms and software tools developed by our community more important than ever before. This means that we, developers of software tool for molecular evolutionary analyses, now have a shared responsibility to make these tools accessible using modern technological developments as well as provide adequate documentation and training. PMID:29688462
Optimized hardware framework of MLP with random hidden layers for classification applications
NASA Astrophysics Data System (ADS)
Zyarah, Abdullah M.; Ramesh, Abhishek; Merkel, Cory; Kudithipudi, Dhireesha
2016-05-01
Multilayer Perceptron Networks with random hidden layers are very efficient at automatic feature extraction and offer significant performance improvements in the training process. They essentially employ large collection of fixed, random features, and are expedient for form-factor constrained embedded platforms. In this work, a reconfigurable and scalable architecture is proposed for the MLPs with random hidden layers with a customized building block based on CORDIC algorithm. The proposed architecture also exploits fixed point operations for area efficiency. The design is validated for classification on two different datasets. An accuracy of ~ 90% for MNIST dataset and 75% for gender classification on LFW dataset was observed. The hardware has 299 speed-up over the corresponding software realization.
Evaluation of copy number variation detection for a SNP array platform
2014-01-01
Background Copy Number Variations (CNVs) are usually inferred from Single Nucleotide Polymorphism (SNP) arrays by use of some software packages based on given algorithms. However, there is no clear understanding of the performance of these software packages; it is therefore difficult to select one or several software packages for CNV detection based on the SNP array platform. We selected four publicly available software packages designed for CNV calling from an Affymetrix SNP array, including Birdsuite, dChip, Genotyping Console (GTC) and PennCNV. The publicly available dataset generated by Array-based Comparative Genomic Hybridization (CGH), with a resolution of 24 million probes per sample, was considered to be the “gold standard”. Compared with the CGH-based dataset, the success rate, average stability rate, sensitivity, consistence and reproducibility of these four software packages were assessed compared with the “gold standard”. Specially, we also compared the efficiency of detecting CNVs simultaneously by two, three and all of the software packages with that by a single software package. Results Simply from the quantity of the detected CNVs, Birdsuite detected the most while GTC detected the least. We found that Birdsuite and dChip had obvious detecting bias. And GTC seemed to be inferior because of the least amount of CNVs it detected. Thereafter we investigated the detection consistency produced by one certain software package and the rest three software suits. We found that the consistency of dChip was the lowest while GTC was the highest. Compared with the CNVs detecting result of CGH, in the matching group, GTC called the most matching CNVs, PennCNV-Affy ranked second. In the non-overlapping group, GTC called the least CNVs. With regards to the reproducibility of CNV calling, larger CNVs were usually replicated better. PennCNV-Affy shows the best consistency while Birdsuite shows the poorest. Conclusion We found that PennCNV outperformed the other three packages in the sensitivity and specificity of CNV calling. Obviously, each calling method had its own limitations and advantages for different data analysis. Therefore, the optimized calling methods might be identified using multiple algorithms to evaluate the concordance and discordance of SNP array-based CNV calling. PMID:24555668
Cost Model Comparison: A Study of Internally and Commercially Developed Cost Models in Use by NASA
NASA Technical Reports Server (NTRS)
Gupta, Garima
2011-01-01
NASA makes use of numerous cost models to accurately estimate the cost of various components of a mission - hardware, software, mission/ground operations - during the different stages of a mission's lifecycle. The purpose of this project was to survey these models and determine in which respects they are similar and in which they are different. The initial survey included a study of the cost drivers for each model, the form of each model (linear/exponential/other CER, range/point output, capable of risk/sensitivity analysis), and for what types of missions and for what phases of a mission lifecycle each model is capable of estimating cost. The models taken into consideration consisted of both those that were developed by NASA and those that were commercially developed: GSECT, NAFCOM, SCAT, QuickCost, PRICE, and SEER. Once the initial survey was completed, the next step in the project was to compare the cost models' capabilities in terms of Work Breakdown Structure (WBS) elements. This final comparison was then portrayed in a visual manner with Venn diagrams. All of the materials produced in the process of this study were then posted on the Ground Segment Team (GST) Wiki.
Dual-stroke heat pump field performance
NASA Astrophysics Data System (ADS)
Veyo, S. E.
1984-11-01
Two nearly identical proprototype systems, each employing a unique dual-stroke compressor, were built and tested. One was installed in an occupied residence in Jeannette, Pa. It has provided the heating and cooling required from that time to the present. The system has functioned without failure of any prototypical advanced components, although early field experience did suffer from deficiencies in the software for the breadboard micro processor control system. Analysis of field performance data indicates a heating performance factor (HSPF) of 8.13 Stu/Wa, and a cooling energy efficiency (SEER) of 8.35 Scu/Wh. Data indicate that the beat pump is oversized for the test house since the observed lower balance point is 3 F whereas 17 F La optimum. Oversizing coupled with the use of resistance heat ot maintain delivered air temperature warmer than 90 F results in the consumption of more resistance heat than expected, more unit cycling, and therefore lower than expected energy efficiency. Our analysis indicates that with optimal mixing the dual stroke heat pump will yield as HSFF 30% better than a single capacity heat pump representative of high efficiency units in the market place today for the observed weather profile.
High-Performance Tiled WMS and KML Web Server
NASA Technical Reports Server (NTRS)
Plesea, Lucian
2007-01-01
This software is an Apache 2.0 module implementing a high-performance map server to support interactive map viewers and virtual planet client software. It can be used in applications that require access to very-high-resolution geolocated images, such as GIS, virtual planet applications, and flight simulators. It serves Web Map Service (WMS) requests that comply with a given request grid from an existing tile dataset. It also generates the KML super-overlay configuration files required to access the WMS image tiles.
ACHCAR, J. A.; MARTINEZ, E. Z.; RUFFINO-NETTO, A.; PAULINO, C. D.; SOARES, P.
2008-01-01
SUMMARY We considered a Bayesian analysis for the prevalence of tuberculosis cases in New York City from 1970 to 2000. This counting dataset presented two change-points during this period. We modelled this counting dataset considering non-homogeneous Poisson processes in the presence of the two-change points. A Bayesian analysis for the data is considered using Markov chain Monte Carlo methods. Simulated Gibbs samples for the parameters of interest were obtained using WinBugs software. PMID:18346287
NASA Astrophysics Data System (ADS)
Klump, J. F.; Ulbricht, D.; Conze, R.
2014-12-01
The Continental Deep Drilling Programme (KTB) was a scientific drilling project from 1987 to 1995 near Windischeschenbach, Bavaria. The main super-deep borehole reached a depth of 9,101 meters into the Earth's continental crust. The project used the most current equipment for data capture and processing. After the end of the project key data were disseminated through the web portal of the International Continental Scientific Drilling Program (ICDP). The scientific reports were published as printed volumes. As similar projects have also experienced, it becomes increasingly difficult to maintain a data portal over a long time. Changes in software and underlying hardware make a migration of the entire system inevitable. Around 2009 the data presented on the ICDP web portal were migrated to the Scientific Drilling Database (SDDB) and published through DataCite using Digital Object Identifiers (DOI) as persistent identifiers. The SDDB portal used a relational database with a complex data model to store data and metadata. A PHP-based Content Management System with custom modifications made it possible to navigate and browse datasets using the metadata and then download datasets. The data repository software eSciDoc allows storing self-contained packages consistent with the OAIS reference model. Each package consists of binary data files and XML-metadata. Using a REST-API the packages can be stored in the eSciDoc repository and can be searched using the XML-metadata. During the last maintenance cycle of the SDDB the data and metadata were migrated into the eSciDoc repository. Discovery metadata was generated following the GCMD-DIF, ISO19115 and DataCite schemas. The eSciDoc repository allows to store an arbitrary number of XML-metadata records with each data object. In addition to descriptive metadata each data object may contain pointers to related materials, such as IGSN-metadata to link datasets to physical specimens, or identifiers of literature interpreting the data. Datasets are presented by XSLT-stylesheet transformation using the stored metadata. The presentation shows several migration cycles of data and metadata, which were driven by aging software systems. Currently the datasets reside as self-contained entities in a repository system that is ready for digital preservation.
Annual Report to the Nation on the Status of Cancer - SEER Publications
Report on rates for new cancer cases, cancer deaths, and trends for the most common cancers in the United States. View the report, read a summary of incidence or mortality, or access materials to share on social media.
A Comparative Study of Point Cloud Data Collection and Processing
NASA Astrophysics Data System (ADS)
Pippin, J. E.; Matheney, M.; Gentle, J. N., Jr.; Pierce, S. A.; Fuentes-Pineda, G.
2016-12-01
Over the past decade, there has been dramatic growth in the acquisition of publicly funded high-resolution topographic data for scientific, environmental, engineering and planning purposes. These data sets are valuable for applications of interest across a large and varied user community. However, because of the large volumes of data produced by high-resolution mapping technologies and expense of aerial data collection, it is often difficult to collect and distribute these datasets. Furthermore, the data can be technically challenging to process, requiring software and computing resources not readily available to many users. This study presents a comparison of advanced computing hardware and software that is used to collect and process point cloud datasets, such as LIDAR scans. Activities included implementation and testing of open source libraries and applications for point cloud data processing such as, Meshlab, Blender, PDAL, and PCL. Additionally, a suite of commercial scale applications, Skanect and Cloudcompare, were applied to raw datasets. Handheld hardware solutions, a Structure Scanner and Xbox 360 Kinect V1, were tested for their ability to scan at three field locations. The resultant data projects successfully scanned and processed subsurface karst features ranging from small stalactites to large rooms, as well as a surface waterfall feature. Outcomes support the feasibility of rapid sensing in 3D at field scales.
Rule-based topology system for spatial databases to validate complex geographic datasets
NASA Astrophysics Data System (ADS)
Martinez-Llario, J.; Coll, E.; Núñez-Andrés, M.; Femenia-Ribera, C.
2017-06-01
A rule-based topology software system providing a highly flexible and fast procedure to enforce integrity in spatial relationships among datasets is presented. This improved topology rule system is built over the spatial extension Jaspa. Both projects are open source, freely available software developed by the corresponding author of this paper. Currently, there is no spatial DBMS that implements a rule-based topology engine (considering that the topology rules are designed and performed in the spatial backend). If the topology rules are applied in the frontend (as in many GIS desktop programs), ArcGIS is the most advanced solution. The system presented in this paper has several major advantages over the ArcGIS approach: it can be extended with new topology rules, it has a much wider set of rules, and it can mix feature attributes with topology rules as filters. In addition, the topology rule system can work with various DBMSs, including PostgreSQL, H2 or Oracle, and the logic is performed in the spatial backend. The proposed topology system allows users to check the complex spatial relationships among features (from one or several spatial layers) that require some complex cartographic datasets, such as the data specifications proposed by INSPIRE in Europe and the Land Administration Domain Model (LADM) for Cadastral data.
Busby, Ben; Lesko, Matthew; Federer, Lisa
2016-01-01
In genomics, bioinformatics and other areas of data science, gaps exist between extant public datasets and the open-source software tools built by the community to analyze similar data types. The purpose of biological data science hackathons is to assemble groups of genomics or bioinformatics professionals and software developers to rapidly prototype software to address these gaps. The only two rules for the NCBI-assisted hackathons run so far are that 1) data either must be housed in public data repositories or be deposited to such repositories shortly after the hackathon's conclusion, and 2) all software comprising the final pipeline must be open-source or open-use. Proposed topics, as well as suggested tools and approaches, are distributed to participants at the beginning of each hackathon and refined during the event. Software, scripts, and pipelines are developed and published on GitHub, a web service providing publicly available, free-usage tiers for collaborative software development. The code resulting from each hackathon is published at https://github.com/NCBI-Hackathons/ with separate directories or repositories for each team.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stessin, Alexander M.; Weill Medical College of Cornell University, New York, NY; Meyer, Joshua E.
2008-11-15
Purpose: Cancer of the exocrine pancreas is the fifth leading cause of cancer death in the United States. Neoadjuvant chemoradiation has been investigated in several trials as a strategy for downstaging locally advanced disease to resectability. The aim of the present study is to examine the effect of neoadjuvant radiation therapy (RT) vs. other treatments on long-term survival for patients with resectable pancreatic cancer in a large population-based sample group. Methods and Materials: The Surveillance, Epidemiology, and End Results (SEER) registry database (1994-2003) was queried for cases of surgically resected pancreatic cancer. Retrospective analysis was performed. The endpoint of themore » study was overall survival. Results: Using Kaplan-Meier analysis we found that the median overall survival of patients receiving neoadjuvant RT was 23 months vs. 12 months with no RT and 17 months with adjuvant RT. Using Cox regression and controlling for independent covariates (age, sex, stage, grade, and year of diagnosis), we found that neoadjuvant RT results in significantly higher rates of survival than other treatments (hazard ratio [HR], 0.55; 95% confidence interval, 0.38-0.79; p = 0.001). Specifically comparing adjuvant with neoadjuvant RT, we found a significantly lower HR for death in patients receiving neoadjuvant RT rather than adjuvant RT (HR, 0.63; 95% confidence interval, 0.45-0.90; p = 0.03). Conclusions: This analysis of SEER data showed a survival benefit for the use of neoadjuvant RT over surgery alone or surgery with adjuvant RT in treating pancreatic cancer. Therapeutic strategies that use neoadjuvant RT should be further explored for patients with resectable pancreatic cancer.« less
Kinslow, Connor J; Rajpara, Raj S; Wu, Cheng-Chia; Bruce, Samuel S; Canoll, Peter D; Wang, Shih-Hsiu; Sonabend, Adam M; Sheth, Sameer A; McKhann, Guy M; Sisti, Michael B; Bruce, Jeffrey N; Wang, Tony J C
2017-06-01
Meningeal hemangiopericytoma (m-HPC) is a rare tumor of the central nervous system (CNS), which is distinguished clinically from meningioma by its tendency to recur and metastasize. The histological classification and grading scheme for m-HPC is still evolving and few studies have identified tumor features that are associated with metastasis. All patients at our institution with m-HPC were assessed for patient, tumor, and treatment characteristics associated with survival, recurrence, and metastasis. New findings were validated using the SEER database. Twenty-seven patients were identified in our institutional records with m-HPC with a median follow-up time of 85 months. Invasiveness was the strongest predictor of decreased overall survival (OS) and decreased metastasis-free survival (MFS) (p = 0.004 and 0.001). On subgroup analysis, bone invasion trended towards decreased OS (p = 0.056). Bone invasion and soft tissue invasion were significantly associated with decreased MFS (p = 0.001 and 0.012). An additional 315 patients with m-HPC were identified in the SEER database that had information on tumor invasion and 263 with information on distant metastasis. Invasion was significantly associated with decreased survival (HR = 5.769, p = 0.007) and metastasis (OR 134, p = 0.000) in the SEER data. In this study, the authors identified a previously unreported tumor characteristic, invasiveness, as the strongest factor associated with decreased survival and metastasis. The association of invasion with decreased survival and metastasis was confirmed in a separate, larger, publicly available database. Invasion may be a useful parameter in the histological grading and clinical management of hemangiopericytoma of the CNS.
Kong, Xiangxing; Li, Jun; Cai, Yibo; Tian, Yu; Chi, Shengqiang; Tong, Danyang; Hu, Yeting; Yang, Qi; Li, Jingsong; Poston, Graeme; Yuan, Ying; Ding, Kefeng
2018-01-08
To revise the American Joint Committee on Cancer TNM staging system for colorectal cancer (CRC) based on a nomogram analysis of Surveillance, Epidemiology, and End Results (SEER) database, and to prove the rationality of enhancing T stage's weighting in our previously proposed T-plus staging system. Total 115,377 non-metastatic CRC patients from SEER were randomly grouped as training and testing set by ratio 1:1. The Nomo-staging system was established via three nomograms based on 1-year, 2-year and 3-year disease specific survival (DSS) Logistic regression analysis of the training set. The predictive value of Nomo-staging system for the testing set was evaluated by concordance index (c-index), likelihood ratio (L.R.) and Akaike information criteria (AIC) for 1-year, 2-year, 3-year overall survival (OS) and DSS. Kaplan-Meier survival curve was used to valuate discrimination and gradient monotonicity. And an external validation was performed on database from the Second Affiliated Hospital of Zhejiang University (SAHZU). Patients with T1-2 N1 and T1N2a were classified into stage II while T4 N0 patients were classified into stage III in Nomo-staging system. Kaplan-Meier survival curves of OS and DSS in testing set showed Nomo-staging system performed better in discrimination and gradient monotonicity, and the external validation in SAHZU database also showed distinctly better discrimination. The Nomo-staging system showed higher value in L.R. and c-index, and lower value in AIC when predicting OS and DSS in testing set. The Nomo-staging system showed better performance in prognosis prediction and the weight of lymph nodes status in prognosis prediction should be cautiously reconsidered.
Ambient ultraviolet radiation exposure and hepatocellular carcinoma incidence in the United States.
VoPham, Trang; Bertrand, Kimberly A; Yuan, Jian-Min; Tamimi, Rulla M; Hart, Jaime E; Laden, Francine
2017-08-18
Hepatocellular carcinoma (HCC), the most commonly occurring type of primary liver cancer, has been increasing in incidence worldwide. Vitamin D, acquired from sunlight exposure, diet, and dietary supplements, has been hypothesized to impact hepatocarcinogenesis. However, previous epidemiologic studies examining the associations between dietary and serum vitamin D reported mixed results. The purpose of this study was to examine the association between ambient ultraviolet (UV) radiation exposure and HCC risk in the U.S. The Surveillance, Epidemiology, and End Results (SEER) database provided information on HCC cases diagnosed between 2000 and 2014 from 16 population-based cancer registries across the U.S. Ambient UV exposure was estimated by linking the SEER county with a spatiotemporal UV exposure model using a geographic information system. Poisson regression with robust variance estimation was used to calculate incidence rate ratios (IRRs) and 95% confidence intervals (CIs) for the association between ambient UV exposure per interquartile range (IQR) increase (32.4 mW/m 2 ) and HCC risk adjusting for age at diagnosis, sex, race, year of diagnosis, SEER registry, and county-level information on prevalence of health conditions, lifestyle, socioeconomic, and environmental factors. Higher levels of ambient UV exposure were associated with statistically significant lower HCC risk (n = 56,245 cases; adjusted IRR per IQR increase: 0.83, 95% CI 0.77, 0.90; p < 0.01). A statistically significant inverse association between ambient UV and HCC risk was observed among males (p for interaction = 0.01) and whites (p for interaction = 0.01). Higher ambient UV exposure was associated with a decreased risk of HCC in the U.S. UV exposure may be a potential modifiable risk factor for HCC that should be explored in future research.
Early estimates of SEER cancer incidence, 2014.
Lewis, Denise Riedel; Chen, Huann-Sheng; Cockburn, Myles G; Wu, Xiao-Cheng; Stroup, Antoinette M; Midthune, Douglas N; Zou, Zhaohui; Krapcho, Martin F; Miller, Daniel G; Feuer, Eric J
2017-07-01
Cancer incidence rates and trends for cases diagnosed through 2014 using data reported to the Surveillance, Epidemiology, and End Results (SEER) program in February 2016 and a validation of rates and trends for cases diagnosed through 2013 and submitted in February 2015 using the November 2015 submission are reported. New cancer sites include the pancreas, kidney and renal pelvis, corpus and uterus, and childhood cancer sites for ages birth to 19 years inclusive. A new reporting delay model is presented for these estimates for more consistent results with the model used for the usual November SEER submissions, adjusting for the large case undercount in the February submission. Joinpoint regression methodology was used to assess trends. Delay-adjusted rates and trends were checked for validity between the February 2016 and November 2016 submissions. Validation revealed that the delay model provides similar estimates of eventual counts using either February or November submission data. Trends declined through 2014 for prostate and colon and rectum cancer for males and females, male and female lung cancer, and cervical cancer. Thyroid cancer and liver and intrahepatic bile duct cancer increased. Pancreas (male and female) and corpus and uterus cancer demonstrated a modest increase. Slight increases occurred for male kidney and renal pelvis, and for all childhood cancer sites for ages birth to 19 years. Evaluating early cancer data submissions, adjusted for reporting delay, produces timely and valid incidence rates and trends. The results of the current study support using delay-adjusted February submission data for valid incidence rate and trend estimates over several data cycles. Cancer 2017;123:2524-34. © 2017 American Cancer Society. © 2017 American Cancer Society. This article has been contributed to by US Government employees and their work is in the public domain in the USA.
Impact of sex on prognostic host factors in surgical patients with lung cancer.
Wainer, Zoe; Wright, Gavin M; Gough, Karla; Daniels, Marissa G; Choong, Peter; Conron, Matthew; Russell, Prudence A; Alam, Naveed Z; Ball, David; Solomon, Benjamin
2017-12-01
Lung cancer has markedly poorer survival in men. Recognized important prognostic factors are divided into host, tumour and environmental factors. Traditional staging systems that use only tumour factors to predict prognosis are of limited accuracy. By examining sex-based patterns of disease-specific survival in non-small cell lung cancer patients, we determined the effect of sex on the prognostic value of additional host factors. Two cohorts of patients treated surgically with curative intent between 2000 and 2009 were utilized. The primary cohort was from Melbourne, Australia, with an independent validation set from the American Surveillance, Epidemiology and End Results (SEER) database. Univariate and multivariate analyses of validated host-related prognostic factors were performed in both cohorts to investigate the differences in survival between men and women. The Melbourne cohort had 605 patients (61% men) and SEER cohort comprised 55 681 patients (51% men). Disease-specific 5-year survival showed men had statistically significant poorer survival in both cohorts (P < 0.001); Melbourne men at 53.2% compared with women at 68.3%, and SEER 53.3% men and 62.0% women were alive at 5 years. Being male was independently prognostic for disease-specific mortality in the Melbourne cohort after adjustment for ethnicity, smoking history, performance status, age, pathological stage and histology (hazard ratio = 1.54, 95% confidence interval: 1.10-2.16, P = 0.012). Sex differences in non-small cell lung cancer are important irrespective of age, ethnicity, smoking, performance status and tumour, node and metastasis stage. Epidemiological findings such as these should be translated into research and clinical paradigms to determine the factors that influence the survival disadvantage experienced by men. © 2016 Royal Australasian College of Surgeons.
Abdel-Rahman, Omar
2018-03-01
Population-based data on the clinical correlates and prognostic value of the pattern of metastases among patients with cutaneous melanoma are needed. Surveillance, Epidemiology and End Results (SEER) database (2010-2013) has been explored through SEER*Stat program. For each of six distant metastatic sites (bone, brain, liver, lung, distant lymph nodes, and skin/subcutaneous), relevant correlation with baseline characteristics were reported. Survival analysis has been conducted through Kaplan-Meier analysis, and multivariate analysis has been conducted through a Cox proportional hazard model. A total of 2691 patients with metastatic cutaneous melanoma were identified in the period from 2010 to 2013. Patients with isolated skin/subcutaneous metastases have the best overall and melanoma-specific survival (MSS) followed by patients with isolated distant lymph node metastases followed by patients with isolated lung metastases. Patients with isolated liver, bone, or brain metastases have the worst overall and MSS (p < .0001 for both end points). Multivariate analysis revealed that age more than 70 at diagnosis (p = .012); multiple sites of metastases (p <.0001), no surgery to the primary tumor (p <.0001), and no surgery to the metastatic disease (p < .0001) were associated with worse overall survival (OS). For MSS, nodal positivity (p = .038), multiple sites of metastases (p < .0001), no surgery to the primary tumor (p < .0001), and no surgery to the metastatic disease (p < .0001) were associated with worse survival. The prognosis of metastatic cutaneous melanoma patients differs considerably according to the site of distant metastases. Further prospective studies are required to evaluate the role of local treatment in the management of metastatic disease.
McCarthy, Ellen P; Ngo, Long H; Chirikos, Thomas N; Roetzheim, Richard G; Li, Donglin; Drews, Reed E; Iezzoni, Lisa I
2007-01-01
Objective To examine stage at diagnosis and survival for disabled Medicare beneficiaries diagnosed with cancer under age 65 and compare their experiences with those of other persons diagnosed under age 65. Data Sources Surveillance, Epidemiology, and End Results (SEER) Program data and SEER-Medicare linked data for 1988–1999. SEER-11 Program includes 11 population-based tumor registries collecting information on all incident cancers in catchment areas. Tumor registry and Medicare data are linked for persons enrolled in Medicare. Study Design 307,595 incident cases of non-small cell lung (51,963), colorectal (52,092), breast (142,281), and prostate (61,259) cancer diagnosed in persons under age 65 from 1988 to 1999. Persons who qualified for Social Security Disability Insurance and had Medicare (SSDI/Medicare) were identified from Medicare enrollment files. Ordinal polychotomous logistic regression and Cox proportional hazards regression were used to estimate adjusted associations between disability status and later-stage diagnoses and mortality (all-cause and cancer-specific). Principal Findings Persons with SSDI/Medicare had lower rates of Stages III/IV diagnoses than others for lung (63.3 versus 69.5 percent) and prostate (25.5 versus 30.8 percent) cancers, but not for breast or colorectal cancers. After adjustment, they remained less likely to be diagnosed at later stages for lung and prostate cancers. Nevertheless, persons with SSDI/Medicare experienced higher all-cause mortality for each cancer. Cancer-specific mortality was higher among persons with SSDI/Medicare for breast and colorectal cancer patients. Conclusions Disabled Medicare beneficiaries are diagnosed with cancer at similar or earlier stages than others. However, they experience higher rates of cancer-related mortality when diagnosed at the same stage of breast and colorectal cancer. PMID:17362209
Mertens, Ann C; Yong, Jian; Dietz, Andrew; Kreiter, Erin; Yasui, Yutaka; Bleyer, Archie; Armstrong, Gregory T; Robison, Leslie L; Wasilewski-Masker, Karen
2015-01-01
Background Long-term survivors of pediatric cancer are at risk for life-threatening late effects of their cancer. Previous studies have shown excesses in long-term mortality within high-risk groups defined by demographic and treatment characteristics. Methods To investigate conditional survival in a pediatric cancer population, we performed an analysis of conditional survival in the original Childhood Cancer Survivor Study (CCSS) cohort and the Surveillance, Epidemiology and End Results (SEER) database registry. The overall probability of death for patients in 5 years and 10 years after they survived 5, 10, 15, and 20 years since cancer diagnosis, and cause-specific death in 10 years for 5-year survivors were estimated using the cumulative incidence method. Results Among CCSS and SEER patients who were alive 5 years post cancer diagnosis, within each diagnosis group at least 92% are alive in the subsequent 5 years, except leukemia patients of whom only 88% of 5-year survivors remain alive in the subsequent 5 years. The probability of all-cause mortality in the next 10 years on patients who survived at least 5 years after diagnosis, was 8.8% in CCSS and 10.6% in SEER, approximately three quarter of which were due to neoplasms as causes of death. Conclusion The risk of death of pediatric cancer survivors in 10 years can vary between diagnosis groups by at most 12% even up to 20 years post diagnosis. This information is clinically important in counseling patients on their conditional survival, particularly when survivors are seen in long-term follow-up. PMID:25557134
Rein, David B; Borton, Joshua; Liffmann, Danielle K; Wittenborn, John S
2016-04-01
The aim of this work was to estimate and describe the Medicare beneficiaries diagnosed with hepatitis C virus (HCV) in 2009, incremental annual costs by disease stage, incremental total Medicare HCV payments in 2009 using the Surveillance, Epidemiology, and End Results (SEER)-Medicare linked data covering the years 2002 to 2009. We weighted the 2009 SEER-Medicare data to create estimates of the number of patients with an HCV diagnosis, used an inverse probability-weighted two-part, probit, and generalized linear model to estimate incremental per patient per month costs, and used simulation to estimate annual 2009 Medicare burden, presented in 2014 dollars. We summarized patient characteristics, diagnoses, and costs from SEER-Medicare files into a person-year panel data set. We estimated there were 407,786 patients with diagnosed HCV in 2009, of whom 61.4% had one or more comorbidities defined by the study. In 2009, 68% of patients were diagnosed with chronic HCV only, 9% with cirrhosis, 12% with decompensated cirrhosis (DCC), 2% with liver cancer, 2% with a history of transplant, and 8% who died. Annual costs for patients with chronic infection only and DCC were higher than the values used in many previous cost-effectiveness studies, and treatment of DCC accounted for 63.9% of total Medicare's HCV expenditures. Medicare paid $2.7 billion (credible interval: $0.7-$4.6 billion) in incremental costs for HCV in 2009. The costs of HCV to Medicare in 2009 were substantial and expected to increase over the next decade. Annual costs for patients with chronic infection only and DCC were higher than values used in many cost-effectiveness analyses. © 2015 by the American Association for the Study of Liver Diseases.
Chen, Jie; Chen, Jinggui; Xu, Yu; Long, Ziwen; Zhou, Ye; Zhu, Huiyan; Wang, Yanong; Shi, Yingqiang
2016-06-01
To investigate the impact of age on the clinicopathological features and survival of patients with gastric cancer (GC), and hope to better define age-specific patterns of GC and possible associated risk factors.Using the surveillance, epidemiology, and end results (SEER) database to search the patients who diagnosed GC between 2007 and 2011 with a known age. The overall and 5-year gastric cancer specific survival (CSS) data were obtained using Kaplan-Meier plots. Multivariable Cox regression models were built for the analysis of long-term survival outcomes and risk factors.A total of 7762 GC patients treated with surgery during the 4-year study period were included in the final study cohort. We divided into five subgroups according to the different age ranges. The overall 5-year cause-specific survival (CSS) was 60.3% in Group 1 (below 45 years), 60.3% in the Group 2 (45-55 years), 61.2% in Group 3 (56-65 years), 59.2% in Group 4 (66-75 years), and 59.2% in Group 5 (older than 76 years). Kaplan-Meier plots showed that patients older than 76 years had the worst 5-year CSS of 56.0% rate in all the subgroups. Age, tumor size, primary site, histological type, and Tumor Node Metastasis stage were identified as significant risk factors for poor survival on univariate analysis (all P < 0.001, log-rank test). Additionally, as the age increased, the risk of death for GC demonstrated a significant increase.In conclusion, our analysis of the SEER database revealed that the prognosis of GC varies with age. Patients at age 56 to 65 group have more favorable clinicopathologic characteristics and better CSS than other groups.
Robustness of Next Generation Sequencing on Older Formalin-Fixed Paraffin-Embedded Tissue
Carrick, Danielle Mercatante; Mehaffey, Michele G.; Sachs, Michael C.; Altekruse, Sean; Camalier, Corinne; Chuaqui, Rodrigo; Cozen, Wendy; Das, Biswajit; Hernandez, Brenda Y.; Lih, Chih-Jian; Lynch, Charles F.; Makhlouf, Hala; McGregor, Paul; McShane, Lisa M.; Phillips Rohan, JoyAnn; Walsh, William D.; Williams, Paul M.; Gillanders, Elizabeth M.; Mechanic, Leah E.; Schully, Sheri D.
2015-01-01
Next Generation Sequencing (NGS) technologies are used to detect somatic mutations in tumors and study germ line variation. Most NGS studies use DNA isolated from whole blood or fresh frozen tissue. However, formalin-fixed paraffin-embedded (FFPE) tissues are one of the most widely available clinical specimens. Their potential utility as a source of DNA for NGS would greatly enhance population-based cancer studies. While preliminary studies suggest FFPE tissue may be used for NGS, the feasibility of using archived FFPE specimens in population based studies and the effect of storage time on these specimens needs to be determined. We conducted a study to determine whether DNA in archived FFPE high-grade ovarian serous adenocarcinomas from Surveillance, Epidemiology and End Results (SEER) registries Residual Tissue Repositories (RTR) was present in sufficient quantity and quality for NGS assays. Fifty-nine FFPE tissues, stored from 3 to 32 years, were obtained from three SEER RTR sites. DNA was extracted, quantified, quality assessed, and subjected to whole exome sequencing (WES). Following DNA extraction, 58 of 59 specimens (98%) yielded DNA and moved on to the library generation step followed by WES. Specimens stored for longer periods of time had significantly lower coverage of the target region (6% lower per 10 years, 95% CI: 3-10%) and lower average read depth (40x lower per 10 years, 95% CI: 18-60), although sufficient quality and quantity of WES data was obtained for data mining. Overall, 90% (53/59) of specimens provided usable NGS data regardless of storage time. This feasibility study demonstrates FFPE specimens acquired from SEER registries after varying lengths of storage time and under varying storage conditions are a promising source of DNA for NGS. PMID:26222067
Performance testing of 3D point cloud software
NASA Astrophysics Data System (ADS)
Varela-González, M.; González-Jorge, H.; Riveiro, B.; Arias, P.
2013-10-01
LiDAR systems are being used widely in recent years for many applications in the engineering field: civil engineering, cultural heritage, mining, industry and environmental engineering. One of the most important limitations of this technology is the large computational requirements involved in data processing, especially for large mobile LiDAR datasets. Several software solutions for data managing are available in the market, including open source suites, however, users often unknown methodologies to verify their performance properly. In this work a methodology for LiDAR software performance testing is presented and four different suites are studied: QT Modeler, VR Mesh, AutoCAD 3D Civil and the Point Cloud Library running in software developed at the University of Vigo (SITEGI). The software based on the Point Cloud Library shows better results in the loading time of the point clouds and CPU usage. However, it is not as strong as commercial suites in working set and commit size tests.
The EPA Control Strategy Tool (CoST) is a software tool for projecting potential future control scenarios, their effects on emissions and estimated costs. This tool uses the NEI and the Control Measures Dataset as key inputs. CoST outputs are projections of future control scenarios.
Anomaly Detection at Multiple Scales (ADAMS)
2011-11-09
must resort to generating their own data that simulates insider attacks. The Schonlau dataset is the most widely used for academic study. It...measurements are estimated by well-known software plagiarism tools . 39 As explained above, there are many different techniques for code trans- formation
Eight software applications are compared for their performance in estimating the octanol-water partition coefficient (Kow), melting point, vapor pressure and water solubility for a dataset of polychlorinated biphenyls, polybrominated diphenyl ethers, polychlorinated dibenzodioxin...
SEER Informational Guidebook Training Aids.
ERIC Educational Resources Information Center
Baylis, Paula
This book includes topics on the surveillance, epidemiology, and end results reporting of human cancer. An anatomy section describes various systems of the human body, emphasizing those sites with high incidence of cancer. A general reference section describes weights and measures, pathology and histology, diagnostic techniques, and medical…
Baxter Community—High Performance Green Building
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2009-02-16
This case study describes the Baxter community built by David Weekley Homes, which is reducing their energy demand through a number of techniques including advanced air sealing techniques, the installation of SEER 14 air conditioners, and Low-e windows in conjunction with conventional framing and insulation.
A Seer of Trump's Coming Parses Repeal and Replace.
Kirkner, Richard Mark
2017-03-01
Diana Furchtgott-Roth, a senior fellow at the Manhattan Institute, a freemarket think tank, confidently predicted back in October what few people saw coming-Donald Trump's electoral victory. Now she gives her take on the dismantling of the ACA and what might come after.
Breaking the computational barriers of pairwise genome comparison.
Torreno, Oscar; Trelles, Oswaldo
2015-08-11
Conventional pairwise sequence comparison software algorithms are being used to process much larger datasets than they were originally designed for. This can result in processing bottlenecks that limit software capabilities or prevent full use of the available hardware resources. Overcoming the barriers that limit the efficient computational analysis of large biological sequence datasets by retrofitting existing algorithms or by creating new applications represents a major challenge for the bioinformatics community. We have developed C libraries for pairwise sequence comparison within diverse architectures, ranging from commodity systems to high performance and cloud computing environments. Exhaustive tests were performed using different datasets of closely- and distantly-related sequences that span from small viral genomes to large mammalian chromosomes. The tests demonstrated that our solution is capable of generating high quality results with a linear-time response and controlled memory consumption, being comparable or faster than the current state-of-the-art methods. We have addressed the problem of pairwise and all-versus-all comparison of large sequences in general, greatly increasing the limits on input data size. The approach described here is based on a modular out-of-core strategy that uses secondary storage to avoid reaching memory limits during the identification of High-scoring Segment Pairs (HSPs) between the sequences under comparison. Software engineering concepts were applied to avoid intermediate result re-calculation, to minimise the performance impact of input/output (I/O) operations and to modularise the process, thus enhancing application flexibility and extendibility. Our computationally-efficient approach allows tasks such as the massive comparison of complete genomes, evolutionary event detection, the identification of conserved synteny blocks and inter-genome distance calculations to be performed more effectively.
Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering
Sun, Peng; Speicher, Nora K.; Röttger, Richard; Guo, Jiong; Baumbach, Jan
2014-01-01
Abstract The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as ‘simultaneous clustering’ or ‘co-clustering’, has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new heuristic: ‘Bi-Force’. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of pairwise similarities. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol in a recent review paper from Eren et al. (2013) (A comparative analysis of biclustering algorithms for gene expressiondata. Brief. Bioinform., 14:279–292.) and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, BiMax, Spectral, xMOTIFs and ISA. To this end, a suite of synthetic datasets as well as nine large gene expression datasets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used datasets are publicly available at http://biclue.mpi-inf.mpg.de. PMID:24682815
Hay, Peter D; Smith, Julie; O'Connor, Richard A
2016-02-01
The aim of this study was to evaluate the benefits to SPECT bone scan image quality when applying resolution recovery (RR) during image reconstruction using software provided by a third-party supplier. Bone SPECT data from 90 clinical studies were reconstructed retrospectively using software supplied independent of the gamma camera manufacturer. The current clinical datasets contain 120×10 s projections and are reconstructed using an iterative method with a Butterworth postfilter. Five further reconstructions were created with the following characteristics: 10 s projections with a Butterworth postfilter (to assess intraobserver variation); 10 s projections with a Gaussian postfilter with and without RR; and 5 s projections with a Gaussian postfilter with and without RR. Two expert observers were asked to rate image quality on a five-point scale relative to our current clinical reconstruction. Datasets were anonymized and presented in random order. The benefits of RR on image scores were evaluated using ordinal logistic regression (visual grading regression). The application of RR during reconstruction increased the probability of both observers of scoring image quality as better than the current clinical reconstruction even where the dataset contained half the normal counts. Type of reconstruction and observer were both statistically significant variables in the ordinal logistic regression model. Visual grading regression was found to be a useful method for validating the local introduction of technological developments in nuclear medicine imaging. RR, as implemented by the independent software supplier, improved bone SPECT image quality when applied during image reconstruction. In the majority of clinical cases, acquisition times for bone SPECT intended for the purposes of localization can safely be halved (from 10 s projections to 5 s) when RR is applied.
de Andrade, Roberto R S; Vaslin, Maite F S
2014-03-07
Next-generation parallel sequencing (NGS) allows the identification of viral pathogens by sequencing the small RNAs of infected hosts. Thus, viral genomes may be assembled from host immune response products without prior virus enrichment, amplification or purification. However, mapping of the vast information obtained presents a bioinformatics challenge. In order to by pass the need of line command and basic bioinformatics knowledge, we develop a mapping software with a graphical interface to the assemblage of viral genomes from small RNA dataset obtained by NGS. SearchSmallRNA was developed in JAVA language version 7 using NetBeans IDE 7.1 software. The program also allows the analysis of the viral small interfering RNAs (vsRNAs) profile; providing an overview of the size distribution and other features of the vsRNAs produced in infected cells. The program performs comparisons between each read sequenced present in a library and a chosen reference genome. Reads showing Hamming distances smaller or equal to an allowed mismatched will be selected as positives and used to the assemblage of a long nucleotide genome sequence. In order to validate the software, distinct analysis using NGS dataset obtained from HIV and two plant viruses were used to reconstruct viral whole genomes. SearchSmallRNA program was able to reconstructed viral genomes using NGS of small RNA dataset with high degree of reliability so it will be a valuable tool for viruses sequencing and discovery. It is accessible and free to all research communities and has the advantage to have an easy-to-use graphical interface. SearchSmallRNA was written in Java and is freely available at http://www.microbiologia.ufrj.br/ssrna/.
2014-01-01
Background Next-generation parallel sequencing (NGS) allows the identification of viral pathogens by sequencing the small RNAs of infected hosts. Thus, viral genomes may be assembled from host immune response products without prior virus enrichment, amplification or purification. However, mapping of the vast information obtained presents a bioinformatics challenge. Methods In order to by pass the need of line command and basic bioinformatics knowledge, we develop a mapping software with a graphical interface to the assemblage of viral genomes from small RNA dataset obtained by NGS. SearchSmallRNA was developed in JAVA language version 7 using NetBeans IDE 7.1 software. The program also allows the analysis of the viral small interfering RNAs (vsRNAs) profile; providing an overview of the size distribution and other features of the vsRNAs produced in infected cells. Results The program performs comparisons between each read sequenced present in a library and a chosen reference genome. Reads showing Hamming distances smaller or equal to an allowed mismatched will be selected as positives and used to the assemblage of a long nucleotide genome sequence. In order to validate the software, distinct analysis using NGS dataset obtained from HIV and two plant viruses were used to reconstruct viral whole genomes. Conclusions SearchSmallRNA program was able to reconstructed viral genomes using NGS of small RNA dataset with high degree of reliability so it will be a valuable tool for viruses sequencing and discovery. It is accessible and free to all research communities and has the advantage to have an easy-to-use graphical interface. Availability and implementation SearchSmallRNA was written in Java and is freely available at http://www.microbiologia.ufrj.br/ssrna/. PMID:24607237
Bayesian correlated clustering to integrate multiple datasets
Kirk, Paul; Griffin, Jim E.; Savage, Richard S.; Ghahramani, Zoubin; Wild, David L.
2012-01-01
Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods. Availability: A Matlab implementation of MDI is available from http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/. Contact: D.L.Wild@warwick.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23047558
Theory of mind in dogs: is the perspective-taking task a good test?
Roberts, William A; Macpherson, Krista
2011-12-01
Udell, Dorey, and Wynne (in press) have reported an experiment in which wolves, shelter dogs, and pet dogs all showed a significant preference for begging from a person who faced them (seer) over a person whose back was turned to them (blind experimenter). On tests with the blind person's eyes covered with a bucket, a book, or a camera, pet dogs showed more preference for the seer than did wolves and shelter dogs. We agree with the authors' position that most of these findings are best explained by preexperimental learning experienced by the subjects. We argue, however, that the perspective-taking task is not a good test of the domestication theory or of the theory of mind in dogs. The problem we see is that use of the perspective-taking task, combined with preexperimental learning in all the subjects, strongly biases the outcome in favor of a behavioral learning interpretation. Tasks less influenced by preexperimental training would provide less confounded tests of domestication and theory of mind.
Metadata tables to enable dynamic data modeling and web interface design: the SEER example.
Weiner, Mark; Sherr, Micah; Cohen, Abigail
2002-04-01
A wealth of information addressing health status, outcomes and resource utilization is compiled and made available by various government agencies. While exploration of the data is possible using existing tools, in general, would-be users of the resources must acquire CD-ROMs or download data from the web, and upload the data into their own database. Where web interfaces exist, they are highly structured, limiting the kinds of queries that can be executed. This work develops a web-based database interface engine whose content and structure is generated through interaction with a metadata table. The result is a dynamically generated web interface that can easily accommodate changes in the underlying data model by altering the metadata table, rather than requiring changes to the interface code. This paper discusses the background and implementation of the metadata table and web-based front end and provides examples of its use with the NCI's Surveillance, Epidemiology and End-Results (SEER) database.
NASA Astrophysics Data System (ADS)
Das Gupta, P.
2016-01-01
The Indo-Aryans of ancient India observed stars and constellations for ascertaining auspicious times in order to conduct sacrificial rites ordained by the Vedas. Naturally, they would have sighted comets and referred to them in the Vedic texts. In Rigveda (circa 1700-1500 BC) and Atharvaveda (circa 1150 BC), there are references to dhumaketus and ketus, which stand for comets in Sanskrit. Rigveda speaks of a fig tree whose aerial roots spread out in the sky (Parpola 2010). Had this imagery been inspired by the resemblance of a comet's tail with long and linear roots of a banyan tree (ficus benghalensis)? Varahamihira (AD 550) and Ballal Sena (circa AD 1100-1200) described a large number of comets recorded by ancient seers, such as Parashara, Vriddha Garga, Narada, and Garga, to name a few. In this article, we propose that an episode in Mahabharata in which a radiant king, Nahusha, who rules the heavens and later turns into a serpent after he kicked the seer Agastya (also the star Canopus), is a mythological retelling of a cometary event.
Ehrlich, Joshua R.; Schwartz, Michael J.; Ng, Casey K.; Kauffman, Eric C.; Scherr, Douglas S.
2009-01-01
Purpose. To date, no study has examined a population-based registry to determine the impact of multiple malignancies on survival of bladder cancer patients. Our experience suggests that bladder cancer patients with multiple malignancies may have relatively positive outcomes. Materials & Methods. We utilized data from the Surveillance Epidemiology and End Results (SEERs) database to examine survival between patients with only bladder cancer (BO) and with bladder cancer and additional cancer(s) antecedent (AB), subsequent (BS), or antecedent and subsequent to bladder cancer (ABS). Results. Analyses demonstrated diminished survival among AB and ABS cohorts. However, when cohorts were substratified by stage, patients in the high-stage BS cohort appeared to have a survival advantage over high-stage BO patients. Conclusions. Bladder cancer patients with multiple malignancies have diminished survival. The survival advantage of high-stage BS patients is likely a statistical phenomenon. Such findings are important to shape future research and to improve our understanding of patients with multiple malignancies. PMID:20069054
The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping.
Ashish, Naveen; Dewan, Peehoo; Toga, Arthur W
2015-01-01
This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model. The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. It employs unsupervised text mining techniques to determine similarity between data elements and also employs machine-learning classifiers to identify element matches. It further provides an active-learning capability where the process of training the GEM system is optimized. Our experimental evaluations show that the GEM system provides highly accurate data mappings (over 90% accuracy) for real datasets of thousands of data elements each, in the Alzheimer's disease research domain. Further, the effort in training the system for new datasets is also optimized. We are currently employing the GEM system to map Alzheimer's disease datasets from around the globe into a common representation, as part of a global Alzheimer's disease integrated data sharing and analysis network called GAAIN. GEM achieves significantly higher data mapping accuracy for biomedical datasets compared to other state-of-the-art tools for database schema matching that have similar functionality. With the use of active-learning capabilities, the user effort in training the system is minimal.
The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping
Ashish, Naveen; Dewan, Peehoo; Toga, Arthur W.
2016-01-01
This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model. The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. It employs unsupervised text mining techniques to determine similarity between data elements and also employs machine-learning classifiers to identify element matches. It further provides an active-learning capability where the process of training the GEM system is optimized. Our experimental evaluations show that the GEM system provides highly accurate data mappings (over 90% accuracy) for real datasets of thousands of data elements each, in the Alzheimer's disease research domain. Further, the effort in training the system for new datasets is also optimized. We are currently employing the GEM system to map Alzheimer's disease datasets from around the globe into a common representation, as part of a global Alzheimer's disease integrated data sharing and analysis network called GAAIN1. GEM achieves significantly higher data mapping accuracy for biomedical datasets compared to other state-of-the-art tools for database schema matching that have similar functionality. With the use of active-learning capabilities, the user effort in training the system is minimal. PMID:26793094
SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases.
Chiba, Hirokazu; Uchiyama, Ikuo
2017-02-08
Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians. Thus, an easy-to-use interface is necessary. We developed SPANG, a SPARQL client that has unique features for querying RDF datasets. SPANG dynamically generates typical SPARQL queries according to specified arguments. It can also call SPARQL template libraries constructed in a local system or published on the Web. Further, it enables combinatorial execution of multiple queries, each with a distinct target database. These features facilitate easy and effective access to RDF datasets and integrative analysis of distributed data. SPANG helps users to exploit RDF datasets by generation and reuse of SPARQL queries through a simple interface. This client will enhance integrative exploitation of biological RDF datasets distributed across the Web. This software package is freely available at http://purl.org/net/spang .
Boubela, Roland N.; Kalcher, Klaudius; Huf, Wolfgang; Našel, Christian; Moser, Ewald
2016-01-01
Technologies for scalable analysis of very large datasets have emerged in the domain of internet computing, but are still rarely used in neuroimaging despite the existence of data and research questions in need of efficient computation tools especially in fMRI. In this work, we present software tools for the application of Apache Spark and Graphics Processing Units (GPUs) to neuroimaging datasets, in particular providing distributed file input for 4D NIfTI fMRI datasets in Scala for use in an Apache Spark environment. Examples for using this Big Data platform in graph analysis of fMRI datasets are shown to illustrate how processing pipelines employing it can be developed. With more tools for the convenient integration of neuroimaging file formats and typical processing steps, big data technologies could find wider endorsement in the community, leading to a range of potentially useful applications especially in view of the current collaborative creation of a wealth of large data repositories including thousands of individual fMRI datasets. PMID:26778951
DAQ: Software Architecture for Data Acquisition in Sounding Rockets
NASA Technical Reports Server (NTRS)
Ahmad, Mohammad; Tran, Thanh; Nichols, Heidi; Bowles-Martinez, Jessica N.
2011-01-01
A multithreaded software application was developed by Jet Propulsion Lab (JPL) to collect a set of correlated imagery, Inertial Measurement Unit (IMU) and GPS data for a Wallops Flight Facility (WFF) sounding rocket flight. The data set will be used to advance Terrain Relative Navigation (TRN) technology algorithms being researched at JPL. This paper describes the software architecture and the tests used to meet the timing and data rate requirements for the software used to collect the dataset. Also discussed are the challenges of using commercial off the shelf (COTS) flight hardware and open source software. This includes multiple Camera Link (C-link) based cameras, a Pentium-M based computer, and Linux Fedora 11 operating system. Additionally, the paper talks about the history of the software architecture's usage in other JPL projects and its applicability for future missions, such as cubesats, UAVs, and research planes/balloons. Also talked about will be the human aspect of project especially JPL's Phaeton program and the results of the launch.
User's Guide for the Agricultural Non-Point Source (AGNPS) Pollution Model Data Generator
Finn, Michael P.; Scheidt, Douglas J.; Jaromack, Gregory M.
2003-01-01
BACKGROUND Throughout this user guide, we refer to datasets that we used in conjunction with developing of this software for supporting cartographic research and producing the datasets to conduct research. However, this software can be used with these datasets or with more 'generic' versions of data of the appropriate type. For example, throughout the guide, we refer to national land cover data (NLCD) and digital elevation model (DEM) data from the U.S. Geological Survey (USGS) at a 30-m resolution, but any digital terrain model or land cover data at any appropriate resolution will produce results. Another key point to keep in mind is to use a consistent data resolution for all the datasets per model run. The U.S. Department of Agriculture (USDA) developed the Agricultural Nonpoint Source (AGNPS) pollution model of watershed hydrology in response to the complex problem of managing nonpoint sources of pollution. AGNPS simulates the behavior of runoff, sediment, and nutrient transport from watersheds that have agriculture as their prime use. The model operates on a cell basis and is a distributed parameter, event-based model. The model requires 22 input parameters. Output parameters are grouped primarily by hydrology, sediment, and chemical output (Young and others, 1995.) Elevation, land cover, and soil are the base data from which to extract the 22 input parameters required by the AGNPS. For automatic parameter extraction, follow the general process described in this guide of extraction from the geospatial data through the AGNPS Data Generator to generate input parameters required by the pollution model (Finn and others, 2002.)
QuIN: A Web Server for Querying and Visualizing Chromatin Interaction Networks.
Thibodeau, Asa; Márquez, Eladio J; Luo, Oscar; Ruan, Yijun; Menghi, Francesca; Shin, Dong-Guk; Stitzel, Michael L; Vera-Licona, Paola; Ucar, Duygu
2016-06-01
Recent studies of the human genome have indicated that regulatory elements (e.g. promoters and enhancers) at distal genomic locations can interact with each other via chromatin folding and affect gene expression levels. Genomic technologies for mapping interactions between DNA regions, e.g., ChIA-PET and HiC, can generate genome-wide maps of interactions between regulatory elements. These interaction datasets are important resources to infer distal gene targets of non-coding regulatory elements and to facilitate prioritization of critical loci for important cellular functions. With the increasing diversity and complexity of genomic information and public ontologies, making sense of these datasets demands integrative and easy-to-use software tools. Moreover, network representation of chromatin interaction maps enables effective data visualization, integration, and mining. Currently, there is no software that can take full advantage of network theory approaches for the analysis of chromatin interaction datasets. To fill this gap, we developed a web-based application, QuIN, which enables: 1) building and visualizing chromatin interaction networks, 2) annotating networks with user-provided private and publicly available functional genomics and interaction datasets, 3) querying network components based on gene name or chromosome location, and 4) utilizing network based measures to identify and prioritize critical regulatory targets and their direct and indirect interactions. QuIN's web server is available at http://quin.jax.org QuIN is developed in Java and JavaScript, utilizing an Apache Tomcat web server and MySQL database and the source code is available under the GPLV3 license available on GitHub: https://github.com/UcarLab/QuIN/.
Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr.
Privé, Florian; Aschard, Hugues; Ziyatdinov, Andrey; Blum, Michael G B
2017-03-30
Genome-wide datasets produced for association studies have dramatically increased in size over the past few years, with modern datasets commonly including millions of variants measured in dozens of thousands of individuals. This increase in data size is a major challenge severely slowing down genomic analyses, leading to some software becoming obsolete and researchers having limited access to diverse analysis tools. Here we present two R packages, bigstatsr and bigsnpr, allowing for the analysis of large scale genomic data to be performed within R. To address large data size, the packages use memory-mapping for accessing data matrices stored on disk instead of in RAM. To perform data pre-processing and data analysis, the packages integrate most of the tools that are commonly used, either through transparent system calls to existing software, or through updated or improved implementation of existing methods. In particular, the packages implement fast and accurate computations of principal component analysis and association studies, functions to remove SNPs in linkage disequilibrium and algorithms to learn polygenic risk scores on millions of SNPs. We illustrate applications of the two R packages by analyzing a case-control genomic dataset for celiac disease, performing an association study and computing Polygenic Risk Scores. Finally, we demonstrate the scalability of the R packages by analyzing a simulated genome-wide dataset including 500,000 individuals and 1 million markers on a single desktop computer. https://privefl.github.io/bigstatsr/ & https://privefl.github.io/bigsnpr/. florian.prive@univ-grenoble-alpes.fr & michael.blum@univ-grenoble-alpes.fr. Supplementary materials are available at Bioinformatics online.
Enhanced Detection of Sea-Disposed Man-Made Objects in Backscatter Data
NASA Astrophysics Data System (ADS)
Edwards, M.; Davis, R. B.
2016-12-01
The Hawai'i Undersea Military Munitions Assessment (HUMMA) project developed software to increase data visualization capabilities applicable to seafloor reflectivity datasets acquired by a variety of bottom-mapping sonar systems. The purpose of these improvements is to detect different intensity values within an arbitrary amplitude range that may be associated with relative target reflectivity as well as extend the overall amplitude range across which detailed dynamic contrast may be effectively displayed. The backscatter dataset used to develop this software imaged tens of thousands of reflective targets resting on the seabed that were systematically sea disposed south of Oahu, Hawaii, around the end of World War II in waters ranging from 300-600 meters depth. Human-occupied and remotely operated vehicles conducted ground-truth video and photographic reconnaissance of thousands of these reflective targets, documenting and geo-referencing long curvilinear trials of items including munitions, paint cans, airplane parts, scuttled ships, cars and bundled anti-submarine nets. Edwards et al. [2012] determined that most individual trails consist of objects of one particular type. The software described in this presentation, in combination with the ground-truth images, was developed to help recognize different types of objects based on reflectivity, size, and shape from altitudes of tens of meters above the seabed. The fundamental goal of the software is to facilitate rapid underway detection and geo-location of specific sea-disposed objects so their impact on the environment can be assessed.
Calibration of radio-astronomical data on the cloud. LOFAR, the pathway to SKA
NASA Astrophysics Data System (ADS)
Sabater, J.; Sánchez-Expósito, S.; Garrido, J.; Ruiz, J. E.; Best, P. N.; Verdes-Montenegro, L.
2015-05-01
The radio interferometer LOFAR (LOw Frequency ARray) is fully operational now. This Square Kilometre Array (SKA) pathfinder allows the observation of the sky at frequencies between 10 and 240 MHz, a relatively unexplored region of the spectrum. LOFAR is a software defined telescope: the data is mainly processed using specialized software running in common computing facilities. That means that the capabilities of the telescope are virtually defined by software and mainly limited by the available computing power. However, the quantity of data produced can quickly reach huge volumes (several Petabytes per day). After the correlation and pre-processing of the data in a dedicated cluster, the final dataset is handled to the user (typically several Terabytes). The calibration of these data requires a powerful computing facility in which the specific state of the art software under heavy continuous development can be easily installed and updated. That makes this case a perfect candidate for a cloud infrastructure which adds the advantages of an on demand, flexible solution. We present our approach to the calibration of LOFAR data using Ibercloud, the cloud infrastructure provided by Ibergrid. With the calibration work-flow adapted to the cloud, we can explore calibration strategies for the SKA and show how private or commercial cloud infrastructures (Ibercloud, Amazon EC2, Google Compute Engine, etc.) can help to solve the problems with big datasets that will be prevalent in the future of astronomy.
Evaluating the Quantitative Capabilities of Metagenomic Analysis Software.
Kerepesi, Csaba; Grolmusz, Vince
2016-05-01
DNA sequencing technologies are applied widely and frequently today to describe metagenomes, i.e., microbial communities in environmental or clinical samples, without the need for culturing them. These technologies usually return short (100-300 base-pairs long) DNA reads, and these reads are processed by metagenomic analysis software that assign phylogenetic composition-information to the dataset. Here we evaluate three metagenomic analysis software (AmphoraNet--a webserver implementation of AMPHORA2--, MG-RAST, and MEGAN5) for their capabilities of assigning quantitative phylogenetic information for the data, describing the frequency of appearance of the microorganisms of the same taxa in the sample. The difficulties of the task arise from the fact that longer genomes produce more reads from the same organism than shorter genomes, and some software assign higher frequencies to species with longer genomes than to those with shorter ones. This phenomenon is called the "genome length bias." Dozens of complex artificial metagenome benchmarks can be found in the literature. Because of the complexity of those benchmarks, it is usually difficult to judge the resistance of a metagenomic software to this "genome length bias." Therefore, we have made a simple benchmark for the evaluation of the "taxon-counting" in a metagenomic sample: we have taken the same number of copies of three full bacterial genomes of different lengths, break them up randomly to short reads of average length of 150 bp, and mixed the reads, creating our simple benchmark. Because of its simplicity, the benchmark is not supposed to serve as a mock metagenome, but if a software fails on that simple task, it will surely fail on most real metagenomes. We applied three software for the benchmark. The ideal quantitative solution would assign the same proportion to the three bacterial taxa. We have found that AMPHORA2/AmphoraNet gave the most accurate results and the other two software were under-performers: they counted quite reliably each short read to their respective taxon, producing the typical genome length bias. The benchmark dataset is available at http://pitgroup.org/static/3RandomGenome-100kavg150bps.fna.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van Benthem, Mark H.
2016-05-04
This software is employed for 3D visualization of X-ray diffraction (XRD) data with functionality for slicing, reorienting, isolating and plotting of 2D color contour maps and 3D renderings of large datasets. The program makes use of the multidimensionality of textured XRD data where diffracted intensity is not constant over a given set of angular positions (as dictated by the three defined dimensional angles of phi, chi, and two-theta). Datasets are rendered in 3D with intensity as a scaler which is represented as a rainbow color scale. A GUI interface and scrolling tools along with interactive function via the mouse allowmore » for fast manipulation of these large datasets so as to perform detailed analysis of diffraction results with full dimensionality of the diffraction space.« less
AnthropMMD: An R package with a graphical user interface for the mean measure of divergence.
Santos, Frédéric
2018-01-01
The mean measure of divergence is a dissimilarity measure between groups of individuals described by dichotomous variables. It is well suited to datasets with many missing values, and it is generally used to compute distance matrices and represent phenograms. Although often used in biological anthropology and archaeozoology, this method suffers from a lack of implementation in common statistical software. A package for the R statistical software, AnthropMMD, is presented here. Offering a dynamic graphical user interface, it is the first one dedicated to Smith's mean measure of divergence. The package also provides facilities for graphical representations and the crucial step of trait selection, so that the entire analysis can be performed through the graphical user interface. Its use is demonstrated using an artificial dataset, and the impact of trait selection is discussed. Finally, AnthropMMD is compared to three other free tools available for calculating the mean measure of divergence, and is proven to be consistent with them. © 2017 Wiley Periodicals, Inc.
Automated Fault Interpretation and Extraction using Improved Supplementary Seismic Datasets
NASA Astrophysics Data System (ADS)
Bollmann, T. A.; Shank, R.
2017-12-01
During the interpretation of seismic volumes, it is necessary to interpret faults along with horizons of interest. With the improvement of technology, the interpretation of faults can be expedited with the aid of different algorithms that create supplementary seismic attributes, such as semblance and coherency. These products highlight discontinuities, but still need a large amount of human interaction to interpret faults and are plagued by noise and stratigraphic discontinuities. Hale (2013) presents a method to improve on these datasets by creating what is referred to as a Fault Likelihood volume. In general, these volumes contain less noise and do not emphasize stratigraphic features. Instead, planar features within a specified strike and dip range are highlighted. Once a satisfactory Fault Likelihood Volume is created, extraction of fault surfaces is much easier. The extracted fault surfaces are then exported to interpretation software for QC. Numerous software packages have implemented this methodology with varying results. After investigating these platforms, we developed a preferred Automated Fault Interpretation workflow.
A web server for mining Comparative Genomic Hybridization (CGH) data
NASA Astrophysics Data System (ADS)
Liu, Jun; Ranka, Sanjay; Kahveci, Tamer
2007-11-01
Advances in cytogenetics and molecular biology has established that chromosomal alterations are critical in the pathogenesis of human cancer. Recurrent chromosomal alterations provide cytological and molecular markers for the diagnosis and prognosis of disease. They also facilitate the identification of genes that are important in carcinogenesis, which in the future may help in the development of targeted therapy. A large amount of publicly available cancer genetic data is now available and it is growing. There is a need for public domain tools that allow users to analyze their data and visualize the results. This chapter describes a web based software tool that will allow researchers to analyze and visualize Comparative Genomic Hybridization (CGH) datasets. It employs novel data mining methodologies for clustering and classification of CGH datasets as well as algorithms for identifying important markers (small set of genomic intervals with aberrations) that are potentially cancer signatures. The developed software will help in understanding the relationships between genomic aberrations and cancer types.
AstroBlend: An astrophysical visualization package for Blender
NASA Astrophysics Data System (ADS)
Naiman, J. P.
2016-04-01
The rapid growth in scale and complexity of both computational and observational astrophysics over the past decade necessitates efficient and intuitive methods for examining and visualizing large datasets. Here, I present AstroBlend, an open-source Python library for use within the three dimensional modeling software, Blender. While Blender has been a popular open-source software among animators and visual effects artists, in recent years it has also become a tool for visualizing astrophysical datasets. AstroBlend combines the three dimensional capabilities of Blender with the analysis tools of the widely used astrophysical toolset, yt, to afford both computational and observational astrophysicists the ability to simultaneously analyze their data and create informative and appealing visualizations. The introduction of this package includes a description of features, work flow, and various example visualizations. A website - www.astroblend.com - has been developed which includes tutorials, and a gallery of example images and movies, along with links to downloadable data, three dimensional artistic models, and various other resources.
Liu, Yijin; Meirer, Florian; Williams, Phillip A.; Wang, Junyue; Andrews, Joy C.; Pianetta, Piero
2012-01-01
Transmission X-ray microscopy (TXM) has been well recognized as a powerful tool for non-destructive investigation of the three-dimensional inner structure of a sample with spatial resolution down to a few tens of nanometers, especially when combined with synchrotron radiation sources. Recent developments of this technique have presented a need for new tools for both system control and data analysis. Here a software package developed in MATLAB for script command generation and analysis of TXM data is presented. The first toolkit, the script generator, allows automating complex experimental tasks which involve up to several thousand motor movements. The second package was designed to accomplish computationally intense tasks such as data processing of mosaic and mosaic tomography datasets; dual-energy contrast imaging, where data are recorded above and below a specific X-ray absorption edge; and TXM X-ray absorption near-edge structure imaging datasets. Furthermore, analytical and iterative tomography reconstruction algorithms were implemented. The compiled software package is freely available. PMID:22338691
Zhang, Z; Guillaume, F; Sartelet, A; Charlier, C; Georges, M; Farnir, F; Druet, T
2012-10-01
In many situations, genome-wide association studies are performed in populations presenting stratification. Mixed models including a kinship matrix accounting for genetic relatedness among individuals have been shown to correct for population and/or family structure. Here we extend this methodology to generalized linear mixed models which properly model data under various distributions. In addition we perform association with ancestral haplotypes inferred using a hidden Markov model. The method was shown to properly account for stratification under various simulated scenari presenting population and/or family structure. Use of ancestral haplotypes resulted in higher power than SNPs on simulated datasets. Application to real data demonstrates the usefulness of the developed model. Full analysis of a dataset with 4600 individuals and 500 000 SNPs was performed in 2 h 36 min and required 2.28 Gb of RAM. The software GLASCOW can be freely downloaded from www.giga.ulg.ac.be/jcms/prod_381171/software. francois.guillaume@jouy.inra.fr Supplementary data are available at Bioinformatics online.
Tian, Jing; Varga, Boglarka; Tatrai, Erika; Fanni, Palya; Somfai, Gabor Mark; Smiddy, William E.
2016-01-01
Over the past two decades a significant number of OCT segmentation approaches have been proposed in the literature. Each methodology has been conceived for and/or evaluated using specific datasets that do not reflect the complexities of the majority of widely available retinal features observed in clinical settings. In addition, there does not exist an appropriate OCT dataset with ground truth that reflects the realities of everyday retinal features observed in clinical settings. While the need for unbiased performance evaluation of automated segmentation algorithms is obvious, the validation process of segmentation algorithms have been usually performed by comparing with manual labelings from each study and there has been a lack of common ground truth. Therefore, a performance comparison of different algorithms using the same ground truth has never been performed. This paper reviews research-oriented tools for automated segmentation of the retinal tissue on OCT images. It also evaluates and compares the performance of these software tools with a common ground truth. PMID:27159849
IMMAN: free software for information theory-based chemometric analysis.
Urias, Ricardo W Pino; Barigye, Stephen J; Marrero-Ponce, Yovani; García-Jacas, César R; Valdes-Martiní, José R; Perez-Gimenez, Facundo
2015-05-01
The features and theoretical background of a new and free computational program for chemometric analysis denominated IMMAN (acronym for Information theory-based CheMoMetrics ANalysis) are presented. This is multi-platform software developed in the Java programming language, designed with a remarkably user-friendly graphical interface for the computation of a collection of information-theoretic functions adapted for rank-based unsupervised and supervised feature selection tasks. A total of 20 feature selection parameters are presented, with the unsupervised and supervised frameworks represented by 10 approaches in each case. Several information-theoretic parameters traditionally used as molecular descriptors (MDs) are adapted for use as unsupervised rank-based feature selection methods. On the other hand, a generalization scheme for the previously defined differential Shannon's entropy is discussed, as well as the introduction of Jeffreys information measure for supervised feature selection. Moreover, well-known information-theoretic feature selection parameters, such as information gain, gain ratio, and symmetrical uncertainty are incorporated to the IMMAN software ( http://mobiosd-hub.com/imman-soft/ ), following an equal-interval discretization approach. IMMAN offers data pre-processing functionalities, such as missing values processing, dataset partitioning, and browsing. Moreover, single parameter or ensemble (multi-criteria) ranking options are provided. Consequently, this software is suitable for tasks like dimensionality reduction, feature ranking, as well as comparative diversity analysis of data matrices. Simple examples of applications performed with this program are presented. A comparative study between IMMAN and WEKA feature selection tools using the Arcene dataset was performed, demonstrating similar behavior. In addition, it is revealed that the use of IMMAN unsupervised feature selection methods improves the performance of both IMMAN and WEKA supervised algorithms. Graphic representation for Shannon's distribution of MD calculating software.
Römpp, Andreas; Schramm, Thorsten; Hester, Alfons; Klinkert, Ivo; Both, Jean-Pierre; Heeren, Ron M A; Stöckli, Markus; Spengler, Bernhard
2011-01-01
Imaging mass spectrometry is the method of scanning a sample of interest and generating an "image" of the intensity distribution of a specific analyte. The data sets consist of a large number of mass spectra which are usually acquired with identical settings. Existing data formats are not sufficient to describe an MS imaging experiment completely. The data format imzML was developed to allow the flexible and efficient exchange of MS imaging data between different instruments and data analysis software.For this purpose, the MS imaging data is divided in two separate files. The mass spectral data is stored in a binary file to ensure efficient storage. All metadata (e.g., instrumental parameters, sample details) are stored in an XML file which is based on the standard data format mzML developed by HUPO-PSI. The original mzML controlled vocabulary was extended to include specific parameters of imaging mass spectrometry (such as x/y position and spatial resolution). The two files (XML and binary) are connected by offset values in the XML file and are unambiguously linked by a universally unique identifier. The resulting datasets are comparable in size to the raw data and the separate metadata file allows flexible handling of large datasets.Several imaging MS software tools already support imzML. This allows choosing from a (growing) number of processing tools. One is no longer limited to proprietary software, but is able to use the processing software which is best suited for a specific question or application. On the other hand, measurements from different instruments can be compared within one software application using identical settings for data processing. All necessary information for evaluating and implementing imzML can be found at http://www.imzML.org .
Seer 2008 Session III Discussant Remarks
ERIC Educational Resources Information Center
Medina, Jacquie
2009-01-01
Three research abstracts dealt with program outcomes and the factors that affect them. Morgan (2008) dealt with the potential influence of sensation-seeking personality traits on perceived risk and perceived competence in adventure experiences. Two abstracts by Bobilya, Akey, and Mitchell, Jr. (2008) and Austin, Martin, Mittelstaedt, Schanning,…
Planform: an application and database of graph-encoded planarian regenerative experiments.
Lobo, Daniel; Malone, Taylor J; Levin, Michael
2013-04-15
Understanding the mechanisms governing the regeneration capabilities of many organisms is a fundamental interest in biology and medicine. An ever-increasing number of manipulation and molecular experiments are attempting to discover a comprehensive model for regeneration, with the planarian flatworm being one of the most important model species. Despite much effort, no comprehensive, constructive, mechanistic models exist yet, and it is now clear that computational tools are needed to mine this huge dataset. However, until now, there is no database of regenerative experiments, and the current genotype-phenotype ontologies and databases are based on textual descriptions, which are not understandable by computers. To overcome these difficulties, we present here Planform (Planarian formalization), a manually curated database and software tool for planarian regenerative experiments, based on a mathematical graph formalism. The database contains more than a thousand experiments from the main publications in the planarian literature. The software tool provides the user with a graphical interface to easily interact with and mine the database. The presented system is a valuable resource for the regeneration community and, more importantly, will pave the way for the application of novel artificial intelligence tools to extract knowledge from this dataset. The database and software tool are freely available at http://planform.daniel-lobo.com.
GUIDEseq: a bioconductor package to analyze GUIDE-Seq datasets for CRISPR-Cas nucleases.
Zhu, Lihua Julie; Lawrence, Michael; Gupta, Ankit; Pagès, Hervé; Kucukural, Alper; Garber, Manuel; Wolfe, Scot A
2017-05-15
Genome editing technologies developed around the CRISPR-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions. These nucleases also hold tremendous promise for treating a variety of genetic disorders. In the context of their therapeutic application, it is important to identify the spectrum of genomic sequences that are cleaved by a candidate nuclease when programmed with a particular guide RNA, as well as the cleavage efficiency of these sites. Powerful new experimental approaches, such as GUIDE-seq, facilitate the sensitive, unbiased genome-wide detection of nuclease cleavage sites within the genome. Flexible bioinformatics analysis tools for processing GUIDE-seq data are needed. Here, we describe an open source, open development software suite, GUIDEseq, for GUIDE-seq data analysis and annotation as a Bioconductor package in R. The GUIDEseq package provides a flexible platform with more than 60 adjustable parameters for the analysis of datasets associated with custom nuclease applications. These parameters allow data analysis to be tailored to different nuclease platforms with different length and complexity in their guide and PAM recognition sequences or their DNA cleavage position. They also enable users to customize sequence aggregation criteria, and vary peak calling thresholds that can influence the number of potential off-target sites recovered. GUIDEseq also annotates potential off-target sites that overlap with genes based on genome annotation information, as these may be the most important off-target sites for further characterization. In addition, GUIDEseq enables the comparison and visualization of off-target site overlap between different datasets for a rapid comparison of different nuclease configurations or experimental conditions. For each identified off-target, the GUIDEseq package outputs mapped GUIDE-Seq read count as well as cleavage score from a user specified off-target cleavage score prediction algorithm permitting the identification of genomic sequences with unexpected cleavage activity. The GUIDEseq package enables analysis of GUIDE-data from various nuclease platforms for any species with a defined genomic sequence. This software package has been used successfully to analyze several GUIDE-seq datasets. The software, source code and documentation are freely available at http://www.bioconductor.org/packages/release/bioc/html/GUIDEseq.html .
Ramus, Claire; Hovasse, Agnès; Marcellin, Marlène; Hesse, Anne-Marie; Mouton-Barbosa, Emmanuelle; Bouyssié, David; Vaca, Sebastian; Carapito, Christine; Chaoui, Karima; Bruley, Christophe; Garin, Jérôme; Cianférani, Sarah; Ferro, Myriam; Van Dorssaeler, Alain; Burlet-Schiltz, Odile; Schaeffer, Christine; Couté, Yohann; Gonzalez de Peredo, Anne
2016-01-30
Proteomic workflows based on nanoLC-MS/MS data-dependent-acquisition analysis have progressed tremendously in recent years. High-resolution and fast sequencing instruments have enabled the use of label-free quantitative methods, based either on spectral counting or on MS signal analysis, which appear as an attractive way to analyze differential protein expression in complex biological samples. However, the computational processing of the data for label-free quantification still remains a challenge. Here, we used a proteomic standard composed of an equimolar mixture of 48 human proteins (Sigma UPS1) spiked at different concentrations into a background of yeast cell lysate to benchmark several label-free quantitative workflows, involving different software packages developed in recent years. This experimental design allowed to finely assess their performances in terms of sensitivity and false discovery rate, by measuring the number of true and false-positive (respectively UPS1 or yeast background proteins found as differential). The spiked standard dataset has been deposited to the ProteomeXchange repository with the identifier PXD001819 and can be used to benchmark other label-free workflows, adjust software parameter settings, improve algorithms for extraction of the quantitative metrics from raw MS data, or evaluate downstream statistical methods. Bioinformatic pipelines for label-free quantitative analysis must be objectively evaluated in their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. This can be done through the use of complex spiked samples, for which the "ground truth" of variant proteins is known, allowing a statistical evaluation of the performances of the data processing workflow. We provide here such a controlled standard dataset and used it to evaluate the performances of several label-free bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, for detection of variant proteins with different absolute expression levels and fold change values. The dataset presented here can be useful for tuning software tool parameters, and also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. Copyright © 2015 Elsevier B.V. All rights reserved.
Crowell, Kevin L; Slysz, Gordon W; Baker, Erin S; LaMarche, Brian L; Monroe, Matthew E; Ibrahim, Yehia M; Payne, Samuel H; Anderson, Gordon A; Smith, Richard D
2013-11-01
The addition of ion mobility spectrometry to liquid chromatography-mass spectrometry experiments requires new, or updated, software tools to facilitate data processing. We introduce a command line software application LC-IMS-MS Feature Finder that searches for molecular ion signatures in multidimensional liquid chromatography-ion mobility spectrometry-mass spectrometry (LC-IMS-MS) data by clustering deisotoped peaks with similar monoisotopic mass, charge state, LC elution time and ion mobility drift time values. The software application includes an algorithm for detecting and quantifying co-eluting chemical species, including species that exist in multiple conformations that may have been separated in the IMS dimension. LC-IMS-MS Feature Finder is available as a command-line tool for download at http://omics.pnl.gov/software/LC-IMS-MS_Feature_Finder.php. The Microsoft.NET Framework 4.0 is required to run the software. All other dependencies are included with the software package. Usage of this software is limited to non-profit research to use (see README). rds@pnnl.gov. Supplementary data are available at Bioinformatics online.
Generating DEM from LIDAR data - comparison of available software tools
NASA Astrophysics Data System (ADS)
Korzeniowska, K.; Lacka, M.
2011-12-01
In recent years many software tools and applications have appeared that offer procedures, scripts and algorithms to process and visualize ALS data. This variety of software tools and of "point cloud" processing methods contributed to the aim of this study: to assess algorithms available in various software tools that are used to classify LIDAR "point cloud" data, through a careful examination of Digital Elevation Models (DEMs) generated from LIDAR data on a base of these algorithms. The works focused on the most important available software tools: both commercial and open source ones. Two sites in a mountain area were selected for the study. The area of each site is 0.645 sq km. DEMs generated with analysed software tools ware compared with a reference dataset, generated using manual methods to eliminate non ground points. Surfaces were analysed using raster analysis. Minimum, maximum and mean differences between reference DEM and DEMs generated with analysed software tools were calculated, together with Root Mean Square Error. Differences between DEMs were also examined visually using transects along the grid axes in the test sites.
NASA Astrophysics Data System (ADS)
Seers, T. D.; Hodgetts, D.
2013-12-01
Seers, T. D. & Hodgetts, D. School of Earth, Atmospheric and Environmental Sciences, University of Manchester, UK. M13 9PL. The detection of topological change at the Earth's surface is of considerable scholarly interest, allowing the quantification of the rates of geomorphic processes whilst providing lucid insights into the underlying mechanisms driving landscape evolution. In this regard, the past decade has witnessed the ever increasing proliferation of studies employing multi-temporal topographic data in within the geosciences, bolstered by continuing technical advancements in the acquisition and processing of prerequisite datasets. Provided by workers within the field of Computer Vision, multiview stereo (MVS) dense surface reconstructions, primed by structure-from-motion (SfM) based camera pose estimation represents one such development. Providing a cost effective, operationally efficient data capture medium, the modest requirement of a consumer grade camera for data collection coupled with the minimal user intervention required during post-processing makes SfM-MVS an attractive alternative to terrestrial laser scanners for collecting multi-temporal topographic datasets. However, in similitude to terrestrial scanner derived data, the co-registration of spatially coincident or partially overlapping scans produced by SfM-MVS presents a major technical challenge, particularly in the case of semi non-rigid scenes produced during topographic change detection studies. Moreover, the arbitrary scaling resulting from SfM ambiguity requires that a scale matrix must be estimated during the transformation, introducing further complexity into its formulation. Here, we present a novel, fully unsupervised algorithm which utilises non-linearly weighted image features for the solving the similarity transform (scale, translation rotation) between partially overlapping scans produced by SfM-MVS image processing. With the only initialization condition being partial intersection between input image sets, our method has major advantages over conventional iterative least squares minimization based methods (e.g. Iterative Closest Point variants), acting only on rigid areas of target scenes, being capable of reliably estimating the scaling factor and requiring no incipient estimation of the transformation to initialize (i.e. manual rough alignment). Moreover, because the solution is closed form, convergence is considerably more expedient that most iterative methods. It is hoped that the availability of improved co-registration routines, such as the one presented here, will facilitate the routine collection of multi-temporal topographic datasets by a wider range of geoscience practitioners.
Bolch, Charlotte A; Chu, Haitao; Jarosek, Stephanie; Cole, Stephen R; Elliott, Sean; Virnig, Beth
2017-07-10
To illustrate the 10-year risks of urinary adverse events (UAEs) among men diagnosed with prostate cancer and treated with different types of therapy, accounting for the competing risk of death. Prostate cancer is the second most common malignancy among adult males in the United States. Few studies have reported the long-term post-treatment risk of UAEs and those that have, have not appropriately accounted for competing deaths. This paper conducts an inverse probability of treatment (IPT) weighted competing risks analysis to estimate the effects of different prostate cancer treatments on the risk of UAE, using a matched-cohort of prostate cancer/non-cancer control patients from the Surveillance, Epidemiology and End Results (SEER) Medicare database. Study dataset included men age 66 years or older that are 83% white and had a median follow-up time of 4.14 years. Patients that underwent combination radical prostatectomy and external beam radiotherapy experienced the highest risk of UAE (IPT-weighted competing risks: HR 3.65 with 95% CI (3.28, 4.07); 10-yr. cumulative incidence = 36.5%). Findings suggest that IPT-weighted competing risks analysis provides an accurate estimator of the cumulative incidence of UAE taking into account the competing deaths as well as measured confounding bias.
Calypso: a user-friendly web-server for mining and visualizing microbiome-environment interactions.
Zakrzewski, Martha; Proietti, Carla; Ellis, Jonathan J; Hasan, Shihab; Brion, Marie-Jo; Berger, Bernard; Krause, Lutz
2017-03-01
Calypso is an easy-to-use online software suite that allows non-expert users to mine, interpret and compare taxonomic information from metagenomic or 16S rDNA datasets. Calypso has a focus on multivariate statistical approaches that can identify complex environment-microbiome associations. The software enables quantitative visualizations, statistical testing, multivariate analysis, supervised learning, factor analysis, multivariable regression, network analysis and diversity estimates. Comprehensive help pages, tutorials and videos are provided via a wiki page. The web-interface is accessible via http://cgenome.net/calypso/ . The software is programmed in Java, PERL and R and the source code is available from Zenodo ( https://zenodo.org/record/50931 ). The software is freely available for non-commercial users. l.krause@uq.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Combined use of semantics and metadata to manage Research Data Life Cycle in Environmental Sciences
NASA Astrophysics Data System (ADS)
Aguilar Gómez, Fernando; de Lucas, Jesús Marco; Pertinez, Esther; Palacio, Aida
2017-04-01
The use of metadata to contextualize datasets is quite extended in Earth System Sciences. There are some initiatives and available tools to help data managers to choose the best metadata standard that fit their use cases, like the DCC Metadata Directory (http://www.dcc.ac.uk/resources/metadata-standards). In our use case, we have been gathering physical, chemical and biological data from a water reservoir since 2010. A well metadata definition is crucial not only to contextualize our own data but also to integrate datasets from other sources like satellites or meteorological agencies. That is why we have chosen EML (Ecological Metadata Language), which integrates many different elements to define a dataset, including the project context, instrumentation and parameters definition, and the software used to process, provide quality controls and include the publication details. Those metadata elements can contribute to help both human and machines to understand and process the dataset. However, the use of metadata is not enough to fully support the data life cycle, from the Data Management Plan definition to the Publication and Re-use. To do so, we need to define not only metadata and attributes but also the relationships between them, so semantics are needed. Ontologies, being a knowledge representation, can contribute to define the elements of a research data life cycle, including DMP, datasets, software, etc. They also can define how the different elements are related between them and how they interact. The first advantage of developing an ontology of a knowledge domain is that they provide a common vocabulary hierarchy (i.e. a conceptual schema) that can be used and standardized by all the agents interested in the domain (either humans or machines). This way of using ontologies is one of the basis of the Semantic Web, where ontologies are set to play a key role in establishing a common terminology between agents. To develop an ontology we are using a graphical tool Protégé, which is a graphical ontology-development tool that supports a rich knowledge model and it is open-source and freely available. To process and manage the ontology, we are using Semantic MediaWiki, which is able to process queries. Semantic MediaWiki is an extension of MediaWiki where we can do semantic search and export data in RDF. Our final goal is integrating our data repository portal and semantic processing engine in order to have a complete system to manage the data life cycle stages and their relationships, including machine-actionable DMP solution, datasets and software management, computing resources for processing and analysis and publication features (DOI mint). This way we will be able to reproduce the full data life cycle chain warranting the FAIR+R principles.
the System Modeling & Geospatial Data Science Group in the Strategic Energy Analysis Center. Areas Publications Oliveira, R and Moreno, R. 2016. Harvesting, Integrating and Distributing Large Open Geospatial Datasets Using Free and Open-Source Software. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLI-B7
NASA Astrophysics Data System (ADS)
Liu, Z.; Acker, J. G.; Kempler, S. J.
2016-12-01
The NASA Goddard Earth Sciences (GES) Data and Information Services Center (DISC) is one of twelve NASA Science Mission Directorate (SMD) Data Centers that provide Earth science data, information, and services to research scientists, applications scientists, applications users, and students around the world. The GES DISC is the home (archive) of NASA Precipitation and Hydrology, as well as Atmospheric Composition and Dynamics remote sensing data and information. To facilitate Earth science data access, the GES DISC has been developing user-friendly data services for users at different levels. Among them, the Geospatial Interactive Online Visualization ANd aNalysis Infrastructure (GIOVANNI, http://giovanni.gsfc.nasa.gov/) allows users to explore satellite-based data using sophisticated analyses and visualizations without downloading data and software, which is particularly suitable for novices to use NASA datasets in STEM activities. In this presentation, we will briefly introduce GIOVANNI and recommend datasets for STEM. Examples of using these datasets in STEM activities will be presented as well.
NASA Technical Reports Server (NTRS)
Liu, Z.; Acker, J.; Kempler, S.
2016-01-01
The NASA Goddard Earth Sciences (GES) Data and Information Services Center(DISC) is one of twelve NASA Science Mission Directorate (SMD) Data Centers that provide Earth science data, information, and services to users around the world including research and application scientists, students, citizen scientists, etc. The GESDISC is the home (archive) of remote sensing datasets for NASA Precipitation and Hydrology, Atmospheric Composition and Dynamics, etc. To facilitate Earth science data access, the GES DISC has been developing user-friendly data services for users at different levels in different countries. Among them, the Geospatial Interactive Online Visualization ANd aNalysis Infrastructure (Giovanni, http:giovanni.gsfc.nasa.gov) allows users to explore satellite-based datasets using sophisticated analyses and visualization without downloading data and software, which is particularly suitable for novices (such as students) to use NASA datasets in STEM (science, technology, engineering and mathematics) activities. In this presentation, we will briefly introduce Giovanni along with examples for STEM activities.
A program for handling map projections of small-scale geospatial raster data
Finn, Michael P.; Steinwand, Daniel R.; Trent, Jason R.; Buehler, Robert A.; Mattli, David M.; Yamamoto, Kristina H.
2012-01-01
Scientists routinely accomplish small-scale geospatial modeling using raster datasets of global extent. Such use often requires the projection of global raster datasets onto a map or the reprojection from a given map projection associated with a dataset. The distortion characteristics of these projection transformations can have significant effects on modeling results. Distortions associated with the reprojection of global data are generally greater than distortions associated with reprojections of larger-scale, localized areas. The accuracy of areas in projected raster datasets of global extent is dependent on spatial resolution. To address these problems of projection and the associated resampling that accompanies it, methods for framing the transformation space, direct point-to-point transformations rather than gridded transformation spaces, a solution to the wrap-around problem, and an approach to alternative resampling methods are presented. The implementations of these methods are provided in an open-source software package called MapImage (or mapIMG, for short), which is designed to function on a variety of computer architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Warrant, Marilyn M.; Garcia, Rudy J.; Zhang, Pengchu
2004-09-15
Tomcat-Projects_RF is a software package for analyzing sensor data obtained from a database and displaying the results with Java Servlet Pages (JSP). SQL Views into the dataset are tailored for personnel having different roles in monitoring the items in a storage facility. For example, an inspector, a host treaty compliance officer, a system engineer and software developers were the users identified that would need to access data at different levels of detail, The analysis provides a high level status of the storage facility and allows the user to go deeper into the data details if the user desires.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crowell, Kevin L.; Slysz, Gordon W.; Baker, Erin Shammel
2013-09-05
We introduce a command line software application LC-IMS-MS Feature Finder that searches for molecular ion signatures in multidimensional liquid chromatography-ion mobility spectrometry-mass spectrometry (LC-IMS-MS) data by clustering deisotoped peaks with similar monoisotopic mass, charge state, LC elution time, and ion mobility drift time values. The software application includes an algorithm for detecting and quantifying co-eluting chemical species, including species that exist in multiple conformations that may have been separated in the IMS dimension.
NASA Astrophysics Data System (ADS)
Jeffery, Keith; Bailo, Daniele
2014-05-01
The European Plate Observing System (EPOS) is integrating geoscientific information concerning earth movements in Europe. We are approaching the end of the PP (Preparatory Project) phase and in October 2014 expect to continue with the full project within ESFRI (European Strategic Framework for Research Infrastructures). The key aspects of EPOS concern providing services to allow homogeneous access by end-users over heterogeneous data, software, facilities, equipment and services. The e-infrastructure of EPOS is the heart of the project since it integrates the work on organisational, legal, economic and scientific aspects. Following the creation of an inventory of relevant organisations, persons, facilities, equipment, services, datasets and software (RIDE) the scale of integration required became apparent. The EPOS e-infrastructure architecture has been developed systematically based on recorded primary (user) requirements and secondary (interoperation with other systems) requirements through Strawman, Woodman and Ironman phases with the specification - and developed confirmatory prototypes - becoming more precise and progressively moving from paper to implemented system. The EPOS architecture is based on global core services (Integrated Core Services - ICS) which access thematic nodes (domain-specific European-wide collections, called thematic Core Services - TCS), national nodes and specific institutional nodes. The key aspect is the metadata catalog. In one dimension this is described in 3 levels: (1) discovery metadata using well-known and commonly used standards such as DC (Dublin Core) to enable users (via an intelligent user interface) to search for objects within the EPOS environment relevant to their needs; (2) contextual metadata providing the context of the object described in the catalog to enable a user or the system to determine the relevance of the discovered object(s) to their requirement - the context includes projects, funding, organisations involved, persons involved, related publications, facilities, equipment and others, and utilises CERIF (Common European Research Information Format) standard (see www.eurocris.org); (3) detailed metadata which is specific to a domain or to a particular object and includes the schema describing the object to processing software. The other dimension of the metadata concerns the objects described. These are classified into users, services (including software), data and resources (computing, data storage, instruments and scientific equipment). An alternative architecture has been considered: using brokering. This technique has been used especially in North America geoscience projects to interoperate datasets. The technique involves writing software to interconvert between any two node datasets. Given n nodes this implies writing n*(n-1) convertors. EPOS Working Group 7 (e-infrastructures and virtual community) which deals with the design and implementation of a prototype of the EPOS services, chose to use an approach which endows the system with an extreme flexibility and sustainability. It is called the Metadata Catalogue approach. With the use of the catalogue the EPOS system can: 1. interoperate with software, services, users, organisations, facilities, equipment etc. as well as datasets; 2. avoid to write n*(n-1) software convertors and generate as much as possible, through the information contained in the catalogue only n convertors. This is a huge saving - especially in maintenance as the datasets (or other node resources) evolve. We are working on (semi-) automation of convertor generation by metadata mapping - this is leading-edge computer science research; 3. make large use of contextual metadata which enable a user or a machine to: (i) improve discovery of resources at nodes; (ii) improve precision and recall in search; (iii) drive the systems for identification, authentication, authorisation, security and privacy recording the relevant attributes of the node resources and of the user; (iv) manage provenance and long-term digital preservation; The linkage between the Integrated Services, which provide the integration of data and services, with the diverse Thematic Services Nodes is provided by means of a compatibility layer, which includes the aforementioned metadata catalogue. This layer provides 'connectors' to make local data, software and services available through the EPOS Integrated Services layer. In conclusion, we believe the EPOS e-infrastructure architecture is fit for purpose including long-term sustainability and pan-European access to data and services.
VIPER: a visualisation tool for exploring inheritance inconsistencies in genotyped pedigrees
2012-01-01
Background Pedigree genotype datasets are used for analysing genetic inheritance and to map genetic markers and traits. Such datasets consist of hundreds of related animals genotyped for thousands of genetic markers and invariably contain multiple errors in both the pedigree structure and in the associated individual genotype data. These errors manifest as apparent inheritance inconsistencies in the pedigree, and invalidate analyses of marker inheritance patterns across the dataset. Cleaning raw datasets of bad data points (incorrect pedigree relationships, unreliable marker assays, suspect samples, bad genotype results etc.) requires expert exploration of the patterns of exposed inconsistencies in the context of the inheritance pedigree. In order to assist this process we are developing VIPER (Visual Pedigree Explorer), a software tool that integrates an inheritance-checking algorithm with a novel space-efficient pedigree visualisation, so that reported inheritance inconsistencies are overlaid on an interactive, navigable representation of the pedigree structure. Methods and results This paper describes an evaluation of how VIPER displays the different scales and types of dataset that occur experimentally, with a description of how VIPER's display interface and functionality meet the challenges presented by such data. We examine a range of possible error types found in real and simulated pedigree genotype datasets, demonstrating how these errors are exposed and explored using the VIPER interface and we evaluate the utility and usability of the interface to the domain expert. Evaluation was performed as a two stage process with the assistance of domain experts (geneticists). The initial evaluation drove the iterative implementation of further features in the software prototype, as required by the users, prior to a final functional evaluation of the pedigree display for exploring the various error types, data scales and structures. Conclusions The VIPER display was shown to effectively expose the range of errors found in experimental genotyped pedigrees, allowing users to explore the underlying causes of reported inheritance inconsistencies. This interface will provide the basis for a full data cleaning tool that will allow the user to remove isolated bad data points, and reversibly test the effect of removing suspect genotypes and pedigree relationships. PMID:22607476
DMRfinder: efficiently identifying differentially methylated regions from MethylC-seq data.
Gaspar, John M; Hart, Ronald P
2017-11-29
DNA methylation is an epigenetic modification that is studied at a single-base resolution with bisulfite treatment followed by high-throughput sequencing. After alignment of the sequence reads to a reference genome, methylation counts are analyzed to determine genomic regions that are differentially methylated between two or more biological conditions. Even though a variety of software packages is available for different aspects of the bioinformatics analysis, they often produce results that are biased or require excessive computational requirements. DMRfinder is a novel computational pipeline that identifies differentially methylated regions efficiently. Following alignment, DMRfinder extracts methylation counts and performs a modified single-linkage clustering of methylation sites into genomic regions. It then compares methylation levels using beta-binomial hierarchical modeling and Wald tests. Among its innovative attributes are the analyses of novel methylation sites and methylation linkage, as well as the simultaneous statistical analysis of multiple sample groups. To demonstrate its efficiency, DMRfinder is benchmarked against other computational approaches using a large published dataset. Contrasting two replicates of the same sample yielded minimal genomic regions with DMRfinder, whereas two alternative software packages reported a substantial number of false positives. Further analyses of biological samples revealed fundamental differences between DMRfinder and another software package, despite the fact that they utilize the same underlying statistical basis. For each step, DMRfinder completed the analysis in a fraction of the time required by other software. Among the computational approaches for identifying differentially methylated regions from high-throughput bisulfite sequencing datasets, DMRfinder is the first that integrates all the post-alignment steps in a single package. Compared to other software, DMRfinder is extremely efficient and unbiased in this process. DMRfinder is free and open-source software, available on GitHub ( github.com/jsh58/DMRfinder ); it is written in Python and R, and is supported on Linux.
ATACseqQC: a Bioconductor package for post-alignment quality assessment of ATAC-seq data.
Ou, Jianhong; Liu, Haibo; Yu, Jun; Kelliher, Michelle A; Castilla, Lucio H; Lawson, Nathan D; Zhu, Lihua Julie
2018-03-01
ATAC-seq (Assays for Transposase-Accessible Chromatin using sequencing) is a recently developed technique for genome-wide analysis of chromatin accessibility. Compared to earlier methods for assaying chromatin accessibility, ATAC-seq is faster and easier to perform, does not require cross-linking, has higher signal to noise ratio, and can be performed on small cell numbers. However, to ensure a successful ATAC-seq experiment, step-by-step quality assurance processes, including both wet lab quality control and in silico quality assessment, are essential. While several tools have been developed or adopted for assessing read quality, identifying nucleosome occupancy and accessible regions from ATAC-seq data, none of the tools provide a comprehensive set of functionalities for preprocessing and quality assessment of aligned ATAC-seq datasets. We have developed a Bioconductor package, ATACseqQC, for easily generating various diagnostic plots to help researchers quickly assess the quality of their ATAC-seq data. In addition, this package contains functions to preprocess aligned ATAC-seq data for subsequent peak calling. Here we demonstrate the utilities of our package using 25 publicly available ATAC-seq datasets from four studies. We also provide guidelines on what the diagnostic plots should look like for an ideal ATAC-seq dataset. This software package has been used successfully for preprocessing and assessing several in-house and public ATAC-seq datasets. Diagnostic plots generated by this package will facilitate the quality assessment of ATAC-seq data, and help researchers to evaluate their own ATAC-seq experiments as well as select high-quality ATAC-seq datasets from public repositories such as GEO to avoid generating hypotheses or drawing conclusions from low-quality ATAC-seq experiments. The software, source code, and documentation are freely available as a Bioconductor package at https://bioconductor.org/packages/release/bioc/html/ATACseqQC.html .
10 CFR 429.43 - Commercial heating, ventilating, air conditioning (HVAC) equipment.
Code of Federal Regulations, 2012 CFR
2012-01-01
... seasonal energy efficiency ratio (SEER in British thermal units per Watt-hour (Btu/Wh)), the heating...) Package terminal air conditioners: The energy efficiency ratio (EER in British thermal units per Watt-hour... package vertical air conditioner: The energy efficiency ratio (EER in British thermal units per Watt-hour...
10 CFR 429.43 - Commercial heating, ventilating, air conditioning (HVAC) equipment.
Code of Federal Regulations, 2013 CFR
2013-01-01
... seasonal energy efficiency ratio (SEER in British thermal units per Watt-hour (Btu/Wh)), the heating...) Package terminal air conditioners: The energy efficiency ratio (EER in British thermal units per Watt-hour... package vertical air conditioner: The energy efficiency ratio (EER in British thermal units per Watt-hour...
Nutrient-Chlorophyll Relationships in the Indian River Lagoon, Florida(SEERS)
The Indian River Lagoon is a highly diverse estuary located along Florida’s Atlantic coast. The system is made up of the main stem and two side-lagoons: the Banana River and Mosquito Lagoon. We segmented the main stem into three sections based on spatial trends in water quality ...
A Fifteen-Year Forecast of Information-Processing Technology. Final Report.
ERIC Educational Resources Information Center
Bernstein, George B.
This study developed a variation of the DELPHI approach, a polling technique for systematically soliciting opinions from experts, to produce a technological forecast of developments in the information-processing industry. SEER (System for Event Evaluation and Review) combines the more desirable elements of existing techniques: (1) intuitive…
Awakening the Inner Eye. Intuition in Education.
ERIC Educational Resources Information Center
Noddings, Nel; Shore, Paul J.
This book discusses the meaning, importance, and uses of intuition. In the first chapter the development of the conceptual history of intuition is traced from the ancient seers, religion, art, psychology, and philosophy. In chapter 2, work which has contributed to the development of intuition as a philosophical and psychological concept is…
DOE ZERH Case Study: Sunroc Builders, Bates Avenue, Lakeland, FL
DOE Office of Scientific and Technical Information (OSTI.GOV)
none,
2015-09-01
Case study of a DOE 2015 Housing Innovation Award winning affordable home in the hot-humid climate that got HERS 57 without PV, with 6.5” SIP walls and 8.25” SIP roof; uninsulated slab foundation; fresh air intake; SEER 16 ducted air source heat pump.
Yilmaz, E; Kayikcioglu, T; Kayipmaz, S
2017-07-01
In this article, we propose a decision support system for effective classification of dental periapical cyst and keratocystic odontogenic tumor (KCOT) lesions obtained via cone beam computed tomography (CBCT). CBCT has been effectively used in recent years for diagnosing dental pathologies and determining their boundaries and content. Unlike other imaging techniques, CBCT provides detailed and distinctive information about the pathologies by enabling a three-dimensional (3D) image of the region to be displayed. We employed 50 CBCT 3D image dataset files as the full dataset of our study. These datasets were identified by experts as periapical cyst and KCOT lesions according to the clinical, radiographic and histopathologic features. Segmentation operations were performed on the CBCT images using viewer software that we developed. Using the tools of this software, we marked the lesional volume of interest and calculated and applied the order statistics and 3D gray-level co-occurrence matrix for each CBCT dataset. A feature vector of the lesional region, including 636 different feature items, was created from those statistics. Six classifiers were used for the classification experiments. The Support Vector Machine (SVM) classifier achieved the best classification performance with 100% accuracy, and 100% F-score (F1) scores as a result of the experiments in which a ten-fold cross validation method was used with a forward feature selection algorithm. SVM achieved the best classification performance with 96.00% accuracy, and 96.00% F1 scores in the experiments in which a split sample validation method was used with a forward feature selection algorithm. SVM additionally achieved the best performance of 94.00% accuracy, and 93.88% F1 in which a leave-one-out (LOOCV) method was used with a forward feature selection algorithm. Based on the results, we determined that periapical cyst and KCOT lesions can be classified with a high accuracy with the models that we built using the new dataset selected for this study. The studies mentioned in this article, along with the selected 3D dataset, 3D statistics calculated from the dataset, and performance results of the different classifiers, comprise an important contribution to the field of computer-aided diagnosis of dental apical lesions. Copyright © 2017 Elsevier B.V. All rights reserved.
Busby, Ben; Lesko, Matthew; Federer, Lisa
2016-01-01
In genomics, bioinformatics and other areas of data science, gaps exist between extant public datasets and the open-source software tools built by the community to analyze similar data types. The purpose of biological data science hackathons is to assemble groups of genomics or bioinformatics professionals and software developers to rapidly prototype software to address these gaps. The only two rules for the NCBI-assisted hackathons run so far are that 1) data either must be housed in public data repositories or be deposited to such repositories shortly after the hackathon’s conclusion, and 2) all software comprising the final pipeline must be open-source or open-use. Proposed topics, as well as suggested tools and approaches, are distributed to participants at the beginning of each hackathon and refined during the event. Software, scripts, and pipelines are developed and published on GitHub, a web service providing publicly available, free-usage tiers for collaborative software development. The code resulting from each hackathon is published at https://github.com/NCBI-Hackathons/ with separate directories or repositories for each team. PMID:27134733
Li, You; Heavican, Tayla B.; Vellichirammal, Neetha N.; Iqbal, Javeed
2017-01-01
Abstract The RNA-Seq technology has revolutionized transcriptome characterization not only by accurately quantifying gene expression, but also by the identification of novel transcripts like chimeric fusion transcripts. The ‘fusion’ or ‘chimeric’ transcripts have improved the diagnosis and prognosis of several tumors, and have led to the development of novel therapeutic regimen. The fusion transcript detection is currently accomplished by several software packages, primarily relying on sequence alignment algorithms. The alignment of sequencing reads from fusion transcript loci in cancer genomes can be highly challenging due to the incorrect mapping induced by genomic alterations, thereby limiting the performance of alignment-based fusion transcript detection methods. Here, we developed a novel alignment-free method, ChimeRScope that accurately predicts fusion transcripts based on the gene fingerprint (as k-mers) profiles of the RNA-Seq paired-end reads. Results on published datasets and in-house cancer cell line datasets followed by experimental validations demonstrate that ChimeRScope consistently outperforms other popular methods irrespective of the read lengths and sequencing depth. More importantly, results on our in-house datasets show that ChimeRScope is a better tool that is capable of identifying novel fusion transcripts with potential oncogenic functions. ChimeRScope is accessible as a standalone software at (https://github.com/ChimeRScope/ChimeRScope/wiki) or via the Galaxy web-interface at (https://galaxy.unmc.edu/). PMID:28472320
Gonzalez, Michael A; Lebrigio, Rafael F Acosta; Van Booven, Derek; Ulloa, Rick H; Powell, Eric; Speziani, Fiorella; Tekin, Mustafa; Schüle, Rebecca; Züchner, Stephan
2013-06-01
Novel genes are now identified at a rapid pace for many Mendelian disorders, and increasingly, for genetically complex phenotypes. However, new challenges have also become evident: (1) effectively managing larger exome and/or genome datasets, especially for smaller labs; (2) direct hands-on analysis and contextual interpretation of variant data in large genomic datasets; and (3) many small and medium-sized clinical and research-based investigative teams around the world are generating data that, if combined and shared, will significantly increase the opportunities for the entire community to identify new genes. To address these challenges, we have developed GEnomes Management Application (GEM.app), a software tool to annotate, manage, visualize, and analyze large genomic datasets (https://genomics.med.miami.edu/). GEM.app currently contains ∼1,600 whole exomes from 50 different phenotypes studied by 40 principal investigators from 15 different countries. The focus of GEM.app is on user-friendly analysis for nonbioinformaticians to make next-generation sequencing data directly accessible. Yet, GEM.app provides powerful and flexible filter options, including single family filtering, across family/phenotype queries, nested filtering, and evaluation of segregation in families. In addition, the system is fast, obtaining results within 4 sec across ∼1,200 exomes. We believe that this system will further enhance identification of genetic causes of human disease. © 2013 Wiley Periodicals, Inc.
To generate a finite element model of human thorax using the VCH dataset
NASA Astrophysics Data System (ADS)
Shi, Hui; Liu, Qian
2009-10-01
Purpose: To generate a three-dimensional (3D) finite element (FE) model of human thorax which may provide the basis of biomechanics simulation for the study of design effect and mechanism of safety belt when vehicle collision. Methods: Using manually or semi-manually segmented method, the interested area can be segmented from the VCH (Visible Chinese Human) dataset. The 3D surface model of thorax is visualized by using VTK (Visualization Toolkit) and further translated into (Stereo Lithography) STL format, which approximates the geometry of solid model by representing the boundaries with triangular facets. The data in STL format need to be normalized into NURBS surfaces and IGES format using software such as Geomagic Studio to provide archetype for reverse engineering. The 3D FE model was established using Ansys software. Results: The generated 3D FE model was an integrated thorax model which could reproduce human's complicated structure morphology including clavicle, ribs, spine and sternum. It was consisted of 1 044 179 elements in total. Conclusions: Compared with the previous thorax model, this FE model enhanced the authenticity and precision of results analysis obviously, which can provide a sound basis for analysis of human thorax biomechanical research. Furthermore, using the method above, we can also establish 3D FE models of some other organizes and tissues utilizing the VCH dataset.
for micro-siting potential development projects. This shapefile was generated from a raster dataset for Sustainable Energy, LLC for the U.S. Department of Energy ("DOE"). The user is granted , the user of this data agrees to credit NREL in any publications or software that incorporate or use
for micro-siting potential development projects. This shapefile was generated from a raster dataset for Sustainable Energy, LLC for the U.S. Department of Energy ("DOE"). The user is granted , the user of this data agrees to credit NREL in any publications or software that incorporate or use
Computer Simulation of Classic Studies in Psychology.
ERIC Educational Resources Information Center
Bradley, Drake R.
This paper describes DATASIM, a comprehensive software package which generates simulated data for actual or hypothetical research designs. DATASIM is primarily intended for use in statistics and research methods courses, where it is used to generate "individualized" datasets for students to analyze, and later to correct their answers.…
Ontology-Driven Discovery of Scientific Computational Entities
ERIC Educational Resources Information Center
Brazier, Pearl W.
2010-01-01
Many geoscientists use modern computational resources, such as software applications, Web services, scientific workflows and datasets that are readily available on the Internet, to support their research and many common tasks. These resources are often shared via human contact and sometimes stored in data portals; however, they are not necessarily…
Geologic Communications | Alaska Division of Geological & Geophysical
improves a database for the Division's digital and map-based geological, geophysical, and geochemical data interfaces DGGS metadata and digital data distribution - Geospatial datasets published by DGGS are designed to be compatible with a broad variety of digital mapping software, to present DGGS's geospatial data
Mylona, Anastasia; Carr, Stephen; Aller, Pierre; Moraes, Isabel; Treisman, Richard; Evans, Gwyndaf; Foadi, James
2017-08-04
The present article describes how to use the computer program BLEND to help assemble complete datasets for the solution of macromolecular structures, starting from partial or complete datasets, derived from data collection from multiple crystals. The program is demonstrated on more than two hundred X-ray diffraction datasets obtained from 50 crystals of a complex formed between the SRF transcription factor, its cognate DNA, and a peptide from the SRF cofactor MRTF-A. This structure is currently in the process of being fully solved. While full details of the structure are not yet available, the repeated application of BLEND on data from this structure, as they have become available, has made it possible to produce electron density maps clear enough to visualise the potential location of MRTF sequences.
Combining results of multiple search engines in proteomics.
Shteynberg, David; Nesvizhskii, Alexey I; Moritz, Robert L; Deutsch, Eric W
2013-09-01
A crucial component of the analysis of shotgun proteomics datasets is the search engine, an algorithm that attempts to identify the peptide sequence from the parent molecular ion that produced each fragment ion spectrum in the dataset. There are many different search engines, both commercial and open source, each employing a somewhat different technique for spectrum identification. The set of high-scoring peptide-spectrum matches for a defined set of input spectra differs markedly among the various search engine results; individual engines each provide unique correct identifications among a core set of correlative identifications. This has led to the approach of combining the results from multiple search engines to achieve improved analysis of each dataset. Here we review the techniques and available software for combining the results of multiple search engines and briefly compare the relative performance of these techniques.
Combining Results of Multiple Search Engines in Proteomics*
Shteynberg, David; Nesvizhskii, Alexey I.; Moritz, Robert L.; Deutsch, Eric W.
2013-01-01
A crucial component of the analysis of shotgun proteomics datasets is the search engine, an algorithm that attempts to identify the peptide sequence from the parent molecular ion that produced each fragment ion spectrum in the dataset. There are many different search engines, both commercial and open source, each employing a somewhat different technique for spectrum identification. The set of high-scoring peptide-spectrum matches for a defined set of input spectra differs markedly among the various search engine results; individual engines each provide unique correct identifications among a core set of correlative identifications. This has led to the approach of combining the results from multiple search engines to achieve improved analysis of each dataset. Here we review the techniques and available software for combining the results of multiple search engines and briefly compare the relative performance of these techniques. PMID:23720762
Android Malware Classification Using K-Means Clustering Algorithm
NASA Astrophysics Data System (ADS)
Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah
2017-08-01
Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.
Mylona, Anastasia; Carr, Stephen; Aller, Pierre; Moraes, Isabel; Treisman, Richard; Evans, Gwyndaf; Foadi, James
2018-01-01
The present article describes how to use the computer program BLEND to help assemble complete datasets for the solution of macromolecular structures, starting from partial or complete datasets, derived from data collection from multiple crystals. The program is demonstrated on more than two hundred X-ray diffraction datasets obtained from 50 crystals of a complex formed between the SRF transcription factor, its cognate DNA, and a peptide from the SRF cofactor MRTF-A. This structure is currently in the process of being fully solved. While full details of the structure are not yet available, the repeated application of BLEND on data from this structure, as they have become available, has made it possible to produce electron density maps clear enough to visualise the potential location of MRTF sequences. PMID:29456874
Dataset on spatial distribution and location of universities in Nigeria.
Adeyemi, G A; Edeki, S O
2018-06-01
Access to quality educational system, and the location of educational institutions are of great importance for future prospect of youth in any nation. These in return, have great effects on the economy growth and development of any country. Thus, the dataset contained in this article examines and explains the spatial distribution of universities in the Nigeria system of education. Data from the university commission, Nigeria, as at December 2017 are used. These include all the 40 federal universities, 44 states universities, and 69 private universities making a total of 153 universities in the Nigerian system of education. The data analysis is via the Geographic Information System (GIS) software. The dataset contained in this article will be of immense assistance to the national educational policy makers, parents, and potential students as regards smart and reliable decision making academically.
Omics Metadata Management Software v. 1 (OMMS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and to perform bioinformatics analyses and information management tasks via a simple and intuitive web-based interface. Several use cases with short-read sequence datasets are provided to showcase the full functionality of the OMMS, from metadata curation tasks, to bioinformatics analyses and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for web-based deployment supporting geographically dispersed research teams. Our software was developed with open-source bundles, is flexible, extensible and easily installedmore » and run by operators with general system administration and scripting language literacy.« less
Lohmann, Ingrid
2012-01-01
In multi-cellular organisms, spatiotemporal activity of cis-regulatory DNA elements depends on their occupancy by different transcription factors (TFs). In recent years, genome-wide ChIP-on-Chip, ChIP-Seq and DamID assays have been extensively used to unravel the combinatorial interaction of TFs with cis-regulatory modules (CRMs) in the genome. Even though genome-wide binding profiles are increasingly becoming available for different TFs, single TF binding profiles are in most cases not sufficient for dissecting complex regulatory networks. Thus, potent computational tools detecting statistically significant and biologically relevant TF-motif co-occurrences in genome-wide datasets are essential for analyzing context-dependent transcriptional regulation. We have developed COPS (Co-Occurrence Pattern Search), a new bioinformatics tool based on a combination of association rules and Markov chain models, which detects co-occurring TF binding sites (BSs) on genomic regions of interest. COPS scans DNA sequences for frequent motif patterns using a Frequent-Pattern tree based data mining approach, which allows efficient performance of the software with respect to both data structure and implementation speed, in particular when mining large datasets. Since transcriptional gene regulation very often relies on the formation of regulatory protein complexes mediated by closely adjoining TF binding sites on CRMs, COPS additionally detects preferred short distance between co-occurring TF motifs. The performance of our software with respect to biological significance was evaluated using three published datasets containing genomic regions that are independently bound by several TFs involved in a defined biological process. In sum, COPS is a fast, efficient and user-friendly tool mining statistically and biologically significant TFBS co-occurrences and therefore allows the identification of TFs that combinatorially regulate gene expression. PMID:23272209
PyRAD: assembly of de novo RADseq loci for phylogenetic analyses.
Eaton, Deren A R
2014-07-01
Restriction-site-associated genomic markers are a powerful tool for investigating evolutionary questions at the population level, but are limited in their utility at deeper phylogenetic scales where fewer orthologous loci are typically recovered across disparate taxa. While this limitation stems in part from mutations to restriction recognition sites that disrupt data generation, an additional source of data loss comes from the failure to identify homology during bioinformatic analyses. Clustering methods that allow for lower similarity thresholds and the inclusion of indel variation will perform better at assembling RADseq loci at the phylogenetic scale. PyRAD is a pipeline to assemble de novo RADseq loci with the aim of optimizing coverage across phylogenetic datasets. It uses a wrapper around an alignment-clustering algorithm, which allows for indel variation within and between samples, as well as for incomplete overlap among reads (e.g. paired-end). Here I compare PyRAD with the program Stacks in their performance analyzing a simulated RADseq dataset that includes indel variation. Indels disrupt clustering of homologous loci in Stacks but not in PyRAD, such that the latter recovers more shared loci across disparate taxa. I show through reanalysis of an empirical RADseq dataset that indels are a common feature of such data, even at shallow phylogenetic scales. PyRAD uses parallel processing as well as an optional hierarchical clustering method, which allows it to rapidly assemble phylogenetic datasets with hundreds of sampled individuals. Software is written in Python and freely available at http://www.dereneaton.com/software/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
QuIN: A Web Server for Querying and Visualizing Chromatin Interaction Networks
Thibodeau, Asa; Márquez, Eladio J.; Luo, Oscar; Ruan, Yijun; Shin, Dong-Guk; Stitzel, Michael L.; Ucar, Duygu
2016-01-01
Recent studies of the human genome have indicated that regulatory elements (e.g. promoters and enhancers) at distal genomic locations can interact with each other via chromatin folding and affect gene expression levels. Genomic technologies for mapping interactions between DNA regions, e.g., ChIA-PET and HiC, can generate genome-wide maps of interactions between regulatory elements. These interaction datasets are important resources to infer distal gene targets of non-coding regulatory elements and to facilitate prioritization of critical loci for important cellular functions. With the increasing diversity and complexity of genomic information and public ontologies, making sense of these datasets demands integrative and easy-to-use software tools. Moreover, network representation of chromatin interaction maps enables effective data visualization, integration, and mining. Currently, there is no software that can take full advantage of network theory approaches for the analysis of chromatin interaction datasets. To fill this gap, we developed a web-based application, QuIN, which enables: 1) building and visualizing chromatin interaction networks, 2) annotating networks with user-provided private and publicly available functional genomics and interaction datasets, 3) querying network components based on gene name or chromosome location, and 4) utilizing network based measures to identify and prioritize critical regulatory targets and their direct and indirect interactions. AVAILABILITY: QuIN’s web server is available at http://quin.jax.org QuIN is developed in Java and JavaScript, utilizing an Apache Tomcat web server and MySQL database and the source code is available under the GPLV3 license available on GitHub: https://github.com/UcarLab/QuIN/. PMID:27336171
Jones, Nia W; Raine-Fenning, Nick J; Mousa, Hatem A; Bradley, Eileen; Bugg, George J
2011-03-01
Three-dimensional (3-D) power Doppler angiography (3-D-PDA) allows visualisation of Doppler signals within the placenta and their quantification is possible by the generation of vascular indices by the 4-D View software programme. This study aimed to investigate intra- and interobserver reproducibility of 3-D-PDA analysis of stored datasets at varying gestations with the ultimate goal being to develop a tool for predicting placental dysfunction. Women with an uncomplicated, viable singleton pregnancy were scanned at 12, 16 or 20 weeks gestational age groups. 3-D-PDA datasets acquired of the whole placenta were analysed using the VOCAL software processing tool. Each volume was analysed by three observers twice in the A plane. Intra- and interobserver reliability was assessed by intraclass correlation coefficients (ICCs) and Bland Altman plots. At each gestational age group, 20 low risk women were scanned resulting in 60 datasets in total. The ICC demonstrated a high level of measurement reliability at each gestation with intraobserver values >0.90 and interobserver values of >0.6 for the vascular indices. Bland Altman plots also showed high levels of agreement. Systematic bias was seen at 20 weeks in the vascular indices obtained by different observers. This study demonstrates that 3-D-PDA data can be measured reliably by different observers from stored datasets up to 18 weeks gestation. Measurements become less reliable as gestation advances with bias between observers evident at 20 weeks. Copyright © 2011 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
Lim, Hyun-ju; Weinheimer, Oliver; Wielpütz, Mark O.; Dinkel, Julien; Hielscher, Thomas; Gompelmann, Daniela; Kauczor, Hans-Ulrich; Heussel, Claus Peter
2016-01-01
Objectives Surgical or bronchoscopic lung volume reduction (BLVR) techniques can be beneficial for heterogeneous emphysema. Post-processing software tools for lobar emphysema quantification are useful for patient and target lobe selection, treatment planning and post-interventional follow-up. We aimed to evaluate the inter-software variability of emphysema quantification using fully automated lobar segmentation prototypes. Material and Methods 66 patients with moderate to severe COPD who underwent CT for planning of BLVR were included. Emphysema quantification was performed using 2 modified versions of in-house software (without and with prototype advanced lung vessel segmentation; programs 1 [YACTA v.2.3.0.2] and 2 [YACTA v.2.4.3.1]), as well as 1 commercial program 3 [Pulmo3D VA30A_HF2] and 1 pre-commercial prototype 4 [CT COPD ISP ver7.0]). The following parameters were computed for each segmented anatomical lung lobe and the whole lung: lobar volume (LV), mean lobar density (MLD), 15th percentile of lobar density (15th), emphysema volume (EV) and emphysema index (EI). Bland-Altman analysis (limits of agreement, LoA) and linear random effects models were used for comparison between the software. Results Segmentation using programs 1, 3 and 4 was unsuccessful in 1 (1%), 7 (10%) and 5 (7%) patients, respectively. Program 2 could analyze all datasets. The 53 patients with successful segmentation by all 4 programs were included for further analysis. For LV, program 1 and 4 showed the largest mean difference of 72 ml and the widest LoA of [-356, 499 ml] (p<0.05). Program 3 and 4 showed the largest mean difference of 4% and the widest LoA of [-7, 14%] for EI (p<0.001). Conclusions Only a single software program was able to successfully analyze all scheduled data-sets. Although mean bias of LV and EV were relatively low in lobar quantification, ranges of disagreement were substantial in both of them. For longitudinal emphysema monitoring, not only scanning protocol but also quantification software needs to be kept constant. PMID:27029047
Wilke, Marko
2018-02-01
This dataset contains the regression parameters derived by analyzing segmented brain MRI images (gray matter and white matter) from a large population of healthy subjects, using a multivariate adaptive regression splines approach. A total of 1919 MRI datasets ranging in age from 1-75 years from four publicly available datasets (NIH, C-MIND, fCONN, and IXI) were segmented using the CAT12 segmentation framework, writing out gray matter and white matter images normalized using an affine-only spatial normalization approach. These images were then subjected to a six-step DARTEL procedure, employing an iterative non-linear registration approach and yielding increasingly crisp intermediate images. The resulting six datasets per tissue class were then analyzed using multivariate adaptive regression splines, using the CerebroMatic toolbox. This approach allows for flexibly modelling smoothly varying trajectories while taking into account demographic (age, gender) as well as technical (field strength, data quality) predictors. The resulting regression parameters described here can be used to generate matched DARTEL or SHOOT templates for a given population under study, from infancy to old age. The dataset and the algorithm used to generate it are publicly available at https://irc.cchmc.org/software/cerebromatic.php.
Anguita, Alberto; García-Remesal, Miguel; Graf, Norbert; Maojo, Victor
2016-04-01
Modern biomedical research relies on the semantic integration of heterogeneous data sources to find data correlations. Researchers access multiple datasets of disparate origin, and identify elements-e.g. genes, compounds, pathways-that lead to interesting correlations. Normally, they must refer to additional public databases in order to enrich the information about the identified entities-e.g. scientific literature, published clinical trial results, etc. While semantic integration techniques have traditionally focused on providing homogeneous access to private datasets-thus helping automate the first part of the research, and there exist different solutions for browsing public data, there is still a need for tools that facilitate merging public repositories with private datasets. This paper presents a framework that automatically locates public data of interest to the researcher and semantically integrates it with existing private datasets. The framework has been designed as an extension of traditional data integration systems, and has been validated with an existing data integration platform from a European research project by integrating a private biological dataset with data from the National Center for Biotechnology Information (NCBI). Copyright © 2016 Elsevier Inc. All rights reserved.
Usability Prediction & Ranking of SDLC Models Using Fuzzy Hierarchical Usability Model
NASA Astrophysics Data System (ADS)
Gupta, Deepak; Ahlawat, Anil K.; Sagar, Kalpna
2017-06-01
Evaluation of software quality is an important aspect for controlling and managing the software. By such evaluation, improvements in software process can be made. The software quality is significantly dependent on software usability. Many researchers have proposed numbers of usability models. Each model considers a set of usability factors but do not cover all the usability aspects. Practical implementation of these models is still missing, as there is a lack of precise definition of usability. Also, it is very difficult to integrate these models into current software engineering practices. In order to overcome these challenges, this paper aims to define the term `usability' using the proposed hierarchical usability model with its detailed taxonomy. The taxonomy considers generic evaluation criteria for identifying the quality components, which brings together factors, attributes and characteristics defined in various HCI and software models. For the first time, the usability model is also implemented to predict more accurate usability values. The proposed system is named as fuzzy hierarchical usability model that can be easily integrated into the current software engineering practices. In order to validate the work, a dataset of six software development life cycle models is created and employed. These models are ranked according to their predicted usability values. This research also focuses on the detailed comparison of proposed model with the existing usability models.
Gpufit: An open-source toolkit for GPU-accelerated curve fitting.
Przybylski, Adrian; Thiel, Björn; Keller-Findeisen, Jan; Stock, Bernd; Bates, Mark
2017-11-16
We present a general purpose, open-source software library for estimation of non-linear parameters by the Levenberg-Marquardt algorithm. The software, Gpufit, runs on a Graphics Processing Unit (GPU) and executes computations in parallel, resulting in a significant gain in performance. We measured a speed increase of up to 42 times when comparing Gpufit with an identical CPU-based algorithm, with no loss of precision or accuracy. Gpufit is designed such that it is easily incorporated into existing applications or adapted for new ones. Multiple software interfaces, including to C, Python, and Matlab, ensure that Gpufit is accessible from most programming environments. The full source code is published as an open source software repository, making its function transparent to the user and facilitating future improvements and extensions. As a demonstration, we used Gpufit to accelerate an existing scientific image analysis package, yielding significantly improved processing times for super-resolution fluorescence microscopy datasets.
16 CFR Appendix H to Part 305 - Cooling Performance and Cost for Central Air Conditioners
Code of Federal Regulations, 2014 CFR
2014-01-01
... for Central Air Conditioners Manufacturer's rated cooling capacities (Btu's/hr.) Range of SEER's Low High Single Package Units Central Air Conditioners (Cooling Only): All capacities 10.6 16.5 Heat Pumps (Cooling Function): All capacities 10.6 16.0 Split System Units Central Air Conditioners (Cooling Only...
DOE Office of Scientific and Technical Information (OSTI.GOV)
none,
2012-03-01
PNNL and Florida Solar Energy Center worked with Habitat for Humanity of Palm Beach County to upgrade an empty 1996 home with a 14.5 SEER AC, heat pump water heater, CFLs, more attic insulation, and air sealing to cut utility bills $872 annually.
What We See Is What We Choose: Seers and Seekers with Diversity
ERIC Educational Resources Information Center
Srinivasan, Prasanna
2017-01-01
Educators are always reminded that the act of teaching and learning has to be purposeful and highly relevant to all individuals and groups within particular societies. However, societies are highly complex, and they are traversed by varied categorical groupings based on individual and group identities. Taylor contends that categorical identity…
TableSeer: Automatic Table Extraction, Search, and Understanding
ERIC Educational Resources Information Center
Liu, Ying
2009-01-01
Tables are ubiquitous with a history that pre-dates that of sentential text. Authors often report a summary of their most important findings using tabular structure in documents. For example, scientists widely use tables to present the latest experimental results or statistical data in a condensed fashion. Along with the explosive development of…
Second primary malignancies in chronic myeloid leukemia.
Shah, Binay Kumar; Ghimire, Krishna Bilas
2014-12-01
Survival of patients with chronic myeloid leukemia (CML) has improved with the use of imatinib and other tyrosine kinase inhibitors. There is limited data on second primary malignancies (SPM) in CML. We analyzed the SPMs rates among CML patients reported to Surveillance, Epidemiology, and End Results (SEER) database during pre-(1992-2000) and post-(2002-2009) era. We used SEER Multiple Primary-Standardized Incidence Ratio session to calculate standardized incidence ratios (SIRs). Among 8,511 adult CML patients, 446 patients developed 473 SPMs. The SIR for SPMs in CML patients was significantly higher with observed/expected ratio:1.27, P < 0.05 and absolute excess risk of 32.09 per 10,000 person years compared to general population. The rate of SPMs for cancers of all sites in post-imatinib era were significantly higher compared to pre-imatinib era with observed/expected ratio of 1.48 versus 1.06, P = 0.03. This study showed that risk of SPMs is higher among CML patients. The risk of SPMs is significantly higher in post-imatinib era compared to pre-imatinib era.
PyFDAP: automated analysis of fluorescence decay after photoconversion (FDAP) experiments.
Bläßle, Alexander; Müller, Patrick
2015-03-15
We developed the graphical user interface PyFDAP for the fitting of linear and non-linear decay functions to data from fluorescence decay after photoconversion (FDAP) experiments. PyFDAP structures and analyses large FDAP datasets and features multiple fitting and plotting options. PyFDAP was written in Python and runs on Ubuntu Linux, Mac OS X and Microsoft Windows operating systems. The software, a user guide and a test FDAP dataset are freely available for download from http://people.tuebingen.mpg.de/mueller-lab. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A decision support system for map projections of small scale data
Finn, Michael P.; Usery, E. Lynn; Posch, Stephan T.; Seong, Jeong Chang
2004-01-01
The use of commercial geographic information system software to process large raster datasets of terrain elevation, population, land cover, vegetation, soils, temperature, and rainfall requires both projection from spherical coordinates to plane coordinate systems and transformation from one plane system to another. Decision support systems deliver information resulting in knowledge that assists in policies, priorities, or processes. This paper presents an approach to handling the problems of raster dataset projection and transformation through the development of a Web-enabled decision support system to aid users of transformation processes with the selection of appropriate map projections based on data type, areal extent, location, and preservation properties.
Ray Meta: scalable de novo metagenome assembly and profiling
2012-01-01
Voluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for de novo assembly and taxonomic profiling. Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. It can accurately assemble and profile a three billion read metagenomic experiment representing 1,000 bacterial genomes of uneven proportions in 15 hours with 1,024 processor cores, using only 1.5 GB per core. The software will facilitate the processing of large and complex datasets, and will help in generating biological insights for specific environments. Ray Meta is open source and available at http://denovoassembler.sf.net. PMID:23259615
Clock Agreement Among Parallel Supercomputer Nodes
Jones, Terry R.; Koenig, Gregory A.
2014-04-30
This dataset presents measurements that quantify the clock synchronization time-agreement characteristics among several high performance computers including the current world's most powerful machine for open science, the U.S. Department of Energy's Titan machine sited at Oak Ridge National Laboratory. These ultra-fast machines derive much of their computational capability from extreme node counts (over 18000 nodes in the case of the Titan machine). Time-agreement is commonly utilized by parallel programming applications and tools, distributed programming application and tools, and system software. Our time-agreement measurements detail the degree of time variance between nodes and how that variance changes over time. The dataset includes empirical measurements and the accompanying spreadsheets.
User's Guide for the MapImage Reprojection Software Package, Version 1.01
Finn, Michael P.; Trent, Jason R.
2004-01-01
Scientists routinely accomplish small-scale geospatial modeling in the raster domain, using high-resolution datasets (such as 30-m data) for large parts of continents and low-resolution to high-resolution datasets for the entire globe. Recently, Usery and others (2003a) expanded on the previously limited empirical work with real geographic data by compiling and tabulating the accuracy of categorical areas in projected raster datasets of global extent. Geographers and applications programmers at the U.S. Geological Survey's (USGS) Mid-Continent Mapping Center (MCMC) undertook an effort to expand and evolve an internal USGS software package, MapImage, or mapimg, for raster map projection transformation (Usery and others, 2003a). Daniel R. Steinwand of Science Applications International Corporation, Earth Resources Observation Systems Data Center in Sioux Falls, S. Dak., originally developed mapimg for the USGS, basing it on the USGS's General Cartographic Transformation Package (GCTP). It operated as a command line program on the Unix operating system. Through efforts at MCMC, and in coordination with Mr. Steinwand, this program has been transformed from an application based on a command line into a software package based on a graphic user interface for Windows, Linux, and Unix machines. Usery and others (2003b) pointed out that many commercial software packages do not use exact projection equations and that even when exact projection equations are used, the software often results in error and sometimes does not complete the transformation for specific projections, at specific resampling resolutions, and for specific singularities. Direct implementation of point-to-point transformation with appropriate functions yields the variety of projections available in these software packages, but implementation with data other than points requires specific adaptation of the equations or prior preparation of the data to allow the transformation to succeed. Additional constraints apply to global raster data. It appears that some packages use the USGS's GCTP or similar point transformations without adaptation to the specific characteristics of raster data (Usery and others, 2003b). It is most common for programs to compute transformations of raster data in an inverse fashion. Such mapping can result in an erroneous position and replicate data or create pixels not in the original space. As Usery and others (2003a) indicated, mapimg performs a corresponding forward transformation to ensure the same location results from both methods. The primary benefit of this function is to mask cells outside the domain. MapImage 1.01 is now on the Web. You can download the User's Guide, source, and binaries from the following site: http://mcmcweb.er.usgs.gov/carto_research/projection/acc_proj_data.html
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers.
Teodoro, Douglas; Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio
2018-01-01
The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers
Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio
2018-01-01
The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms. PMID:29293556
Bridging Archival Standards: Building Software to Translate Metadata Between PDS3 and PDS4
NASA Astrophysics Data System (ADS)
De Cesare, C. M.; Padams, J. H.
2018-04-01
Transitioning datasets from PDS3 to PDS4 requires manual and detail-oriented work. To increase efficiency and reduce human error, we've built the Label Mapping Tool, which compares a PDS3 label to a PDS4 label template and outputs mappings between the two.
NASA Technical Reports Server (NTRS)
Pourmal, Elena
2016-01-01
The HDF Group maintains and evolves HDF software used by NASA ESDIS program to manage remote sense data. In this talk we will discuss new features of HDF (Virtual Datasets, Single writerMultiple reader access, Community supported HDF5 compression filters) that address storage and IO performance requirements of the applications that work with the ESDIS data products.
Software Tools | Office of Cancer Clinical Proteomics Research
The CPTAC program develops new approaches to elucidate aspects of the molecular complexity of cancer made from large-scale proteogenomic datasets, and advance them toward precision medicine. Part of the CPTAC mission is to make data and tools available and accessible to the greater research community to accelerate the discovery process.
WGS 84. The shapefile was generated from the raster dataset and then projected to Geographic Decimal "). The user is granted the right, without any fee or cost, to use, copy, modify, alter, enhance copies of the data. Further, the user of this data agrees to credit NREL in any publications or software
Optimizing Performance of Scientific Visualization Software to Support Frontier-Class Computations
2015-08-01
Hypersonic Sciences Branch) for providing sample datasets and permission to use an image of Q_Criterion isosurface for this report; Dr Anders Grimsrud...10.1. EnSight CSM and CFD Post processing; c2014 [accessed 2015 July 6] http:// www.ceisoftware.com. Main Page. XDMF; 2014 Nov 7 [2015 July 6] http
Large-Scale Dynamic Observation Planning for Unmanned Surface Vessels
2007-06-01
programming language. In addition, the useful development software NetBeans IDE is free and makes the use of Java very user-friendly. 92...3. We implemented the greedy and 3PAA algorithms in Java using the NetBeans IDE version 5.5. 4. The test datasets were generated in MATLAB. 5
Carreer, William J.; Flight, Robert M.; Moseley, Hunter N. B.
2013-01-01
New metabolomics applications of ultra-high resolution and accuracy mass spectrometry can provide thousands of detectable isotopologues, with the number of potentially detectable isotopologues increasing exponentially with the number of stable isotopes used in newer isotope tracing methods like stable isotope-resolved metabolomics (SIRM) experiments. This huge increase in usable data requires software capable of correcting the large number of isotopologue peaks resulting from SIRM experiments in a timely manner. We describe the design of a new algorithm and software system capable of handling these high volumes of data, while including quality control methods for maintaining data quality. We validate this new algorithm against a previous single isotope correction algorithm in a two-step cross-validation. Next, we demonstrate the algorithm and correct for the effects of natural abundance for both 13C and 15N isotopes on a set of raw isotopologue intensities of UDP-N-acetyl-D-glucosamine derived from a 13C/15N-tracing experiment. Finally, we demonstrate the algorithm on a full omics-level dataset. PMID:24404440
GeneXplorer: an interactive web application for microarray data visualization and analysis.
Rees, Christian A; Demeter, Janos; Matese, John C; Botstein, David; Sherlock, Gavin
2004-10-01
When publishing large-scale microarray datasets, it is of great value to create supplemental websites where either the full data, or selected subsets corresponding to figures within the paper, can be browsed. We set out to create a CGI application containing many of the features of some of the existing standalone software for the visualization of clustered microarray data. We present GeneXplorer, a web application for interactive microarray data visualization and analysis in a web environment. GeneXplorer allows users to browse a microarray dataset in an intuitive fashion. It provides simple access to microarray data over the Internet and uses only HTML and JavaScript to display graphic and annotation information. It provides radar and zoom views of the data, allows display of the nearest neighbors to a gene expression vector based on their Pearson correlations and provides the ability to search gene annotation fields. The software is released under the permissive MIT Open Source license, and the complete documentation and the entire source code are freely available for download from CPAN http://search.cpan.org/dist/Microarray-GeneXplorer/.
The Spectral Image Processing System (SIPS): Software for integrated analysis of AVIRIS data
NASA Technical Reports Server (NTRS)
Kruse, F. A.; Lefkoff, A. B.; Boardman, J. W.; Heidebrecht, K. B.; Shapiro, A. T.; Barloon, P. J.; Goetz, A. F. H.
1992-01-01
The Spectral Image Processing System (SIPS) is a software package developed by the Center for the Study of Earth from Space (CSES) at the University of Colorado, Boulder, in response to a perceived need to provide integrated tools for analysis of imaging spectrometer data both spectrally and spatially. SIPS was specifically designed to deal with data from the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) and the High Resolution Imaging Spectrometer (HIRIS), but was tested with other datasets including the Geophysical and Environmental Research Imaging Spectrometer (GERIS), GEOSCAN images, and Landsat TM. SIPS was developed using the 'Interactive Data Language' (IDL). It takes advantage of high speed disk access and fast processors running under the UNIX operating system to provide rapid analysis of entire imaging spectrometer datasets. SIPS allows analysis of single or multiple imaging spectrometer data segments at full spatial and spectral resolution. It also allows visualization and interactive analysis of image cubes derived from quantitative analysis procedures such as absorption band characterization and spectral unmixing. SIPS consists of three modules: SIPS Utilities, SIPS_View, and SIPS Analysis. SIPS version 1.1 is described below.
Array data extractor (ADE): a LabVIEW program to extract and merge gene array data.
Kurtenbach, Stefan; Kurtenbach, Sarah; Zoidl, Georg
2013-12-01
Large data sets from gene expression array studies are publicly available offering information highly valuable for research across many disciplines ranging from fundamental to clinical research. Highly advanced bioinformatics tools have been made available to researchers, but a demand for user-friendly software allowing researchers to quickly extract expression information for multiple genes from multiple studies persists. Here, we present a user-friendly LabVIEW program to automatically extract gene expression data for a list of genes from multiple normalized microarray datasets. Functionality was tested for 288 class A G protein-coupled receptors (GPCRs) and expression data from 12 studies comparing normal and diseased human hearts. Results confirmed known regulation of a beta 1 adrenergic receptor and further indicate novel research targets. Although existing software allows for complex data analyses, the LabVIEW based program presented here, "Array Data Extractor (ADE)", provides users with a tool to retrieve meaningful information from multiple normalized gene expression datasets in a fast and easy way. Further, the graphical programming language used in LabVIEW allows applying changes to the program without the need of advanced programming knowledge.
Lee, Won-Joon; Wilkinson, Caroline M; Hwang, Hyeon-Shik; Lee, Sang-Mi
2015-05-01
Accuracy is the most important factor supporting the reliability of forensic facial reconstruction (FFR) comparing to the corresponding actual face. A number of methods have been employed to evaluate objective accuracy of FFR. Recently, it has been attempted that the degree of resemblance between computer-generated FFR and actual face is measured by geometric surface comparison method. In this study, three FFRs were produced employing live adult Korean subjects and three-dimensional computerized modeling software. The deviations of the facial surfaces between the FFR and the head scan CT of the corresponding subject were analyzed in reverse modeling software. The results were compared with those from a previous study which applied the same methodology as this study except average facial soft tissue depth dataset. Three FFRs of this study that applied updated dataset demonstrated lesser deviation errors between the facial surfaces of the FFR and corresponding subject than those from the previous study. The results proposed that appropriate average tissue depth data are important to increase quantitative accuracy of FFR. © 2015 American Academy of Forensic Sciences.
Developing a new global network of river reaches from merged satellite-derived datasets
NASA Astrophysics Data System (ADS)
Lion, C.; Allen, G. H.; Beighley, E.; Pavelsky, T.
2015-12-01
In 2020, the Surface Water and Ocean Topography satellite (SWOT), a joint mission of NASA/CNES/CSA/UK will be launched. One of its major products will be the measurements of continental water extent, including the width, height, and slope of rivers and the surface area and elevations of lakes. The mission will improve the monitoring of continental water and also our understanding of the interactions between different hydrologic reservoirs. For rivers, SWOT measurements of slope must be carried out over predefined river reaches. As such, an a priori dataset for rivers is needed in order to facilitate analysis of the raw SWOT data. The information required to produce this dataset includes measurements of river width, elevation, slope, planform, river network topology, and flow accumulation. To produce this product, we have linked two existing global datasets: the Global River Widths from Landsat (GRWL) database, which contains river centerline locations, widths, and a braiding index derived from Landsat imagery, and a modified version of the HydroSHEDS hydrologically corrected digital elevation product, which contains heights and flow accumulation measurements for streams at 3 arcsecond spatial resolution. Merging these two datasets requires considerable care. The difficulties, among others, lie in the difference of resolution: 30m versus 3 arseconds, and the age of the datasets: 2000 versus ~2010 (some rivers have moved, the braided sections are different). As such, we have developed custom software to merge the two datasets, taking into account the spatial proximity of river channels in the two datasets and ensuring that flow accumulation in the final dataset always increases downstream. Here, we present our preliminary results for a portion of South America and demonstrate the strengths and weaknesses of the method.
Palta, Manisha; Palta, Priya; Bhavsar, Nrupen A; Horton, Janet K; Blitzblau, Rachel C
2015-01-15
The Cancer and Leukemia Group B (CALGB) 9343 randomized phase 3 trial established lumpectomy and adjuvant therapy with tamoxifen alone, rather than both radiotherapy and tamoxifen, as a reasonable treatment course for women aged >70 years with clinical stage I (AJCC 7th edition), estrogen receptor-positive breast cancer. An analysis of the Surveillance, Epidemiology, and End Results (SEER) registry was undertaken to assess practice patterns before and after the publication of this landmark study. The SEER database from 2000 to 2009 was used to identify 40,583 women aged ≥70 years who were treated with breast-conserving surgery for clinical stage I, estrogen receptor-positive and/or progesterone receptor-positive breast cancer. The percentage of patients receiving radiotherapy and the type of radiotherapy delivered was assessed over time. Administration of radiotherapy was further assessed across age groups; SEER cohort; and tumor size, grade, and laterality. Approximately 68.6% of patients treated between 2000 and 2004 compared with 61.7% of patients who were treated between 2005 and 2009 received some form of adjuvant radiotherapy (P < .001). Coinciding with a decline in the use of external beam radiotherapy, there was an increase in the use of implant radiotherapy from 1.4% between 2000 and 2004 to 6.2% between 2005 to 2009 (P < .001). There were significant reductions in the frequency of radiotherapy delivery over time across age groups, tumor size, and tumor grade and regardless of laterality (P < .001 for all). Randomized phase 3 data support the omission of adjuvant radiotherapy in elderly women with early-stage breast cancer. Analysis of practice patterns before and after the publication of these data indicates a significant decline in radiotherapy use; however, nearly two-thirds of women continue to receive adjuvant radiotherapy. © 2014 American Cancer Society.
Health Insurance Affects Head and Neck Cancer Treatment Patterns and Outcomes.
Inverso, Gino; Mahal, Brandon A; Aizer, Ayal A; Donoff, R Bruce; Chuang, Sung-Kiang
2016-06-01
The purpose of this study is to examine the effect of insurance coverage on stage of presentation, treatment, and survival of head and neck cancer (HNC). A retrospective study was conducted using the Surveillance, Epidemiology, and End Results (SEER) program to identify patients diagnosed with HNC. The primary variable of interest was insurance analyzed as a dichotomous variable: Patients were considered uninsured if they were classified as "uninsured" by SEER, whereas patients were considered insured if they were defined by SEER as "any Medicaid," "insured," or "insured/no specifics." The outcomes of interest were cancer stage at presentation (M0 vs M1), receipt of definitive treatment, and HNC-specific mortality (HNCSM). Multivariable logistic regression modeled the association between insurance status and stage at presentation, as well as between insurance status and receipt of definitive treatment, whereas HNCSM was modeled using Fine and Gray competing risks. Sensitivity logistic regression analysis was used to determine whether observed interactions remained significant by insurance type (privately insured, Medicaid, and uninsured). Patients without medical insurance were more likely to present with metastatic cancer (adjusted odds ratio, 1.60; P < .001), were more likely to not receive definitive treatment (adjusted odds ratio, 1.64; P < .001), and had a higher risk of HNCSM (adjusted hazard ratio, 1.20; P = .002). Sensitivity analyses showed that when results were stratified by insurance type, significant interactions remained for uninsured patients and patients with Medicaid. Uninsured patients and patients with Medicaid are more likely to present with metastatic disease, are more likely to not be treated definitively, and are at a higher risk of HNCSM. The treatment gap between Medicaid and private insurance observed in this study should serve as an immediate policy target for health care reform. Copyright © 2016 The American Association of Oral and Maxillofacial Surgeons. Published by Elsevier Inc. All rights reserved.
Ammann, Eric M; Shanafelt, Tait D; Larson, Melissa C; Wright, Kara B; McDowell, Bradley D; Link, Brian K; Chrischilles, Elizabeth A
2017-12-01
Novel targeted therapies offer excellent short-term outcomes in patients with chronic lymphocytic leukemia and small lymphocytic lymphoma (CLL/SLL). However, there is disagreement over how widely these therapies should be used in place of standard chemo-immunotherapy (CIT). We investigated whether stratification on the length of the interval between first-line (T1) and second-line (T2) treatments could identify a subgroup of older patients with relapsed CLL/SLL with an expectation of normal overall survival, and for whom CIT could be an acceptable treatment choice. Patients with relapsed CLL/SLL who received T2 were identified from the SEER-Medicare Linked Database. Five-year relative survival (RS5; ie, the ratio of observed survival to expected survival based on population life tables) was assessed after stratifying patients on the interval between T1 and T2. We then validated our findings in the Mayo Clinic CLL Database. Among 1974 SEER-Medicare patients (median age = 77 years) who received T2 for relapsed CLL/SLL, longer time-to-retreatment was associated with a modestly improved prognosis (P = .01). However, even among those retreated ≥ 3 years after T1, survival was poor compared with the general population (RS5 = 0.50 or lower in SEER-Medicare). Similar patterns were observed in the younger Mayo validation cohort, although prognosis was better overall among the Mayo patients, and patients with favorable fluorescence in situ hybridization retreated ≥ 3 years after T1 had close to normal expected survival (RS5 = 0.87). Further research is needed to quantify the degree to which targeted therapies provide meaningful improvements over CIT in long-term outcomes for older patients with relapsed CLL/SLL. Copyright © 2017 Elsevier Inc. All rights reserved.
Tien, Yu-Yu; Wright, Kara; Halfdanarson, Thorvardur R.; Abu-Hejleh, Taher; Brooks, John M.
2016-01-01
Objectives The purpose of this study was to assess to what extent geographic variation in adjuvant treatment for non-small cell lung cancer (NSCLC) patients would remain, after controlling for patient and area-level characteristics. Materials and Methods A retrospective cohort of 18,410 Medicare beneficiaries with resected, stage I-IIIA NSCLC was identified from the Surveillance, Epidemiology, and End Results (SEER)-Medicare linked database. Adjuvant therapies were classified as adjuvant chemotherapy (ACT), postoperative radiation therapy (PORT), or no adjuvant therapy. Predicted treatment probabilities were estimated for each patient given their clinical, demographic, and area-level characteristics with multivariate logistic regression. Area Treatment Ratios were used to estimate the propensity of patients in a local area to receive an adjuvant treatment, controlling for characteristics of patients in the area. Areas were categorized as low-, mid- and high-use and mapped for two representative SEER registries. Results Overall, 10%, 12%, and 78% of patients received ACT, PORT and no adjuvant therapy, respectively. Age, sex, stage, type and year of surgery, and comorbidity were associated with adjuvant treatment use. Even after adjusting for patient characteristics, substantial geographic treatment variation remained. High- and low-use areas were tightly juxtaposed within and across SEER registries, often within the same county. In some local areas, patients were up to eight times more likely to receive adjuvant therapy than expected, given their characteristics. On the other hand, almost a quarter of patients lived in local areas in which patients were more than three times less likely to receive ACT than would be predicted. Conclusion Controlling for patient and area-level covariates did not remove geographic variation in adjuvant therapies for resected NSCLC patients. A greater proportion of patients were treated less than expected, rather than more than expected. Further research is needed to better understand its causes and potential impact on outcomes. PMID:27040848
Halpern, Michael T; Urato, Matthew P; Kent, Erin E
2017-01-01
Providing high-quality medical care for individuals with cancer during their last year of life involves a range of challenges. An important component of high-quality care during this critical period is ensuring optimal patient satisfaction. The objective of the current study was to assess factors influencing health care ratings among individuals with cancer within 1 year before death. The current study used the Surveillance, Epidemiology, and End Results (SEER)-Consumer Assessment of Healthcare Providers and Systems (CAHPS) data set, a new data resource linking patient-reported information from the CAHPS Medicare Survey with clinical information from the National Cancer Institute's SEER program. The study included 5102 Medicare beneficiaries diagnosed with cancer who completed CAHPS between 1998 and 2011 within 1 year before their death. Multivariable logistic regression analyses examined associations between patient demographic and insurance characteristics with 9 measures of health care experience. Patients with higher general or mental health status were significantly more likely to indicate excellent experience with nearly all measures examined. Sex, race/ethnicity, and education also were found to be significant predictors for certain ratings. Greater time before death predicted an increased likelihood of higher ratings for health plan and specialist physician. Clinical characteristics were found to have few significant associations with experience of care. Individuals in fee-for-service Medicare plans (vs Medicare Advantage) had a greater likelihood of excellent experience with health plans, getting care quickly, and getting needed care. Among patients with cancer within 1 year before death, experience with health plans, physicians, and medical care were found to be associated with sociodemographic, insurance, and clinical characteristics. These findings provide guidance for the development of programs to improve the experience of care among individuals with cancer. Cancer 2017;123:336-344. © 2016 American Cancer Society. © 2016 American Cancer Society. This article has been contributed to by US Government employees and their work is in the public domain in the USA.
Hodgkin disease survival in Europe and the U.S.: prognostic significance of morphologic groups.
Allemani, Claudia; Sant, Milena; De Angelis, Roberta; Marcos-Gragera, Rafael; Coebergh, Jan Willem
2006-07-15
The survival of patients with Hodgkin disease (HD) varies markedly across Europe and generally is shorter than the survival of patients in the U.S. To investigate these differences, the authors compared population-based HD survival in relation to morphologic type among populations in Europe and the U.S. The authors analyzed 6726 patients from 37 cancer registries that participated in EUROCARE-3 and 3442 patients from 9 U.S. Surveillance, Epidemiology, and End Results (SEER) registries. Patients were diagnosed during 1990 to 1994 and were followed for at least 5 years. The European registries were grouped into EUROCARE West, EUROCARE UK, and EUROCARE East. Morphologic groups were nodular sclerosis, mixed cellularity, lymphocyte depletion, lymphocyte predominance, and not otherwise specified (NOS). The influence of morphology on geographic differences in 5-year relative survival was explored by using multiple regression analysis. In the model that was adjusted by age, gender, and years since diagnosis, the relative excess risk (RER) of death was 0.93 (95% confidence interval [95% CI], 0.81-1.05) in EUROCARE West, 1.15 (95% CI, 1.04-1.28) in EUROCARE UK, and 1.39 (95% CI, 1.21-1.60) in EUROCARE East (compared with the SEER data). When morphology was included, EUROCARE UK and SEER no longer differed (RER, 1.06; 95% CI, 0.95-1.18). Morphology distribution varied markedly across Europe and much less in the U.S., with nodular sclerosis less common in Europe (45.9%) than the U.S. (61.7%). The RER data showed that patients who had lymphocyte depletion, NOS, and mixed cellularity had a significantly worse prognoses compared with patients who had nodular sclerosis, whereas patients who had lymphocyte predominance had the best prognosis. The current results provide population-based evidence that morphology strongly influences the prognosis of patients with HD. However differences in the morphologic case mix explains only some of the geographic variations observed in survival.
Chen, Vivien W.; Ruiz, Bernardo A.; Hsieh, Mei-Chin; Wu, Xiao-Cheng; Ries, Lynn; Lewis, Denise R.
2014-01-01
Introduction The American Joint Committee on Cancer (AJCC) 7th edition introduced major changes in the staging of lung cancer, including Tumor (T), Node (N), Metastasis (M) (TNM) system and new stage/prognostic site-specific factors (SSFs), collected under the Collaborative Stage Version 2 (CSv2) Data Collection System. The intent was to improve the stage precision which could guide treatment options and ultimately lead to better survival. This report examines stage trends, the change in stage distributions from the AJCC 6th to the 7th edition, and findings of the prognostic SSFs for 2010 lung cancer cases. Methods Data were from the November 2012 submission of 18 Surveillance, Epidemiology, and End Results (SEER) Program population-based registries. A total of 344 797 cases of lung cancer, diagnosed in 2004–2010, were analyzed. Results The percentages of small tumors and early stage lung cancer cases increased from 2004 to 2010. The AJCC 7th edition, implemented for 2010 diagnosis year, subclassified tumor size and reclassified multiple tumor nodules, pleural effusions, and involvement of tumors in the contralateral lung, resulting in a slight decrease in stage IB and stage IIIB and a small increase in stage IIA and stage IV. Overall about 80% of cases remained the same stage group in AJCC 6th and 7th editions. About 21% of lung cancer patients had separate tumor nodules in the ipsilateral (same) lung, and 23% of the surgically resected patients had visceral pleural invasion, both adverse prognostic factors. Conclusion It is feasible for high quality population-based registries such as the SEER Program to collect more refined staging and prognostic SSFs that allows better categorization of lung cancer patients with different clinical outcomes and to assess their survival. PMID:25412390
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pollom, Erqi L., E-mail: erqiliu@stanford.edu; Wang, Guanying; Harris, Jeremy P.
Purpose: We examined the impact of intensity modulated radiation therapy (IMRT) on hospitalization rates in the Surveillance, Epidemiology, and End Results (SEER)–Medicare population with anal squamous cell carcinoma (SCC). Methods and Materials: We performed a retrospective cohort study using the SEER-Medicare database. We identified patients with nonmetastatic anal SCC diagnosed between 2001 and 2011 and treated with chemoradiation therapy. We assessed the relation between IMRT and first hospitalization by use of a multivariate competing-risk model, as well as instrumental variable analysis, using provider IMRT affinity as our instrument. Results: Of the 1165 patients included in our study, 458 (39%) receivedmore » IMRT. IMRT use increased over time and was associated more with regional and provider characteristics than with patient characteristics. The 3- and 6-month cumulative incidences of first hospitalization were 41.9% (95% confidence interval [CI], 37.3%-46.4%) and 47.6% (95% CI, 43.0%-52.2%), respectively, for the IMRT cohort and 46.7% (95% CI, 43.0%-50.4%) and 52.1% (95% CI, 48.4%-55.7%), respectively, for the non-IMRT cohort. IMRT was associated with a decreased hazard of first hospitalization compared with 3-dimensional radiation techniques (hazard ratio, 0.70; 95% CI, 0.58-0.84; P=.0002). Instrumental variable analysis suggested an even greater reduction in hospitalizations with IMRT after controlling for unmeasured confounders. There was a trend toward improved overall survival with IMRT, with an adjusted hazard ratio of 0.77 (95% CI, 0.59-1.00; P=.05). Conclusions: The use of IMRT is associated with reduced hospitalizations in elderly patients with anal SCC. Further work is warranted to understand the long-term health and cost impact of IMRT, particularly for patient subgroups most at risk of toxicity and hospitalization.« less
Confocal imaging of transmembrane voltage by SEER of di-8-ANEPPS.
Manno, Carlo; Figueroa, Lourdes; Fitts, Robert; Ríos, Eduardo
2013-03-01
Imaging, optical mapping, and optical multisite recording of transmembrane potential (V(m)) are essential for studying excitable cells and systems. The naphthylstyryl voltage-sensitive dyes, including di-8-ANEPPS, shift both their fluorescence excitation and emission spectra upon changes in V(m). Accordingly, they have been used for monitoring V(m) in nonratioing and both emission and excitation ratioing modes. Their changes in fluorescence are usually much less than 10% per 100 mV. Conventional ratioing increases sensitivity to between 3 and 15% per 100 mV. Low sensitivity limits the value of these dyes, especially when imaged with low light systems like confocal scanners. Here we demonstrate the improvement afforded by shifted excitation and emission ratioing (SEER) as applied to imaging membrane potential in flexor digitorum brevis muscle fibers of adult mice. SEER--the ratioing of two images of fluorescence, obtained with different excitation wavelengths in different emission bands-was implemented in two commercial confocal systems. A conventional pinhole scanner, affording optimal setting of emission bands but less than ideal excitation wavelengths, achieved a sensitivity of up to 27% per 100 mV, nearly doubling the value found by conventional ratioing of the same data. A better pair of excitation lights should increase the sensitivity further, to 35% per 100 mV. The maximum acquisition rate with this system was 1 kHz. A fast "slit scanner" increased the effective rate to 8 kHz, but sensitivity was lower. In its high-sensitivity implementation, the technique demonstrated progressive deterioration of action potentials upon fatiguing tetani induced by stimulation patterns at >40 Hz, thereby identifying action potential decay as a contributor to fatigue onset. Using the fast implementation, we could image for the first time an action potential simultaneously at multiple locations along the t-tubule system. These images resolved the radially varying lag associated with propagation at a finite velocity.
Winters, Brian R; Wright, Jonathan L; Holt, Sarah K; Dash, Atreya; Gore, John L; Schade, George R
2017-09-05
Health related quality of life after radical cystectomy and ileal conduit is not well quantified at the population level. We evaluated health related quality of life in patients with bladder cancer compared with noncancer controls and patients with colorectal cancer using data from SEER (Surveillance, Epidemiology and End Results)-MHOS (Medicare Health Outcomes Survey). SEER-MHOS data from 1998 to 2013 were used to identify patients with bladder cancer and those with colorectal cancer who underwent extirpative surgery with ileal conduit or colostomy creation, respectively. A total of 166 patients with bladder cancer treated with radical cystectomy were propensity matched 1:5 to 830 noncancer controls and compared with 154 patients with colorectal cancer. Differences in Mental and Physical Component Summary scores as well as component subscores were determined between patients with bladder cancer, patients with colorectal cancer and noncancer controls. SEER-MHOS patients were more commonly male and white with a mean ± SD age of 77 ± 6 years. Patients treated with radical cystectomy had significantly lower Physical Component Summary scores, select physical subscale scores and all mental subscale scores compared with noncancer controls. These findings were similar in the subset of 40 patients treated with radical cystectomy who had available preoperative and postoperative survey data. Global Mental Component Summary scores did not differ significantly between the groups. No significant differences were observed in global Mental Component Summary, Physical Component Summary or subscale scores between patients with bladder cancer and patients with colorectal cancer. Patients with bladder cancer who undergo radical cystectomy have significant declines in multiple components of physical and mental health related quality of life vs noncancer controls, which mirror those of patients with colorectal cancer. Further longitudinal study is required to better codify the effectors of poor health related quality of life after radical cystectomy to improve patient expectations and outcomes. Copyright © 2018 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hong, Julian C.; Kruser, Tim J.; Gondi, Vinai
Purpose: Comprehensive neck radiation therapy (RT) has been shown to increase cerebrovascular disease (CVD) risk in advanced-stage head-and-neck cancer. We assessed whether more limited neck RT used for early-stage (T1-T2 N0) glottic cancer is associated with increased CVD risk, using the Surveillance, Epidemiology, and End Results (SEER)-Medicare linked database. Methods and Materials: We identified patients ≥66 years of age with early-stage glottic laryngeal cancer from SEER diagnosed from 1992 to 2007. Patients treated with combined surgery and RT were excluded. Medicare CPT codes for carotid interventions, Medicare ICD-9 codes for cerebrovascular events, and SEER data for stroke as the causemore » of death were collected. Similarly, Medicare CPT and ICD-9 codes for peripheral vascular disease (PVD) were assessed to serve as an internal control between treatment groups. Results: A total of 1413 assessable patients (RT, n=1055; surgery, n=358) were analyzed. The actuarial 10-year risk of CVD was 56.5% (95% confidence interval 51.5%-61.5%) for the RT cohort versus 48.7% (41.1%-56.3%) in the surgery cohort (P=.27). The actuarial 10-year risk of PVD did not differ between the RT (52.7% [48.1%-57.3%]) and surgery cohorts (52.6% [45.2%-60.0%]) (P=.89). Univariate analysis showed an increased association of CVD with more recent diagnosis (P=.001) and increasing age (P=.001). On multivariate Cox analysis, increasing age (P<.001) and recent diagnosis (P=.002) remained significantly associated with a higher CVD risk, whereas the association of RT and CVD remained not statistically significant (HR=1.11 [0.91-1.37,] P=.31). Conclusions: Elderly patients with early-stage laryngeal cancer have a high burden of cerebrovascular events after surgical management or RT. RT and surgery are associated with comparable risk for subsequent CVD development after treatment in elderly patients.« less
Incidence of breast carcinoma in women with thyroid carcinoma.
Vassilopoulou-Sellin, R; Palmer, L; Taylor, S; Cooksley, C S
1999-02-01
Breast carcinoma and differentiated thyroid carcinoma(the most common endocrine malignancy) occur predominantly in women. An association between the two tumors has been suggested by some investigators, but the potential impact of treatment of one of these diseases on the development of the other remains unclear. The authors examined the relation between the occurrence of these two tumors. There were 41,686 patients with breast carcinoma and 3662 with thyroid carcinoma who registered at The University of Texas M. D. Anderson Cancer Center between March 1944 and April 1997. Women who received both diagnoses since 1976 were identified and incidence rates and relative risks of secondary tumor development were calculated. Surveillance, Epidemiology and End Results (SEER) program data on the age-adjusted incidences of these diseases during the same time period were used for the expected incidences in the same population. Among 18,931 women with a diagnosis of breast carcinoma since 1976, 11 developed differentiated thyroid carcinoma > or = 2 years after the diagnosis of breast carcinoma. These breast carcinoma patients contributed 129,336 person-years of follow-up; the observed incidence of thyroid carcinoma in this group was not different from that in a similar age group of women in the SEER database. Among 1013 women with a diagnosis of thyroid carcinoma since 1976, 24 developed breast carcinoma > or = 2 years after the diagnosis of thyroid carcinoma. These thyroid carcinoma patients contributed 8380 person-years of follow-up; the observed incidence of breast carcinoma in women ages 40-49 years was significantly higher than the expected incidence for women in the same age group in the SEER database. Breast carcinoma developing after thyroid carcinoma was diagnosed more frequently than expected in young adult women seen at the study institution since 1976. This potential association and plausible mechanisms of breast carcinoma development after thyroid carcinoma should be evaluated in larger cohorts of patients.
Rosenthal, Mariana; Johnson, Christopher J; Scoppa, Steve; Carter, Kris
2016-01-01
Investigations of suspected cancer clusters are resource intensive and rarely identify true clusters: among 428 publicly reported US investigations during 1990-2011, only 1 etiologic cluster was identified. In 2013, the Cancer Data Registry of Idaho (CDRI) was contacted regarding a suspected cancer cluster at a worksite (Cluster A) and among an occupational cohort (Cluster B). We investigated to determine whether these were true clusters. We derived investigation cohorts for Cluster A from facility-provided employee records and for Cluster B from professional licensing records. We used Registry PlusTM Link Plus to conduct probabilistic linkage of cohort members to the CDRI registry and completed matching through manual review by using LexisNexis®, Accurint®, and the Social Security Death Index. We calculated standardized incidence ratios (SIR) using the MP-SIR session type in SEER*Stat and Idaho and US referent populations. For Cluster A, we identified 34 cancer cases during 9,689 person-years; compared with Idaho and US rates, 95 percent CIs for SIRs included 1.0 for 24 of 24 primary site categories. For Cluster B, we identified 78 cancer cases during 15,154 person-years; compared with Idaho rates, 95 percent CI for SIRs included 1.0 for 23 of 24 primary site categories and was less than 1.0 for lung and bronchus cancers, and compared with US rates, 95 percent CI for SIRs included 1.0 for 22 of 24 primary site categories and was less than 1.0 for lung and bronchus and colorectal cancers. We identified no statistically significant excess in cancer incidence in either cohort. SEER*Stat's MP-SIR is an efficient tool for performing SIR assessments, a Centers for Disease Control and Prevention/Council of State and Territorial Epidemiologists-recommended step when investigating suspected cancer clusters.
Panagopoulou, Paraskevi; Georgakis, Marios K; Baka, Margarita; Moschovi, Maria; Papadakis, Vassilios; Polychronopoulou, Sophia; Kourti, Maria; Hatzipantelis, Emmanuel; Stiakaki, Eftichia; Dana, Helen; Tragiannidis, Athanasios; Bouka, Evdoxia; Antunes, Luis; Bastos, Joana; Coza, Daniela; Demetriou, Anna; Agius, Domenic; Eser, Sultan; Gheorghiu, Raluca; Šekerija, Mario; Trojanowski, Maciej; Žagar, Tina; Zborovskaya, Anna; Ryzhov, Anton; Dessypris, Nick; Morgenstern, Daniel; Petridou, Eleni Th
2018-06-01
Neuroblastoma outcomes vary with disease characteristics, healthcare delivery and socio-economic indicators. We assessed survival patterns and prognostic factors for patients with neuroblastoma in 11 Southern and Eastern European (SEE) countries versus those in the US, including-for the first time-the Nationwide Registry for Childhood Hematological Malignancies and Solid Tumours (NARECHEM-ST)/Greece. Overall survival (OS) was calculated in 13 collaborating SEE childhood cancer registries (1829 cases, ∼1990-2016) and Surveillance, Epidemiology, and End Results (SEER), US (3072 cases, 1990-2012); Kaplan-Meier curves were used along with multivariable Cox regression models assessing the effect of age, gender, primary tumour site, histology, Human Development Index (HDI) and place of residence (urban/rural) on survival. The 5-year OS rates varied widely among the SEE countries (Ukraine: 45%, Poland: 81%) with the overall SEE rate (59%) being significantly lower than in SEER (77%; p < 0.001). In the common registration period within SEE (2000-2008), no temporal trend was noted as opposed to a significant increase in SEER. Age >12 months (hazard ratio [HR]: 2.8-4.7 in subsequent age groups), male gender (HR: 1.1), residence in rural areas (HR: 1.3), living in high (HR: 2.2) or medium (HR: 2.4) HDI countries and specific primary tumour location were associated with worse outcome; conversely, ganglioneuroblastoma subtype (HR: 0.28) was associated with higher survival rate. Allowing for the disease profile, children with neuroblastoma in SEE, especially those in rural areas and lower HDI countries, fare worse than patients in the US, mainly during the early years after diagnosis; this may be attributed to presumably modifiable socio-economic and healthcare system performance differentials warranting further research. Copyright © 2018 Elsevier Ltd. All rights reserved.
Oweira, Hani; Petrausch, Ulf; Helbling, Daniel; Schmidt, Jan; Mannhart, Meinrad; Mehrabi, Arianeb; Schöb, Othmar; Giryes, Anwar; Decker, Michael; Abdel-Rahman, Omar
2017-03-14
To evaluate the prognostic value of site-specific metastases among patients with metastatic pancreatic carcinoma registered within the Surveillance, Epidemiology and End Results (SEER) database. SEER database (2010-2013) has been queried through SEER*Stat program to determine the presentation, treatment outcomes and prognostic outcomes of metastatic pancreatic adenocarcinoma according to the site of metastasis. In this study, metastatic pancreatic adenocarcinoma patients were classified according to the site of metastases (liver, lung, bone, brain and distant lymph nodes). We utilized chi-square test to compare the clinicopathological characteristics among different sites of metastases. We used Kaplan-Meier analysis and log-rank testing for survival comparisons. We employed Cox proportional model to perform multivariate analyses of the patient population; and accordingly hazard ratios with corresponding 95%CI were generated. Statistical significance was considered if a two-tailed P value < 0.05 was achieved. A total of 13233 patients with stage IV pancreatic cancer and known sites of distant metastases were identified in the period from 2010-2013 and they were included into the current analysis. Patients with isolated distant nodal involvement or lung metastases have better overall and pancreatic cancer-specific survival compared to patients with isolated liver metastases (for overall survival: lung vs liver metastases: P < 0.0001; distant nodal vs liver metastases: P < 0.0001) (for pancreatic cancer-specific survival: lung vs liver metastases: P < 0.0001; distant nodal vs liver metastases: P < 0.0001). Multivariate analysis revealed that age < 65 years, white race, being married, female gender; surgery to the primary tumor and surgery to the metastatic disease were associated with better overall survival and pancreatic cancer-specific survival. Pancreatic adenocarcinoma patients with isolated liver metastases have worse outcomes compared to patients with isolated lung or distant nodal metastases. Further research is needed to identify the highly selected subset of patients who may benefit from local treatment of the primary tumor and/or metastatic disease.
Welzel, Tania M; Graubard, Barry I; Zeuzem, Stefan; El-Serag, Hashem B; Davila, Jessica A; McGlynn, Katherine A
2011-08-01
Incidence rates of hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) have increased in the United States. Metabolic syndrome is recognized as a risk factor for HCC and a postulated one for ICC. The magnitude of risk, however, has not been investigated on a population level in the United States. We therefore examined the association between metabolic syndrome and the development of these cancers. All persons diagnosed with HCC and ICC between 1993 and 2005 were identified in the Surveillance, Epidemiology, and End Results (SEER)-Medicare database. For comparison, a 5% sample of individuals residing in the same regions as the SEER registries of the cases was selected. The prevalence of metabolic syndrome as defined by the U.S. National Cholesterol Education Program Adult Treatment Panel III criteria, and other risk factors for HCC (hepatitis B virus, hepatitis C virus, alcoholic liver disease, liver cirrhosis, biliary cirrhosis, hemochromatosis, Wilson's disease) and ICC (biliary cirrhosis, cholangitis, cholelithiasis, choledochal cysts, hepatitis B virus, hepatitis C virus, alcoholic liver disease, cirrhosis, inflammatory bowel disease) were compared among persons who developed cancer and those who did not. Logistic regression was used to calculate odds ratios and 95% confidence intervals. The inclusion criteria were met by 3649 HCC cases, 743 ICC cases, and 195,953 comparison persons. Metabolic syndrome was significantly more common among persons who developed HCC (37.1%) and ICC (29.7%) than the comparison group (17.1%, P<0.0001). In adjusted multiple logistic regression analyses, metabolic syndrome remained significantly associated with increased risk of HCC (odds ratio=2.13; 95% confidence interval=1.96-2.31, P<0.0001) and ICC (odds ratio=1.56; 95% confidence interval=1.32-1.83, P<0.0001). Metabolic syndrome is a significant risk factor for development of HCC and ICC in the general U.S. population. Copyright © 2011 American Association for the Study of Liver Diseases.
ECOALIM: A Dataset of Environmental Impacts of Feed Ingredients Used in French Animal Production.
Wilfart, Aurélie; Espagnol, Sandrine; Dauguet, Sylvie; Tailleur, Aurélie; Gac, Armelle; Garcia-Launay, Florence
2016-01-01
Feeds contribute highly to environmental impacts of livestock products. Therefore, formulating low-impact feeds requires data on environmental impacts of feed ingredients with consistent perimeters and methodology for life cycle assessment (LCA). We created the ECOALIM dataset of life cycle inventories (LCIs) and associated impacts of feed ingredients used in animal production in France. It provides several perimeters for LCIs (field gate, storage agency gate, plant gate and harbour gate) with homogeneously collected data from French R&D institutes covering the 2005-2012 period. The dataset of environmental impacts is available as a Microsoft® Excel spreadsheet on the ECOALIM website and provides climate change, acidification, eutrophication, non-renewable and total cumulative energy demand, phosphorus demand, and land occupation. LCIs in the ECOALIM dataset are available in the AGRIBALYSE® database in SimaPro® software. The typology performed on the dataset classified the 149 average feed ingredients into categories of low impact (co-products of plant origin and minerals), high impact (feed-use amino acids, fats and vitamins) and intermediate impact (cereals, oilseeds, oil meals and protein crops). Therefore, the ECOALIM dataset can be used by feed manufacturers and LCA practitioners to investigate formulation of low-impact feeds. It also provides data for environmental evaluation of feeds and animal production systems. Included in AGRIBALYSE® database and SimaPro®, the ECOALIM dataset will benefit from their procedures for maintenance and regular updating. Future use can also include environmental labelling of commercial products from livestock production.
ECOALIM: A Dataset of Environmental Impacts of Feed Ingredients Used in French Animal Production
Espagnol, Sandrine; Dauguet, Sylvie; Tailleur, Aurélie; Gac, Armelle; Garcia-Launay, Florence
2016-01-01
Feeds contribute highly to environmental impacts of livestock products. Therefore, formulating low-impact feeds requires data on environmental impacts of feed ingredients with consistent perimeters and methodology for life cycle assessment (LCA). We created the ECOALIM dataset of life cycle inventories (LCIs) and associated impacts of feed ingredients used in animal production in France. It provides several perimeters for LCIs (field gate, storage agency gate, plant gate and harbour gate) with homogeneously collected data from French R&D institutes covering the 2005–2012 period. The dataset of environmental impacts is available as a Microsoft® Excel spreadsheet on the ECOALIM website and provides climate change, acidification, eutrophication, non-renewable and total cumulative energy demand, phosphorus demand, and land occupation. LCIs in the ECOALIM dataset are available in the AGRIBALYSE® database in SimaPro® software. The typology performed on the dataset classified the 149 average feed ingredients into categories of low impact (co-products of plant origin and minerals), high impact (feed-use amino acids, fats and vitamins) and intermediate impact (cereals, oilseeds, oil meals and protein crops). Therefore, the ECOALIM dataset can be used by feed manufacturers and LCA practitioners to investigate formulation of low-impact feeds. It also provides data for environmental evaluation of feeds and animal production systems. Included in AGRIBALYSE® database and SimaPro®, the ECOALIM dataset will benefit from their procedures for maintenance and regular updating. Future use can also include environmental labelling of commercial products from livestock production. PMID:27930682
A web Accessible Framework for Discovery, Visualization and Dissemination of Polar Data
NASA Astrophysics Data System (ADS)
Kirsch, P. J.; Breen, P.; Barnes, T. D.
2007-12-01
A web accessible information framework, currently under development within the Physical Sciences Division of the British Antarctic Survey is described. The datasets accessed are generally heterogeneous in nature from fields including space physics, meteorology, atmospheric chemistry, ice physics, and oceanography. Many of these are returned in near real time over a 24/7 limited bandwidth link from remote Antarctic Stations and ships. The requirement is to provide various user groups - each with disparate interests and demands - a system incorporating a browsable and searchable catalogue; bespoke data summary visualization, metadata access facilities and download utilities. The system allows timely access to raw and processed datasets through an easily navigable discovery interface. Once discovered, a summary of the dataset can be visualized in a manner prescribed by the particular projects and user communities or the dataset may be downloaded, subject to accessibility restrictions that may exist. In addition, access to related ancillary information including software, documentation, related URL's and information concerning non-electronic media (of particular relevance to some legacy datasets) is made directly available having automatically been associated with a dataset during the discovery phase. Major components of the framework include the relational database containing the catalogue, the organizational structure of the systems holding the data - enabling automatic updates of the system catalogue and real-time access to data -, the user interface design, and administrative and data management scripts allowing straightforward incorporation of utilities, datasets and system maintenance.
Serial femtosecond crystallography datasets from G protein-coupled receptors
White, Thomas A.; Barty, Anton; Liu, Wei; Ishchenko, Andrii; Zhang, Haitao; Gati, Cornelius; Zatsepin, Nadia A.; Basu, Shibom; Oberthür, Dominik; Metz, Markus; Beyerlein, Kenneth R.; Yoon, Chun Hong; Yefanov, Oleksandr M.; James, Daniel; Wang, Dingjie; Messerschmidt, Marc; Koglin, Jason E.; Boutet, Sébastien; Weierstall, Uwe; Cherezov, Vadim
2016-01-01
We describe the deposition of four datasets consisting of X-ray diffraction images acquired using serial femtosecond crystallography experiments on microcrystals of human G protein-coupled receptors, grown and delivered in lipidic cubic phase, at the Linac Coherent Light Source. The receptors are: the human serotonin receptor 2B in complex with an agonist ergotamine, the human δ-opioid receptor in complex with a bi-functional peptide ligand DIPP-NH2, the human smoothened receptor in complex with an antagonist cyclopamine, and finally the human angiotensin II type 1 receptor in complex with the selective antagonist ZD7155. All four datasets have been deposited, with minimal processing, in an HDF5-based file format, which can be used directly for crystallographic processing with CrystFEL or other software. We have provided processing scripts and supporting files for recent versions of CrystFEL, which can be used to validate the data. PMID:27479354
TriageTools: tools for partitioning and prioritizing analysis of high-throughput sequencing data.
Fimereli, Danai; Detours, Vincent; Konopka, Tomasz
2013-04-01
High-throughput sequencing is becoming a popular research tool but carries with it considerable costs in terms of computation time, data storage and bandwidth. Meanwhile, some research applications focusing on individual genes or pathways do not necessitate processing of a full sequencing dataset. Thus, it is desirable to partition a large dataset into smaller, manageable, but relevant pieces. We present a toolkit for partitioning raw sequencing data that includes a method for extracting reads that are likely to map onto pre-defined regions of interest. We show the method can be used to extract information about genes of interest from DNA or RNA sequencing samples in a fraction of the time and disk space required to process and store a full dataset. We report speedup factors between 2.6 and 96, depending on settings and samples used. The software is available at http://www.sourceforge.net/projects/triagetools/.
Minutes of the CD-ROM Workshop
NASA Technical Reports Server (NTRS)
King, Joseph H.; Grayzeck, Edwin J.
1989-01-01
The workshop described in this document had two goals: (1) to establish guidelines for the CD-ROM as a tool to distribute datasets; and (2) to evaluate current scientific CD-ROM projects as an archive. Workshop attendees were urged to coordinate with European groups to develop CD-ROM, which is already available at low cost in the U.S., as a distribution medium for astronomical datasets. It was noted that NASA has made the CD Publisher at the National Space Science Data Center (NSSDC) available to the scientific community when the Publisher is not needed for NASA work. NSSDC's goal is to provide the Publisher's user with the hardware and software tools needed to design a user's dataset for distribution. This includes producing a master CD and copies. The prerequisite premastering process is described, as well as guidelines for CD-ROM construction. The production of discs was evaluated. CD-ROM projects, guidelines, and problems of the technology were discussed.
HYDRA Hyperspectral Data Research Application Tom Rink and Tom Whittaker
NASA Astrophysics Data System (ADS)
Rink, T.; Whittaker, T.
2005-12-01
HYDRA is a freely available, easy to install tool for visualization and analysis of large local or remote hyper/multi-spectral datasets. HYDRA is implemented on top of the open source VisAD Java library via Jython - the Java implementation of the user friendly Python programming language. VisAD provides data integration, through its generalized data model, user-display interaction and display rendering. Jython has an easy to read, concise, scripting-like, syntax which eases software development. HYDRA allows data sharing of large datasets through its support of the OpenDAP and OpenADDE server-client protocols. The users can explore and interrogate data, and subset in physical and/or spectral space to isolate key areas of interest for further analysis without having to download an entire dataset. It also has an extensible data input architecture to recognize new instruments and understand different local file formats, currently NetCDF and HDF4 are supported.
Serial femtosecond crystallography datasets from G protein-coupled receptors.
White, Thomas A; Barty, Anton; Liu, Wei; Ishchenko, Andrii; Zhang, Haitao; Gati, Cornelius; Zatsepin, Nadia A; Basu, Shibom; Oberthür, Dominik; Metz, Markus; Beyerlein, Kenneth R; Yoon, Chun Hong; Yefanov, Oleksandr M; James, Daniel; Wang, Dingjie; Messerschmidt, Marc; Koglin, Jason E; Boutet, Sébastien; Weierstall, Uwe; Cherezov, Vadim
2016-08-01
We describe the deposition of four datasets consisting of X-ray diffraction images acquired using serial femtosecond crystallography experiments on microcrystals of human G protein-coupled receptors, grown and delivered in lipidic cubic phase, at the Linac Coherent Light Source. The receptors are: the human serotonin receptor 2B in complex with an agonist ergotamine, the human δ-opioid receptor in complex with a bi-functional peptide ligand DIPP-NH2, the human smoothened receptor in complex with an antagonist cyclopamine, and finally the human angiotensin II type 1 receptor in complex with the selective antagonist ZD7155. All four datasets have been deposited, with minimal processing, in an HDF5-based file format, which can be used directly for crystallographic processing with CrystFEL or other software. We have provided processing scripts and supporting files for recent versions of CrystFEL, which can be used to validate the data.
Automated Analysis of Fluorescence Microscopy Images to Identify Protein-Protein Interactions
Venkatraman, S.; Doktycz, M. J.; Qi, H.; ...
2006-01-01
The identification of protein interactions is important for elucidating biological networks. One obstacle in comprehensive interaction studies is the analyses of large datasets, particularly those containing images. Development of an automated system to analyze an image-based protein interaction dataset is needed. Such an analysis system is described here, to automatically extract features from fluorescence microscopy images obtained from a bacterial protein interaction assay. These features are used to relay quantitative values that aid in the automated scoring of positive interactions. Experimental observations indicate that identifying at least 50% positive cells in an image is sufficient to detect a protein interaction.more » Based on this criterion, the automated system presents 100% accuracy in detecting positive interactions for a dataset of 16 images. Algorithms were implemented using MATLAB and the software developed is available on request from the authors.« less
Development of web-GIS system for analysis of georeferenced geophysical data
NASA Astrophysics Data System (ADS)
Okladnikov, I.; Gordov, E. P.; Titov, A. G.; Bogomolov, V. Y.; Genina, E.; Martynova, Y.; Shulgina, T. M.
2012-12-01
Georeferenced datasets (meteorological databases, modeling and reanalysis results, remote sensing products, etc.) are currently actively used in numerous applications including modeling, interpretation and forecast of climatic and ecosystem changes for various spatial and temporal scales. Due to inherent heterogeneity of environmental datasets as well as their huge size which might constitute up to tens terabytes for a single dataset at present studies in the area of climate and environmental change require a special software support. A dedicated web-GIS information-computational system for analysis of georeferenced climatological and meteorological data has been created. The information-computational system consists of 4 basic parts: computational kernel developed using GNU Data Language (GDL), a set of PHP-controllers run within specialized web-portal, JavaScript class libraries for development of typical components of web mapping application graphical user interface (GUI) based on AJAX technology, and an archive of geophysical datasets. Computational kernel comprises of a number of dedicated modules for querying and extraction of data, mathematical and statistical data analysis, visualization, and preparing output files in geoTIFF and netCDF format containing processing results. Specialized web-portal consists of a web-server Apache, complying OGC standards Geoserver software which is used as a base for presenting cartographical information over the Web, and a set of PHP-controllers implementing web-mapping application logic and governing computational kernel. JavaScript libraries aiming at graphical user interface development are based on GeoExt library combining ExtJS Framework and OpenLayers software. The archive of geophysical data consists of a number of structured environmental datasets represented by data files in netCDF, HDF, GRIB, ESRI Shapefile formats. For processing by the system are available: two editions of NCEP/NCAR Reanalysis, JMA/CRIEPI JRA-25 Reanalysis, ECMWF ERA-40 Reanalysis, ECMWF ERA Interim Reanalysis, MRI/JMA APHRODITE's Water Resources Project Reanalysis, DWD Global Precipitation Climatology Centre's data, GMAO Modern Era-Retrospective analysis for Research and Applications, meteorological observational data for the territory of the former USSR for the 20th century, results of modeling by global and regional climatological models, and others. The system is already involved into a scientific research process. Particularly, recently the system was successfully used for analysis of Siberia climate changes and its impact in the region. The Web-GIS information-computational system for geophysical data analysis provides specialists involved into multidisciplinary research projects with reliable and practical instruments for complex analysis of climate and ecosystems changes on global and regional scales. Using it even unskilled user without specific knowledge can perform computational processing and visualization of large meteorological, climatological and satellite monitoring datasets through unified web-interface in a common graphical web-browser. This work is partially supported by the Ministry of education and science of the Russian Federation (contract #07.514.114044), projects IV.31.1.5, IV.31.2.7, RFBR grants #10-07-00547a, #11-05-01190a, and integrated project SB RAS #131.
BIAS: Bioinformatics Integrated Application Software.
Finak, G; Godin, N; Hallett, M; Pepin, F; Rajabi, Z; Srivastava, V; Tang, Z
2005-04-15
We introduce a development platform especially tailored to Bioinformatics research and software development. BIAS (Bioinformatics Integrated Application Software) provides the tools necessary for carrying out integrative Bioinformatics research requiring multiple datasets and analysis tools. It follows an object-relational strategy for providing persistent objects, allows third-party tools to be easily incorporated within the system and supports standards and data-exchange protocols common to Bioinformatics. BIAS is an OpenSource project and is freely available to all interested users at http://www.mcb.mcgill.ca/~bias/. This website also contains a paper containing a more detailed description of BIAS and a sample implementation of a Bayesian network approach for the simultaneous prediction of gene regulation events and of mRNA expression from combinations of gene regulation events. hallett@mcb.mcgill.ca.
Sargeant, Tobias; Laperrière, David; Ismail, Houssam; Boucher, Geneviève; Rozendaal, Marieke; Lavallée, Vincent-Philippe; Ashton-Beaucage, Dariel; Wilhelm, Brian; Hébert, Josée; Hilton, Douglas J.
2017-01-01
Abstract Genome-wide transcriptome profiling has enabled non-supervised classification of tumours, revealing different sub-groups characterized by specific gene expression features. However, the biological significance of these subtypes remains for the most part unclear. We describe herein an interactive platform, Minimum Spanning Trees Inferred Clustering (MiSTIC), that integrates the direct visualization and comparison of the gene correlation structure between datasets, the analysis of the molecular causes underlying co-variations in gene expression in cancer samples, and the clinical annotation of tumour sets defined by the combined expression of selected biomarkers. We have used MiSTIC to highlight the roles of specific transcription factors in breast cancer subtype specification, to compare the aspects of tumour heterogeneity targeted by different prognostic signatures, and to highlight biomarker interactions in AML. A version of MiSTIC preloaded with datasets described herein can be accessed through a public web server (http://mistic.iric.ca); in addition, the MiSTIC software package can be obtained (github.com/iric-soft/MiSTIC) for local use with personalized datasets. PMID:28472340
A method for generating new datasets based on copy number for cancer analysis.
Kim, Shinuk; Kon, Mark; Kang, Hyunsik
2015-01-01
New data sources for the analysis of cancer data are rapidly supplementing the large number of gene-expression markers used for current methods of analysis. Significant among these new sources are copy number variation (CNV) datasets, which typically enumerate several hundred thousand CNVs distributed throughout the genome. Several useful algorithms allow systems-level analyses of such datasets. However, these rich data sources have not yet been analyzed as deeply as gene-expression data. To address this issue, the extensive toolsets used for analyzing expression data in cancerous and noncancerous tissue (e.g., gene set enrichment analysis and phenotype prediction) could be redirected to extract a great deal of predictive information from CNV data, in particular those derived from cancers. Here we present a software package capable of preprocessing standard Agilent copy number datasets into a form to which essentially all expression analysis tools can be applied. We illustrate the use of this toolset in predicting the survival time of patients with ovarian cancer or glioblastoma multiforme and also provide an analysis of gene- and pathway-level deletions in these two types of cancer.
Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering.
Sun, Peng; Speicher, Nora K; Röttger, Richard; Guo, Jiong; Baumbach, Jan
2014-05-01
The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as 'simultaneous clustering' or 'co-clustering', has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new heuristic: 'Bi-Force'. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of pairwise similarities. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol in a recent review paper from Eren et al. (2013) (A comparative analysis of biclustering algorithms for gene expressiondata. Brief. Bioinform., 14:279-292.) and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, BiMax, Spectral, xMOTIFs and ISA. To this end, a suite of synthetic datasets as well as nine large gene expression datasets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used datasets are publicly available at http://biclue.mpi-inf.mpg.de. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
A benchmark for comparison of cell tracking algorithms
Maška, Martin; Ulman, Vladimír; Svoboda, David; Matula, Pavel; Matula, Petr; Ederra, Cristina; Urbiola, Ainhoa; España, Tomás; Venkatesan, Subramanian; Balak, Deepak M.W.; Karas, Pavel; Bolcková, Tereza; Štreitová, Markéta; Carthel, Craig; Coraluppi, Stefano; Harder, Nathalie; Rohr, Karl; Magnusson, Klas E. G.; Jaldén, Joakim; Blau, Helen M.; Dzyubachyk, Oleh; Křížek, Pavel; Hagen, Guy M.; Pastor-Escuredo, David; Jimenez-Carretero, Daniel; Ledesma-Carbayo, Maria J.; Muñoz-Barrutia, Arrate; Meijering, Erik; Kozubek, Michal; Ortiz-de-Solorzano, Carlos
2014-01-01
Motivation: Automatic tracking of cells in multidimensional time-lapse fluorescence microscopy is an important task in many biomedical applications. A novel framework for objective evaluation of cell tracking algorithms has been established under the auspices of the IEEE International Symposium on Biomedical Imaging 2013 Cell Tracking Challenge. In this article, we present the logistics, datasets, methods and results of the challenge and lay down the principles for future uses of this benchmark. Results: The main contributions of the challenge include the creation of a comprehensive video dataset repository and the definition of objective measures for comparison and ranking of the algorithms. With this benchmark, six algorithms covering a variety of segmentation and tracking paradigms have been compared and ranked based on their performance on both synthetic and real datasets. Given the diversity of the datasets, we do not declare a single winner of the challenge. Instead, we present and discuss the results for each individual dataset separately. Availability and implementation: The challenge Web site (http://www.codesolorzano.com/celltrackingchallenge) provides access to the training and competition datasets, along with the ground truth of the training videos. It also provides access to Windows and Linux executable files of the evaluation software and most of the algorithms that competed in the challenge. Contact: codesolorzano@unav.es Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24526711
2012-01-01
Background MicroRNAs (miRNAs) are noncoding RNAs that direct post-transcriptional regulation of protein coding genes. Recent studies have shown miRNAs are important for controlling many biological processes, including nervous system development, and are highly conserved across species. Given their importance, computational tools are necessary for analysis, interpretation and integration of high-throughput (HTP) miRNA data in an increasing number of model species. The Bioinformatics Resource Manager (BRM) v2.3 is a software environment for data management, mining, integration and functional annotation of HTP biological data. In this study, we report recent updates to BRM for miRNA data analysis and cross-species comparisons across datasets. Results BRM v2.3 has the capability to query predicted miRNA targets from multiple databases, retrieve potential regulatory miRNAs for known genes, integrate experimentally derived miRNA and mRNA datasets, perform ortholog mapping across species, and retrieve annotation and cross-reference identifiers for an expanded number of species. Here we use BRM to show that developmental exposure of zebrafish to 30 uM nicotine from 6–48 hours post fertilization (hpf) results in behavioral hyperactivity in larval zebrafish and alteration of putative miRNA gene targets in whole embryos at developmental stages that encompass early neurogenesis. We show typical workflows for using BRM to integrate experimental zebrafish miRNA and mRNA microarray datasets with example retrievals for zebrafish, including pathway annotation and mapping to human ortholog. Functional analysis of differentially regulated (p<0.05) gene targets in BRM indicates that nicotine exposure disrupts genes involved in neurogenesis, possibly through misregulation of nicotine-sensitive miRNAs. Conclusions BRM provides the ability to mine complex data for identification of candidate miRNAs or pathways that drive phenotypic outcome and, therefore, is a useful hypothesis generation tool for systems biology. The miRNA workflow in BRM allows for efficient processing of multiple miRNA and mRNA datasets in a single software environment with the added capability to interact with public data sources and visual analytic tools for HTP data analysis at a systems level. BRM is developed using Java™ and other open-source technologies for free distribution (http://www.sysbio.org/dataresources/brm.stm). PMID:23174015
UFO (UnFold Operator) user guide
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kissel, L.; Biggs, F.; Marking, T.R.
UFO is a collection of interactive utility programs for estimating unknown functions of one variable using a wide-ranging class of information as input, for miscellaneous data-analysis applications, for performing feasibility studies, and for supplementing our other software. Inverse problems, which include spectral unfolds, inverse heat-transfer problems, time-domain deconvolution, and unusual or difficult curve-fit problems, are classes of applications for which UFO is well suited. Extensive use of B-splines and (X,Y)-datasets is made to represent functions. The (X,Y)-dataset representation is unique in that it is not restricted to equally-spaced data. This feature is used, for example, in a table-generating algorithm thatmore » evaluates a function to a user-specified interpolation accuracy while minimizing the number of points stored in the corresponding dataset. UFO offers a variety of miscellaneous data-analysis options such as plotting, comparing, transforming, scaling, integrating; and adding, subtracting, multiplying, and dividing functions together. These options are often needed as intermediate steps in analyzing and solving difficult inverse problems, but they also find frequent use in other applications. Statistical options are available to calculate goodness-of-fit to measurements, specify error bands on solutions, give confidence limits on calculated quantities, and to point out the statistical consequences of operations such as smoothing. UFO is designed to do feasibility studies on a variety of engineering measurements. It is also tailored to supplement our Test Analysis and Design codes, SRAD Test-Data Archive software, and Digital Signal Analysis routines.« less
Indoor Modelling Benchmark for 3D Geometry Extraction
NASA Astrophysics Data System (ADS)
Thomson, C.; Boehm, J.
2014-06-01
A combination of faster, cheaper and more accurate hardware, more sophisticated software, and greater industry acceptance have all laid the foundations for an increased desire for accurate 3D parametric models of buildings. Pointclouds are the data source of choice currently with static terrestrial laser scanning the predominant tool for large, dense volume measurement. The current importance of pointclouds as the primary source of real world representation is endorsed by CAD software vendor acquisitions of pointcloud engines in 2011. Both the capture and modelling of indoor environments require great effort in time by the operator (and therefore cost). Automation is seen as a way to aid this by reducing the workload of the user and some commercial packages have appeared that provide automation to some degree. In the data capture phase, advances in indoor mobile mapping systems are speeding up the process, albeit currently with a reduction in accuracy. As a result this paper presents freely accessible pointcloud datasets of two typical areas of a building each captured with two different capture methods and each with an accurate wholly manually created model. These datasets are provided as a benchmark for the research community to gauge the performance and improvements of various techniques for indoor geometry extraction. With this in mind, non-proprietary, interoperable formats are provided such as E57 for the scans and IFC for the reference model. The datasets can be found at: http://indoor-bench.github.io/indoor-bench.
The Kalman Filter and High Performance Computing at NASA's Data Assimilation Office (DAO)
NASA Technical Reports Server (NTRS)
Lyster, Peter M.
1999-01-01
Atmospheric data assimilation is a method of combining actual observations with model simulations to produce a more accurate description of the earth system than the observations alone provide. The output of data assimilation, sometimes called "the analysis", are accurate regular, gridded datasets of observed and unobserved variables. This is used not only for weather forecasting but is becoming increasingly important for climate research. For example, these datasets may be used to assess retrospectively energy budgets or the effects of trace gases such as ozone. This allows researchers to understand processes driving weather and climate, which have important scientific and policy implications. The primary goal of the NASA's Data Assimilation Office (DAO) is to provide datasets for climate research and to support NASA satellite and aircraft missions. This presentation will: (1) describe ongoing work on the advanced Kalman/Lagrangian filter parallel algorithm for the assimilation of trace gases in the stratosphere; and (2) discuss the Kalman filter in relation to other presentations from the DAO on Four Dimensional Data Assimilation at this meeting. Although the designation "Kalman filter" is often used to describe the overarching work, the series of talks will show that the scientific software and the kind of parallelization techniques that are being developed at the DAO are very different depending on the type of problem being considered, the extent to which the problem is mission critical, and the degree of Software Engineering that has to be applied.
Opening Up Access to Open Access
ERIC Educational Resources Information Center
Singer, Ross
2008-01-01
As the corpus of gray literature grows and the price of serials rises, it becomes increasingly important to explore ways to integrate the free and open Web seamlessly into one's collections. Users, after all, are discovering these materials all the time via sites such as Google Scholar and Scirus or by searching arXiv.org or CiteSeer directly.…
Dickens, Chesterton, and the Future of English Studies
ERIC Educational Resources Information Center
Rampton, David
2014-01-01
The idea that literature has inspirational qualities and is produced by Great Writers has repeatedly come under attack as literary studies seeks to redefine itself. Yet the ability to think of the writer as genius, seer, moral guide, all the romantic possibilities, in short, is arguably as important as it has always been. Engaging with what G.K.…
Rueckl, Martin; Lenzi, Stephen C; Moreno-Velasquez, Laura; Parthier, Daniel; Schmitz, Dietmar; Ruediger, Sten; Johenning, Friedrich W
2017-01-01
The measurement of activity in vivo and in vitro has shifted from electrical to optical methods. While the indicators for imaging activity have improved significantly over the last decade, tools for analysing optical data have not kept pace. Most available analysis tools are limited in their flexibility and applicability to datasets obtained at different spatial scales. Here, we present SamuROI (Structured analysis of multiple user-defined ROIs), an open source Python-based analysis environment for imaging data. SamuROI simplifies exploratory analysis and visualization of image series of fluorescence changes in complex structures over time and is readily applicable at different spatial scales. In this paper, we show the utility of SamuROI in Ca 2+ -imaging based applications at three spatial scales: the micro-scale (i.e., sub-cellular compartments including cell bodies, dendrites and spines); the meso-scale, (i.e., whole cell and population imaging with single-cell resolution); and the macro-scale (i.e., imaging of changes in bulk fluorescence in large brain areas, without cellular resolution). The software described here provides a graphical user interface for intuitive data exploration and region of interest (ROI) management that can be used interactively within Jupyter Notebook: a publicly available interactive Python platform that allows simple integration of our software with existing tools for automated ROI generation and post-processing, as well as custom analysis pipelines. SamuROI software, source code and installation instructions are publicly available on GitHub and documentation is available online. SamuROI reduces the energy barrier for manual exploration and semi-automated analysis of spatially complex Ca 2+ imaging datasets, particularly when these have been acquired at different spatial scales.
Recommendations for a service framework to access astronomical archives
NASA Technical Reports Server (NTRS)
Travisano, J. J.; Pollizzi, J.
1992-01-01
There are a large number of astronomical archives and catalogs on-line for network access, with many different user interfaces and features. Some systems are moving towards distributed access, supplying users with client software for their home sites which connects to servers at the archive site. Many of the issues involved in defining a standard framework of services that archive/catalog suppliers can use to achieve a basic level of interoperability are described. Such a framework would simplify the development of client and server programs to access the wide variety of astronomical archive systems. The primary services that are supplied by current systems include: catalog browsing, dataset retrieval, name resolution, and data analysis. The following issues (and probably more) need to be considered in establishing a standard set of client/server interfaces and protocols: Archive Access - dataset retrieval, delivery, file formats, data browsing, analysis, etc.; Catalog Access - database management systems, query languages, data formats, synchronous/asynchronous mode of operation, etc.; Interoperability - transaction/message protocols, distributed processing mechanisms (DCE, ONC/SunRPC, etc), networking protocols, etc.; Security - user registration, authorization/authentication mechanisms, etc.; Service Directory - service registration, lookup, port/task mapping, parameters, etc.; Software - public vs proprietary, client/server software, standard interfaces to client/server functions, software distribution, operating system portability, data portability, etc. Several archive/catalog groups, notably the Astrophysics Data System (ADS), are already working in many of these areas. In the process of developing StarView, which is the user interface to the Space Telescope Data Archive and Distribution Service (ST-DADS), these issues and the work of others were analyzed. A framework of standard interfaces for accessing services on any archive system which would benefit archive user and supplier alike is proposed.