Science.gov

Sample records for additional independent datasets

  1. Optimal training dataset composition for SVM-based, age-independent, automated epileptic seizure detection.

    PubMed

    Bogaarts, J G; Gommer, E D; Hilkman, D M W; van Kranen-Mastenbroek, V H J M; Reulen, J P H

    2016-08-01

    Automated seizure detection is a valuable asset to health professionals, which makes adequate treatment possible in order to minimize brain damage. Most research focuses on two separate aspects of automated seizure detection: EEG feature computation and classification methods. Little research has been published regarding optimal training dataset composition for patient-independent seizure detection. This paper evaluates the performance of classifiers trained on different datasets in order to determine the optimal dataset for use in classifier training for automated, age-independent, seizure detection. Three datasets are used to train a support vector machine (SVM) classifier: (1) EEG from neonatal patients, (2) EEG from adult patients and (3) EEG from both neonates and adults. To correct for baseline EEG feature differences among patients feature, normalization is essential. Usually dedicated detection systems are developed for either neonatal or adult patients. Normalization might allow for the development of a single seizure detection system for patients irrespective of their age. Two classifier versions are trained on all three datasets: one with feature normalization and one without. This gives us six different classifiers to evaluate using both the neonatal and adults test sets. As a performance measure, the area under the receiver operating characteristics curve (AUC) is used. With application of FBC, it resulted in performance values of 0.90 and 0.93 for neonatal and adult seizure detection, respectively. For neonatal seizure detection, the classifier trained on EEG from adult patients performed significantly worse compared to both the classifier trained on EEG data from neonatal patients and the classier trained on both neonatal and adult EEG data. For adult seizure detection, optimal performance was achieved by either the classifier trained on adult EEG data or the classifier trained on both neonatal and adult EEG data. Our results show that age-independent

  2. Multi-temporal harmonization of independent land-use/land-cover datasets for the conterminous United States

    NASA Astrophysics Data System (ADS)

    Soulard, C. E.; Acevedo, W.

    2013-12-01

    A wide range of national-scale land-use/land-cover (LULC) classification efforts exist, yet key differences between these data arise because of independent programmatic objectives and methodologies. As part of the USGS Climate and Land Use Change Research and Development Program, researchers on the Land Change Research Project are working to assess correspondence, characterize the uncertainties, and resolve discrepancies between national LULC datasets. A collection of fifteen moderate resolution land classification datasets were identified and evaluated both qualitatively and quantitatively prior to harmonization using a pixel-based data fusion process. During harmonization, we reconciled nomenclature differences through limited aggregation of classes to facilitate comparison, followed by implementation of a process for checking classification uncertainty against reference imagery and validation datasets that correspond to the time frame of each dataset. Areas with LULC uncertainty between datasets were edited to reflect the classification with the most supporting evidence. Our harmonization process identified pixels that remained unchanged across core dates in input datasets, then reconciled LULC changes between input data across three intervals (1992-2001, 2001-2006, and 2006-2011). By relying on convergence of evidence across numerous independent datasets, Land Change Research seeks to better understand the uncertainties between LULC data and leverage the best elements of readily-available data to improve LULC change monitoring across the conterminous United States.

  3. Prognostic breast cancer signature identified from 3D culture model accurately predicts clinical outcome across independent datasets

    SciTech Connect

    Martin, Katherine J.; Patrick, Denis R.; Bissell, Mina J.; Fournier, Marcia V.

    2008-10-20

    One of the major tenets in breast cancer research is that early detection is vital for patient survival by increasing treatment options. To that end, we have previously used a novel unsupervised approach to identify a set of genes whose expression predicts prognosis of breast cancer patients. The predictive genes were selected in a well-defined three dimensional (3D) cell culture model of non-malignant human mammary epithelial cell morphogenesis as down-regulated during breast epithelial cell acinar formation and cell cycle arrest. Here we examine the ability of this gene signature (3D-signature) to predict prognosis in three independent breast cancer microarray datasets having 295, 286, and 118 samples, respectively. Our results show that the 3D-signature accurately predicts prognosis in three unrelated patient datasets. At 10 years, the probability of positive outcome was 52, 51, and 47 percent in the group with a poor-prognosis signature and 91, 75, and 71 percent in the group with a good-prognosis signature for the three datasets, respectively (Kaplan-Meier survival analysis, p<0.05). Hazard ratios for poor outcome were 5.5 (95% CI 3.0 to 12.2, p<0.0001), 2.4 (95% CI 1.6 to 3.6, p<0.0001) and 1.9 (95% CI 1.1 to 3.2, p = 0.016) and remained significant for the two larger datasets when corrected for estrogen receptor (ER) status. Hence the 3D-signature accurately predicts breast cancer outcome in both ER-positive and ER-negative tumors, though individual genes differed in their prognostic ability in the two subtypes. Genes that were prognostic in ER+ patients are AURKA, CEP55, RRM2, EPHA2, FGFBP1, and VRK1, while genes prognostic in ER patients include ACTB, FOXM1 and SERPINE2 (Kaplan-Meier p<0.05). Multivariable Cox regression analysis in the largest dataset showed that the 3D-signature was a strong independent factor in predicting breast cancer outcome. The 3D-signature accurately predicts breast cancer outcome across multiple datasets and holds prognostic

  4. Evaluation of results from genome-wide studies of language and reading in a novel independent dataset.

    PubMed

    Carrion-Castillo, A; van Bergen, E; Vino, A; van Zuijen, T; de Jong, P F; Francks, C; Fisher, S E

    2016-07-01

    Recent genome-wide association scans (GWAS) for reading and language abilities have pin-pointed promising new candidate loci. However, the potential contributions of these loci remain to be validated. In this study, we tested 17 of the most significantly associated single nucleotide polymorphisms (SNPs) from these GWAS studies (P < 10(-6) in the original studies) in a new independent population dataset from the Netherlands: known as Familial Influences on Literacy Abilities. This dataset comprised 483 children from 307 nuclear families and 505 adults (including parents of participating children), and provided adequate statistical power to detect the effects that were previously reported. The following measures of reading and language performance were collected: word reading fluency, nonword reading fluency, phonological awareness and rapid automatized naming. Two SNPs (rs12636438 and rs7187223) were associated with performance in multivariate and univariate testing, but these did not remain significant after correction for multiple testing. Another SNP (rs482700) was only nominally associated in the multivariate test. For the rest of the SNPs, we did not find supportive evidence of association. The findings may reflect differences between our study and the previous investigations with respect to the language of testing, the exact tests used and the recruitment criteria. Alternatively, most of the prior reported associations may have been false positives. A larger scale GWAS meta-analysis than those previously performed will likely be required to obtain robust insights into the genomic architecture underlying reading and language. PMID:27198479

  5. Evaluation of results from genome-wide studies of language and reading in a novel independent dataset.

    PubMed

    Carrion-Castillo, A; van Bergen, E; Vino, A; van Zuijen, T; de Jong, P F; Francks, C; Fisher, S E

    2016-07-01

    Recent genome-wide association scans (GWAS) for reading and language abilities have pin-pointed promising new candidate loci. However, the potential contributions of these loci remain to be validated. In this study, we tested 17 of the most significantly associated single nucleotide polymorphisms (SNPs) from these GWAS studies (P < 10(-6) in the original studies) in a new independent population dataset from the Netherlands: known as Familial Influences on Literacy Abilities. This dataset comprised 483 children from 307 nuclear families and 505 adults (including parents of participating children), and provided adequate statistical power to detect the effects that were previously reported. The following measures of reading and language performance were collected: word reading fluency, nonword reading fluency, phonological awareness and rapid automatized naming. Two SNPs (rs12636438 and rs7187223) were associated with performance in multivariate and univariate testing, but these did not remain significant after correction for multiple testing. Another SNP (rs482700) was only nominally associated in the multivariate test. For the rest of the SNPs, we did not find supportive evidence of association. The findings may reflect differences between our study and the previous investigations with respect to the language of testing, the exact tests used and the recruitment criteria. Alternatively, most of the prior reported associations may have been false positives. A larger scale GWAS meta-analysis than those previously performed will likely be required to obtain robust insights into the genomic architecture underlying reading and language.

  6. Accuracy and Precision in the Southern Hemisphere Additional Ozonesondes (SHADOZ) Dataset in Light of the JOSIE-2000 Results

    NASA Technical Reports Server (NTRS)

    Witte, Jacquelyn C.; Thompson, Anne M.; Schmidlin, F. J.; Oltmans, S. J.; Smit, H. G. J.

    2004-01-01

    Since 1998 the Southern Hemisphere ADditional OZonesondes (SHADOZ) project has provided over 2000 ozone profiles over eleven southern hemisphere tropical and subtropical stations. Balloon-borne electrochemical concentration cell (ECC) ozonesondes are used to measure ozone. The data are archived at: &ttp://croc.gsfc.nasa.gov/shadoz>. In analysis of ozonesonde imprecision within the SHADOZ dataset, Thompson et al. [JGR, 108,8238,20031 we pointed out that variations in ozonesonde technique (sensor solution strength, instrument manufacturer, data processing) could lead to station-to-station biases within the SHADOZ dataset. Imprecisions and accuracy in the SHADOZ dataset are examined in light of new data. First, SHADOZ total ozone column amounts are compared to version 8 TOMS (2004 release). As for TOMS version 7, satellite total ozone is usually higher than the integrated column amount from the sounding. Discrepancies between the sonde and satellite datasets decline two percentage points on average, compared to version 7 TOMS offsets. Second, the SHADOZ station data are compared to results of chamber simulations (JOSE-2000, Juelich Ozonesonde Intercomparison Experiment) in which the various SHADOZ techniques were evaluated. The range of JOSE column deviations from a standard instrument (-10%) in the chamber resembles that of the SHADOZ station data. It appears that some systematic variations in the SHADOZ ozone record are accounted for by differences in solution strength, data processing and instrument type (manufacturer).

  7. Complementary Aerodynamic Performance Datasets for Variable Speed Power Turbine Blade Section from Two Independent Transonic Turbine Cascades

    NASA Technical Reports Server (NTRS)

    Flegel, Ashlie B.; Welch, Gerard E.; Giel, Paul W.; Ames, Forrest E.; Long, Jonathan A.

    2015-01-01

    Two independent experimental studies were conducted in linear cascades on a scaled, two-dimensional mid-span section of a representative Variable Speed Power Turbine (VSPT) blade. The purpose of these studies was to assess the aerodynamic performance of the VSPT blade over large Reynolds number and incidence angle ranges. The influence of inlet turbulence intensity was also investigated. The tests were carried out in the NASA Glenn Research Center Transonic Turbine Blade Cascade Facility and at the University of North Dakota (UND) High Speed Compressible Flow Wind Tunnel Facility. A large database was developed by acquiring total pressure and exit angle surveys and blade loading data for ten incidence angles ranging from +15.8deg to -51.0deg. Data were acquired over six flow conditions with exit isentropic Reynolds number ranging from 0.05×106 to 2.12×106 and at exit Mach numbers of 0.72 (design) and 0.35. Flow conditions were examined within the respective facility constraints. The survey data were integrated to determine average exit total-pressure and flow angle. UND also acquired blade surface heat transfer data at two flow conditions across the entire incidence angle range aimed at quantifying transitional flow behavior on the blade. Comparisons of the aerodynamic datasets were made for three "match point" conditions. The blade loading data at the match point conditions show good agreement between the facilities. This report shows comparisons of other data and highlights the unique contributions of the two facilities. The datasets are being used to advance understanding of the aerodynamic challenges associated with maintaining efficient power turbine operation over a wide shaft-speed range.

  8. 10 CFR 431.175 - Additional requirements applicable to non-Voluntary Independent Certification Program participants.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 10 Energy 3 2011-01-01 2011-01-01 false Additional requirements applicable to non-Voluntary Independent Certification Program participants. 431.175 Section 431.175 Energy DEPARTMENT OF ENERGY ENERGY... requirements applicable to non-Voluntary Independent Certification Program participants. If you are...

  9. Objective identification of mid-latitude storms in satellite imagery: determination of an independent storm validation dataset.

    NASA Astrophysics Data System (ADS)

    Delsol, C.; Hodges, K.

    2003-04-01

    Current methods of validating GCMs involve comparing model results with Re-analysis datasets in which observations have been combined with a model. The quality of this approach depends on the observational data distribution in space and time and on the model formulation. We propose to use an automatic and objective technique that can provide efficiently a dataset of “real” data against which the models and re-analysis can be validated based on the identification and tracking of weather systems in satellite imagery. We present results of a boundary finding method based on Fourier Shape Descriptors for the identification of extra-tropical cyclones in the mid-latitudes using NOAA’s AVHRR IR imagery. The boundary-finding method, initially derived for medical image processing, is designed to incorporate model-based information into a boundary finding process for continuously deformable objects. This allows us to work with objects that are diverse and irregular in their shape such as developing weather systems. The method is suited to work in an environment, which may contain spurious and broken boundaries. The main characteristic features of an extra-tropical system such as the vortex and associated frontal systems are identified. This work provides a basis for statistical analyses of extra-tropical cyclones for climatological studies and for the validation of GCMs, making use of the vast amount of satellite archive data available. It is also useful for individual case studies for weather forecast verification.

  10. Antimicrobial combinations: Bliss independence and Loewe additivity derived from mechanistic multi-hit models.

    PubMed

    Baeder, Desiree Y; Yu, Guozhi; Hozé, Nathanaël; Rolff, Jens; Regoes, Roland R

    2016-05-26

    Antimicrobial peptides (AMPs) and antibiotics reduce the net growth rate of bacterial populations they target. It is relevant to understand if effects of multiple antimicrobials are synergistic or antagonistic, in particular for AMP responses, because naturally occurring responses involve multiple AMPs. There are several competing proposals describing how multiple types of antimicrobials add up when applied in combination, such as Loewe additivity or Bliss independence. These additivity terms are defined ad hoc from abstract principles explaining the supposed interaction between the antimicrobials. Here, we link these ad hoc combination terms to a mathematical model that represents the dynamics of antimicrobial molecules hitting targets on bacterial cells. In this multi-hit model, bacteria are killed when a certain number of targets are hit by antimicrobials. Using this bottom-up approach reveals that Bliss independence should be the model of choice if no interaction between antimicrobial molecules is expected. Loewe additivity, on the other hand, describes scenarios in which antimicrobials affect the same components of the cell, i.e. are not acting independently. While our approach idealizes the dynamics of antimicrobials, it provides a conceptual underpinning of the additivity terms. The choice of the additivity term is essential to determine synergy or antagonism of antimicrobials.This article is part of the themed issue 'Evolutionary ecology of arthropod antimicrobial peptides'. PMID:27160596

  11. Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models.

    PubMed

    Fan, Jianqing; Feng, Yang; Song, Rui

    2011-06-01

    A variable screening procedure via correlation learning was proposed in Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. Under general nonparametric models, it is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, a data-driven thresholding and an iterative nonparametric independence screening (INIS) are also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data analysis demonstrate that the proposed procedure works well with moderate sample size and large dimension and performs better than competing methods.

  12. Automated determination of chemical functionalisation addition routes based on magnetic susceptibility and nucleus independent chemical shifts

    NASA Astrophysics Data System (ADS)

    Van Lier, G.; Ewels, C. P.; Geerlings, P.

    2008-07-01

    We present a modified version of our previously reported meta-code SACHA, for systematic analysis of chemical addition. The code automates the generation of structures, running of quantum chemical codes, and selection of preferential isomers based on chosen selection rules. While the selection rules for the previous version were based on the total system energy, predicting purely thermodynamic addition patterns, we examine here the possibility of using other system parameters, notably magnetic susceptibility as a descriptor of global aromaticity, and nucleus independent chemical shifts (NICS) as local aromaticity descriptor.

  13. Independent effects of warming and nitrogen addition on plant phenology in the Inner Mongolian steppe

    PubMed Central

    Xia, Jianyang; Wan, Shiqiang

    2013-01-01

    Background and Aims Phenology is one of most sensitive traits of plants in response to regional climate warming. Better understanding of the interactive effects between warming and other environmental change factors, such as increasing atmosphere nitrogen (N) deposition, is critical for projection of future plant phenology. Methods A 4-year field experiment manipulating temperature and N has been conducted in a temperate steppe in northern China. Phenology, including flowering and fruiting date as well as reproductive duration, of eight plant species was monitored and calculated from 2006 to 2009. Key Results Across all the species and years, warming significantly advanced flowering and fruiting time by 0·64 and 0·72 d per season, respectively, which were mainly driven by the earliest species (Potentilla acaulis). Although N addition showed no impact on phenological times across the eight species, it significantly delayed flowering time of Heteropappus altaicus and fruiting time of Agropyron cristatum. The responses of flowering and fruiting times to warming or N addition are coupled, leading to no response of reproductive duration to warming or N addition for most species. Warming shortened reproductive duration of Potentilla bifurca but extended that of Allium bidentatum, whereas N addition shortened that of A. bidentatum. No interactive effect between warming and N addition was found on any phenological event. Such additive effects could be ascribed to the species-specific responses of plant phenology to warming and N addition. Conclusions The results suggest that the warming response of plant phenology is larger in earlier than later flowering species in temperate grassland systems. The effects of warming and N addition on plant phenology are independent of each other. These findings can help to better understand and predict the response of plant phenology to climate warming concurrent with other global change driving factors. PMID:23585496

  14. 10 CFR 431.174 - Additional requirements applicable to Voluntary Independent Certification Program participants.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... Independent Certification Program participants. 431.174 Section 431.174 Energy DEPARTMENT OF ENERGY ENERGY... requirements applicable to Voluntary Independent Certification Program participants. (a) Description of Voluntary Independent Certification Program participant. For purposes of this subpart, a manufacturer...

  15. Independence.

    ERIC Educational Resources Information Center

    Stephenson, Margaret E.

    2000-01-01

    Discusses the four planes of development and the periods of creation and crystallization within each plane. Identifies the type of independence that should be achieved by the end of the first two planes of development. Maintains that it is through individual work on the environment that one achieves independence. (KB)

  16. The Use of Additional GPS Frequencies to Independently Determine Tropospheric Water Vapor Profiles

    NASA Technical Reports Server (NTRS)

    Herman, B.M.; Feng, D.; Flittner, D. E.; Kursinski, E. R.

    2000-01-01

    It is well known that the currently employed L1 and L2 GPS/MET frequencies (1.2 - 1.6) Ghz) do not allow for the separation of water vapor and density (or temperature) from active microwave occultation measurements in regions of the troposphere warmer than 240 K Therefore, additional information must be used, from other types of measurements and weather analyses, to recover water vapor (and temperature) profiles. Thus in data sparse regions, these inferred profiles can be subject to larger errors than would result in data rich regions. The use of properly selected additional GPS frequencies enables a direct, independent measurement of the absorption associated with the water vapor profile, which may then be used in the standard GPS/MET retrievals to obtain a more accurate determination of atmospheric temperature throughout the water vapor layer. This study looks at the use of microwave crosslinks in the region of the 22 Ghz water vapor absorption line for this purpose. An added advantage of using 22 Ghz frequencies is that they are only negligibly affected by the ionosphere in contrast to the large effect at the GPS frequencies. The retrieval algorithm uses both amplitude and phase measurements to obtain profiles of atmospheric pressure, temperature and water water vapor pressure with a vertical resolution of 1 km or better. This technique also provides the cloud liquid water content along the ray path, which is in itself an important element in climate monitoring. Advantages of this method include the ability to make measurements in the presence of clouds and the use of techniques and technology proven through the GPS/MET experiment and several of NASA's planetary exploration missions. Simulations demonstrating this method will be presented for both clear and cloudy sky conditions.

  17. Accuracy and Precision in the Southern Hemisphere Additional Ozonesondes (SHADOZ) Dataset 1998-2000 in Light of the JOSIE-2000 Results

    NASA Technical Reports Server (NTRS)

    Witte, J. C.; Thompson, A. M.; Schmidlin, F. J.; Oltmans, S. J.; McPeters, R. D.; Smit, H. G. J.

    2003-01-01

    A network of 12 southern hemisphere tropical and subtropical stations in the Southern Hemisphere ADditional OZonesondes (SHADOZ) project has provided over 2000 profiles of stratospheric and tropospheric ozone since 1998. Balloon-borne electrochemical concentration cell (ECC) ozonesondes are used with standard radiosondes for pressure, temperature and relative humidity measurements. The archived data are available at:http: //croc.gsfc.nasa.gov/shadoz. In Thompson et al., accuracies and imprecisions in the SHADOZ 1998- 2000 dataset were examined using ground-based instruments and the TOMS total ozone measurement (version 7) as references. Small variations in ozonesonde technique introduced possible biases from station-to-station. SHADOZ total ozone column amounts are now compared to version 8 TOMS; discrepancies between the two datasets are reduced 2\\% on average. An evaluation of ozone variations among the stations is made using the results of a series of chamber simulations of ozone launches (JOSIE-2000, Juelich Ozonesonde Intercomparison Experiment) in which a standard reference ozone instrument was employed with the various sonde techniques used in SHADOZ. A number of variations in SHADOZ ozone data are explained when differences in solution strength, data processing and instrument type (manufacturer) are taken into account.

  18. Nitrogen Addition and Warming Independently Influence the Belowground Micro-Food Web in a Temperate Steppe

    PubMed Central

    Li, Qi; Bai, Huahua; Liang, Wenju; Xia, Jianyang; Wan, Shiqiang; van der Putten, Wim H.

    2013-01-01

    Climate warming and atmospheric nitrogen (N) deposition are known to influence ecosystem structure and functioning. However, our understanding of the interactive effect of these global changes on ecosystem functioning is relatively limited, especially when it concerns the responses of soils and soil organisms. We conducted a field experiment to study the interactive effects of warming and N addition on soil food web. The experiment was established in 2006 in a temperate steppe in northern China. After three to four years (2009–2010), we found that N addition positively affected microbial biomass and negatively influenced trophic group and ecological indices of soil nematodes. However, the warming effects were less obvious, only fungal PLFA showed a decreasing trend under warming. Interestingly, the influence of N addition did not depend on warming. Structural equation modeling analysis suggested that the direct pathway between N addition and soil food web components were more important than the indirect connections through alterations in soil abiotic characters or plant growth. Nitrogen enrichment also affected the soil nematode community indirectly through changes in soil pH and PLFA. We conclude that experimental warming influenced soil food web components of the temperate steppe less than N addition, and there was little influence of warming on N addition effects under these experimental conditions. PMID:23544140

  19. Developing Independent Listening Skills for English as an Additional Language Students

    ERIC Educational Resources Information Center

    Picard, Michelle; Velautham, Lalitha

    2016-01-01

    This paper describes an action research project to develop online, self-access listening resources mirroring the authentic academic contexts experienced by graduate university students. Current listening materials for English as an Additional Language (EAL) students mainly use Standard American English or Standard British pronunciation, and far…

  20. Addition of uridines to edited RNAs in trypanosome mitochondria occurs independently of transcription

    SciTech Connect

    Harris, M.E.; Moore, D.R.; Hajduk, S.L. )

    1990-07-05

    RNA editing is a novel RNA processing event of unknown mechanism that results in the introduction of nucleotides not encoded in the DNA into specific RNA molecules. We have examined the post-transcriptional addition of nucleotides into the mitochondrial RNA of Trypanosoma brucei. Utilizing an isolated organelle system we have determined that addition of uridines to edited RNAs does not require ongoing transcription. Trypanosome mitochondria incorporate CTP, ATP, and UTP into RNA in the absence of transcription. GTP is incorporated into RNA only as a result of the transcription process. Post-transcriptional CTP and ATP incorporation can be ascribed to known enzymatic activities. CTP is incorporated into tRNAs as a result of synthesis or turnover of their 3{prime} CCA sequences. ATP is incorporated into the 3{prime} CCA of tRNAs and into mitochondrial messenger RNAs due to polyadenylation. In the absence of transcription, UTP is incorporated into transcripts known to undergo editing, and the degree of UTP incorporation is consistent with the degree of editing occurring in these transcripts. Cytochrome b mRNAs, which contain a single editing site near their 5{prime} ends, are initially transcribed unedited at that site. Post-transcriptional labeling of cytochrome b mRNAs in the organelle with (alpha-32P)UTP results in the addition of uridines near the 5{prime} end of the RNA but not in a 3{prime} region which lacks an editing site. These results indicate that RNA editing is a post-transcriptional process in the mitochondria of trypanosomes.

  1. Additives

    NASA Technical Reports Server (NTRS)

    Smalheer, C. V.

    1973-01-01

    The chemistry of lubricant additives is discussed to show what the additives are chemically and what functions they perform in the lubrication of various kinds of equipment. Current theories regarding the mode of action of lubricant additives are presented. The additive groups discussed include the following: (1) detergents and dispersants, (2) corrosion inhibitors, (3) antioxidants, (4) viscosity index improvers, (5) pour point depressants, and (6) antifouling agents.

  2. Treatment Planning Constraints to Avoid Xerostomia in Head-and-Neck Radiotherapy: An Independent Test of QUANTEC Criteria Using a Prospectively Collected Dataset

    SciTech Connect

    Moiseenko, Vitali; Wu, Jonn; Hovan, Allan; Saleh, Ziad; Apte, Aditya; Deasy, Joseph O.; Harrow, Stephen; Rabuka, Carman; Muggli, Adam; Thompson, Anna

    2012-03-01

    Purpose: The severe reduction of salivary function (xerostomia) is a common complication after radiation therapy for head-and-neck cancer. Consequently, guidelines to ensure adequate function based on parotid gland tolerance dose-volume parameters have been suggested by the QUANTEC group and by Ortholan et al. We perform a validation test of these guidelines against a prospectively collected dataset and compared with a previously published dataset. Methods and Materials: Whole-mouth stimulated salivary flow data from 66 head-and-neck cancer patients treated with radiotherapy at the British Columbia Cancer Agency (BCCA) were measured, and treatment planning data were abstracted. Flow measurements were collected from 50 patients at 3 months, and 60 patients at 12-month follow-up. Previously published data from a second institution, Washington University in St. Louis (WUSTL), were used for comparison. A logistic model was used to describe the incidence of Grade 4 xerostomia as a function of the mean dose of the spared parotid gland. The rate of correctly predicting the lack of xerostomia (negative predictive value [NPV]) was computed for both the QUANTEC constraints and Ortholan et al. recommendation to constrain the total volume of both glands receiving more than 40 Gy to less than 33%. Results: Both datasets showed a rate of xerostomia of less than 20% when the mean dose to the least-irradiated parotid gland is kept to less than 20 Gy. Logistic model parameters for the incidence of xerostomia at 12 months after therapy, based on the least-irradiated gland, were D{sub 50} = 32.4 Gy and and {gamma} = 0.97. NPVs for QUANTEC guideline were 94% (BCCA data), and 90% (WUSTL data). For Ortholan et al. guideline NPVs were 85% (BCCA) and 86% (WUSTL). Conclusion: These data confirm that the QUANTEC guideline effectively avoids xerostomia, and this is somewhat more effective than constraints on the volume receiving more than 40 Gy.

  3. Additional Saturday rehabilitation improves functional independence and quality of life and reduces length of stay: a randomized controlled trial

    PubMed Central

    2013-01-01

    Background Many inpatients receive little or no rehabilitation on weekends. Our aim was to determine what effect providing additional Saturday rehabilitation during inpatient rehabilitation had on functional independence, quality of life and length of stay compared to 5 days per week of rehabilitation. Methods This was a multicenter, single-blind (assessors) randomized controlled trial with concealed allocation and 12-month follow-up conducted in two publically funded metropolitan inpatient rehabilitation facilities in Melbourne, Australia. Patients were eligible if they were adults (aged ≥18 years) admitted for rehabilitation for any orthopedic, neurological or other disabling conditions excluding those admitted for slow stream rehabilitation/geriatric evaluation and management. Participants were randomly allocated to usual care Monday to Friday rehabilitation (control) or to Monday to Saturday rehabilitation (intervention). The additional Saturday rehabilitation comprised physiotherapy and occupational therapy. The primary outcomes were functional independence (functional independence measure (FIM); measured on an 18 to 126 point scale), health-related quality of life (EQ-5D utility index; measured on a 0 to 1 scale, and EQ-5D visual analog scale; measured on a 0 to 100 scale), and patient length of stay. Outcome measures were assessed on admission, discharge (primary endpoint), and at 6 and 12 months post discharge. Results We randomly assigned 996 adults (mean (SD) age 74 (13) years) to Monday to Saturday rehabilitation (n = 496) or usual care Monday to Friday rehabilitation (n = 500). Relative to admission scores, intervention group participants had higher functional independence (mean difference (MD) 2.3, 95% confidence interval (CI) 0.5 to 4.1, P = 0.01) and health-related quality of life (MD 0.04, 95% CI 0.01 to 0.07, P = 0.009) on discharge and may have had a shorter length of stay by 2 days (95% CI 0 to 4, P = 0.1) when compared to

  4. The influence of non-solvent addition on the independent and dependent parameters in roller electrospinning of polyurethane.

    PubMed

    Cengiz-Callioglu, Funda; Jirsak, Oldrich; Dayik, Mehmet

    2013-07-01

    This paper discusses the effects of 1,1,2,2 tetrachlorethylen (TCE) non-solvent addition on the independent (electrical conductivity, dielectric constant, surface tension and the theological properties of the solution etc.) and dependent parameters (number of Taylor cones per square meter (NTC/m2), spinning performance for one Taylor cone (SP/TC), total spinning performance (SP), fiber properties such as diameter, diameter uniformity, non-fibrous area) in roller electrospinning of polyurethane (PU). The same process parameters (voltage, distance of the electrodes, humidity, etc.) were applied for all solutions during the spinning process. According to the results, the effect of TCE non-solvent concentration on the dielectric constant, surface tension, rheological properties of the solution and also spinning performance was important statistically. Beside these results, TCE non-solvent concentration effects quality of fiber and nano web structure. Generally high fiber density, low non-fibrous percentage and uniform nanofibers were obtained from fiber morphology analyses.

  5. Goal-directed and transfer-cue-elicited drug-seeking are dissociated by pharmacotherapy: evidence for independent additive controllers.

    PubMed

    Hogarth, Lee

    2012-07-01

    According to contemporary learning theory, drug-seeking behavior reflects the summation of 2 dissociable controllers. Whereas goal-directed drug-seeking is determined by the expected current incentive value of the drug, stimulus-elicited drug-seeking is determined by the expected probability of the drug independently of its current incentive value, and these 2 controllers contribute additively to observed drug-seeking. One applied prediction of this model is that smoking cessation pharmacotherapies selectively attenuate tonic but not cue-elicited craving because they downgrade the expected incentive value of the drug but leave expected probability intact. To test this, the current study examined whether nicotine replacement therapy (NRT) nasal spray would modify goal-directed tobacco choice in a human outcome devaluation procedure, but leave cue-elicited tobacco choice in a Pavlovian to instrumental transfer (PIT) procedure intact. Smokers (N= 96) first underwent concurrent choice training in which 2 responses earned tobacco or chocolate points, respectively. Participants then ingested either NRT nasal spray (1 mg) or chocolate (147 g) to devalue 1 outcome. Concurrent choice was then tested again in extinction to measure goal-directed control of choice, and in a PIT test to measure the extent to which tobacco and chocolate stimuli enhanced choice of the same outcome. It was found that NRT modified tobacco choice in the extinction test but not the extent to which the tobacco stimulus enhanced choice of the tobacco outcome in the PIT test. This dissociation suggests that the propensity to engage in drug-seeking is determined independently by the expected value and probability of the drug, and that pharmacotherapy has partial efficacy because it selectively effects expected drug value.

  6. Goal-directed and transfer-cue-elicited drug-seeking are dissociated by pharmacotherapy: evidence for independent additive controllers.

    PubMed

    Hogarth, Lee

    2012-07-01

    According to contemporary learning theory, drug-seeking behavior reflects the summation of 2 dissociable controllers. Whereas goal-directed drug-seeking is determined by the expected current incentive value of the drug, stimulus-elicited drug-seeking is determined by the expected probability of the drug independently of its current incentive value, and these 2 controllers contribute additively to observed drug-seeking. One applied prediction of this model is that smoking cessation pharmacotherapies selectively attenuate tonic but not cue-elicited craving because they downgrade the expected incentive value of the drug but leave expected probability intact. To test this, the current study examined whether nicotine replacement therapy (NRT) nasal spray would modify goal-directed tobacco choice in a human outcome devaluation procedure, but leave cue-elicited tobacco choice in a Pavlovian to instrumental transfer (PIT) procedure intact. Smokers (N= 96) first underwent concurrent choice training in which 2 responses earned tobacco or chocolate points, respectively. Participants then ingested either NRT nasal spray (1 mg) or chocolate (147 g) to devalue 1 outcome. Concurrent choice was then tested again in extinction to measure goal-directed control of choice, and in a PIT test to measure the extent to which tobacco and chocolate stimuli enhanced choice of the same outcome. It was found that NRT modified tobacco choice in the extinction test but not the extent to which the tobacco stimulus enhanced choice of the tobacco outcome in the PIT test. This dissociation suggests that the propensity to engage in drug-seeking is determined independently by the expected value and probability of the drug, and that pharmacotherapy has partial efficacy because it selectively effects expected drug value. PMID:22823420

  7. Neck Circumference, along with Other Anthropometric Indices, Has an Independent and Additional Contribution in Predicting Fatty Liver Disease

    PubMed Central

    Huang, Bi-xia; Zhu, Ming-fan; Wu, Ting; Zhou, Jing-ya; Liu, Yan; Chen, Xiao-lin; Zhou, Rui-fen; Wang, Li-jun; Chen, Yu-ming; Zhu, Hui-lian

    2015-01-01

    Background and Aim Previous studies have indicated that neck circumference is a valuable predictor for obesity and metabolic syndrome, but little evidence is available for fatty liver disease. We examined the association of neck circumference with fatty liver disease and evaluated its predictive value in Chinese adults. Methods This cross-sectional study comprised 4053 participants (1617 women and 2436 men, aged 20-88) recruited from the Health Examination Center in Guangzhou, China between May 2009 and April 2010. Anthropometric measurements were taken, abdominal ultrasonography was conducted and blood biochemical parameters were measured. Covariance, logistic regression and receiver operating characteristic curve analyses were employed. Results The mean neck circumference was greater in subjects with fatty liver disease than those without the disease in both women and men after adjusting for age (P<0.001). Logistic regression analysis showed that the age-adjusted ORs (95% CI) of fatty liver disease for quartile 4 (vs. quartile 1) of neck circumference were 7.70 (4.95-11.99) for women and 12.42 (9.22-16.74) for men. After further adjusting for other anthropometric indices, both individually and combined, the corresponding ORs remained significant (all P-trends<0.05) but were attenuated to 1.94-2.53 for women and 1.45-2.08 for men. An additive interaction existed between neck circumference and the other anthropometric measures (all P<0.05). A high neck circumference value was associated with a much greater prevalence of fatty liver disease in participants with both high and normal BMI, waist circumference and waist-to-hip ratio values. Conclusions Neck circumference was an independent predictor for fatty liver disease and provided an additional contribution when applied with other anthropometric measures. PMID:25679378

  8. Phytosterol intake and dietary fat reduction are independent and additive in their ability to reduce plasma LDL cholesterol

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The plasma LDL-cholesterol-lowering effect of plant sterols (PS) appears to be independent of background diet, but definitive proof is lacking. The effect of background diet on plasma concentrations of PS has not been reported. We determined the effects of manipulating dietary contents of PS and f...

  9. Statistical Reference Datasets

    National Institute of Standards and Technology Data Gateway

    Statistical Reference Datasets (Web, free access)   The Statistical Reference Datasets is also supported by the Standard Reference Data Program. The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software.

  10. Identification of novel estrogen receptor (ER) agonists that have additional and complementary anti-cancer activities via ER-independent mechanism.

    PubMed

    Kim, Taelim; Kim, Hye-In; An, Ji-Young; Lee, Jun; Lee, Na-Rae; Heo, Jinyuk; Kim, Ji-Eun; Yu, Jihyun; Lee, Yong Sup; Inn, Kyung-Soo; Kim, Nam-Jung

    2016-04-01

    In this study, a series of bis(4-hydroxy)benzophenone oxime ether derivatives such as 12c, 12e and 12h were identified as novel estrogen receptor (ER) agonists that have additional and complementary anti-proliferative activities via ER-independent mechanism in cancer cells. These compounds are expected to overcome the therapeutic limitation of existing ER agonists such as estradiol and tamoxifen, which have been known to induce the proliferation of cancer cells. PMID:26905830

  11. Segmentation of Unstructured Datasets

    NASA Technical Reports Server (NTRS)

    Bhat, Smitha

    1996-01-01

    Datasets generated by computer simulations and experiments in Computational Fluid Dynamics tend to be extremely large and complex. It is difficult to visualize these datasets using standard techniques like Volume Rendering and Ray Casting. Object Segmentation provides a technique to extract and quantify regions of interest within these massive datasets. This thesis explores basic algorithms to extract coherent amorphous regions from two-dimensional and three-dimensional scalar unstructured grids. The techniques are applied to datasets from Computational Fluid Dynamics and from Finite Element Analysis.

  12. Dataset Lifecycle Policy

    NASA Technical Reports Server (NTRS)

    Armstrong, Edward; Tauer, Eric

    2013-01-01

    The presentation focused on describing a new dataset lifecycle policy that the NASA Physical Oceanography DAAC (PO.DAAC) has implemented for its new and current datasets to foster improved stewardship and consistency across its archive. The overarching goal is to implement this dataset lifecycle policy for all new GHRSST GDS2 datasets and bridge the mission statements from the GHRSST Project Office and PO.DAAC to provide the best quality SST data in a cost-effective, efficient manner, preserving its integrity so that it will be available and usable to a wide audience.

  13. Fixing Dataset Search

    NASA Technical Reports Server (NTRS)

    Lynnes, Chris

    2014-01-01

    Three current search engines are queried for ozone data at the GES DISC. The results range from sub-optimal to counter-intuitive. We propose a method to fix dataset search by implementing a robust relevancy ranking scheme. The relevancy ranking scheme is based on several heuristics culled from more than 20 years of helping users select datasets.

  14. Dataset Modelability by QSAR

    PubMed Central

    Golbraikh, Alexander; Muratov, Eugene; Fourches, Denis; Tropsha, Alexander

    2014-01-01

    We introduce a simple MODelability Index (MODI) that estimates the feasibility of obtaining predictive QSAR models (Correct Classification Rate above 0.7) for a binary dataset of bioactive compounds. MODI is defined as an activity class-weighted ratio of the number of the nearest neighbor pairs of compounds with the same activity class versus the total number of pairs. The MODI values were calculated for more than 100 datasets and the threshold of 0.65 was found to separate non-modelable from the modelable datasets. PMID:24251851

  15. Exudate-based diabetic macular edema detection in fundus images using publicly available datasets

    SciTech Connect

    Giancardo, Luca; Meriaudeau, Fabrice; Karnowski, Thomas Paul; Li, Yaquin; Garg, Seema; Tobin Jr, Kenneth William; Chaum, Edward

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME through the presence of exudation. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing (e.g., the classifier was trained on an independent dataset and tested on MESSIDOR). Our algorithm obtained an AUC between 0.88 and 0.94 depending on the dataset/features used. Additionally, it does not need ground truth at lesion level to reject false positives and is computationally efficient, as it generates a diagnosis on an average of 4.4 s (9.3 s, considering the optic nerve localization) per image on an 2.6 GHz platform with an unoptimized Matlab implementation.

  16. STAT4 Associates with SLE Through Two Independent Effects that Correlate with Gene Expression and Act Additively with IRF5 to Increase Risk

    PubMed Central

    Abelson, Anna-Karin; Delgado-Vega, Angélica M.; Kozyrev, Sergey V.; Sánchez, Elena; Velázquez-Cruz, Rafael; Eriksson, Niclas; Wojcik, Jerome; Reddy, Prasad Linga; Lima, Guadalupe; D’Alfonso, Sandra; Migliaresi, Sergio; Baca, Vicente; Orozco, Lorena; Witte, Torsten; Ortego-Centeno, Norberto; Abderrahim, Hadi; Pons-Estel, Bernardo A.; Gutiérrez, Carmen; Suárez, Ana; González-Escribano, Maria Francisca; Martin, Javier; Alarcón-Riquelme, Marta E.

    2013-01-01

    Objectives To confirm and define the genetic association of STAT4 and systemic lupus erythematosus, investigate the possibility of correlations with differential splicing and/or expression levels, and genetic interaction with IRF5. Methods 30 tag SNPs were genotyped in an independent set of Spanish cases and controls. SNPs surviving correction for multiple tests were genotyped in 5 new sets of cases and controls for replication. STAT4 cDNA was analyzed by 5’-RACE PCR and sequencing. Expression levels were measured by quantitative PCR. Results In the fine-mapping, four SNPs were significant after correction for multiple testing, with rs3821236 and rs3024866 as the strongest signals, followed by the previously associated rs7574865, and by rs1467199. Association was replicated in all cohorts. After conditional regression analyses, two major independent signals represented by SNPs rs3821236 and rs7574865, remained significant across the sets. These SNPs belong to separate haplotype blocks. High levels of STAT4 expression correlated with SNPs rs3821236, rs3024866 (both in the same haplotype block) and rs7574865 but not with other SNPs. We also detected transcription of alternative tissue-specific exons 1, indicating presence of tissue-specific promoters of potential importance in the expression of STAT4. No interaction with associated SNPs of IRF5 was observed using regression analysis. Conclusions These data confirm STAT4 as a susceptibility gene for SLE and suggest the presence of at least two functional variants affecting levels of STAT4. Our results also indicate that both genes STAT4 and IRF5 act additively to increase risk for SLE. PMID:19019891

  17. A Risk Score with Additional Four Independent Factors to Predict the Incidence and Recovery from Metabolic Syndrome: Development and Validation in Large Japanese Cohorts

    PubMed Central

    Obokata, Masaru; Negishi, Kazuaki; Ohyama, Yoshiaki; Okada, Haruka; Imai, Kunihiko; Kurabayashi, Masahiko

    2015-01-01

    Background Although many risk factors for Metabolic syndrome (MetS) have been reported, there is no clinical score that predicts its incidence. The purposes of this study were to create and validate a risk score for predicting both incidence and recovery from MetS in a large cohort. Methods Subjects without MetS at enrollment (n = 13,634) were randomly divided into 2 groups and followed to record incidence of MetS. We also examined recovery from it in rest 2,743 individuals with prevalent MetS. Results During median follow-up of 3.0 years, 878 subjects in the derivation and 757 in validation cohorts developed MetS. Multiple logistic regression analysis identified 12 independent variables from the derivation cohort and initial score for subsequent MetS was created, which showed good discrimination both in the derivation (c-statistics 0.82) and validation cohorts (0.83). The predictability of the initial score for recovery from MetS was tested in the 2,743 MetS population (906 subjects recovered from MetS), where nine variables (including age, sex, γ-glutamyl transpeptidase, uric acid and five MetS diagnostic criteria constituents.) remained significant. Then, the final score was created using the nine variables. This score significantly predicted both the recovery from MetS (c-statistics 0.70, p<0.001, 78% sensitivity and 54% specificity) and incident MetS (c-statistics 0.80) with an incremental discriminative ability over the model derived from five factors used in the diagnosis of MetS (continuous net reclassification improvement: 0.35, p < 0.001 and integrated discrimination improvement: 0.01, p<0.001). Conclusions We identified four additional independent risk factors associated with subsequent MetS, developed and validated a risk score to predict both incident and recovery from MetS. PMID:26230621

  18. High levels of acute phase proteins and soluble 70 kDa heat shock proteins are independent and additive risk factors for mortality in colorectal cancer

    PubMed Central

    Kocsis, Judit; Mészáros, Tamás; Madaras, Balázs; Tóth, Éva Katalin; Kamondi, Szilárd; Gál, Péter; Varga, Lilian; Prohászka, Zoltán

    2010-01-01

    Recently, we reported that high soluble Hsp70 (sHsp70) level was a significant predictor of mortality during an almost 3-year-long follow-up period in patients with colorectal cancer. This association was the strongest in the group of <70-year-old female patients as well as in those who were in a less advanced stage of the disease at baseline. According to these observations, measurement of the serum level of sHsp70 is a useful, stage-independent prognostic marker in colorectal cancer, especially in patients without distant metastasis. Since many literature data indicated that measurement of C-reactive protein (CRP) and other acute phase proteins (APPs) may also be suitable for predicting the mortality of patients with colorectal cancer, it seemed reasonable to study whether the effect of sHsp70 and other APPs are related or independent. In order to answer this question, we measured the concentrations of CRP as well as of other complement-related APPs (C1 inhibitor, C3, and C9) along with that of the MASP-2 complement component in the sera of 175 patients with colorectal cancer and known levels of sHsp70, which have been used in our previous study. High (above median) levels of CRP, C1 esterase inhibitor (C1-INH), and sHsp70 were found to be independently associated with poor patient survival, whereas no such association was observed with the other proteins tested. According to the adjusted Cox proportional hazards analysis, the additive effect of high sHsp70, CRP, and C1-INH levels on the survival of patients exceeded that of high sHsp70 alone, with a hazard ratio (HR) of 2.83 (1.13–70.9). In some subgroups of patients, such as in females [HR 4.80 (1.07–21.60)] or in ≤70-year-old patients [HR 11.53 (2.78–47.70)], even greater differences were obtained. These findings indicate that the clinical mortality–prediction value of combined measurements of sHsp70, CRP, and C1-INH with inexpensive methods can be very high, especially in specific subgroups of

  19. Independent contributions of the central executive, intelligence, and in-class attentive behavior to developmental change in the strategies used to solve addition problems.

    PubMed

    Geary, David C; Hoard, Mary K; Nugent, Lara

    2012-09-01

    Children's (N=275) use of retrieval, decomposition (e.g., 7=4+3 and thus 6+7=6+4+3), and counting to solve additional problems was longitudinally assessed from first grade to fourth grade, and intelligence, working memory, and in-class attentive behavior was assessed in one or several grades. The goal was to assess the relation between capacity of the central executive component of working memory, controlling for intelligence and in-class attentive behavior, and grade-related changes in children's use of these strategies. The predictor on intercept effects from multilevel models revealed that children with higher central executive capacity correctly retrieved more facts and used the most sophisticated counting procedure more frequently and accurately than their lower capacity peers at the beginning of first grade, but the predictor on slope effects indicated that this advantage disappeared (retrieval) or declined in importance (counting) from first grade to fourth grade. The predictor on slope effects also revealed that from first grade to fourth grade, children with higher capacity adopted the decomposition strategy more quickly than other children. The results remained robust with controls for children's sex, race, school site, speed of encoding Arabic numerals and articulating number words, and mathematics achievement in kindergarten. The results also revealed that intelligence and in-class attentive behavior independently contributed to children's strategy development.

  20. Plant Functional Diversity Can Be Independent of Species Diversity: Observations Based on the Impact of 4-Yrs of Nitrogen and Phosphorus Additions in an Alpine Meadow

    PubMed Central

    Li, Wei; Cheng, Ji-Min; Yu, Kai-Liang; Epstein, Howard E.; Guo, Liang; Jing, Guang-Hua; Zhao, Jie; Du, Guo-Zhen

    2015-01-01

    Past studies have widely documented the decrease in species diversity in response to addition of nutrients, however functional diversity is often independent from species diversity. In this study, we conducted a field experiment to examine the effect of nitrogen and phosphorus fertilization ((NH4)2 HPO4) at 0, 15, 30 and 60 g m-2 yr-1 (F0, F15, F30 and F60) after 4 years of continuous fertilization on functional diversity and species diversity, and its relationship with productivity in an alpine meadow community on the Tibetan Plateau. To this purpose, three community-weighted mean trait values (specific leaf area, SLA; mature plant height, MPH; and seed size, SS) for 30 common species in each fertilization level were determined; three components of functional diversity (functional richness, FRic; functional evenness, FEve; and Rao’s index of quadratic entropy, FRao) were quantified. Our results showed that: (i) species diversity sharply decreased, but functional diversity remained stable with fertilization; (ii) community-weighted mean traits (SLA and MPH) had a significant increase along the fertilization level; (iii) aboveground biomass was not correlated with functional diversity, but it was significantly correlated with species diversity and MPH. Our results suggest that decreases in species diversity due to fertilization do not result in corresponding changes in functional diversity. Functional identity of species may be more important than functional diversity in influencing aboveground productivity in this alpine meadow community, and our results also support the mass ratio hypothesis; that is, the traits of the dominant species influenced the community biomass production. PMID:26295345

  1. Plant Functional Diversity Can Be Independent of Species Diversity: Observations Based on the Impact of 4-Yrs of Nitrogen and Phosphorus Additions in an Alpine Meadow.

    PubMed

    Li, Wei; Cheng, Ji-Min; Yu, Kai-Liang; Epstein, Howard E; Guo, Liang; Jing, Guang-Hua; Zhao, Jie; Du, Guo-Zhen

    2015-01-01

    Past studies have widely documented the decrease in species diversity in response to addition of nutrients, however functional diversity is often independent from species diversity. In this study, we conducted a field experiment to examine the effect of nitrogen and phosphorus fertilization ((NH4)2 HPO4) at 0, 15, 30 and 60 g m-2 yr-1 (F0, F15, F30 and F60) after 4 years of continuous fertilization on functional diversity and species diversity, and its relationship with productivity in an alpine meadow community on the Tibetan Plateau. To this purpose, three community-weighted mean trait values (specific leaf area, SLA; mature plant height, MPH; and seed size, SS) for 30 common species in each fertilization level were determined; three components of functional diversity (functional richness, FRic; functional evenness, FEve; and Rao's index of quadratic entropy, FRao) were quantified. Our results showed that: (i) species diversity sharply decreased, but functional diversity remained stable with fertilization; (ii) community-weighted mean traits (SLA and MPH) had a significant increase along the fertilization level; (iii) aboveground biomass was not correlated with functional diversity, but it was significantly correlated with species diversity and MPH. Our results suggest that decreases in species diversity due to fertilization do not result in corresponding changes in functional diversity. Functional identity of species may be more important than functional diversity in influencing aboveground productivity in this alpine meadow community, and our results also support the mass ratio hypothesis; that is, the traits of the dominant species influenced the community biomass production. PMID:26295345

  2. Prediction of joint algal toxicity of nano-CeO2/nano-TiO2 and florfenicol: Independent action surpasses concentration addition.

    PubMed

    Wang, Zhuang; Wang, Se; Peijnenburg, Willie J G M

    2016-08-01

    Co-exposure of aquatic organisms to engineered nanoparticles (ENPs) and antibiotics is likely to take place in the environment. However, the impacts of co-exposure on aquatic organisms are virtually unknown and understanding the joint toxicity of ENPs and antibiotics is a topic of importance. The independent action (IA) model and the concentration addition (CA) model are two of the most common approaches to mixture toxicity assessment. In this study, the joint toxicity of two ENPs (nCeO2 and nTiO2) and one antibiotic (florfenicol, FLO) to Chlorella pyrenoidosa was determined to compare the applicability of the IA and the CA model. Concentration-response analyses were performed for single toxicants and for binary mixtures containing FLO and one of the ENPs at two suspended particle concentrations. The effect concentrations and the observed effects of the binary mixtures were compared to the predictions of the joint toxicity. The observed toxicity associated with the nCeO2 or nTiO2 exposure was enhanced by the concomitant FLO exposure. The joint toxicity of nCeO2 and FLO was significantly higher than that of nTiO2 and FLO. Predictions based on the IA and CA models tend to underestimate the overall toxicity (in terms of median effect concentration) of the binary mixtures, but IA performs better than CA, irrespective of the effect level under consideration and the types of mixtures studied. This result underpins the need to consider the effects of mixtures of ENPs and organic chemicals on aquatic organisms, and the practicability of the IA and CA methods in toxicity assessment of ENPs.

  3. Plant Functional Diversity Can Be Independent of Species Diversity: Observations Based on the Impact of 4-Yrs of Nitrogen and Phosphorus Additions in an Alpine Meadow.

    PubMed

    Li, Wei; Cheng, Ji-Min; Yu, Kai-Liang; Epstein, Howard E; Guo, Liang; Jing, Guang-Hua; Zhao, Jie; Du, Guo-Zhen

    2015-01-01

    Past studies have widely documented the decrease in species diversity in response to addition of nutrients, however functional diversity is often independent from species diversity. In this study, we conducted a field experiment to examine the effect of nitrogen and phosphorus fertilization ((NH4)2 HPO4) at 0, 15, 30 and 60 g m-2 yr-1 (F0, F15, F30 and F60) after 4 years of continuous fertilization on functional diversity and species diversity, and its relationship with productivity in an alpine meadow community on the Tibetan Plateau. To this purpose, three community-weighted mean trait values (specific leaf area, SLA; mature plant height, MPH; and seed size, SS) for 30 common species in each fertilization level were determined; three components of functional diversity (functional richness, FRic; functional evenness, FEve; and Rao's index of quadratic entropy, FRao) were quantified. Our results showed that: (i) species diversity sharply decreased, but functional diversity remained stable with fertilization; (ii) community-weighted mean traits (SLA and MPH) had a significant increase along the fertilization level; (iii) aboveground biomass was not correlated with functional diversity, but it was significantly correlated with species diversity and MPH. Our results suggest that decreases in species diversity due to fertilization do not result in corresponding changes in functional diversity. Functional identity of species may be more important than functional diversity in influencing aboveground productivity in this alpine meadow community, and our results also support the mass ratio hypothesis; that is, the traits of the dominant species influenced the community biomass production.

  4. The National Hydrography Dataset

    USGS Publications Warehouse

    ,

    1999-01-01

    The National Hydrography Dataset (NHD) is a newly combined dataset that provides hydrographic data for the United States. The NHD is the culmination of recent cooperative efforts of the U.S. Environmental Protection Agency (USEPA) and the U.S. Geological Survey (USGS). It combines elements of USGS digital line graph (DLG) hydrography files and the USEPA Reach File (RF3). The NHD supersedes RF3 and DLG files by incorporating them, not by replacing them. Users of RF3 or DLG files will find the same data in a new, more flexible format. They will find that the NHD is familiar but greatly expanded and refined. The DLG files contribute a national coverage of millions of features, including water bodies such as lakes and ponds, linear water features such as streams and rivers, and also point features such as springs and wells. These files provide standardized feature types, delineation, and spatial accuracy. From RF3, the NHD acquires hydrographic sequencing, upstream and downstream navigation for modeling applications, and reach codes. The reach codes provide a way to integrate data from organizations at all levels by linking the data to this nationally consistent hydrographic network. The feature names are from the Geographic Names Information System (GNIS). The NHD provides comprehensive coverage of hydrographic data for the United States. Some of the anticipated end-user applications of the NHD are multiuse hydrographic modeling and water-quality studies of fish habitats. Although based on 1:100,000-scale data, the NHD is planned so that it can incorporate and encourage the development of the higher resolution data that many users require. The NHD can be used to promote the exchange of data between users at the national, State, and local levels. Many users will benefit from the NHD and will want to contribute to the dataset as well.

  5. National Hydrography Dataset (NHD)

    USGS Publications Warehouse

    ,

    2001-01-01

    The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that make up the nation's surface water drainage system. NHD data was originally developed at 1:100,000 scale and exists at that scale for the whole country. High resolution NHD adds detail to the original 1:100,000-scale NHD. (Data for Alaska, Puerto Rico and the Virgin Islands was developed at high-resolution, not 1:100,000 scale.) Like the 1:100,000-scale NHD, high resolution NHD contains reach codes for networked features and isolated lakes, flow direction, names, stream level, and centerline representations for areal water bodies. Reaches are also defined to represent waterbodies and the approximate shorelines of the Great Lakes, the Atlantic and Pacific Oceans and the Gulf of Mexico. The NHD also incorporates the National Spatial Data Infrastructure framework criteria set out by the Federal Geographic Data Committee.

  6. OpenCL based machine learning labeling of biomedical datasets

    NASA Astrophysics Data System (ADS)

    Amoros, Oscar; Escalera, Sergio; Puig, Anna

    2011-03-01

    In this paper, we propose a two-stage labeling method of large biomedical datasets through a parallel approach in a single GPU. Diagnostic methods, structures volume measurements, and visualization systems are of major importance for surgery planning, intra-operative imaging and image-guided surgery. In all cases, to provide an automatic and interactive method to label or to tag different structures contained into input data becomes imperative. Several approaches to label or segment biomedical datasets has been proposed to discriminate different anatomical structures in an output tagged dataset. Among existing methods, supervised learning methods for segmentation have been devised to easily analyze biomedical datasets by a non-expert user. However, they still have some problems concerning practical application, such as slow learning and testing speeds. In addition, recent technological developments have led to widespread availability of multi-core CPUs and GPUs, as well as new software languages, such as NVIDIA's CUDA and OpenCL, allowing to apply parallel programming paradigms in conventional personal computers. Adaboost classifier is one of the most widely applied methods for labeling in the Machine Learning community. In a first stage, Adaboost trains a binary classifier from a set of pre-labeled samples described by a set of features. This binary classifier is defined as a weighted combination of weak classifiers. Each weak classifier is a simple decision function estimated on a single feature value. Then, at the testing stage, each weak classifier is independently applied on the features of a set of unlabeled samples. In this work, we propose an alternative representation of the Adaboost binary classifier. We use this proposed representation to define a new GPU-based parallelized Adaboost testing stage using OpenCL. We provide numerical experiments based on large available data sets and we compare our results to CPU-based strategies in terms of time and

  7. National Elevation Dataset

    USGS Publications Warehouse

    ,

    2002-01-01

    The National Elevation Dataset (NED) is a new raster product assembled by the U.S. Geological Survey. NED is designed to provide National elevation data in a seamless form with a consistent datum, elevation unit, and projection. Data corrections were made in the NED assembly process to minimize artifacts, perform edge matching, and fill sliver areas of missing data. NED has a resolution of one arc-second (approximately 30 meters) for the conterminous United States, Hawaii, Puerto Rico and the island territories and a resolution of two arc-seconds for Alaska. NED data sources have a variety of elevation units, horizontal datums, and map projections. In the NED assembly process the elevation values are converted to decimal meters as a consistent unit of measure, NAD83 is consistently used as horizontal datum, and all the data are recast in a geographic projection. Older DEM's produced by methods that are now obsolete have been filtered during the NED assembly process to minimize artifacts that are commonly found in data produced by these methods. Artifact removal greatly improves the quality of the slope, shaded-relief, and synthetic drainage information that can be derived from the elevation data. Figure 2 illustrates the results of this artifact removal filtering. NED processing also includes steps to adjust values where adjacent DEM's do not match well, and to fill sliver areas of missing data between DEM's. These processing steps ensure that NED has no void areas and artificial discontinuities have been minimized. The artifact removal filtering process does not eliminate all of the artifacts. In areas where the only available DEM is produced by older methods, then "striping" may still occur.

  8. Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades.

    PubMed

    Orchard, Garrick; Jayawant, Ajinkya; Cohen, Gregory K; Thakor, Nitish

    2015-01-01

    Creating datasets for Neuromorphic Vision is a challenging task. A lack of available recordings from Neuromorphic Vision sensors means that data must typically be recorded specifically for dataset creation rather than collecting and labeling existing data. The task is further complicated by a desire to simultaneously provide traditional frame-based recordings to allow for direct comparison with traditional Computer Vision algorithms. Here we propose a method for converting existing Computer Vision static image datasets into Neuromorphic Vision datasets using an actuated pan-tilt camera platform. Moving the sensor rather than the scene or image is a more biologically realistic approach to sensing and eliminates timing artifacts introduced by monitor updates when simulating motion on a computer monitor. We present conversion of two popular image datasets (MNIST and Caltech101) which have played important roles in the development of Computer Vision, and we provide performance metrics on these datasets using spike-based recognition algorithms. This work contributes datasets for future use in the field, as well as results from spike-based algorithms against which future works can compare. Furthermore, by converting datasets already popular in Computer Vision, we enable more direct comparison with frame-based approaches.

  9. Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades

    PubMed Central

    Orchard, Garrick; Jayawant, Ajinkya; Cohen, Gregory K.; Thakor, Nitish

    2015-01-01

    Creating datasets for Neuromorphic Vision is a challenging task. A lack of available recordings from Neuromorphic Vision sensors means that data must typically be recorded specifically for dataset creation rather than collecting and labeling existing data. The task is further complicated by a desire to simultaneously provide traditional frame-based recordings to allow for direct comparison with traditional Computer Vision algorithms. Here we propose a method for converting existing Computer Vision static image datasets into Neuromorphic Vision datasets using an actuated pan-tilt camera platform. Moving the sensor rather than the scene or image is a more biologically realistic approach to sensing and eliminates timing artifacts introduced by monitor updates when simulating motion on a computer monitor. We present conversion of two popular image datasets (MNIST and Caltech101) which have played important roles in the development of Computer Vision, and we provide performance metrics on these datasets using spike-based recognition algorithms. This work contributes datasets for future use in the field, as well as results from spike-based algorithms against which future works can compare. Furthermore, by converting datasets already popular in Computer Vision, we enable more direct comparison with frame-based approaches. PMID:26635513

  10. Genomics dataset of unidentified disclosed isolates.

    PubMed

    Rekadwad, Bhagwan N

    2016-09-01

    Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.

  11. Genomics dataset of unidentified disclosed isolates.

    PubMed

    Rekadwad, Bhagwan N

    2016-09-01

    Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis. PMID:27408929

  12. Reduced Number of Transitional and Naive B Cells in Addition to Decreased BAFF Levels in Response to the T Cell Independent Immunogen Pneumovax®23.

    PubMed

    Roth, Alena; Glaesener, Stephanie; Schütz, Katharina; Meyer-Bahlburg, Almut

    2016-01-01

    Protective immunity against T cell independent (TI) antigens such as Streptococcus pneumoniae is characterized by antibody production of B cells induced by the combined activation of T cell independent type 1 and type 2 antigens in the absence of direct T cell help. In mice, the main players in TI immune responses have been well defined as marginal zone (MZ) B cells and B-1 cells. However, the existence of human equivalents to these B cell subsets and the nature of the human B cell compartment involved in the immune reaction remain elusive. We therefore analyzed the effect of a TI antigen on the B cell compartment through immunization of healthy individuals with the pneumococcal polysaccharide (PnPS)-based vaccine Pneumovax®23, and subsequent characterization of B cell subpopulations. Our data demonstrates a transient decrease of transitional and naïve B cells, with a concomitant increase of IgA+ but not IgM+ or IgG+ memory B cells and a predominant generation of PnPS-specific IgA+ producing plasma cells. No alterations could be detected in T cells, or proposed human B-1 and MZ B cell equivalents. Consistent with the idea of a TI immune response, antigen-specific memory responses could not be observed. Finally, BAFF, which is supposed to drive class switching to IgA, was unexpectedly found to be decreased in serum in response to Pneumovax®23. Our results demonstrate that a characteristic TI response induced by Pneumovax®23 is associated with distinct phenotypical and functional changes within the B cell compartment. Those modulations occur in the absence of any modulations of T cells and without the development of a specific memory response.

  13. Reduced Number of Transitional and Naive B Cells in Addition to Decreased BAFF Levels in Response to the T Cell Independent Immunogen Pneumovax®23

    PubMed Central

    Roth, Alena; Glaesener, Stephanie; Schütz, Katharina; Meyer-Bahlburg, Almut

    2016-01-01

    Protective immunity against T cell independent (TI) antigens such as Streptococcus pneumoniae is characterized by antibody production of B cells induced by the combined activation of T cell independent type 1 and type 2 antigens in the absence of direct T cell help. In mice, the main players in TI immune responses have been well defined as marginal zone (MZ) B cells and B-1 cells. However, the existence of human equivalents to these B cell subsets and the nature of the human B cell compartment involved in the immune reaction remain elusive. We therefore analyzed the effect of a TI antigen on the B cell compartment through immunization of healthy individuals with the pneumococcal polysaccharide (PnPS)-based vaccine Pneumovax®23, and subsequent characterization of B cell subpopulations. Our data demonstrates a transient decrease of transitional and naïve B cells, with a concomitant increase of IgA+ but not IgM+ or IgG+ memory B cells and a predominant generation of PnPS-specific IgA+ producing plasma cells. No alterations could be detected in T cells, or proposed human B-1 and MZ B cell equivalents. Consistent with the idea of a TI immune response, antigen-specific memory responses could not be observed. Finally, BAFF, which is supposed to drive class switching to IgA, was unexpectedly found to be decreased in serum in response to Pneumovax®23. Our results demonstrate that a characteristic TI response induced by Pneumovax®23 is associated with distinct phenotypical and functional changes within the B cell compartment. Those modulations occur in the absence of any modulations of T cells and without the development of a specific memory response. PMID:27031098

  14. Five year global dataset: NMC operational analyses (1978 to 1982)

    NASA Technical Reports Server (NTRS)

    Straus, David; Ardizzone, Joseph

    1987-01-01

    This document describes procedures used in assembling a five year dataset (1978 to 1982) using NMC Operational Analysis data. These procedures entailed replacing missing and unacceptable data in order to arrive at a complete dataset that is continuous in time. In addition, a subjective assessment on the integrity of all data (both preliminary and final) is presented. Documentation on tapes comprising the Five Year Global Dataset is also included.

  15. Chemical gas sensor array dataset.

    PubMed

    Fonollosa, Jordi; Rodríguez-Luján, Irene; Huerta, Ramón

    2015-06-01

    To address drift in chemical sensing, an extensive dataset was collected over a period of three years. An array of 16 metal-oxide gas sensors was exposed to six different volatile organic compounds at different concentration levels under tightly-controlled operating conditions. Moreover, the generated dataset is suitable to tackle a variety of challenges in chemical sensing such as sensor drift, sensor failure or system calibration. The data is related to "Chemical gas sensor drift compensation using classifier ensembles", by Vergara et al. [1], and "On the calibration of sensor arrays for pattern recognition using the minimal number of experiments", by Rodriguez-Lujan et al. [2] The dataset can be accessed publicly at the UCI repository upon citation of: http://archive.ics.uci.edu/ml/datasets/Gas+Sensor+Array+Drift+Dataset+at+Different+Concentrations.

  16. Independent Contributions of the Central Executive, Intelligence, and In-Class Attentive Behavior to Developmental Change in the Strategies Used to Solve Addition Problems

    ERIC Educational Resources Information Center

    Geary, David C.; Hoard, Mary K.; Nugent, Lara

    2012-01-01

    Children's (N = 275) use of retrieval, decomposition (e.g., 7 = 4+3 and thus 6+7 = 6+4+3), and counting to solve additional problems was longitudinally assessed from first grade to fourth grade, and intelligence, working memory, and in-class attentive behavior was assessed in one or several grades. The goal was to assess the relation between…

  17. Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.

    PubMed

    Marston, Louise; Peacock, Janet L; Yu, Keming; Brocklehurst, Peter; Calvert, Sandra A; Greenough, Anne; Marlow, Neil

    2009-07-01

    Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.

  18. Evidence for protein kinase C-dependent and -independent activation of mitogen-activated protein kinase in T cells: potential role of additional diacylglycerol binding proteins.

    PubMed

    Puente, L G; Stone, J C; Ostergaard, H L

    2000-12-15

    Activation of mitogen-activated protein kinases (MAPK) is a critical signal transduction event for CTL activation, but the signaling mechanisms responsible are not fully characterized. Protein kinase C (PKC) is thought to contribute to MAPK activation following TCR stimulation. We have found that dependence on PKC varies with the method used to stimulate the T cells. Extracellular signal-regulated kinase (ERK) activation in CTL stimulated with soluble cross-linked anti-CD3 is completely inhibited by the PKC inhibitor bisindolylmaleimide (BIM). In contrast, only the later time points in the course of ERK activation are sensitive to BIM when CTL are stimulated with immobilized anti-CD3, a condition that stimulates CTL degranulation. Surprisingly, MAPK activation in response to immobilized anti-CD3 is strongly inhibited at all time points by the diacylglycerol (DAG)-binding domain inhibitor calphostin C implicating the contribution of a DAG-dependent but PKC-independent pathway in the activation of ERK in CTL clones. Chronic exposure to phorbol ester down-regulates the expression of DAG-responsive PKC isoforms; however, this treatment of CTL clones does not inhibit anti-CD3-induced activation of MAPK. Phorbol ester-treated cells have reduced expression of several isoforms of PKC but still express the recently described DAG-binding Ras guanylnucleotide-releasing protein. These results indicate that the late phase of MAPK activation in CTL clones in response to immobilized anti-CD3 stimulation requires PKC while the early phase requires a DAG-dependent, BIM-resistant component.

  19. Genomic Datasets for Cancer Research

    Cancer.gov

    A variety of datasets from genome-wide association studies of cancer and other genotype-phenotype studies, including sequencing and molecular diagnostic assays, are available to approved investigators through the Extramural National Cancer Institute Data Access Committee.

  20. Providing Geographic Datasets as Linked Data in Sdi

    NASA Astrophysics Data System (ADS)

    Hietanen, E.; Lehto, L.; Latvala, P.

    2016-06-01

    In this study, a prototype service to provide data from Web Feature Service (WFS) as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI) are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF) data format. Next, a Web Ontology Language (OWL) ontology is created to describe the dataset information content using the Open Geospatial Consortium's (OGC) GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML) format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID). The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  1. The Harvard organic photovoltaic dataset

    PubMed Central

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-01-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312

  2. An integrated pan-tropical biomass map using multiple reference datasets.

    PubMed

    Avitabile, Valerio; Herold, Martin; Heuvelink, Gerard B M; Lewis, Simon L; Phillips, Oliver L; Asner, Gregory P; Armston, John; Ashton, Peter S; Banin, Lindsay; Bayol, Nicolas; Berry, Nicholas J; Boeckx, Pascal; de Jong, Bernardus H J; DeVries, Ben; Girardin, Cecile A J; Kearsley, Elizabeth; Lindsell, Jeremy A; Lopez-Gonzalez, Gabriela; Lucas, Richard; Malhi, Yadvinder; Morel, Alexandra; Mitchard, Edward T A; Nagy, Laszlo; Qie, Lan; Quinones, Marcela J; Ryan, Casey M; Ferry, Slik J W; Sunderland, Terry; Laurin, Gaia Vaglio; Gatti, Roberto Cazzolla; Valentini, Riccardo; Verbeeck, Hans; Wijaya, Arief; Willcock, Simon

    2016-04-01

    We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of field observations and locally calibrated high-resolution biomass maps, harmonized and upscaled to 14 477 1-km AGB estimates. Our data fusion approach uses bias removal and weighted linear averaging that incorporates and spatializes the biomass patterns indicated by the reference data. The method was applied independently in areas (strata) with homogeneous error patterns of the input (Saatchi and Baccini) maps, which were estimated from the reference data and additional covariates. Based on the fused map, we estimated AGB stock for the tropics (23.4 N-23.4 S) of 375 Pg dry mass, 9-18% lower than the Saatchi and Baccini estimates. The fused map also showed differing spatial patterns of AGB over large areas, with higher AGB density in the dense forest areas in the Congo basin, Eastern Amazon and South-East Asia, and lower values in Central America and in most dry vegetation areas of Africa than either of the input maps. The validation exercise, based on 2118 estimates from the reference dataset not used in the fusion process, showed that the fused map had a RMSE 15-21% lower than that of the input maps and, most importantly, nearly unbiased estimates (mean bias 5 Mg dry mass ha(-1) vs. 21 and 28 Mg ha(-1) for the input maps). The fusion method can be applied at any scale including the policy-relevant national level, where it can provide improved biomass estimates by integrating existing regional biomass maps as input maps and additional, country-specific reference datasets.

  3. An integrated pan-tropical biomass map using multiple reference datasets.

    PubMed

    Avitabile, Valerio; Herold, Martin; Heuvelink, Gerard B M; Lewis, Simon L; Phillips, Oliver L; Asner, Gregory P; Armston, John; Ashton, Peter S; Banin, Lindsay; Bayol, Nicolas; Berry, Nicholas J; Boeckx, Pascal; de Jong, Bernardus H J; DeVries, Ben; Girardin, Cecile A J; Kearsley, Elizabeth; Lindsell, Jeremy A; Lopez-Gonzalez, Gabriela; Lucas, Richard; Malhi, Yadvinder; Morel, Alexandra; Mitchard, Edward T A; Nagy, Laszlo; Qie, Lan; Quinones, Marcela J; Ryan, Casey M; Ferry, Slik J W; Sunderland, Terry; Laurin, Gaia Vaglio; Gatti, Roberto Cazzolla; Valentini, Riccardo; Verbeeck, Hans; Wijaya, Arief; Willcock, Simon

    2016-04-01

    We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of field observations and locally calibrated high-resolution biomass maps, harmonized and upscaled to 14 477 1-km AGB estimates. Our data fusion approach uses bias removal and weighted linear averaging that incorporates and spatializes the biomass patterns indicated by the reference data. The method was applied independently in areas (strata) with homogeneous error patterns of the input (Saatchi and Baccini) maps, which were estimated from the reference data and additional covariates. Based on the fused map, we estimated AGB stock for the tropics (23.4 N-23.4 S) of 375 Pg dry mass, 9-18% lower than the Saatchi and Baccini estimates. The fused map also showed differing spatial patterns of AGB over large areas, with higher AGB density in the dense forest areas in the Congo basin, Eastern Amazon and South-East Asia, and lower values in Central America and in most dry vegetation areas of Africa than either of the input maps. The validation exercise, based on 2118 estimates from the reference dataset not used in the fusion process, showed that the fused map had a RMSE 15-21% lower than that of the input maps and, most importantly, nearly unbiased estimates (mean bias 5 Mg dry mass ha(-1) vs. 21 and 28 Mg ha(-1) for the input maps). The fusion method can be applied at any scale including the policy-relevant national level, where it can provide improved biomass estimates by integrating existing regional biomass maps as input maps and additional, country-specific reference datasets. PMID:26499288

  4. A joint dataset of fair-weather atmospheric electricity

    NASA Astrophysics Data System (ADS)

    Tammet, H.

    2009-02-01

    A new open access dataset ATMEL2007A ( http://ael.physic.ut.ee/tammet/dd/) takes advantage of the diary-type data structure. The dataset comprises the measurements of atmospheric electric field, positive and negative conductivities, air ion concentrations and accompanying meteorological measurements at 13 stations, including 7 stations of the former World Data Centre network. The dataset incorporates more than half a million diurnal series of hourly averages and it can easily be expanded with additional data. The dataset is designed for importing into a personal computer, which makes possible the appending of private data and safely protecting it from public access. Available free software allows extracting data excerpts in the form of traditional data tables or spreadsheets. Examples show how the dataset can be used in the research of the correlations and trends in atmospheric electricity and air pollution.

  5. Querying Large Biological Network Datasets

    ERIC Educational Resources Information Center

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  6. Development of a global historic monthly mean precipitation dataset

    NASA Astrophysics Data System (ADS)

    Yang, Su; Xu, Wenhui; Xu, Yan; Li, Qingxiang

    2016-04-01

    Global historic precipitation dataset is the base for climate and water cycle research. There have been several global historic land surface precipitation datasets developed by international data centers such as the US National Climatic Data Center (NCDC), European Climate Assessment & Dataset project team, Met Office, etc., but so far there are no such datasets developed by any research institute in China. In addition, each dataset has its own focus of study region, and the existing global precipitation datasets only contain sparse observational stations over China, which may result in uncertainties in East Asian precipitation studies. In order to take into account comprehensive historic information, users might need to employ two or more datasets. However, the non-uniform data formats, data units, station IDs, and so on add extra difficulties for users to exploit these datasets. For this reason, a complete historic precipitation dataset that takes advantages of various datasets has been developed and produced in the National Meteorological Information Center of China. Precipitation observations from 12 sources are aggregated, and the data formats, data units, and station IDs are unified. Duplicated stations with the same ID are identified, with duplicated observations removed. Consistency test, correlation coefficient test, significance t-test at the 95% confidence level, and significance F-test at the 95% confidence level are conducted first to ensure the data reliability. Only those datasets that satisfy all the above four criteria are integrated to produce the China Meteorological Administration global precipitation (CGP) historic precipitation dataset version 1.0. It contains observations at 31 thousand stations with 1.87 × 107 data records, among which 4152 time series of precipitation are longer than 100 yr. This dataset plays a critical role in climate research due to its advantages in large data volume and high density of station network, compared to

  7. Rough Clustering for Cancer Datasets

    NASA Astrophysics Data System (ADS)

    Herawan, Tutut

    Cancer is becoming a leading cause of death among people in the whole world. It is confirmed that the early detection and accurate diagnosis of this disease can ensure a long survival of the patients. Expert systems and machine learning techniques are gaining popularity in this field because of the effective classification and high diagnostic capability. This paper presents the application of rough set theory for clustering two cancer datasets. These datasets are taken from UCI ML repository. The method is based on MDA technique proposed by [11]. To select a clustering attribute, the maximal degree of the rough attributes dependencies in categorical-valued information systems is used. Further, we use a divide-and-conquer method to partition/cluster the objects. The results show that MDA technique can be used to cluster to the data. Further, we present clusters visualization using two dimensional plot. The plot results provide user friendly navigation to understand the cluster obtained.

  8. Source Detection with Interferometric Datasets

    NASA Astrophysics Data System (ADS)

    Trott, Cathryn M.; Wayth, Randall B.; Macquart, Jean-Pierre R.; Tingay, Steven J.

    2012-04-01

    The detection of sources in interferometric radio data typically relies on extracting information from images, formed by Fourier transform of the underlying visibility dataset, and CLEANed of contaminating sidelobes through iterative deconvolution. Variable and transient radio sources span a large range of variability timescales, and their study has the potential to enhance our knowledge of the dynamic universe. Their detection and classification involve large data rates and non-stationary PSFs, commensal observing programs and ambitious science goals, and will demand a paradigm shift in the deployment of next-generation instruments. Optimal source detection and classification in real time requires efficient and automated algorithms. On short time-scales variability can be probed with an optimal matched filter detector applied directly to the visibility dataset. This paper shows the design of such a detector, and some preliminary detection performance results.

  9. Geospatial datasets for watershed delineation and characterization used in the Hawaii StreamStats web application

    USGS Publications Warehouse

    Rea, Alan; Skinner, Kenneth D.

    2012-01-01

    The U.S. Geological Survey Hawaii StreamStats application uses an integrated suite of raster and vector geospatial datasets to delineate and characterize watersheds. The geospatial datasets used to delineate and characterize watersheds on the StreamStats website, and the methods used to develop the datasets are described in this report. The datasets for Hawaii were derived primarily from 10 meter resolution National Elevation Dataset (NED) elevation models, and the National Hydrography Dataset (NHD), using a set of procedures designed to enforce the drainage pattern from the NHD into the NED, resulting in an integrated suite of elevation-derived datasets. Additional sources of data used for computing basin characteristics include precipitation, land cover, soil permeability, and elevation-derivative datasets. The report also includes links for metadata and downloads of the geospatial datasets.

  10. Independence Is.

    ERIC Educational Resources Information Center

    Stickney, Sharon

    This workbook is designed to help participants of the Independence Training Program (ITP) to achieve a definition of "independence." The program was developed for teenage girls. The process for developing the concept of independence consists of four steps. Step one instructs the participant to create an imaginary situation where she is completely…

  11. The new Planetary Science Archive (PSA): Exploration and discovery of scientific datasets from ESA's planetary missions

    NASA Astrophysics Data System (ADS)

    Martinez, Santa; Besse, Sebastien; Heather, Dave; Barbarisi, Isa; Arviset, Christophe; De Marchi, Guido; Barthelemy, Maud; Docasal, Ruben; Fraga, Diego; Grotheer, Emmanuel; Lim, Tanya; Macfarlane, Alan; Rios, Carlos; Vallejo, Fran; Saiz, Jaime; ESDC (European Space Data Centre) team

    2016-10-01

    The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces at http://archives.esac.esa.int/psa. All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. The PSA is currently implementing a number of significant improvements, mostly driven by the evolution of the PDS standard, and the growing need for better interfaces and advanced applications to support science exploitation. The newly designed PSA will enhance the user experience and will significantly reduce the complexity for users to find their data promoting one-click access to the scientific datasets with more specialised views when needed. This includes a better integration with Planetary GIS analysis tools and Planetary interoperability services (search and retrieve data, supporting e.g. PDAP, EPN-TAP). It will be also up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's ExoMars and upcoming BepiColombo missions. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). This contribution will introduce the new PSA, its key features and access interfaces.

  12. ISRUC-Sleep: A comprehensive public dataset for sleep researchers.

    PubMed

    Khalighi, Sirvan; Sousa, Teresa; Santos, José Moutinho; Nunes, Urbano

    2016-02-01

    To facilitate the performance comparison of new methods for sleep patterns analysis, datasets with quality content, publicly-available, are very important and useful. We introduce an open-access comprehensive sleep dataset, called ISRUC-Sleep. The data were obtained from human adults, including healthy subjects, subjects with sleep disorders, and subjects under the effect of sleep medication. Each recording was randomly selected between PSG recordings that were acquired by the Sleep Medicine Centre of the Hospital of Coimbra University (CHUC). The dataset comprises three groups of data: (1) data concerning 100 subjects, with one recording session per subject; (2) data gathered from 8 subjects; two recording sessions were performed per subject, and (3) data collected from one recording session related to 10 healthy subjects. The polysomnography (PSG) recordings, associated with each subject, were visually scored by two human experts. Comparing the existing sleep-related public datasets, ISRUC-Sleep provides data of a reasonable number of subjects with different characteristics such as: data useful for studies involving changes in the PSG signals over time; and data of healthy subjects useful for studies involving comparison of healthy subjects with the patients, suffering from sleep disorders. This dataset was created aiming to complement existing datasets by providing easy-to-apply data collection with some characteristics not covered yet. ISRUC-Sleep can be useful for analysis of new contributions: (i) in biomedical signal processing; (ii) in development of ASSC methods; and (iii) on sleep physiology studies. To evaluate and compare new contributions, which use this dataset as a benchmark, results of applying a subject-independent automatic sleep stage classification (ASSC) method on ISRUC-Sleep dataset are presented.

  13. Independence test for sparse data

    NASA Astrophysics Data System (ADS)

    García, J. E.; González-López, V. A.

    2016-06-01

    In this paper a new non-parametric independence test is presented. García and González-López (2014) [1] introduced the LIS test for the hypothesis of independence between two continuous random variables, the test proposed in this work is a generalization of the LIS test. The new test does not require the assumption of continuity for the random variables, it test is applied to two datasets and also compared with the Pearson's Chi-squared test.

  14. Are Independent Probes Truly Independent?

    ERIC Educational Resources Information Center

    Camp, Gino; Pecher, Diane; Schmidt, Henk G.; Zeelenberg, Rene

    2009-01-01

    The independent cue technique has been developed to test traditional interference theories against inhibition theories of forgetting. In the present study, the authors tested the critical criterion for the independence of independent cues: Studied cues not presented during test (and unrelated to test cues) should not contribute to the retrieval…

  15. A polymer dataset for accelerated property prediction and design.

    PubMed

    Huan, Tran Doan; Mannodi-Kanakkithodi, Arun; Kim, Chiho; Sharma, Vinit; Pilania, Ghanshyam; Ramprasad, Rampi

    2016-01-01

    Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. Generally, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to predict the properties of materials not already in the dataset, thus accelerating the materials design. Herein, we develop a dataset of 1,073 polymers and related materials and make it available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate target of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. It will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided. PMID:26927478

  16. A polymer dataset for accelerated property prediction and design.

    PubMed

    Huan, Tran Doan; Mannodi-Kanakkithodi, Arun; Kim, Chiho; Sharma, Vinit; Pilania, Ghanshyam; Ramprasad, Rampi

    2016-03-01

    Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. Generally, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to predict the properties of materials not already in the dataset, thus accelerating the materials design. Herein, we develop a dataset of 1,073 polymers and related materials and make it available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate target of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. It will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided.

  17. A polymer dataset for accelerated property prediction and design

    PubMed Central

    Huan, Tran Doan; Mannodi-Kanakkithodi, Arun; Kim, Chiho; Sharma, Vinit; Pilania, Ghanshyam; Ramprasad, Rampi

    2016-01-01

    Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. Generally, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to predict the properties of materials not already in the dataset, thus accelerating the materials design. Herein, we develop a dataset of 1,073 polymers and related materials and make it available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate target of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. It will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided. PMID:26927478

  18. The CMS dataset bookkeeping service

    SciTech Connect

    Afaq, Anzar,; Dolgert, Andrew; Guo, Yuyi; Jones, Chris; Kosyakov, Sergey; Kuznetsov, Valentin; Lueking, Lee; Riley, Dan; Sekhri, Vijay; /Fermilab

    2007-10-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  19. Forecasting medical waste generation using short and extra short datasets: Case study of Lithuania.

    PubMed

    Karpušenkaitė, Aistė; Ruzgas, Tomas; Denafas, Gintaras

    2016-04-01

    The aim of the study is to evaluate the performance of various mathematical modelling methods, while forecasting medical waste generation using Lithuania's annual medical waste data. Only recently has a hazardous waste collection system that includes medical waste been created and therefore the study access to gain large sets of relevant data for its research has been somewhat limited. According to data that was managed to be obtained, it was decided to develop three short and extra short datasets with 20, 10 and 6 observations. Spearman's correlation calculation showed that the influence of independent variables, such as visits at hospitals and other medical institutions, number of children in the region, number of beds in hospital and other medical institutions, average life expectancy and doctor's visits in that region are the most consistent and common in all three datasets. Tests on the performance of artificial neural networks, multiple linear regression, partial least squares, support vector machines and four non-parametric regression methods were conducted on the collected datasets. The best and most promising results were demonstrated by generalised additive (R(2) = 0.90455) in the regional data case, smoothing splines models (R(2) = 0.98584) in the long annual data case and multilayer feedforward artificial neural networks in the short annual data case (R(2) = 0.61103). PMID:26879908

  20. Using the National Datasets for Faculty Studies.

    ERIC Educational Resources Information Center

    Milam, John

    1999-01-01

    This paper examines 17 national datasets that are available for policy studies and research about college faculty. The datasets include 11 containing faculty information, two about student enrollment, two about degrees awarded, and two about institutional activity. Each of the following datasets is individually described: (1) National Science…

  1. Data Integration for Heterogenous Datasets

    PubMed Central

    2014-01-01

    Abstract More and more, the needs of data analysts are requiring the use of data outside the control of their own organizations. The increasing amount of data available on the Web, the new technologies for linking data across datasets, and the increasing need to integrate structured and unstructured data are all driving this trend. In this article, we provide a technical overview of the emerging “broad data” area, in which the variety of heterogeneous data being used, rather than the scale of the data being analyzed, is the limiting factor in data analysis efforts. The article explores some of the emerging themes in data discovery, data integration, linked data, and the combination of structured and unstructured data. PMID:25553272

  2. Watershed Boundary Dataset for Mississippi

    USGS Publications Warehouse

    Wilson, K. Van; Clair, Michael G.; Turnipseed, D. Phil; Rebich, Richard A.

    2009-01-01

    The U.S. Geological Survey, in cooperation with the Mississippi Department of Environmental Quality, U.S. Department of Agriculture-Natural Resources Conservation Service, Mississippi Department of Transportation, U.S. Department of Agriculture-Forest Service, and the Mississippi Automated Resource Information System developed a 1:24,000-scale Watershed Boundary Dataset for Mississippi including watershed and subwatershed boundaries, codes, names, and areas. The Watershed Boundary Dataset for Mississippi provides a standard geographical framework for water-resources and selected land-resources planning. The original 8-digit subbasins (Hydrologic Unit Codes) were further subdivided into 10-digit watersheds (62.5 to 391 square miles (mi2)) and 12-digit subwatersheds (15.6 to 62.5 mi2) - the exceptions being the Delta part of Mississippi and the Mississippi River inside levees, which were subdivided into 10-digit watersheds only. Also, large water bodies in the Mississippi Sound along the coast were not delineated as small as a typical 12-digit subwatershed. All of the data - including watershed and subwatershed boundaries, subdivision codes and names, and drainage-area data - are stored in a Geographic Information System database, which are available at: http://ms.water.usgs.gov/. This map shows information on drainage and hydrography in the form of U.S. Geological Survey hydrologic unit boundaries for water-resource 2-digit regions, 4-digit subregions, 6-digit basins (formerly called accounting units), 8-digit subbasins (formerly called cataloging units), 10-digit watershed, and 12-digit subwatersheds in Mississippi. A description of the project study area, methods used in the development of watershed and subwatershed boundaries for Mississippi, and results are presented in Wilson and others (2008). The data presented in this map and by Wilson and others (2008) supersede the data presented for Mississippi by Seaber and others (1987) and U.S. Geological Survey (1977).

  3. The Johns Hopkins University multimodal dataset for human action recognition

    NASA Astrophysics Data System (ADS)

    Murray, Thomas S.; Mendat, Daniel R.; Pouliquen, Philippe O.; Andreou, Andreas G.

    2015-05-01

    The Johns Hopkins University MultiModal Action (JHUMMA) dataset contains a set of twenty-one actions recorded with four sensor systems in three different modalities. The data was collected with a data acquisition system that includes three independent active sonar devices at three different frequencies and a Microsoft Kinect sensor that provides both RGB and Depth data. We have developed algorithms for human action recognition from active acoustics and provide benchmark baseline recognition performance results.

  4. A comprehensive polymer dataset for accelerated property prediction and design

    NASA Astrophysics Data System (ADS)

    Tran, Huan; Kumar Mannodi-Kanakkithodi, Arun; Kim, Chiho; Sharma, Vinit; Oilania, Ghanshyam; Ramprasad, Rampi

    Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. In principle, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to rapidly predict the properties of materials not already in the dataset, thus accelerating the design of materials with preferable properties. Here, we report the development of a dataset of 1,065 polymers and related materials, which is available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate target of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. The dataset will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided. We discuss some information ``learned`` from the dataset and suggest that it may be used as the playground for further data-mining work.

  5. Salam's independence

    NASA Astrophysics Data System (ADS)

    Fraser, Gordon

    2009-01-01

    In his kind review of my biography of the Nobel laureate Abdus Salam (December 2008 pp45-46), John W Moffat wrongly claims that Salam had "independently thought of the idea of parity violation in weak interactions".

  6. Maintaining Independence.

    ERIC Educational Resources Information Center

    Upah-Bant, Marilyn

    1978-01-01

    Describes the over-all business and production operation of the "Daily Illini" at the University of Illinois to show how this college publication has assumed the burdens and responsibilities of true independence. (GW)

  7. AMADA-Analysis of multidimensional astronomical datasets

    NASA Astrophysics Data System (ADS)

    de Souza, R. S.; Ciardi, B.

    2015-09-01

    We present AMADA, an interactive web application to analyze multidimensional datasets. The user uploads a simple ASCII file and AMADA performs a number of exploratory analysis together with contemporary visualizations diagnostics. The package performs a hierarchical clustering in the parameter space, and the user can choose among linear, monotonic or non-linear correlation analysis. AMADA provides a number of clustering visualization diagnostics such as heatmaps, dendrograms, chord diagrams, and graphs. In addition, AMADA has the option to run a standard or robust principal components analysis, displaying the results as polar bar plots. The code is written in R and the web interface was created using the SHINY framework. AMADA source-code is freely available at https://goo.gl/KeSPue, and the shiny-app at http://goo.gl/UTnU7I.

  8. Regression with Empirical Variable Selection: Description of a New Method and Application to Ecological Datasets

    PubMed Central

    Goodenough, Anne E.; Hart, Adam G.; Stafford, Richard

    2012-01-01

    Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset – habitat and offspring quality in the great tit (Parus major) – the optimal REVS model explained more variance (higher R2), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R2 values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of “core” variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines. PMID:22479605

  9. Independent Living.

    ERIC Educational Resources Information Center

    Nathanson, Jeanne H., Ed.

    1994-01-01

    This issue of "OSERS" addresses the subject of independent living of individuals with disabilities. The issue includes a message from Judith E. Heumann, the Assistant Secretary of the Office of Special Education and Rehabilitative Services (OSERS), and 10 papers. Papers have the following titles and authors: "Changes in the Rehabilitation Act of…

  10. Evaluation of Uncertainty in Precipitation Datasets for New Mexico, USA

    NASA Astrophysics Data System (ADS)

    Besha, A. A.; Steele, C. M.; Fernald, A.

    2014-12-01

    Climate change, population growth and other factors are endangering water availability and sustainability in semiarid/arid areas particularly in the southwestern United States. Wide coverage of spatial and temporal measurements of precipitation are key for regional water budget analysis and hydrological operations which themselves are valuable tool for water resource planning and management. Rain gauge measurements are usually reliable and accurate at a point. They measure rainfall continuously, but spatial sampling is limited. Ground based radar and satellite remotely sensed precipitation have wide spatial and temporal coverage. However, these measurements are indirect and subject to errors because of equipment, meteorological variability, the heterogeneity of the land surface itself and lack of regular recording. This study seeks to understand precipitation uncertainty and in doing so, lessen uncertainty propagation into hydrological applications and operations. We reviewed, compared and evaluated the TRMM (Tropical Rainfall Measuring Mission) precipitation products, NOAA's (National Oceanic and Atmospheric Administration) Global Precipitation Climatology Centre (GPCC) monthly precipitation dataset, PRISM (Parameter elevation Regression on Independent Slopes Model) data and data from individual climate stations including Cooperative Observer Program (COOP), Remote Automated Weather Stations (RAWS), Soil Climate Analysis Network (SCAN) and Snowpack Telemetry (SNOTEL) stations. Though not yet finalized, this study finds that the uncertainty within precipitation estimates datasets is influenced by regional topography, season, climate and precipitation rate. Ongoing work aims to further evaluate precipitation datasets based on the relative influence of these phenomena so that we can identify the optimum datasets for input to statewide water budget analysis.

  11. A reanalysis dataset of the South China Sea

    PubMed Central

    Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

    2014-01-01

    Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992–2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability. PMID:25977803

  12. A reanalysis dataset of the South China Sea.

    PubMed

    Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

    2014-01-01

    Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992-2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability.

  13. Internal Consistency of the NVAP Water Vapor Dataset

    NASA Technical Reports Server (NTRS)

    Suggs, Ronnie J.; Jedlovec, Gary J.; Arnold, James E. (Technical Monitor)

    2001-01-01

    The NVAP (NASA Water Vapor Project) dataset is a global dataset at 1 x 1 degree spatial resolution consisting of daily, pentad, and monthly atmospheric precipitable water (PW) products. The analysis blends measurements from the Television and Infrared Operational Satellite (TIROS) Operational Vertical Sounder (TOVS), the Special Sensor Microwave/Imager (SSM/I), and radiosonde observations into a daily collage of PW. The original dataset consisted of five years of data from 1988 to 1992. Recent updates have added three additional years (1993-1995) and incorporated procedural and algorithm changes from the original methodology. Since each of the PW sources (TOVS, SSM/I, and radiosonde) do not provide global coverage, each of these sources compliment one another by providing spatial coverage over regions and during times where the other is not available. For this type of spatial and temporal blending to be successful, each of the source components should have similar or compatible accuracies. If this is not the case, regional and time varying biases may be manifested in the NVAP dataset. This study examines the consistency of the NVAP source data by comparing daily collocated TOVS and SSM/I PW retrievals with collocated radiosonde PW observations. The daily PW intercomparisons are performed over the time period of the dataset and for various regions.

  14. Interactive visualization and analysis of multimodal datasets for surgical applications.

    PubMed

    Kirmizibayrak, Can; Yim, Yeny; Wakid, Mike; Hahn, James

    2012-12-01

    Surgeons use information from multiple sources when making surgical decisions. These include volumetric datasets (such as CT, PET, MRI, and their variants), 2D datasets (such as endoscopic videos), and vector-valued datasets (such as computer simulations). Presenting all the information to the user in an effective manner is a challenging problem. In this paper, we present a visualization approach that displays the information from various sources in a single coherent view. The system allows the user to explore and manipulate volumetric datasets, display analysis of dataset values in local regions, combine 2D and 3D imaging modalities and display results of vector-based computer simulations. Several interaction methods are discussed: in addition to traditional interfaces including mouse and trackers, gesture-based natural interaction methods are shown to control these visualizations with real-time performance. An example of a medical application (medialization laryngoplasty) is presented to demonstrate how the combination of different modalities can be used in a surgical setting with our approach.

  15. Understanding independence

    NASA Astrophysics Data System (ADS)

    Annan, James; Hargreaves, Julia

    2016-04-01

    In order to perform any Bayesian processing of a model ensemble, we need a prior over the ensemble members. In the case of multimodel ensembles such as CMIP, the historical approach of ``model democracy'' (i.e. equal weight for all models in the sample) is no longer credible (if it ever was) due to model duplication and inbreeding. The question of ``model independence'' is central to the question of prior weights. However, although this question has been repeatedly raised, it has not yet been satisfactorily addressed. Here I will discuss the issue of independence and present a theoretical foundation for understanding and analysing the ensemble in this context. I will also present some simple examples showing how these ideas may be applied and developed.

  16. Dataset of Scientific Inquiry Learning Environment

    ERIC Educational Resources Information Center

    Ting, Choo-Yee; Ho, Chiung Ching

    2015-01-01

    This paper presents the dataset collected from student interactions with INQPRO, a computer-based scientific inquiry learning environment. The dataset contains records of 100 students and is divided into two portions. The first portion comprises (1) "raw log data", capturing the student's name, interfaces visited, the interface…

  17. Studying the Independent School Library

    ERIC Educational Resources Information Center

    Cahoy, Ellysa Stern; Williamson, Susan G.

    2008-01-01

    In 2005, the American Association of School Librarians' Independent Schools Section conducted a national survey of independent school libraries. This article analyzes the results of the survey, reporting specialized data and information regarding independent school library budgets, collections, services, facilities, and staffing. Additionally, the…

  18. Towards interoperable and reproducible QSAR analyses: Exchange of datasets

    PubMed Central

    2010-01-01

    Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets

  19. Assessment of Northern Hemisphere Snow Water Equivalent Datasets in ESA SnowPEx project

    NASA Astrophysics Data System (ADS)

    Luojus, Kari; Pulliainen, Jouni; Cohen, Juval; Ikonen, Jaakko; Derksen, Chris; Mudryk, Lawrence; Nagler, Thomas; Bojkov, Bojan

    2016-04-01

    Reliable information on snow cover across the Northern Hemisphere and Arctic and sub-Arctic regions is needed for climate monitoring, for understanding the Arctic climate system, and for the evaluation of the role of snow cover and its feedback in climate models. In addition to being of significant interest for climatological investigations, reliable information on snow cover is of high value for the purpose of hydrological forecasting and numerical weather prediction. Terrestrial snow covers up to 50 million km² of the Northern Hemisphere in winter and is characterized by high spatial and temporal variability. Therefore satellite observations provide the best means for timely and complete observations of the global snow cover. There are a number of independent SWE products available that describe the snow conditions on multi-decadal and global scales. Some products are derived using satellite-based information while others rely on meteorological observations and modelling. What is common to practically all the existing hemispheric SWE products, is that their retrieval performance on hemispherical and multi-decadal scales are not accurately known. The purpose of the ESA funded SnowPEx project is to obtain a quantitative understanding of the uncertainty in satellite- as well as model-based SWE products through an internationally coordinated and consistent evaluation exercise. The currently available Northern Hemisphere wide satellite-based SWE datasets which were assessed include 1) the GlobSnow SWE, 2) the NASA Standard SWE, 3) NASA prototype and 4) NSIDC-SSM/I SWE products. The model-based datasets include: 5) the Global Land Data Assimilation System Version 2 (GLDAS-2) product 6) the European Centre for Medium-Range Forecasts Interim Land Reanalysis (ERA-I-Land) which uses a simple snow scheme 7) the Modern Era Retrospective Analysis for Research and Applications (MERRA) which uses an intermediate complexity snow scheme; and 8) SWE from the Crocus snow scheme, a

  20. The new Planetary Science Archive: A tool for exploration and discovery of scientific datasets from ESA's planetary missions.

    NASA Astrophysics Data System (ADS)

    Heather, David; Besse, Sebastien; Barbarisi, Isa; Arviset, Christophe; de Marchi, Guido; Barthelemy, Maud; Docasal, Ruben; Fraga, Diego; Grotheer, Emmanuel; Lim, Tanya; Macfarlane, Alan; Martinez, Santa; Rios, Carlos

    2016-04-01

    Introduction: The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces (e.g. FTP browser, Map based, Advanced search, and Machine interface): http://archives.esac.esa.int/psa All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. Updating the PSA: The PSA is currently implementing a number of significant changes, both to its web-based interface to the scientific community, and to its database structure. The new PSA will be up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's upcoming ExoMars and BepiColombo missions. The newly designed PSA homepage will provide direct access to scientific datasets via a text search for targets or missions. This will significantly reduce the complexity for users to find their data and will promote one-click access to the datasets. Additionally, the homepage will provide direct access to advanced views and searches of the datasets. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). Queries to the PSA database will be possible either via the homepage (for simple searches of missions or targets), or through a filter menu for more tailored queries. The filter menu will offer multiple options to search for a particular dataset or product, and will manage queries for both in-situ and remote sensing instruments. Parameters such as start-time, phase angle, and heliocentric distance will be emphasized. A further

  1. The new Planetary Science Archive: A tool for exploration and discovery of scientific datasets from ESA's planetary missions

    NASA Astrophysics Data System (ADS)

    Heather, David

    2016-07-01

    Introduction: The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces (e.g. FTP browser, Map based, Advanced search, and Machine interface): http://archives.esac.esa.int/psa All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. Updating the PSA: The PSA is currently implementing a number of significant changes, both to its web-based interface to the scientific community, and to its database structure. The new PSA will be up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's upcoming ExoMars and BepiColombo missions. The newly designed PSA homepage will provide direct access to scientific datasets via a text search for targets or missions. This will significantly reduce the complexity for users to find their data and will promote one-click access to the datasets. Additionally, the homepage will provide direct access to advanced views and searches of the datasets. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). Queries to the PSA database will be possible either via the homepage (for simple searches of missions or targets), or through a filter menu for more tailored queries. The filter menu will offer multiple options to search for a particular dataset or product, and will manage queries for both in-situ and remote sensing instruments. Parameters such as start-time, phase angle, and heliocentric distance will be emphasized. A further

  2. Utilizing Multiple Datasets for Snow Cover Mapping

    NASA Technical Reports Server (NTRS)

    Tait, Andrew B.; Hall, Dorothy K.; Foster, James L.; Armstrong, Richard L.

    1999-01-01

    Snow-cover maps generated from surface data are based on direct measurements, however they are prone to interpolation errors where climate stations are sparsely distributed. Snow cover is clearly discernable using satellite-attained optical data because of the high albedo of snow, yet the surface is often obscured by cloud cover. Passive microwave (PM) data is unaffected by clouds, however, the snow-cover signature is significantly affected by melting snow and the microwaves may be transparent to thin snow (less than 3cm). Both optical and microwave sensors have problems discerning snow beneath forest canopies. This paper describes a method that combines ground and satellite data to produce a Multiple-Dataset Snow-Cover Product (MDSCP). Comparisons with current snow-cover products show that the MDSCP draws together the advantages of each of its component products while minimizing their potential errors. Improved estimates of the snow-covered area are derived through the addition of two snow-cover classes ("thin or patchy" and "high elevation" snow cover) and from the analysis of the climate station data within each class. The compatibility of this method for use with Moderate Resolution Imaging Spectroradiometer (MODIS) data, which will be available in 2000, is also discussed. With the assimilation of these data, the resolution of the MDSCP would be improved both spatially and temporally and the analysis would become completely automated.

  3. PEViD: privacy evaluation video dataset

    NASA Astrophysics Data System (ADS)

    Korshunov, Pavel; Ebrahimi, Touradj

    2013-09-01

    Visual privacy protection, i.e., obfuscation of personal visual information in video surveillance is an important and increasingly popular research topic. However, while many datasets are available for testing performance of various video analytics, little to nothing exists for evaluation of visual privacy tools. Since surveillance and privacy protection have contradictory objectives, the design principles of corresponding evaluation datasets should differ too. In this paper, we outline principles that need to be considered when building a dataset for privacy evaluation. Following these principles, we present new, and the first to our knowledge, Privacy Evaluation Video Dataset (PEViD). With the dataset, we provide XML-based annotations of various privacy regions, including face, accessories, skin regions, hair, body silhouette, and other personal information, and their descriptions. Via preliminary subjective tests, we demonstrate the flexibility and suitability of the dataset for privacy evaluations. The evaluation results also show the importance of secondary privacy regions that contain non-facial personal information for privacy- intelligibility tradeoff. We believe that PEViD dataset is equally suitable for evaluations of privacy protection tools using objective metrics and subjective assessments.

  4. A high-resolution European dataset for hydrologic modeling

    NASA Astrophysics Data System (ADS)

    Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

    2013-04-01

    inputs to the hydrological calibration and validation of EFAS as well as for establishing long-term discharge "proxy" climatologies which can then in turn be used for statistical analysis to derive return periods or other time series derivatives. In addition, this dataset will be used to assess climatological trends in Europe. Unfortunately, to date no baseline dataset at the European scale exists to test the quality of the herein presented data. Hence, a comparison against other existing datasets can therefore only be an indication of data quality. Due to availability, a comparison was made for precipitation and temperature only, arguably the most important meteorological drivers for hydrologic models. A variety of analyses was undertaken at country scale against data reported to EUROSTAT and E-OBS datasets. The comparison revealed that while the datasets showed overall similar temporal and spatial patterns, there were some differences in magnitudes especially for precipitation. It is not straightforward to define the specific cause for these differences. However, in most cases the comparatively low observation station density appears to be the principal reason for the differences in magnitude.

  5. A synthetic document image dataset for developing and evaluating historical document processing methods

    NASA Astrophysics Data System (ADS)

    Walker, Daniel; Lund, William; Ringger, Eric

    2012-01-01

    Document images accompanied by OCR output text and ground truth transcriptions are useful for developing and evaluating document recognition and processing methods, especially for historical document images. Additionally, research into improving the performance of such methods often requires further annotation of training and test data (e.g., topical document labels). However, transcribing and labeling historical documents is expensive. As a result, existing real-world document image datasets with such accompanying resources are rare and often relatively small. We introduce synthetic document image datasets of varying levels of noise that have been created from standard (English) text corpora using an existing document degradation model applied in a novel way. Included in the datasets is the OCR output from real OCR engines including the commercial ABBYY FineReader and the open-source Tesseract engines. These synthetic datasets are designed to exhibit some of the characteristics of an example real-world document image dataset, the Eisenhower Communiqúes. The new datasets also benefit from additional metadata that exist due to the nature of their collection and prior labeling efforts. We demonstrate the usefulness of the synthetic datasets by training an existing multi-engine OCR correction method on the synthetic data and then applying the model to reduce word error rates on the historical document dataset. The synthetic datasets will be made available for use by other researchers.

  6. Simulation of Smart Home Activity Datasets.

    PubMed

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-01-01

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  7. Simulation of Smart Home Activity Datasets

    PubMed Central

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-01-01

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation. PMID:26087371

  8. Managing large SNP datasets with SNPpy.

    PubMed

    Mitha, Faheem

    2013-01-01

    Using relational databases to manage SNP datasets is a very useful technique that has significant advantages over alternative methods, including the ability to leverage the power of relational databases to perform data validation, and the use of the powerful SQL query language to export data. SNPpy is a Python program which uses the PostgreSQL database and the SQLAlchemy Python library to automate SNP data management. This chapter shows how to use SNPpy to store and manage large datasets.

  9. Self-Aligning Manifolds for Matching Disparate Medical Image Datasets.

    PubMed

    Baumgartner, Christian F; Gomez, Alberto; Koch, Lisa M; Housden, James R; Kolbitsch, Christoph; McClelland, Jamie R; Rueckert, Daniel; King, Andy P

    2015-01-01

    Manifold alignment can be used to reduce the dimensionality of multiple medical image datasets into a single globally consistent low-dimensional space. This may be desirable in a wide variety of problems, from fusion of different imaging modalities for Alzheimer's disease classification to 4DMR reconstruction from 2D MR slices. Unfortunately, most existing manifold alignment techniques require either a set of prior correspondences or comparability between the datasets in high-dimensional space, which is often not possible. We propose a novel technique for the 'self-alignment' of manifolds (SAM) from multiple dissimilar imaging datasets without prior correspondences or inter-dataset image comparisons. We quantitatively evaluate the method on 4DMR reconstruction from realistic, synthetic sagittal 2D MR slices from 6 volunteers and real data from 4 volunteers. Additionally, we demonstrate the technique for the compounding of two free breathing 3D ultrasound views from one volunteer. The proposed method performs significantly better for 4DMR reconstruction than state-of-the-art image-based techniques. PMID:26221687

  10. A polymer dataset for accelerated property prediction and design

    DOE PAGES

    Huan, Tran Doan; Mannodi-Kanakkithodi, Arun; Kim, Chiho; Sharma, Vinit; Pilania, Ghanshyam; Ramprasad, Rampi

    2016-03-01

    Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. Generally, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to predict the properties of materials not already in the dataset, thus accelerating the materials design. Herein, we develop a dataset of 1,073 polymers and related materials and make it available at http://khazana.uconn.edu/. This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate targetmore » of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. As a result, it will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided.« less

  11. Using bitmap index for interactive exploration of large datasets

    SciTech Connect

    Wu, Kesheng; Koegler, Wendy; Chen, Jacqueline; Shoshani, Arie

    2003-04-24

    Many scientific applications generate large spatio-temporal datasets. A common way of exploring these datasets is to identify and track regions of interest. Usually these regions are defined as contiguous sets of points whose attributes satisfy some user defined conditions, e.g. high temperature regions in a combustion simulation. At each time step, the regions of interest may be identified by first searching for all points that satisfy the conditions and then grouping the points into connected regions. To speed up this process, the searching step may use a tree based indexing scheme, such as a kd-tree or an octree. However, these indices are efficient only if the searches are limited to one or a small number of selected attributes. Scientific datasets often contain hundreds of attributes and scientists frequently study these attributes incomplex combinations, e.g. finding regions of high temperature yet low shear rate and pressure. Bitmap indexing is an efficient method for searching on multiple criteria simultaneously. We apply a bitmap compression scheme to reduce the size of the indices. In addition, we show that the compressed bitmaps can be used efficiently to perform the region growing and the region tracking operations. Analyses show that our approach scales well and our tests on two datasets from simulation of the auto ignition process show impressive performance.

  12. Developing independence.

    PubMed

    Turnbull, A P; Turnbull, H R

    1985-03-01

    The transition from living a life as others want (dependence) to living it as the adolescent wants to live it (independence) is extraordinarily difficult for most teen-agers and their families. The difficulty is compounded in the case of adolescents with disabilities. They are often denied access to the same opportunities of life that are accessible to the nondisabled. They face special problems in augmenting their inherent capacities so that they can take fuller advantage of the accommodations that society makes in an effort to grant them access. In particular, they need training designed to increase their capacities to make, communicate, implement, and evaluate their own life-choices. The recommendations made in this paper are grounded in the long-standing tradition of parens patriae and enlightened paternalism; they seek to be deliberately and cautiously careful about the lives of adolescents with disabilities and their families. We based them on the recent tradition of anti-institutionalism and they are also consistent with some of the major policy directions of the past 15-20 years. These include: normalization, integration, and least-restrictive alternatives; the unity and integrity of the family; the importance of opportunities for self-advocacy; the role of consumer consent and choice in consumer-professional relationships; the need for individualized services; the importance of the developmental model as a basis for service delivery; the value of economic productivity of people with disabilities; and the rights of habilitation, amelioration, and prevention. PMID:3156827

  13. Developing independence.

    PubMed

    Turnbull, A P; Turnbull, H R

    1985-03-01

    The transition from living a life as others want (dependence) to living it as the adolescent wants to live it (independence) is extraordinarily difficult for most teen-agers and their families. The difficulty is compounded in the case of adolescents with disabilities. They are often denied access to the same opportunities of life that are accessible to the nondisabled. They face special problems in augmenting their inherent capacities so that they can take fuller advantage of the accommodations that society makes in an effort to grant them access. In particular, they need training designed to increase their capacities to make, communicate, implement, and evaluate their own life-choices. The recommendations made in this paper are grounded in the long-standing tradition of parens patriae and enlightened paternalism; they seek to be deliberately and cautiously careful about the lives of adolescents with disabilities and their families. We based them on the recent tradition of anti-institutionalism and they are also consistent with some of the major policy directions of the past 15-20 years. These include: normalization, integration, and least-restrictive alternatives; the unity and integrity of the family; the importance of opportunities for self-advocacy; the role of consumer consent and choice in consumer-professional relationships; the need for individualized services; the importance of the developmental model as a basis for service delivery; the value of economic productivity of people with disabilities; and the rights of habilitation, amelioration, and prevention.

  14. Enhanced Data Discoverability for in Situ Hyperspectral Datasets

    NASA Astrophysics Data System (ADS)

    Rasaiah, B.; Bellman, C.; Hewson, R. D.; Jones, S. D.; Malthus, T. J.

    2016-06-01

    Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015) with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.

  15. Increasing consistency of disease biomarker prediction across datasets.

    PubMed

    Chikina, Maria D; Sealfon, Stuart C

    2014-01-01

    Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors such as genetics, sample preparations, and tissue heterogeneity. These factors can contribute to a lack of agreement among related studies, limiting the utility of their aggregation. We show that it is feasible to carry out an automatic correction of individual datasets to reduce the effect of such 'latent variables' (without prior knowledge of the variables) in such a way that datasets addressing the same condition show better agreement once each is corrected. We build our approach on the method of surrogate variable analysis but we demonstrate that the original algorithm is unsuitable for the analysis of human tissue samples that are mixtures of different cell types. We propose a modification to SVA that is crucial to obtaining the improvement in agreement that we observe. We develop our method on a compendium of multiple sclerosis data and verify it on an independent compendium of Parkinson's disease datasets. In both cases, we show that our method is able to improve agreement across varying study designs, platforms, and tissues. This approach has the potential for wide applicability to any field where lack of inter-study agreement has been a concern.

  16. Status and Preliminary Evaluation for Chinese Re-Analysis Datasets

    NASA Astrophysics Data System (ADS)

    bin, zhao; chunxiang, shi; tianbao, zhao; dong, si; jingwei, liu

    2016-04-01

    Based on operational T639L60 spectral model, combined with Hybird_GSI assimilation system by using meteorological observations including radiosondes, buoyes, satellites el al., a set of Chinese Re-Analysis (CRA) datasets is developing by Chinese National Meteorological Information Center (NMIC) of Chinese Meteorological Administration (CMA). The datasets are run at 30km (0.28°latitude / longitude) resolution which holds higher resolution than most of the existing reanalysis dataset. The reanalysis is done in an effort to enhance the accuracy of historical synoptic analysis and aid to find out detailed investigation of various weather and climate systems. The current status of reanalysis is in a stage of preliminary experimental analysis. One-year forecast data during Jun 2013 and May 2014 has been simulated and used in synoptic and climate evaluation. We first examine the model prediction ability with the new assimilation system, and find out that it represents significant improvement in Northern and Southern hemisphere, due to addition of new satellite data, compared with operational T639L60 model, the effect of upper-level prediction is improved obviously and overall prediction stability is enhanced. In climatological analysis, compared with ERA-40, NCEP/NCAR and NCEP/DOE reanalyses, the results show that surface temperature simulates a bit lower in land and higher over ocean, 850-hPa specific humidity reflects weakened anomaly and the zonal wind value anomaly is focus on equatorial tropics. Meanwhile, the reanalysis dataset shows good ability for various climate index, such as subtropical high index, ESMI (East-Asia subtropical Summer Monsoon Index) et al., especially for the Indian and western North Pacific monsoon index. Latter we will further improve the assimilation system and dynamical simulating performance, and obtain 40-years (1979-2018) reanalysis datasets. It will provide a more comprehensive analysis for synoptic and climate diagnosis.

  17. Interoperability of Multiple Datasets with JMARS

    NASA Astrophysics Data System (ADS)

    Smith, M. E.; Christensen, P. R.; Noss, D.; Anwar, S.; Dickenshied, S.

    2012-12-01

    Planetary Science includes all celestial bodies including Earth. However, when investigating Geographic Information System (GIS) applications, Earth and planetary bodies have the tendency to be separated. One reason is because we have been learning and investigating Earth's properties much longer than we have been studying the other planetary bodies, therefore, the archive of GCS and projections is much larger. The first latitude and longitude system of Earth was invented between 276 BC and 194 BC by Eratosthenes who was also the first to calculate the circumference of the Earth. As time went on, scientists continued to re-measure the Earth on both local and global scales which has created a large collection of projections and geographic coordinate systems (GCS) to choose from. The variety of options can create a time consuming task to determine which GCS or projection gets applied to each dataset and how to convert to the correct GCS or projection. Another issue is presented when determining if the dataset should be applied to a geocentric sphere or a geodetic spheroid. Both of which are measured and determine latitude values differently. This can lead to inconsistent results and frustration for the user. This is not the case with other planetary bodies. Although the existence of other planets have been known since the early Babylon times, the accuracy of the planets rotation, size and geologic properties weren't known for several hundreds of years later. Therefore, the options for projections or GCS's are much smaller than the options one has for Earth's data. Even then, the projection and GCS options for other celestial bodies are informal. So it can be hard for the user to determine which projection or GCS to apply to the other planets. JMARS (Java Mission Analysis for Remote Sensing) is an open source suite that was developed by Arizona State University's Mars Space Flight Facility. The beauty of JMARS is that the tool transforms all datasets behind the scenes

  18. Transforming a research-oriented dataset for evaluation of tactical information extraction technologies

    NASA Astrophysics Data System (ADS)

    Roy, Heather; Kase, Sue E.; Knight, Joanne

    2016-05-01

    The most representative and accurate data for testing and evaluating information extraction technologies is real-world data. Real-world operational data can provide important insights into human and sensor characteristics, interactions, and behavior. However, several challenges limit the feasibility of experimentation with real-world operational data. Realworld data lacks the precise knowledge of a "ground truth," a critical factor for benchmarking progress of developing automated information processing technologies. Additionally, the use of real-world data is often limited by classification restrictions due to the methods of collection, procedures for processing, and tactical sensitivities related to the sources, events, or objects of interest. These challenges, along with an increase in the development of automated information extraction technologies, are fueling an emerging demand for operationally-realistic datasets for benchmarking. An approach to meet this demand is to create synthetic datasets, which are operationally-realistic yet unclassified in content. The unclassified nature of these unclassified synthetic datasets facilitates the sharing of data between military and academic researchers thus increasing coordinated testing efforts. This paper describes the expansion and augmentation of two synthetic text datasets, one initially developed through academic research collaborations with the Army. Both datasets feature simulated tactical intelligence reports regarding fictitious terrorist activity occurring within a counterinsurgency (COIN) operation. The datasets were expanded and augmented to create two military relevant datasets. The first resulting dataset was created by augmenting and merging the two to create a single larger dataset containing ground-truth. The second resulting dataset was restructured to more realistically represent the format and content of intelligence reports. The dataset transformation effort, the final datasets, and their

  19. Visualizing large geospatial datasets with KML Regions

    NASA Astrophysics Data System (ADS)

    Ilyushchenko, S.; Wheeler, D.; Ummel, K.; Hammer, D.; Kraft, R.

    2008-12-01

    Regions are a powerful KML feature that helps viewing very large datasets in Google Earth without sacrificing performance. Data is loaded and drawn only when it falls within the user's view and occupies a certain portion of the screen. Using Regions, it is possible to supply separate levels of detail for the data, so that fine details are loaded only when the data fills a portion of the screen that is large enough for the details to be visible. It becomes easy to create compelling interactive presentations of geospatial datasets that are meaningful at both large and small scale. We present two example datasets: worldwide past, present and future carbon dioxide emissions by power plants provided by Carbon Monitoring for Action, Center for Global Development (http://carma.org), as well as 2007 US bridge safety ratings from Federal Highway Administration (http://www.fhwa.dot.gov/BRIDGE/nbi/ascii.cfm).

  20. Quality Visualization of Microarray Datasets Using Circos

    PubMed Central

    Koch, Martin; Wiese, Michael

    2012-01-01

    Quality control and normalization is considered the most important step in the analysis of microarray data. At present there are various methods available for quality assessments of microarray datasets. However there seems to be no standard visualization routine, which also depicts individual microarray quality. Here we present a convenient method for visualizing the results of standard quality control tests using Circos plots. In these plots various quality measurements are drawn in a circular fashion, thus allowing for visualization of the quality and all outliers of each distinct array within a microarray dataset. The proposed method is intended for use with the Affymetrix Human Genome platform (i.e., GPL 96, GPL570 and GPL571). Circos quality measurement plots are a convenient way for the initial quality estimate of Affymetrix datasets that are stored in publicly available databases.

  1. Maximising the value of hospital administrative datasets.

    PubMed

    Nadathur, Shyamala G

    2010-05-01

    Mandatory and standardised administrative data collections are prevalent in the largely public-funded acute sector. In these systems the data collections are used for financial, performance monitoring and reporting purposes. This paper comments on the infrastructure and standards that have been established to support data collection activities, audit and feedback. The routine, local and research uses of these datasets are described using examples from Australian and international literature. The advantages of hospital administrative datasets and opportunities for improvement are discussed under the following headings: accessibility, standardisation, coverage, completeness, cost of obtaining clinical data, recorded Diagnostic Related Groups and International Classification of Diseases codes, linkage and connectivity. In an era of diminishing resources better utilisation of these datasets should be encouraged. Increased study and scrutiny will enhance transparency and help identify issues in the collections. As electronic information systems are increasingly embraced, administrative data collections need to be managed as valuable assets and powerful operational and patient management tools.

  2. 77 FR 15052 - Dataset Workshop-U.S. Billion Dollar Disasters Dataset (1980-2011): Assessing Dataset Strengths...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-03-14

    ... restrictions preclude attendance for those who do not RSVP by the deadline. Space is also limited to the first... individual basis once participation has been confirmed through RSVP. Workshop Date and Time: The workshop... will be placed on dataset accuracy and time-dependent biases. Pathways to overcome accuracy and...

  3. Comparison of recent SnIa datasets

    SciTech Connect

    Sanchez, J.C. Bueno; Perivolaropoulos, L.; Nesseris, S. E-mail: nesseris@nbi.ku.dk

    2009-11-01

    We rank the six latest Type Ia supernova (SnIa) datasets (Constitution (C), Union (U), ESSENCE (Davis) (E), Gold06 (G), SNLS 1yr (S) and SDSS-II (D)) in the context of the Chevalier-Polarski-Linder (CPL) parametrization w(a) = w{sub 0}+w{sub 1}(1−a), according to their Figure of Merit (FoM), their consistency with the cosmological constant (ΛCDM), their consistency with standard rulers (Cosmic Microwave Background (CMB) and Baryon Acoustic Oscillations (BAO)) and their mutual consistency. We find a significant improvement of the FoM (defined as the inverse area of the 95.4% parameter contour) with the number of SnIa of these datasets ((C) highest FoM, (U), (G), (D), (E), (S) lowest FoM). Standard rulers (CMB+BAO) have a better FoM by about a factor of 3, compared to the highest FoM SnIa dataset (C). We also find that the ranking sequence based on consistency with ΛCDM is identical with the corresponding ranking based on consistency with standard rulers ((S) most consistent, (D), (C), (E), (U), (G) least consistent). The ranking sequence of the datasets however changes when we consider the consistency with an expansion history corresponding to evolving dark energy (w{sub 0},w{sub 1}) = (−1.4,2) crossing the phantom divide line w = −1 (it is practically reversed to (G), (U), (E), (S), (D), (C)). The SALT2 and MLCS2k2 fitters are also compared and some peculiar features of the SDSS-II dataset when standardized with the MLCS2k2 fitter are pointed out. Finally, we construct a statistic to estimate the internal consistency of a collection of SnIa datasets. We find that even though there is good consistency among most samples taken from the above datasets, this consistency decreases significantly when the Gold06 (G) dataset is included in the sample.

  4. Introduction of a simple-model-based land surface dataset for Europe

    NASA Astrophysics Data System (ADS)

    Orth, Rene; Seneviratne, Sonia I.

    2015-04-01

    Land surface hydrology is important because it can play a crucial role during extreme events such as droughts, floods and even heat waves. We introduce in this study a new hydrological dataset for the European continent that consists of soil moisture, runoff and evapotranspiration. It is derived with a simple water balance model (SWBM) forced with precipitation, temperature and net radiation. The SWBM dataset covers Europe and extends over the period 1984-2013 with a daily time step and 0.5°x0.5° resolution. We employ a novel approach to calibrate the model, whereby we consider 300 random parameter sets chosen from an observation-based range. Using several independent validation datasets representing soil moisture (or terrestrial water content), evapotranspiration and streamflow, we identify the best performing parameter set and hence the new dataset. To illustrate its usefulness, the SWBM dataset is compared against ERA-Interim/Land and simulations of the Community Land Model Version 4, using all validation datasets as reference. For soil moisture dynamics it outperforms the benchmarks. Therefore the SWBM soil moisture dataset constitutes a reasonable alternative to sparse measurements, little validated model results, or proxy data such as precipitation indices. In terms of runoff the SWBM dataset also performs well versus the benchmarks. They all show a slight dry bias which is probably due to underestimated precipitation used to force the model. The evaluation of the SWBM evapotranspiration dataset is overall satisfactory, but the dynamics are less well captured for this variable. This highlights the limitations of the dataset, as it is based on a simple model that uses uniform parameter values. Hence some processes impacting evapotranspiration dynamics may not be captured, and quality issues may occur in regions with complex terrain. Furthermore we investigate the sources of skill of the SWBM dataset and find that the parameter set has a similar impact on the

  5. Food additives

    PubMed Central

    Spencer, Michael

    1974-01-01

    Food additives are discussed from the food technology point of view. The reasons for their use are summarized: (1) to protect food from chemical and microbiological attack; (2) to even out seasonal supplies; (3) to improve their eating quality; (4) to improve their nutritional value. The various types of food additives are considered, e.g. colours, flavours, emulsifiers, bread and flour additives, preservatives, and nutritional additives. The paper concludes with consideration of those circumstances in which the use of additives is (a) justified and (b) unjustified. PMID:4467857

  6. The Role of Datasets on Scientific Influence within Conflict Research.

    PubMed

    Van Holt, Tracy; Johnson, Jeffery C; Moates, Shiloh; Carley, Kathleen M

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS) over a 66-year period (1945-2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped shape the

  7. Identification of druggable cancer driver genes amplified across TCGA datasets.

    PubMed

    Chen, Ying; McGee, Jeremy; Chen, Xianming; Doman, Thompson N; Gong, Xueqian; Zhang, Youyan; Hamm, Nicole; Ma, Xiwen; Higgs, Richard E; Bhagwat, Shripad V; Buchanan, Sean; Peng, Sheng-Bin; Staschke, Kirk A; Yadav, Vipin; Yue, Yong; Kouros-Mehr, Hosein

    2014-01-01

    The Cancer Genome Atlas (TCGA) projects have advanced our understanding of the driver mutations, genetic backgrounds, and key pathways activated across cancer types. Analysis of TCGA datasets have mostly focused on somatic mutations and translocations, with less emphasis placed on gene amplifications. Here we describe a bioinformatics screening strategy to identify putative cancer driver genes amplified across TCGA datasets. We carried out GISTIC2 analysis of TCGA datasets spanning 16 cancer subtypes and identified 486 genes that were amplified in two or more datasets. The list was narrowed to 75 cancer-associated genes with potential "druggable" properties. The majority of the genes were localized to 14 amplicons spread across the genome. To identify potential cancer driver genes, we analyzed gene copy number and mRNA expression data from individual patient samples and identified 42 putative cancer driver genes linked to diverse oncogenic processes. Oncogenic activity was further validated by siRNA/shRNA knockdown and by referencing the Project Achilles datasets. The amplified genes represented a number of gene families, including epigenetic regulators, cell cycle-associated genes, DNA damage response/repair genes, metabolic regulators, and genes linked to the Wnt, Notch, Hedgehog, JAK/STAT, NF-KB and MAPK signaling pathways. Among the 42 putative driver genes were known driver genes, such as EGFR, ERBB2 and PIK3CA. Wild-type KRAS was amplified in several cancer types, and KRAS-amplified cancer cell lines were most sensitive to KRAS shRNA, suggesting that KRAS amplification was an independent oncogenic event. A number of MAP kinase adapters were co-amplified with their receptor tyrosine kinases, such as the FGFR adapter FRS2 and the EGFR family adapters GRB2 and GRB7. The ubiquitin-like ligase DCUN1D1 and the histone methyltransferase NSD3 were also identified as novel putative cancer driver genes. We discuss the patient tailoring implications for existing cancer

  8. The Role of Datasets on Scientific Influence within Conflict Research

    PubMed Central

    Van Holt, Tracy; Johnson, Jeffery C.; Moates, Shiloh; Carley, Kathleen M.

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving “conflict” in the Web of Science (WoS) over a 66-year period (1945–2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed—such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957–1971 where ideas didn’t persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped

  9. Future weather dataset for fourteen UK sites.

    PubMed

    Liu, Chunde

    2016-09-01

    This Future weather dataset is used for assessing the risk of overheating and thermal discomfort or heat stress in the free running buildings. The weather files are in the format of .epw which can be used in the building simulation packages such as EnergyPlus, DesignBuilder, IES, etc. PMID:27570809

  10. Bacterial clinical infectious diseases ontology (BCIDO) dataset.

    PubMed

    Gordon, Claire L; Weng, Chunhua

    2016-09-01

    This article describes the Bacterial Infectious Diseases Ontology (BCIDO) dataset related to research published in http:dx.doi.org/ 10.1016/j.jbi.2015.07.014 [1], and contains the Protégé OWL files required to run BCIDO in the Protégé environment. BCIDO contains 1719 classes and 39 object properties. PMID:27508237

  11. Thesaurus Dataset of Educational Technology in Chinese

    ERIC Educational Resources Information Center

    Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

    2015-01-01

    The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…

  12. Efficiently Finding Individuals from Video Dataset

    NASA Astrophysics Data System (ADS)

    Hao, Pengyi; Kamata, Sei-Ichiro

    We are interested in retrieving video shots or videos containing particular people from a video dataset. Owing to the large variations in pose, illumination conditions, occlusions, hairstyles and facial expressions, face tracks have recently been researched in the fields of face recognition, face retrieval and name labeling from videos. However, when the number of face tracks is very large, conventional methods, which match all or some pairs of faces in face tracks, will not be effective. Therefore, in this paper, an efficient method for finding a given person from a video dataset is presented. In our study, in according to performing research on face tracks in a single video, we also consider how to organize all the faces in videos in a dataset and how to improve the search quality in the query process. Different videos may include the same person; thus, the management of individuals in different videos will be useful for their retrieval. The proposed method includes the following three points. (i) Face tracks of the same person appearing for a period in each video are first connected on the basis of scene information with a time constriction, then all the people in one video are organized by a proposed hierarchical clustering method. (ii) After obtaining the organizational structure of all the people in one video, the people are organized into an upper layer by affinity propagation. (iii) Finally, in the process of querying, a remeasuring method based on the index structure of videos is performed to improve the retrieval accuracy. We also build a video dataset that contains six types of videos: films, TV shows, educational videos, interviews, press conferences and domestic activities. The formation of face tracks in the six types of videos is first researched, then experiments are performed on this video dataset containing more than 1 million faces and 218,786 face tracks. The results show that the proposed approach has high search quality and a short search time.

  13. A high-throughput system for high-quality tomographic reconstruction of large datasets at Diamond Light Source.

    PubMed

    Atwood, Robert C; Bodey, Andrew J; Price, Stephen W T; Basham, Mark; Drakopoulos, Michael

    2015-06-13

    Tomographic datasets collected at synchrotrons are becoming very large and complex, and, therefore, need to be managed efficiently. Raw images may have high pixel counts, and each pixel can be multidimensional and associated with additional data such as those derived from spectroscopy. In time-resolved studies, hundreds of tomographic datasets can be collected in sequence, yielding terabytes of data. Users of tomographic beamlines are drawn from various scientific disciplines, and many are keen to use tomographic reconstruction software that does not require a deep understanding of reconstruction principles. We have developed Savu, a reconstruction pipeline that enables users to rapidly reconstruct data to consistently create high-quality results. Savu is designed to work in an 'orthogonal' fashion, meaning that data can be converted between projection and sinogram space throughout the processing workflow as required. The Savu pipeline is modular and allows processing strategies to be optimized for users' purposes. In addition to the reconstruction algorithms themselves, it can include modules for identification of experimental problems, artefact correction, general image processing and data quality assessment. Savu is open source, open licensed and 'facility-independent': it can run on standard cluster infrastructure at any institution. PMID:25939626

  14. A computationally efficient Bayesian sequential simulation approach for the assimilation of vast and diverse hydrogeophysical datasets

    NASA Astrophysics Data System (ADS)

    Nussbaumer, Raphaël; Gloaguen, Erwan; Mariéthoz, Grégoire; Holliger, Klaus

    2016-04-01

    Bayesian sequential simulation (BSS) is a powerful geostatistical technique, which notably has shown significant potential for the assimilation of datasets that are diverse with regard to the spatial resolution and their relationship. However, these types of applications of BSS require a large number of realizations to adequately explore the solution space and to assess the corresponding uncertainties. Moreover, such simulations generally need to be performed on very fine grids in order to adequately exploit the technique's potential for characterizing heterogeneous environments. Correspondingly, the computational cost of BSS algorithms in their classical form is very high, which so far has limited an effective application of this method to large models and/or vast datasets. In this context, it is also important to note that the inherent assumption regarding the independence of the considered datasets is generally regarded as being too strong in the context of sequential simulation. To alleviate these problems, we have revisited the classical implementation of BSS and incorporated two key features to increase the computational efficiency. The first feature is a combined quadrant spiral - superblock search, which targets run-time savings on large grids and adds flexibility with regard to the selection of neighboring points using equal directional sampling and treating hard data and previously simulated points separately. The second feature is a constant path of simulation, which enhances the efficiency for multiple realizations. We have also modified the aggregation operator to be more flexible with regard to the assumption of independence of the considered datasets. This is achieved through log-linear pooling, which essentially allows for attributing weights to the various data components. Finally, a multi-grid simulating path was created to enforce large-scale variance and to allow for adapting parameters, such as, for example, the log-linear weights or the type

  15. A new method for evaluating age distributions of detrital zircon datasets by incorporating discordant data

    NASA Astrophysics Data System (ADS)

    Reimink, Jesse; Davies, Joshua; Rojas, Xavier; Waldron, John

    2015-04-01

    U-Pb ages from detrital zircons play an important role in sediment provenance studies. However, U-Pb ages from detrital zircon populations often contain a discordant component, which is traditionally removed before the age data are interpreted. Many different processes can create discordant analyses, with the most important being Pb-loss and mixing of distinct zircon age domains during analysis. Discordant ages contain important information regarding the history of a detrital zircon population, for example the timing of Pb-loss or metamorphism, and removing these analyses may significantly bias a zircon dataset. Here we present a new technique for analyzing detrital zircon populations that uses all U-Pb analyses, independent of discordance. We have developed computer code that evaluates the relative likelihood of discordia lines based on their proximity to discordant data points. When two or more data points lie on or near a discordia line the likelihood associated with that line increases. The upper and lower intercepts of each discordia line, as well as the relative likelihood along that line, are stored, and the likelihood of upper and lower intercepts are plotted with age. There are many benefits to using this technique for analysis of detrital zircon datasets. By utilizing the discordant analyses we allow for the addition of upper and lower intercept information to conventional analysis techniques (i.e. probability density functions or kernel density estimators). We are then able to use a much stricter discordance filter (e.g. < 3%) when analyzing 'concordant' data, thereby increasing the reliability of Pb/Pb ages used in the traditional analysis. Additionally, by not rejecting discordant data from zircon datasets we potentially reduce the overall bias in the analysis, which is a critical step in detrital zircon studies. This new technique is relatively quick and uses traditional analytical results, while the upper and lower intercept information is obtained

  16. FTSPlot: fast time series visualization for large datasets.

    PubMed

    Riss, Michael

    2014-01-01

    The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of O(n x log(N)); the visualization itself can be done with a complexity of O(1) and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with < 20 ms ms. The current 64-bit implementation theoretically supports datasets with up to 2(64) bytes, on the x86_64 architecture currently up to 2(48) bytes are supported, and benchmarks have been conducted with 2(40) bytes/1 TiB or 1.3 x 10(11) double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.

  17. FTSPlot: Fast Time Series Visualization for Large Datasets

    PubMed Central

    Riss, Michael

    2014-01-01

    The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of ; the visualization itself can be done with a complexity of and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with ms. The current 64-bit implementation theoretically supports datasets with up to bytes, on the x86_64 architecture currently up to bytes are supported, and benchmarks have been conducted with bytes/1 TiB or double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments. PMID:24732865

  18. Spatially-based quality control for daily precipitation datasets

    NASA Astrophysics Data System (ADS)

    Serrano-Notivoli, Roberto; de Luis, Martín; Beguería, Santiago; Ángel Saz, Miguel

    2016-04-01

    There are many reasons why wrong data can appear in original precipitation datasets but their common characteristic is that all of them do not correspond to the natural variability of the climate variable. For this reason, is necessary a comprehensive analysis of the data of each station in each day, to be certain that the final dataset will be consistent and reliable. Most of quality control techniques applied over daily precipitation are based on the comparison of each observed value with the rest of values in same series or in reference series built from its nearest stations. These methods are inherited from monthly precipitation studies, but in daily scale the variability is bigger and the methods have to be different. A common character shared by all of these approaches is that they made reconstructions based on the best-correlated reference series, which could be a biased decision because, for example, a extreme precipitation occurred in one day in more than one station could be flagged as erroneous. We propose a method based on the specific conditions of the day and location to determine the reliability of each observation. This method keeps the local variance of the variable and the time-structure independence. To do that, individually for each daily value, we first compute the probability of precipitation occurrence through a multivariate logistic regression using the 10 nearest observations in a binomial mode (0=dry; 1=wet), this produces a binomial prediction (PB) between 0 and 1. Then, we compute a prediction of precipitation magnitude (PM) with the raw data of the same 10 nearest observations. Through these predictions we explore the original data in each day and location by five criteria: 1) Suspect data; 2) Suspect zero; 3) Suspect outlier; 4) Suspect wet and 5) Suspect dry. Tests over different datasets addressed that flagged data depend mainly on the number of available data and the homogeneous distribution of them.

  19. Method of generating features optimal to a dataset and classifier

    DOEpatents

    Bruillard, Paul J.; Gosink, Luke J.; Jarman, Kenneth D.

    2016-10-18

    A method of generating features optimal to a particular dataset and classifier is disclosed. A dataset of messages is inputted and a classifier is selected. An algebra of features is encoded. Computable features that are capable of describing the dataset from the algebra of features are selected. Irredundant features that are optimal for the classifier and the dataset are selected.

  20. 3DSEM: A 3D microscopy dataset.

    PubMed

    Tafti, Ahmad P; Kirkpatrick, Andrew B; Holz, Jessica D; Owen, Heather A; Yu, Zeyun

    2016-03-01

    The Scanning Electron Microscope (SEM) as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples. PMID:26779561

  1. 3DSEM: A 3D microscopy dataset

    PubMed Central

    Tafti, Ahmad P.; Kirkpatrick, Andrew B.; Holz, Jessica D.; Owen, Heather A.; Yu, Zeyun

    2015-01-01

    The Scanning Electron Microscope (SEM) as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples. PMID:26779561

  2. Global Precipitation Measurement: Methods, Datasets and Applications

    NASA Technical Reports Server (NTRS)

    Tapiador, Francisco; Turk, Francis J.; Petersen, Walt; Hou, Arthur Y.; Garcia-Ortega, Eduardo; Machado, Luiz, A. T.; Angelis, Carlos F.; Salio, Paola; Kidd, Chris; Huffman, George J.; De Castro, Manuel

    2011-01-01

    This paper reviews the many aspects of precipitation measurement that are relevant to providing an accurate global assessment of this important environmental parameter. Methods discussed include ground data, satellite estimates and numerical models. First, the methods for measuring, estimating, and modeling precipitation are discussed. Then, the most relevant datasets gathering precipitation information from those three sources are presented. The third part of the paper illustrates a number of the many applications of those measurements and databases. The aim of the paper is to organize the many links and feedbacks between precipitation measurement, estimation and modeling, indicating the uncertainties and limitations of each technique in order to identify areas requiring further attention, and to show the limits within which datasets can be used.

  3. Detecting Novel Associations in Large Datasets

    PubMed Central

    Reshef, David N.; Reshef, Yakir A.; Finucane, Hilary K.; Grossman, Sharon R.; McVean, Gilean; Turnbaugh, Peter J.; Lander, Eric S.; Mitzenmacher, Michael; Sabeti, Pardis C.

    2012-01-01

    Identifying interesting relationships between pairs of variables in large datasets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to datasets in global health, gene expression, major-league baseball, and the human gut microbiota, and identify known and novel relationships. PMID:22174245

  4. 3DSEM: A 3D microscopy dataset.

    PubMed

    Tafti, Ahmad P; Kirkpatrick, Andrew B; Holz, Jessica D; Owen, Heather A; Yu, Zeyun

    2016-03-01

    The Scanning Electron Microscope (SEM) as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples.

  5. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Astronomy Data Centre, Canadian

    2014-01-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors, and the local outlier factor. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.

  6. Scalable Machine Learning for Massive Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Gray, A.

    2014-04-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex

  7. Integrative analysis of multiple diverse omics datasets by sparse group multitask regression.

    PubMed

    Lin, Dongdong; Zhang, Jigang; Li, Jingyao; He, Hao; Deng, Hong-Wen; Wang, Yu-Ping

    2014-01-01

    A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms, and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: (1) treat the biomarker identification in each single study as a task and then combine them by multitask learning; (2) group variables from all studies for identifying significant genes; (3) enforce sparse constraint on groups of variables to overcome the "small sample, but large variables" problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E, and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed from other

  8. Data Assimilation and Model Evaluation Experiment Datasets.

    NASA Astrophysics Data System (ADS)

    Lai, Chung-Chieng A.; Qian, Wen; Glenn, Scott M.

    1994-05-01

    The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMÉE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets.The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: 1)collection of observational data; 2) analysis and interpretation; 3) interpolation using the Optimum Thermal Interpolation System package; 4) quality control and re-analysis; and 5) data archiving and software documentation.The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement.Suggestions for DAMEE data usages include 1) ocean modeling and data assimilation studies, 2) diagnosis and theorectical studies, and 3) comparisons with locally detailed observations.

  9. Projecting global datasets to achieve equal areas

    USGS Publications Warehouse

    Usery, E.L.; Finn, M.P.; Cox, J.D.; Beard, T.; Ruhl, S.; Bearden, M.

    2003-01-01

    Scientists routinely accomplish global modeling in the raster domain, but recent research has indicated that the transformation of large areas through map projection equations leads to errors. This research attempts to gauge the extent of map projection and resampling effects on the tabulation of categorical areas by comparing the results of three datasets for seven common projections. The datasets, Global Land Cover, Holdridge Life Zones, and Global Vegetation, were compiled at resolutions of 30 arc-second, 1/2 degree, and 1 degree, respectively. These datasets were projected globally from spherical coordinates to plane representations. Results indicate significant problems in the implementation of global projection transformations in commercial software, as well as differences in areal accuracy across projections. The level of raster resolution directly affects the accuracy of areal tabulations, with higher resolution yielding higher accuracy. If the raster resolution is high enough for individual pixels to approximate points, the areal error tends to zero. The 30-arc-second cells appear to approximate this condition.

  10. The combined inhibitory effect of the adenosine A1 and cannabinoid CB1 receptors on cAMP accumulation in the hippocampus is additive and independent of A1 receptor desensitization.

    PubMed

    Serpa, André; Correia, Sara; Ribeiro, Joaquim A; Sebastião, Ana M; Cascalheira, José F

    2015-01-01

    Adenosine A1 and cannabinoid CB1 receptors are highly expressed in hippocampus where they trigger similar transduction pathways. We investigated how the combined acute activation of A1 and CB1 receptors modulates cAMP accumulation in rat hippocampal slices. The CB1 agonist WIN55212-2 (0.3-30 μM) decreased forskolin-stimulated cAMP accumulation with an EC50 of 6.6±2.7 μM and an Emax of 31%±2%, whereas for the A1 agonist, N6-cyclopentyladenosine (CPA, 10-150 nM), an EC50 of 35±19 nM, and an Emax of 29%±5 were obtained. The combined inhibitory effect of WIN55212-2 (30 μM) and CPA (100 nM) on cAMP accumulation was 41%±6% (n=4), which did not differ (P>0.7) from the sum of the individual effects of each agonist (43%±8%) but was different (P<0.05) from the effects of CPA or WIN55212-2 alone. Preincubation with CPA (100 nM) for 95 min caused desensitization of adenosine A1 activity, which did not modify the effect of WIN55212-2 (30 μM) on cAMP accumulation. In conclusion, the combined effect of CB1 and A1 receptors on cAMP formation is additive and CB1 receptor activity is not affected by short-term A1 receptor desensitization.

  11. Quantifying uncertainty in observational rainfall datasets

    NASA Astrophysics Data System (ADS)

    Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

    2015-04-01

    The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded

  12. Likelihood-based population independent component analysis

    PubMed Central

    Eloyan, Ani; Crainiceanu, Ciprian M.; Caffo, Brian S.

    2013-01-01

    Independent component analysis (ICA) is a widely used technique for blind source separation, used heavily in several scientific research areas including acoustics, electrophysiology, and functional neuroimaging. We propose a scalable two-stage iterative true group ICA methodology for analyzing population level functional magnetic resonance imaging (fMRI) data where the number of subjects is very large. The method is based on likelihood estimators of the underlying source densities and the mixing matrix. As opposed to many commonly used group ICA algorithms, the proposed method does not require significant data reduction by a 2-fold singular value decomposition. In addition, the method can be applied to a large group of subjects since the memory requirements are not restrictive. The performance of our approach is compared with a commonly used group ICA algorithm via simulation studies. Furthermore, the proposed method is applied to a large collection of resting state fMRI datasets. The results show that established brain networks are well recovered by the proposed algorithm. PMID:23314416

  13. The Development of a Noncontact Letter Input Interface “Fingual” Using Magnetic Dataset

    NASA Astrophysics Data System (ADS)

    Fukushima, Taishi; Miyazaki, Fumio; Nishikawa, Atsushi

    We have newly developed a noncontact letter input interface called “Fingual”. Fingual uses a glove mounted with inexpensive and small magnetic sensors. Using the glove, users can input letters to form the finger alphabets, a kind of sign language. The proposed method uses some dataset which consists of magnetic field and the corresponding letter information. In this paper, we show two recognition methods using the dataset. First method uses Euclidean norm, and second one additionally uses Gaussian function as a weighting function. Then we conducted verification experiments for the recognition rate of each method in two situations. One of the situations is that subjects used their own dataset; the other is that they used another person's dataset. As a result, the proposed method could recognize letters with a high rate in both situations, even though it is better to use their own dataset than to use another person's dataset. Though Fingual needs to collect magnetic dataset for each letter in advance, its feature is the ability to recognize letters without the complicated calculations such as inverse problems. This paper shows results of the recognition experiments, and shows the utility of the proposed system “Fingual”.

  14. Development of a SPARK Training Dataset

    SciTech Connect

    Sayre, Amanda M.; Olson, Jarrod R.

    2015-03-01

    In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the

  15. Phosphazene additives

    DOEpatents

    Harrup, Mason K; Rollins, Harry W

    2013-11-26

    An additive comprising a phosphazene compound that has at least two reactive functional groups and at least one capping functional group bonded to phosphorus atoms of the phosphazene compound. One of the at least two reactive functional groups is configured to react with cellulose and the other of the at least two reactive functional groups is configured to react with a resin, such as an amine resin of a polycarboxylic acid resin. The at least one capping functional group is selected from the group consisting of a short chain ether group, an alkoxy group, or an aryloxy group. Also disclosed are an additive-resin admixture, a method of treating a wood product, and a wood product.

  16. Potlining Additives

    SciTech Connect

    Rudolf Keller

    2004-08-10

    In this project, a concept to improve the performance of aluminum production cells by introducing potlining additives was examined and tested. Boron oxide was added to cathode blocks, and titanium was dissolved in the metal pool; this resulted in the formation of titanium diboride and caused the molten aluminum to wet the carbonaceous cathode surface. Such wetting reportedly leads to operational improvements and extended cell life. In addition, boron oxide suppresses cyanide formation. This final report presents and discusses the results of this project. Substantial economic benefits for the practical implementation of the technology are projected, especially for modern cells with graphitized blocks. For example, with an energy savings of about 5% and an increase in pot life from 1500 to 2500 days, a cost savings of $ 0.023 per pound of aluminum produced is projected for a 200 kA pot.

  17. National Hydropower Plant Dataset, Version 1

    DOE Data Explorer

    Samu, Nicole; Kao, Shih-Chieh; O'Connor, Patrick

    2016-09-30

    The 2016 National Hydropower Plant Dataset, Version 1, includes geospatial point-level locations and key characteristics of online existing hydropower plants in the United States that are currently licensed, exempt, or awaiting relicensing. These data are a subset extracted from NHAAP’s Existing Hydropower Assets (EHA) internal database, which is a cornerstone of NHAAP’s EHA effort that has supported multiple U.S. hydropower R&D research initiatives related to market acceleration, environmental impact reduction, technology-to-market activities, and climate change impact assessment. For more information on NHAAP’s EHA effort, please visit the project web page at: http://nhaap.ornl.gov/existing-hydropower.

  18. COPS: Detecting Co-Occurrence and Spatial Arrangement of Transcription Factor Binding Motifs in Genome-Wide Datasets

    PubMed Central

    Lohmann, Ingrid

    2012-01-01

    In multi-cellular organisms, spatiotemporal activity of cis-regulatory DNA elements depends on their occupancy by different transcription factors (TFs). In recent years, genome-wide ChIP-on-Chip, ChIP-Seq and DamID assays have been extensively used to unravel the combinatorial interaction of TFs with cis-regulatory modules (CRMs) in the genome. Even though genome-wide binding profiles are increasingly becoming available for different TFs, single TF binding profiles are in most cases not sufficient for dissecting complex regulatory networks. Thus, potent computational tools detecting statistically significant and biologically relevant TF-motif co-occurrences in genome-wide datasets are essential for analyzing context-dependent transcriptional regulation. We have developed COPS (Co-Occurrence Pattern Search), a new bioinformatics tool based on a combination of association rules and Markov chain models, which detects co-occurring TF binding sites (BSs) on genomic regions of interest. COPS scans DNA sequences for frequent motif patterns using a Frequent-Pattern tree based data mining approach, which allows efficient performance of the software with respect to both data structure and implementation speed, in particular when mining large datasets. Since transcriptional gene regulation very often relies on the formation of regulatory protein complexes mediated by closely adjoining TF binding sites on CRMs, COPS additionally detects preferred short distance between co-occurring TF motifs. The performance of our software with respect to biological significance was evaluated using three published datasets containing genomic regions that are independently bound by several TFs involved in a defined biological process. In sum, COPS is a fast, efficient and user-friendly tool mining statistically and biologically significant TFBS co-occurrences and therefore allows the identification of TFs that combinatorially regulate gene expression. PMID:23272209

  19. Detecting Corresponding Vertex Pairs between Planar Tessellation Datasets with Agglomerative Hierarchical Cell-Set Matching

    PubMed Central

    Huh, Yong; Yu, Kiyun; Park, Woojin

    2016-01-01

    This paper proposes a method to detect corresponding vertex pairs between planar tessellation datasets. Applying an agglomerative hierarchical co-clustering, the method finds geometrically corresponding cell-set pairs from which corresponding vertex pairs are detected. Then, the map transformation is performed with the vertex pairs. Since these pairs are independently detected for each corresponding cell-set pairs, the method presents improved matching performance regardless of locally uneven positional discrepancies between dataset. The proposed method was applied to complicated synthetic cell datasets assumed as a cadastral map and a topographical map, and showed an improved result with the F-measures of 0.84 comparing to a previous matching method with the F-measure of 0.48. PMID:27348229

  20. Detecting Corresponding Vertex Pairs between Planar Tessellation Datasets with Agglomerative Hierarchical Cell-Set Matching.

    PubMed

    Huh, Yong; Yu, Kiyun; Park, Woojin

    2016-01-01

    This paper proposes a method to detect corresponding vertex pairs between planar tessellation datasets. Applying an agglomerative hierarchical co-clustering, the method finds geometrically corresponding cell-set pairs from which corresponding vertex pairs are detected. Then, the map transformation is performed with the vertex pairs. Since these pairs are independently detected for each corresponding cell-set pairs, the method presents improved matching performance regardless of locally uneven positional discrepancies between dataset. The proposed method was applied to complicated synthetic cell datasets assumed as a cadastral map and a topographical map, and showed an improved result with the F-measures of 0.84 comparing to a previous matching method with the F-measure of 0.48. PMID:27348229

  1. Evaluation of anomalies in GLDAS-1996 dataset.

    PubMed

    Zhou, Xinyao; Zhang, Yongqiang; Yang, Yonghui; Yang, Yanmin; Han, Shumin

    2013-01-01

    Global Land Data Assimilation System (GLDAS) data are widely used for land-surface flux simulations. Therefore, the simulation accuracy using GLDAS dataset is largely contingent upon the accuracy of the GLDAS dataset. It is found that GLDAS land-surface model simulated runoff exhibits strong anomalies for 1996. These anomalies are investigated by evaluating four GLDAS meteorological forcing data (precipitation, air temperature, downward shortwave radiation and downward longwave radiation) in six large basins across the world (Danube, Mississippi, Yangtze, Congo, Amazon and Murray-Darling basins). Precipitation data from the Global Precipitation Climatology Centre (GPCC) are also compared with GLDAS forcing precipitation data. Large errors and lack of monthly variability in GLDAS-1996 precipitation data are the main sources for the anomalies in the simulated runoff. The impact of the precipitation data on simulated runoff for 1996 is investigated with the Community Atmosphere Biosphere Land Exchange (CABLE) land-surface model in the Yangtze basin, for which area high-quality local precipitation data are obtained from the China Meteorological Administration (CMA). The CABLE model is driven by GLDAS daily precipitation data and CMA daily precipitation, respectively. The simulated daily and monthly runoffs obtained from CMA data are noticeably better than those obtained from GLDAS data, suggesting that GLDAS-1996 precipitation data are not so reliable for land-surface flux simulations. PMID:23579825

  2. Lifting Object Detection Datasets into 3D.

    PubMed

    Carreira, Joao; Vicente, Sara; Agapito, Lourdes; Batista, Jorge

    2016-07-01

    While data has certainly taken the center stage in computer vision in recent years, it can still be difficult to obtain in certain scenarios. In particular, acquiring ground truth 3D shapes of objects pictured in 2D images remains a challenging feat and this has hampered progress in recognition-based object reconstruction from a single image. Here we propose to bypass previous solutions such as 3D scanning or manual design, that scale poorly, and instead populate object category detection datasets semi-automatically with dense, per-object 3D reconstructions, bootstrapped from:(i) class labels, (ii) ground truth figure-ground segmentations and (iii) a small set of keypoint annotations. Our proposed algorithm first estimates camera viewpoint using rigid structure-from-motion and then reconstructs object shapes by optimizing over visual hull proposals guided by loose within-class shape similarity assumptions. The visual hull sampling process attempts to intersect an object's projection cone with the cones of minimal subsets of other similar objects among those pictured from certain vantage points. We show that our method is able to produce convincing per-object 3D reconstructions and to accurately estimate cameras viewpoints on one of the most challenging existing object-category detection datasets, PASCAL VOC. We hope that our results will re-stimulate interest on joint object recognition and 3D reconstruction from a single image. PMID:27295458

  3. Land cover trends dataset, 1973-2000

    USGS Publications Warehouse

    Soulard, Christopher E.; Acevedo, William; Auch, Roger F.; Sohl, Terry L.; Drummond, Mark A.; Sleeter, Benjamin M.; Sorenson, Daniel G.; Kambly, Steven; Wilson, Tamara S.; Taylor, Janis L.; Sayler, Kristi L.; Stier, Michael P.; Barnes, Christopher A.; Methven, Steven C.; Loveland, Thomas R.; Headley, Rachel; Brooks, Mark S.

    2014-01-01

    The U.S. Geological Survey Land Cover Trends Project is releasing a 1973–2000 time-series land-use/land-cover dataset for the conterminous United States. The dataset contains 5 dates of land-use/land-cover data for 2,688 sample blocks randomly selected within 84 ecological regions. The nominal dates of the land-use/land-cover maps are 1973, 1980, 1986, 1992, and 2000. The land-use/land-cover maps were classified manually from Landsat Multispectral Scanner, Thematic Mapper, and Enhanced Thematic Mapper Plus imagery using a modified Anderson Level I classification scheme. The resulting land-use/land-cover data has a 60-meter resolution and the projection is set to Albers Equal-Area Conic, North American Datum of 1983. The files are labeled using a standard file naming convention that contains the number of the ecoregion, sample block, and Landsat year. The downloadable files are organized by ecoregion, and are available in the ERDAS IMAGINETM (.img) raster file format.

  4. Evaluation of anomalies in GLDAS-1996 dataset.

    PubMed

    Zhou, Xinyao; Zhang, Yongqiang; Yang, Yonghui; Yang, Yanmin; Han, Shumin

    2013-01-01

    Global Land Data Assimilation System (GLDAS) data are widely used for land-surface flux simulations. Therefore, the simulation accuracy using GLDAS dataset is largely contingent upon the accuracy of the GLDAS dataset. It is found that GLDAS land-surface model simulated runoff exhibits strong anomalies for 1996. These anomalies are investigated by evaluating four GLDAS meteorological forcing data (precipitation, air temperature, downward shortwave radiation and downward longwave radiation) in six large basins across the world (Danube, Mississippi, Yangtze, Congo, Amazon and Murray-Darling basins). Precipitation data from the Global Precipitation Climatology Centre (GPCC) are also compared with GLDAS forcing precipitation data. Large errors and lack of monthly variability in GLDAS-1996 precipitation data are the main sources for the anomalies in the simulated runoff. The impact of the precipitation data on simulated runoff for 1996 is investigated with the Community Atmosphere Biosphere Land Exchange (CABLE) land-surface model in the Yangtze basin, for which area high-quality local precipitation data are obtained from the China Meteorological Administration (CMA). The CABLE model is driven by GLDAS daily precipitation data and CMA daily precipitation, respectively. The simulated daily and monthly runoffs obtained from CMA data are noticeably better than those obtained from GLDAS data, suggesting that GLDAS-1996 precipitation data are not so reliable for land-surface flux simulations.

  5. Independent Peer Reviews

    SciTech Connect

    2012-03-16

    Independent Assessments: DOE's Systems Integrator convenes independent technical reviews to gauge progress toward meeting specific technical targets and to provide technical information necessary for key decisions.

  6. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    PubMed

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE. PMID:25247298

  7. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    PubMed

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  8. Multiresolution comparison of precipitation datasets for large-scale models

    NASA Astrophysics Data System (ADS)

    Chun, K. P.; Sapriza Azuri, G.; Davison, B.; DeBeer, C. M.; Wheater, H. S.

    2014-12-01

    Gridded precipitation datasets are crucial for driving large-scale models which are related to weather forecast and climate research. However, the quality of precipitation products is usually validated individually. Comparisons between gridded precipitation products along with ground observations provide another avenue for investigating how the precipitation uncertainty would affect the performance of large-scale models. In this study, using data from a set of precipitation gauges over British Columbia and Alberta, we evaluate several widely used North America gridded products including the Canadian Gridded Precipitation Anomalies (CANGRD), the National Center for Environmental Prediction (NCEP) reanalysis, the Water and Global Change (WATCH) project, the thin plate spline smoothing algorithms (ANUSPLIN) and Canadian Precipitation Analysis (CaPA). Based on verification criteria for various temporal and spatial scales, results provide an assessment of possible applications for various precipitation datasets. For long-term climate variation studies (~100 years), CANGRD, NCEP, WATCH and ANUSPLIN have different comparative advantages in terms of their resolution and accuracy. For synoptic and mesoscale precipitation patterns, CaPA provides appealing performance of spatial coherence. In addition to the products comparison, various downscaling methods are also surveyed to explore new verification and bias-reduction methods for improving gridded precipitation outputs for large-scale models.

  9. A comparison of clustering methods for biogeography with fossil datasets

    PubMed Central

    2016-01-01

    Cluster analysis is one of the most commonly used methods in palaeoecological studies, particularly in studies investigating biogeographic patterns. Although a number of different clustering methods are widely used, the approach and underlying assumptions of many of these methods are quite different. For example, methods may be hierarchical or non-hierarchical in their approaches, and may use Euclidean distance or non-Euclidean indices to cluster the data. In order to assess the effectiveness of the different clustering methods as compared to one another, a simulation was designed that could assess each method over a range of both cluster distinctiveness and sampling intensity. Additionally, a non-hierarchical, non-Euclidean, iterative clustering method implemented in the R Statistical Language is described. This method, Non-Euclidean Relational Clustering (NERC), creates distinct clusters by dividing the data set in order to maximize the average similarity within each cluster, identifying clusters in which each data point is on average more similar to those within its own group than to those in any other group. While all the methods performed well with clearly differentiated and well-sampled datasets, when data are less than ideal the linkage methods perform poorly compared to non-Euclidean based k-means and the NERC method. Based on this analysis, Unweighted Pair Group Method with Arithmetic Mean and neighbor joining methods are less reliable with incomplete datasets like those found in palaeobiological analyses, and the k-means and NERC methods should be used in their place. PMID:26966658

  10. Publicly Releasing a Large Simulation Dataset with NDS Labs

    NASA Astrophysics Data System (ADS)

    Goldbaum, Nathan

    2016-03-01

    Optimally, all publicly funded research should be accompanied by the tools, code, and data necessary to fully reproduce the analysis performed in journal articles describing the research. This ideal can be difficult to attain, particularly when dealing with large (>10 TB) simulation datasets. In this lightning talk, we describe the process of publicly releasing a large simulation dataset to accompany the submission of a journal article. The simulation was performed using Enzo, an open source, community-developed N-body/hydrodynamics code and was analyzed using a wide range of community- developed tools in the scientific Python ecosystem. Although the simulation was performed and analyzed using an ecosystem of sustainably developed tools, we enable sustainable science using our data by making it publicly available. Combining the data release with the NDS Labs infrastructure allows a substantial amount of added value, including web-based access to analysis and visualization using the yt analysis package through an IPython notebook interface. In addition, we are able to accompany the paper submission to the arXiv preprint server with links to the raw simulation data as well as interactive real-time data visualizations that readers can explore on their own or share with colleagues during journal club discussions. It is our hope that the value added by these services will substantially increase the impact and readership of the paper.

  11. Benchmarking Undedicated Cloud Computing Providers for Analysis of Genomic Datasets

    PubMed Central

    Yazar, Seyhan; Gooden, George E. C.; Mackey, David A.; Hewitt, Alex W.

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5–78.2) for E.coli and 53.5% (95% CI: 34.4–72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5–303.1) and 173.9% (95% CI: 134.6–213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE. PMID:25247298

  12. SAGE Research Methods Datasets: A Data Analysis Educational Tool.

    PubMed

    Vardell, Emily

    2016-01-01

    SAGE Research Methods Datasets (SRMD) is an educational tool designed to offer users the opportunity to obtain hands-on experience with data analysis. Users can search for and browse authentic datasets by method, discipline, and data type. Each of the datasets are supplemented with educational material on the research method and clear guidelines for how to approach data analysis. PMID:27391182

  13. SAGE Research Methods Datasets: A Data Analysis Educational Tool.

    PubMed

    Vardell, Emily

    2016-01-01

    SAGE Research Methods Datasets (SRMD) is an educational tool designed to offer users the opportunity to obtain hands-on experience with data analysis. Users can search for and browse authentic datasets by method, discipline, and data type. Each of the datasets are supplemented with educational material on the research method and clear guidelines for how to approach data analysis.

  14. Dataset-Driven Research to Support Learning and Knowledge Analytics

    ERIC Educational Resources Information Center

    Verbert, Katrien; Manouselis, Nikos; Drachsler, Hendrik; Duval, Erik

    2012-01-01

    In various research areas, the availability of open datasets is considered as key for research and application purposes. These datasets are used as benchmarks to develop new algorithms and to compare them to other algorithms in given settings. Finding such available datasets for experimentation can be a challenging task in technology enhanced…

  15. National hydrography dataset--linear referencing

    USGS Publications Warehouse

    Simley, Jeffrey; Doumbouya, Ariel

    2012-01-01

    Geospatial data normally have a certain set of standard attributes, such as an identification number, the type of feature, and name of the feature. These standard attributes are typically embedded into the default attribute table, which is directly linked to the geospatial features. However, it is impractical to embed too much information because it can create a complex, inflexible, and hard to maintain geospatial dataset. Many scientists prefer to create a modular, or relational, data design where the information about the features is stored and maintained separately, then linked to the geospatial data. For example, information about the water chemistry of a lake can be maintained in a separate file and linked to the lake. A Geographic Information System (GIS) can then relate the water chemistry to the lake and analyze it as one piece of information. For example, the GIS can select all lakes more than 50 acres, with turbidity greater than 1.5 milligrams per liter.

  16. Internationally coordinated glacier monitoring: strategy and datasets

    NASA Astrophysics Data System (ADS)

    Hoelzle, Martin; Armstrong, Richard; Fetterer, Florence; Gärtner-Roer, Isabelle; Haeberli, Wilfried; Kääb, Andreas; Kargel, Jeff; Nussbaumer, Samuel; Paul, Frank; Raup, Bruce; Zemp, Michael

    2014-05-01

    (c) the Randolph Glacier Inventory (RGI), a new and globally complete digital dataset of outlines from about 180,000 glaciers with some meta-information, which has been used for many applications relating to the IPCC AR5 report. Concerning glacier changes, a database (Fluctuations of Glaciers) exists containing information about mass balance, front variations including past reconstructed time series, geodetic changes and special events. Annual mass balance reporting contains information for about 125 glaciers with a subset of 37 glaciers with continuous observational series since 1980 or earlier. Front variation observations of around 1800 glaciers are available from most of the mountain ranges world-wide. This database was recently updated with 26 glaciers having an unprecedented dataset of length changes from from reconstructions of well-dated historical evidence going back as far as the 16th century. Geodetic observations of about 430 glaciers are available. The database is completed by a dataset containing information on special events including glacier surges, glacier lake outbursts, ice avalanches, eruptions of ice-clad volcanoes, etc. related to about 200 glaciers. A special database of glacier photographs contains 13,000 pictures from around 500 glaciers, some of them dating back to the 19th century. A key challenge is to combine and extend the traditional observations with fast evolving datasets from new technologies.

  17. VAST Contest Dataset Use in Education

    SciTech Connect

    Whiting, Mark A.; North, Chris; Endert, Alexander; Scholtz, Jean; Haack, Jereme N.; Varley, Caroline F.; Thomas, James J.

    2009-12-13

    The IEEE Visual Analytics Science and Technology (VAST) Symposium has held a contest each year since its inception in 2006. These events are designed to provide visual analytics researchers and developers with analytic challenges similar to those encountered by professional information analysts. The VAST contest has had an extended life outside of the symposium, however, as materials are being used in universities and other educational settings, either to help teachers of visual analytics-related classes or for student projects. We describe how we develop VAST contest datasets that results in products that can be used in different settings and review some specific examples of the adoption of the VAST contest materials in the classroom. The examples are drawn from graduate and undergraduate courses at Virginia Tech and from the Visual Analytics "Summer Camp" run by the National Visualization and Analytics Center in 2008. We finish with a brief discussion on evaluation metrics for education

  18. LIMS Version 6 Level 3 Dataset

    NASA Technical Reports Server (NTRS)

    Remsberg, Ellis E.; Lingenfelser, Gretchen

    2010-01-01

    This report describes the Limb Infrared Monitor of the Stratosphere (LIMS) Version 6 (V6) Level 3 data products and the assumptions used for their generation. A sequential estimation algorithm was used to obtain daily, zonal Fourier coefficients of the several parameters of the LIMS dataset for 216 days of 1978-79. The coefficients are available at up to 28 pressure levels and at every two degrees of latitude from 64 S to 84 N and at the synoptic time of 12 UT. Example plots were prepared and archived from the data at 10 hPa of January 1, 1979, to illustrate the overall coherence of the features obtained with the LIMS-retrieved parameters.

  19. Visualization of cosmological particle-based datasets.

    PubMed

    Navratil, Paul; Johnson, Jarrett; Bromm, Volker

    2007-01-01

    We describe our visualization process for a particle-based simulation of the formation of the first stars and their impact on cosmic history. The dataset consists of several hundred time-steps of point simulation data, with each time-step containing approximately two million point particles. For each time-step, we interpolate the point data onto a regular grid using a method taken from the radiance estimate of photon mapping. We import the resulting regular grid representation into ParaView, with which we extract isosurfaces across multiple variables. Our images provide insights into the evolution of the early universe, tracing the cosmic transition from an initially homogeneous state to one of increasing complexity. Specifically, our visualizations capture the build-up of regions of ionized gas around the first stars, their evolution, and their complex interactions with the surrounding matter. These observations will guide the upcoming James Webb Space Telescope, the key astronomy mission of the next decade.

  20. Asteroids in the EXPLORE II Dataset

    NASA Astrophysics Data System (ADS)

    Schmoll, S.; Mallen-Ornelas, G.; Holman, M.

    2005-12-01

    The inner asteroid belt holds information about the solar system's history and future. The currently accepted theory of planet formation is that smaller rocky bodies collided and formed the planets of the inner solar system, and asteroids are relics of this past. Furthermore, near Earth objects that could potentially collide with us usually originate in the main belt. Determining the size distribution of the main-belt asteroids is key to unlocking the processes of planet formation and possible problems with near Earth objects. Here the EXtra Solar PLanet Occultation(EXPLORE) II data taken with the CFH12K mosaic CCD prime focus camera on the CFHT 3.6-m telescope are used to find the size distribution of main belt asteroids. The EXPLORE Project is an extrasolar planet detection survey that focuses on one patch of the sky per observing run. The resultant data have more observations per asteroid than any preceding deep asteroid search. Here a pipeline is presented to find the asteroids in this dataset, along with the other four EXPLORE datasets. This is done by processing the data with an image subtraction package called ISIS (Alard et al. 1997) and custom masking using IRAF. Asteroids are found using SExtractor (Bertin et al. 1996) and a set of custom C programs that detects moving objects in a series of images. Then light curves are created for each asteroid found. Sizes can be estimated based on the absolute magnitudes of the asteroids. We present absolute magnitudes and preliminary size distribution for the >52 asteroids found thus far. This Research was made possible by the NSF and SAO REU Program.

  1. Analysis Summary of an Assembled Western U.S. Dataset

    SciTech Connect

    Ryall, F

    2005-03-22

    The dataset for this report is described in Walter et al. (2004) and consists primarily of Nevada Test Site (NTS) explosions, hole collapse and earthquakes. In addition, there were several earthquakes in California and Utah; earthquakes recorded near Cataract Creek, Arizona; mine blasts at two areas in Arizona; and two mine collapses in Wyoming. In the vicinity of NTS there were mainshock/aftershock sequences at Little Skull Mt, Scotty's Junction and Hector ere mine. All the events were shallow and distances ranged from about 0.1 degree to regional distances. All of the data for these events were carefully reviewed and analyzed. In the following sections of the report, we describe analysis procedures, problems with the data and results of analysis.

  2. An Alternative Measure of Solar Activity from Detailed Sunspot Datasets

    NASA Astrophysics Data System (ADS)

    Muraközy, J.; Baranyi, T.; Ludmány, A.

    2016-05-01

    The sunspot number is analyzed by using detailed sunspot data, including aspects of observability, sunspot sizes, and proper identification of sunspot groups as discrete entities of solar activity. The tests show that in addition to the subjective factors there are also objective causes of the ambiguities in the series of sunspot numbers. To introduce an alternative solar-activity measure, the physical meaning of the sunspot number has to be reconsidered. It contains two components whose numbers are governed by different physical mechanisms and this is one source of the ambiguity. This article suggests an activity index, which is the amount of emerged magnetic flux. The only long-term proxy measure is the detailed sunspot-area dataset with proper calibration to the magnetic flux. The Debrecen sunspot databases provide an appropriate source for the establishment of the suggested activity index.

  3. Non-local gravity and comparison with observational datasets

    SciTech Connect

    Dirian, Yves; Foffa, Stefano; Kunz, Martin; Maggiore, Michele; Pettorino, Valeria E-mail: stefano.foffa@unige.ch E-mail: michele.maggiore@unige.ch

    2015-04-01

    We study the cosmological predictions of two recently proposed non-local modifications of General Relativity. Both models have the same number of parameters as ΛCDM, with a mass parameter m replacing the cosmological constant. We implement the cosmological perturbations of the non-local models into a modification of the CLASS Boltzmann code, and we make a full comparison to CMB, BAO and supernova data. We find that the non-local models fit these datasets very well, at the same level as ΛCDM. Among the vast literature on modified gravity models, this is, to our knowledge, the only example which fits data as well as ΛCDM without requiring any additional parameter. For both non-local models parameter estimation using Planck +JLA+BAO data gives a value of H{sub 0} slightly higher than in ΛCDM.

  4. EVALUATION OF LAND USE/LAND COVER DATASETS FOR URBAN WATERSHED MODELING

    SciTech Connect

    S.J. BURIAN; M.J. BROWN; T.N. MCPHERSON

    2001-08-01

    Land use/land cover (LULC) data are a vital component for nonpoint source pollution modeling. Most watershed hydrology and pollutant loading models use, in some capacity, LULC information to generate runoff and pollutant loading estimates. Simple equation methods predict runoff and pollutant loads using runoff coefficients or pollutant export coefficients that are often correlated to LULC type. Complex models use input variables and parameters to represent watershed characteristics and pollutant buildup and washoff rates as a function of LULC type. Whether using simple or complex models an accurate LULC dataset with an appropriate spatial resolution and level of detail is paramount for reliable predictions. The study presented in this paper compared and evaluated several LULC dataset sources for application in urban environmental modeling. The commonly used USGS LULC datasets have coarser spatial resolution and lower levels of classification than other LULC datasets. In addition, the USGS datasets do not accurately represent the land use in areas that have undergone significant land use change during the past two decades. We performed a watershed modeling analysis of three urban catchments in Los Angeles, California, USA to investigate the relative difference in average annual runoff volumes and total suspended solids (TSS) loads when using the USGS LULC dataset versus using a more detailed and current LULC dataset. When the two LULC datasets were aggregated to the same land use categories, the relative differences in predicted average annual runoff volumes and TSS loads from the three catchments were 8 to 14% and 13 to 40%, respectively. The relative differences did not have a predictable relationship with catchment size.

  5. Evaluation of land use/land cover datasets for urban watershed modeling

    SciTech Connect

    Burian, S. J.; Brown, M. J.; McPherson, T. N.

    2001-01-01

    Land use/land cover (LULC) data are a vital component for nonpoint source pollution modeling. Most watershed hydrology and pollutant loading models use, in some capacity, LULC information to generate runoff and pollutant loading estimates. Simple equation methods predict runoff and pollutant loads using runoff coefficients or pollutant export coefficients that are often correlated to LULC type. Complex models use input variables and parameters to represent watershed characteristics and pollutant buildup and washoff rates as a function of LULC type. Whether using simple or complex models an accurate LULC dataset with an appropriate spatial resolution and level of detail is paramount for reliable predictions. The study presented in this paper compared and evaluated several LULC dataset sources for application in urban environmental modeling. The commonly used USGS LULC datasets have coarser spatial resolution and lower levels of classification than other LULC datasets. In addition, the USGS datasets do not accurately represent the land use in areas that have undergone significant land use change during the past two decades. We performed a watershed modeling analysis of three urban catchments in Los Angeles, California, USA to investigate the relative difference in average annual runoff volumes and total suspended solids (TSS) loads when using the USGS LULC dataset versus using a more detailed and current LULC dataset. When the two LULC datasets were aggregated to the same land use categories, the relative differences in predicted average annual runoff volumes and TSS loads from the three catchments were 8 to 14% and 13 to 40%, respectively. The relative differences did not have a predictable relationship with catchment size.

  6. Dry spell characteristics over India based on IMD and APHRODITE datasets

    NASA Astrophysics Data System (ADS)

    Sushama, L.; Ben Said, S.; Khaliq, M. N.; Nagesh Kumar, D.; Laprise, R.

    2014-12-01

    Selected characteristics of dry spells and associated trends over India during the 1951-2007 period is studied using two gridded datasets: the Indian Meteorological Department (IMD) and the Asian Precipitation-Highly Resolved Observational Data Integration Towards Evaluation of the water resources (APHRODITE) datasets. Two precipitation thresholds, 1 and 3 mm, are used to define a dry day (and therefore dry spells) in this study. Comparison of the spatial patterns of the dry spell characteristics (mean number of dry days, mean number of dry spells, mean and maximum duration of dry spells) for the annual and summer monsoon period obtained with both datasets agree overall, except for the northernmost part of India. The number of dry days obtained with APHRODITE is larger for this region compared to IMD, which is consistent with the smaller precipitation for the region in APHRODITE. These differences are also visible in the spatial patterns of mean and maximum dry spell durations. Analysis of field significance associated with trends, at the level of 34 predefined meteorological subdivisions over the mainland, suggests better agreement between the two datasets in positive trends associated with number of dry days for the annual and summer monsoon period, for both thresholds. Important differences between the two datasets are noted in the field significance associated with the negative trends. While negative trends in annual maximum duration of dry spells appear field significant for the desert regions according to both datasets, they are found field significant for two regions (Punjab and South Interior Karnataka) for the monsoon period for both datasets. This study, in addition to providing information on the spatial and temporal patterns associated with dry spell characteristics, also allows identification of regions and characteristics where the two datasets agree/disagree.

  7. Evaluation of land use/land cover datasets for urban watershed modeling.

    PubMed

    Burian, S J; Brown, M J; McPherson, T N

    2002-01-01

    Land use/land cover (LULC) data are a vital component for nonpoint source pollution modeling. Most watershed hydrology and pollutant loading models use, in some capacity, LULC information to generate runoff and pollutant loading estimates. Simple equation methods predict runoff and pollutant loads using runoff coefficients or pollutant export coefficients that are often correlated to LULC type. Complex models use input variables and parameters to represent watershed characteristics and pollutant buildup and washoff rates as a function of LULC type. Whether using simple or complex models an accurate LULC dataset with an appropriate spatial resolution and level of detail is paramount for reliable predictions. The study presented in this paper compared and evaluated several LULC dataset sources for application in urban environmental modeling. The commonly used USGS LULC datasets have coarser spatial resolution and lower levels of classification than other LULC datasets. In addition, the USGS datasets do not accurately represent the land use in areas that have undergone significant land use change during the past two decades. We performed a watershed modeling analysis of three urban catchments in Los Angeles, California, USA to investigate the relative difference in average annual runoff volumes and total suspended solids (TSS) loads when using the USGS LULC dataset versus using a more detailed and current LULC dataset. When the two LULC datasets were aggregated to the same land use categories, the relative differences in predicted average annual runoff volumes and TSS loads from the three catchments were 8 to 14% and 13 to 40%, respectively. The relative differences did not have a predictable relationship with catchment size. PMID:12079113

  8. Independent EEG Sources Are Dipolar

    PubMed Central

    Delorme, Arnaud; Palmer, Jason; Onton, Julie; Oostenveld, Robert; Makeig, Scott

    2012-01-01

    Independent component analysis (ICA) and blind source separation (BSS) methods are increasingly used to separate individual brain and non-brain source signals mixed by volume conduction in electroencephalographic (EEG) and other electrophysiological recordings. We compared results of decomposing thirteen 71-channel human scalp EEG datasets by 22 ICA and BSS algorithms, assessing the pairwise mutual information (PMI) in scalp channel pairs, the remaining PMI in component pairs, the overall mutual information reduction (MIR) effected by each decomposition, and decomposition ‘dipolarity’ defined as the number of component scalp maps matching the projection of a single equivalent dipole with less than a given residual variance. The least well-performing algorithm was principal component analysis (PCA); best performing were AMICA and other likelihood/mutual information based ICA methods. Though these and other commonly-used decomposition methods returned many similar components, across 18 ICA/BSS algorithms mean dipolarity varied linearly with both MIR and with PMI remaining between the resulting component time courses, a result compatible with an interpretation of many maximally independent EEG components as being volume-conducted projections of partially-synchronous local cortical field activity within single compact cortical domains. To encourage further method comparisons, the data and software used to prepare the results have been made available (http://sccn.ucsd.edu/wiki/BSSComparison). PMID:22355308

  9. Reconstructing thawing quintessence with multiple datasets

    NASA Astrophysics Data System (ADS)

    Lima, Nelson A.; Liddle, Andrew R.; Sahlén, Martin; Parkinson, David

    2016-03-01

    In this work we model the quintessence potential in a Taylor series expansion, up to second order, around the present-day value of the scalar field. The field is evolved in a thawing regime assuming zero initial velocity. We use the latest data from the Planck satellite, baryonic acoustic oscillations observations from the Sloan Digital Sky Survey, and supernova luminosity distance information from Union2.1 to constrain our models parameters, and also include perturbation growth data from the WiggleZ, BOSS, and 6dF surveys. The supernova data provide the strongest individual constraint on the potential parameters. We show that the growth data performance is competitive with the other datasets in constraining the dark energy parameters we introduce. We also conclude that the combined constraints we obtain for our model parameters, when compared to previous works of nearly a decade ago, have shown only modest improvement, even with new growth of structure data added to previously existent types of data.

  10. Classification of antimicrobial peptides with imbalanced datasets

    NASA Astrophysics Data System (ADS)

    Camacho, Francy L.; Torres, Rodrigo; Ramos Pollán, Raúl

    2015-12-01

    In the last years, pattern recognition has been applied to several fields for solving multiple problems in science and technology as for example in protein prediction. This methodology can be useful for prediction of activity of biological molecules, e.g. for determination of antimicrobial activity of synthetic and natural peptides. In this work, we evaluate the performance of different physico-chemical properties of peptides (descriptors groups) in the presence of imbalanced data sets, when facing the task of detecting whether a peptide has antimicrobial activity. We evaluate undersampling and class weighting techniques to deal with the class imbalance with different classification methods and descriptor groups. Our classification model showed an estimated precision of 96% showing that descriptors used to codify the amino acid sequences contain enough information to correlate the peptides sequences with their antimicrobial activity by means of learning machines. Moreover, we show how certain descriptor groups (pseudoaminoacid composition type I) work better with imbalanced datasets while others (dipeptide composition) work better with balanced ones.

  11. Large scale validation of the M5L lung CAD on heterogeneous CT datasets

    SciTech Connect

    Lopez Torres, E. E-mail: cerello@to.infn.it; Fiorina, E.; Pennazio, F.; Peroni, C.; Saletta, M.; Cerello, P. E-mail: cerello@to.infn.it; Camarlinghi, N.; Fantacci, M. E.

    2015-04-15

    Purpose: M5L, a fully automated computer-aided detection (CAD) system for the detection and segmentation of lung nodules in thoracic computed tomography (CT), is presented and validated on several image datasets. Methods: M5L is the combination of two independent subsystems, based on the Channeler Ant Model as a segmentation tool [lung channeler ant model (lungCAM)] and on the voxel-based neural approach. The lungCAM was upgraded with a scan equalization module and a new procedure to recover the nodules connected to other lung structures; its classification module, which makes use of a feed-forward neural network, is based of a small number of features (13), so as to minimize the risk of lacking generalization, which could be possible given the large difference between the size of the training and testing datasets, which contain 94 and 1019 CTs, respectively. The lungCAM (standalone) and M5L (combined) performance was extensively tested on 1043 CT scans from three independent datasets, including a detailed analysis of the full Lung Image Database Consortium/Image Database Resource Initiative database, which is not yet found in literature. Results: The lungCAM and M5L performance is consistent across the databases, with a sensitivity of about 70% and 80%, respectively, at eight false positive findings per scan, despite the variable annotation criteria and acquisition and reconstruction conditions. A reduced sensitivity is found for subtle nodules and ground glass opacities (GGO) structures. A comparison with other CAD systems is also presented. Conclusions: The M5L performance on a large and heterogeneous dataset is stable and satisfactory, although the development of a dedicated module for GGOs detection could further improve it, as well as an iterative optimization of the training procedure. The main aim of the present study was accomplished: M5L results do not deteriorate when increasing the dataset size, making it a candidate for supporting radiologists on large

  12. ASSESSING THE ACCURACY OF NATIONAL LAND COVER DATASET AREA ESTIMATES AT MULTIPLE SPATIAL EXTENTS

    EPA Science Inventory

    Site specific accuracy assessments provide fine-scale evaluation of the thematic accuracy of land use/land cover (LULC) datasets; however, they provide little insight into LULC accuracy across varying spatial extents. Additionally, LULC data are typically used to describe lands...

  13. Validity and Reliability of Stillbirth Data Using Linked Self-Reported and Administrative Datasets

    PubMed Central

    Hure, Alexis J.; Chojenta, Catherine L.; Powers, Jennifer R.; Byles, Julie E.; Loxton, Deborah

    2015-01-01

    Background A high rate of stillbirth was previously observed in the Australian Longitudinal Study of Women’s Health (ALSWH). Our primary objective was to test the validity and reliability of self-reported stillbirth data linked to state-based administrative datasets. Methods Self-reported data, collected as part of the ALSWH cohort born in 1973–1978, were linked to three administrative datasets for women in New South Wales, Australia (n = 4374): the Midwives Data Collection; Admitted Patient Data Collection; and Perinatal Death Review Database. Linkages were obtained from the Centre for Health Record Linkage for the period 1996–2009. True cases of stillbirth were defined by being consistently recorded in two or more independent data sources. Sensitivity, specificity, positive predictive value, negative predictive value, percent agreement, and kappa statistics were calculated for each dataset. Results Forty-nine women reported 53 stillbirths. No dataset was 100% accurate. The administrative datasets performed better than self-reported data, with high accuracy and agreement. Self-reported data showed high sensitivity (100%) but low specificity (30%), meaning women who had a stillbirth always reported it, but there was also over-reporting of stillbirths. About half of the misreported cases in the ALSWH were able to be removed by identifying inconsistencies in longitudinal data. Conclusions Data linkage provides great opportunity to assess the validity and reliability of self-reported study data. Conversely, self-reported study data can help to resolve inconsistencies in administrative datasets. Quantifying the strengths and limitations of both self-reported and administrative data can improve epidemiological research, especially by guiding methods and interpretation of findings. PMID:25367675

  14. Provenance Challenges for Earth Science Dataset Publication

    NASA Technical Reports Server (NTRS)

    Tilmes, Curt

    2011-01-01

    Modern science is increasingly dependent on computational analysis of very large data sets. Organizing, referencing, publishing those data has become a complex problem. Published research that depends on such data often fails to cite the data in sufficient detail to allow an independent scientist to reproduce the original experiments and analyses. This paper explores some of the challenges related to data identification, equivalence and reproducibility in the domain of data intensive scientific processing. It will use the example of Earth Science satellite data, but the challenges also apply to other domains.

  15. Synthesizing plant phenological indicators from multispecies datasets

    NASA Astrophysics Data System (ADS)

    Rutishauser, This; Peñuelas, Josep; Filella, Iolanda; Gehrig, Regula; Scherrer, Simon C.; Röthlisberger, Christian

    2014-05-01

    Changes in the seasonality of life cycles of plants from phenological observations are traditionally analysed at the species level. Trends and correlations with main environmental driving variables show a coherent picture across the globe. The question arises whether there is an integrated phenological signal across species that describes common interannual variability. Is there a way to express synthetic phenological indicators from multispecies datasets that serve decision makers as usefull tools? Can these indicators be derived in such a robust way that systematic updates yield necessary information for adaptation measures? We address these questions by analysing multi-species phenological data sets with leaf-unfolding and flowering observations from 30 sites across Europe between 40° and 63°N including data from PEP725, the Swiss Plant Phenological Observation Network and one legacy data set. Starting in 1951 the data sets were synthesized by multivariate analysis (Principal Component Analysis). The representativeness of the site specific indicator was tested against subsets including only leaf-unfolding or flowering phases, and by a comparison with a 50% random sample of the available phenophases for 500 time steps. Results show that a synthetic indicators explains up to 79% of the variance at each site - usually 40-50% or more. Robust linear trends over the common period 1971-2000 indicate an overall change of the indicator of -0.32 days/year with lower uncertainty than previous studies. Advances were more pronounced in southern and northern Europe. The indicator-based analysis provides a promising tool for synthesizing site-based plant phenological records and is a companion to, and validating data for, an increasing number of phenological measurements derived from phenological models and satellite sensors.

  16. A reference GNSS tropospheric dataset over Europe.

    NASA Astrophysics Data System (ADS)

    Pacione, Rosa; Di Tomaso, Simona

    2016-04-01

    The present availability of 18 years of GNSS data belonging to the European Permanent Network (EPN, http://www.epncb.oma.be/) is a valuable database for the development of a climate data record of GNSS tropospheric products over Europe. This dataset has high potential for monitoring trend and variability in atmospheric water vapour, improving the knowledge of climatic trends of atmospheric water vapour and being useful for global and regional NWP reanalyses as well as climate model simulations. In the framework of the EPN-Repro2, a second reprocessing campaign of the EPN, five Analysis Centres have homogenously reprocessed the EPN network for the 1996-2013. Three Analysis Centres are providing homogenously reprocessed solutions for the entire network, which are analyzed by the three different software packages: Bernese, GAMIT and GIPSY-OASIS. Smaller subnetworks based on Bernese 5.2 are also provided. A huge effort is made for providing solutions that are the basis for deriving new coordinates, velocities and troposphere parameters, Zenith Tropospheric Delays and Horizontal Gradients, for the entire EPN. These individual contributions are combined in order to provide the official EPN reprocessed products. A preliminary tropospheric combined solution for the period 1996-2013 has been carried out. It is based on all the available homogenously reprocessed solutions and it offers the possibility to assess each of them prior to the ongoing final combination. We will present the results of the EPN Repro2 tropospheric combined products and how the climate community will benefit from them. Aknowledgment.The EPN Repro2 working group is acknowledged for providing the EPN solutions used in this work. E-GEOS activity is carried out in the framework of ASI contract 2015-050-R.0.

  17. Integrating diverse datasets improves developmental enhancer prediction.

    PubMed

    Erwin, Genevieve D; Oksenberg, Nir; Truty, Rebecca M; Kostka, Dennis; Murphy, Karl K; Ahituv, Nadav; Pollard, Katherine S; Capra, John A

    2014-06-01

    Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable

  18. The need for a national LIDAR dataset

    USGS Publications Warehouse

    Stoker, Jason M.; Harding, David; Parrish, Jay

    2008-01-01

    On May 21st and 22nd 2008, the U.S. Geological Survey (USGS), the National Aeronautics and Space Administration (NASA), and the Association of American State Geologists (AASG) hosted the Second National Light Detection and Ranging (Lidar) Initiative Strategy Meeting at USGS Headquarters in Reston, Virginia. The USGS is taking the lead in cooperation with many partners to design and implement a future high-resolution National Lidar Dataset. Initial work is focused on determining viability, developing requirements and specifi cations, establishing what types of information contained in a lidar signal are most important, and identifying key stakeholders and their respective roles. In February 2007, USGS hosted the fi rst National Lidar Initiative Strategy Meeting at USGS Headquarters in Virginia. The presentations and a published summary report from the fi rst meeting can be found on the Center for Lidar Information Coordination and Knowledge (CLICK) Website: http://lidar.cr.usgs.gov. The fi rst meeting demonstrated the public need for consistent lidar data at the national scale. The goals of the second meeting were to further expand on the ideas and information developed in the fi rst meeting, to bring more stakeholders together, to both refi ne and expand on the requirements and capabilities needed, and to discuss an organizational and funding approach for an initiative of this magnitude. The approximately 200 participants represented Federal, State, local, commercial and academic interests. The second meeting included a public solicitation for presentations and posters to better democratize the workshop. All of the oral presentation abstracts that were submitted were accepted, and the 25 poster submissions augmented and expanded upon the oral presentations. The presentations from this second meeting, including audio, can be found on CLICK at http://lidar.cr.usgs.gov/national_lidar_2008.php. Based on the presentations and the discussion sessions, the following

  19. Clementine: Anticipated scientific datasets from the Moon and Geographos

    NASA Technical Reports Server (NTRS)

    Mcewen, A. S.

    1993-01-01

    The Clementine spacecraft mission is designed to test the performance of new lightweight and low-power detectors developed at the Lawrence Livermore National Laboratory (LLNL) for the Strategic Defense Initiative Office (SDIO). A secondary objective of the mission is to acquire useful scientific data, principally of the Moon and the near-Earth asteroid Geographos. The spacecraft will be in an elliptical polar orbit about the Moon for about 2 months beginning in February of 1994 and it will fly by Geographos on August 31. Clementine will carry seven detectors each weighing less than about 1 kg: two Star Trackers wide-angle uv/vis wide-angle Short Wavelength IR (SWIR) Long-Wavelength IR (LWIR) and LIDAR (Laser Image Detection And Ranging) narrow-angle imaging and ranging. Additional presentations about the mission detectors and related science issues are in this volume. If fully successful Clementine will return about 3 million lunar images, a dataset with nearly as many bits of data (uncompressed) as the first cycle of Magellan and more than 5000 images of Geographos. The complete and efficient analysis of such large data sets requires systematic processing efforts. Described below are concepts for two such efforts for the Clementine mission: global multispectral imaging of the Moon and videos of the Geographos flyby. Other anticipated datasets for which systematic processing might be desirable include multispectral observations of Earth; LIDAR altimetry of the Moon with high-resolution imaging along each ground track; high-resolution LIDAR color along each lunar ground track which could be used to identify potential titanium-rich deposits at scales of a few meters; and thermal IR imaging along each lunar ground track (including nighttime observations near the poles).

  20. Application of Huang-Hilbert Transforms to Geophysical Datasets

    NASA Technical Reports Server (NTRS)

    Duffy, Dean G.

    2003-01-01

    The Huang-Hilbert transform is a promising new method for analyzing nonstationary and nonlinear datasets. In this talk I will apply this technique to several important geophysical datasets. To understand the strengths and weaknesses of this method, multi- year, hourly datasets of the sea level heights and solar radiation will be analyzed. Then we will apply this transform to the analysis of gravity waves observed in a mesoscale observational net.

  1. Framework for Interactive Parallel Dataset Analysis on the Grid

    SciTech Connect

    Alexander, David A.; Ananthan, Balamurali; Johnson, Tony; Serbo, Victor; /SLAC

    2007-01-10

    We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.

  2. Pgu-Face: A dataset of partially covered facial images.

    PubMed

    Salari, Seyed Reza; Rostami, Habib

    2016-12-01

    In this article we introduce a human face image dataset. Images were taken in close to real-world conditions using several cameras, often mobile phone׳s cameras. The dataset contains 224 subjects imaged under four different figures (a nearly clean-shaven countenance, a nearly clean-shaven countenance with sunglasses, an unshaven or stubble face countenance, an unshaven or stubble face countenance with sunglasses) in up to two recording sessions. Existence of partially covered face images in this dataset could reveal the robustness and efficiency of several facial image processing algorithms. In this work we present the dataset and explain the recording method. PMID:27668275

  3. Pgu-Face: A dataset of partially covered facial images.

    PubMed

    Salari, Seyed Reza; Rostami, Habib

    2016-12-01

    In this article we introduce a human face image dataset. Images were taken in close to real-world conditions using several cameras, often mobile phone׳s cameras. The dataset contains 224 subjects imaged under four different figures (a nearly clean-shaven countenance, a nearly clean-shaven countenance with sunglasses, an unshaven or stubble face countenance, an unshaven or stubble face countenance with sunglasses) in up to two recording sessions. Existence of partially covered face images in this dataset could reveal the robustness and efficiency of several facial image processing algorithms. In this work we present the dataset and explain the recording method.

  4. Automatic Diabetic Macular Edema Detection in Fundus Images Using Publicly Available Datasets

    SciTech Connect

    Giancardo, Luca; Meriaudeau, Fabrice; Karnowski, Thomas Paul; Li, Yaquin; Garg, Seema; Tobin Jr, Kenneth William; Chaum, Edward

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing. Our algorithm is robust to segmentation uncertainties, does not need ground truth at lesion level, and is very fast, generating a diagnosis on an average of 4.4 seconds per image on an 2.6 GHz platform with an unoptimised Matlab implementation.

  5. Identifying reproducible cancer-associated highly expressed genes with important functional significances using multiple datasets

    PubMed Central

    Huang, Haiyan; Li, Xiangyu; Guo, You; Zhang, Yuncong; Deng, Xusheng; Chen, Lufei; Zhang, Jiahui; Guo, Zheng; Ao, Lu

    2016-01-01

    Identifying differentially expressed (DE) genes between cancer and normal tissues is of basic importance for studying cancer mechanisms. However, current methods, such as the commonly used Significance Analysis of Microarrays (SAM), are biased to genes with low expression levels. Recently, we proposed an algorithm, named the pairwise difference (PD) algorithm, to identify highly expressed DE genes based on reproducibility evaluation of top-ranked expression differences between paired technical replicates of cells under two experimental conditions. In this study, we extended the application of the algorithm to the identification of DE genes between two types of tissue samples (biological replicates) based on several independent datasets or sub-datasets of a dataset, by constructing multiple paired average gene expression profiles for the two types of samples. Using multiple datasets for lung and esophageal cancers, we demonstrated that PD could identify many DE genes highly expressed in both cancer and normal tissues that tended to be missed by the commonly used SAM. These highly expressed DE genes, including many housekeeping genes, were significantly enriched in many conservative pathways, such as ribosome, proteasome, phagosome and TNF signaling pathways with important functional significances in oncogenesis. PMID:27796338

  6. The Challenge of Assimilating Older Data and Samples into Digital Datasets and Sample Collections

    NASA Astrophysics Data System (ADS)

    Leinen, M.

    2015-12-01

    The geosciences are especially dependent on past observations of the planet to understand both processes and planetary history. As digital storage became more inexpensive - and conversion of written and published material to digital format became easier - many of us assumed that existing files of data -- and even notebooks and 'shoeboxes' of data would be assimilated into larger curated datasets. While publications are rapidly becoming available digitally, the data in them, no less data that were not published, are not being integrated into readily available datasets. Negative data, while critical are especially at risk. Samples are even more vulnerable because of the space needed and cost of maintenance. Universities are more frequently being called on to manage the data and collections of faculty who are no longer active or to find other collections that are willing to take them on, in most cases with no additional resources. Examples from datasets and collections maintained by Scripps Institution of Oceanography will be used to illustrate challenges.

  7. Public Availability to ECS Collected Datasets

    NASA Astrophysics Data System (ADS)

    Henderson, J. F.; Warnken, R.; McLean, S. J.; Lim, E.; Varner, J. D.

    2013-12-01

    Coastal nations have spent considerable resources exploring the limits of their extended continental shelf (ECS) beyond 200 nm. Although these studies are funded to fulfill requirements of the UN Convention on the Law of the Sea, the investments are producing new data sets in frontier areas of Earth's oceans that will be used to understand, explore, and manage the seafloor and sub-seafloor for decades to come. Although many of these datasets are considered proprietary until a nation's potential ECS has become 'final and binding' an increasing amount of data are being released and utilized by the public. Data sets include multibeam, seismic reflection/refraction, bottom sampling, and geophysical data. The U.S. ECS Project, a multi-agency collaboration whose mission is to establish the full extent of the continental shelf of the United States consistent with international law, relies heavily on data and accurate, standard metadata. The United States has made it a priority to make available to the public all data collected with ECS-funding as quickly as possible. The National Oceanic and Atmospheric Administration's (NOAA) National Geophysical Data Center (NGDC) supports this objective by partnering with academia and other federal government mapping agencies to archive, inventory, and deliver marine mapping data in a coordinated, consistent manner. This includes ensuring quality, standard metadata and developing and maintaining data delivery capabilities built on modern digital data archives. Other countries, such as Ireland, have submitted their ECS data for public availability and many others have made pledges to participate in the future. The data services provided by NGDC support the U.S. ECS effort as well as many developing nation's ECS effort through the U.N. Environmental Program. Modern discovery, visualization, and delivery of scientific data and derived products that span national and international sources of data ensure the greatest re-use of data and

  8. Dataset for a case report of a homozygous PEX16 F332del mutation.

    PubMed

    Bacino, Carlos; Chao, Yu-Hsin; Seto, Elaine; Lotze, Tim; Xia, Fan; Jones, Richard O; Moser, Ann; Wangler, Michael F

    2016-03-01

    This dataset provides a clinical description along with extensive biochemical and molecular characterization of a patient with a homozygous mutation in PEX16 with an atypical phenotype. This patient described in Molecular Genetics and Metabolism Reports was ultimately diagnosed with an atypical peroxisomal disorder on exome sequencing. A clinical timeline and diagnostic summary, results of an extensive plasma and fibroblast analysis of this patient׳s peroxisomal profile is provided. In addition, a table of additional variants from the exome analysis is provided.

  9. Sharing Clouds: Showing, Distributing, and Sharing Large Point Datasets

    NASA Astrophysics Data System (ADS)

    Grigsby, S.

    2012-12-01

    Sharing large data sets with colleagues and the general public presents a unique technological challenge for scientists. In addition to large data volumes, there are significant challenges in representing data that is often irregular, multidimensional and spatial in nature. For derived data products, additional challenges exist in displaying and providing provenance data. For this presentation, several open source technologies are demonstrated for the remote display and access of large irregular point data sets. These technologies and techniques include the remote viewing of point data using HTML5 and OpenGL, which provides a highly accessible preview of the data sets for a range of audiences. Intermediate levels of accessibility and high levels of interactivity are accomplished with technologies such as wevDAV, which allows collaborators to run analysis on local clients, using data stored and administered on remote servers. Remote processing and analysis, including provenance tracking, will be discussed at the workgroup level. The data sets used for this presentation include data acquired from the NSF funded National Center for Airborne Laser Mapping (NCALM), and data acquired for research and instructional use in NASA's Student Airborne Research Program (SARP). These datasets include Light Ranging And Detection (LiDAR) point clouds ranging in size from several hundred thousand to several hundred million data points; the techniques and technologies discussed are applicable to other forms of irregular point data.

  10. Normalization of transposon-mutant library sequencing datasets to improve identification of conditionally essential genes.

    PubMed

    DeJesus, Michael A; Ioerger, Thomas R

    2016-06-01

    Sequencing of transposon-mutant libraries using next-generation sequencing (TnSeq) has become a popular method for determining which genes and non-coding regions are essential for growth under various conditions in bacteria. For methods that rely on quantitative comparison of counts of reads at transposon insertion sites, proper normalization of TnSeq datasets is vitally important. Real TnSeq datasets are often noisy and exhibit a significant skew that can be dominated by high counts at a small number of sites (often for non-biological reasons). If two datasets that are not appropriately normalized are compared, it might cause the artifactual appearance of Differentially Essential (DE) genes in a statistical test, constituting type I errors (false positives). In this paper, we propose a novel method for normalization of TnSeq datasets that corrects for the skew of read-count distributions by fitting them to a Beta-Geometric distribution. We show that this read-count correction procedure reduces the number of false positives when comparing replicate datasets grown under the same conditions (for which no genuine differences in essentiality are expected). We compare these results to results obtained with other normalization procedures, and show that it results in greater reduction in the number of false positives. In addition we investigate the effects of normalization on the detection of DE genes.

  11. Self-Reported Juvenile Firesetting: Results from Two National Survey Datasets

    PubMed Central

    Howell Bowling, Carrie; Merrick, Joav; Omar, Hatim A.

    2013-01-01

    The main purpose of this study was to address gaps in existing research by examining the relationship between academic performance and attention problems with juvenile firesetting. Two datasets from the Achenbach System for Empirically Based Assessment (ASEBA) were used. The Factor Analysis Dataset (N = 975) was utilized and results indicated that adolescents who report lower academic performance are more likely to set fires. Additionally, adolescents who report a poor attitude toward school are even more likely to set fires. Results also indicated that attention problems are predictive of self-reported firesetting. The National Survey Dataset (N = 1158) was analyzed to determine the prevalence of firesetting in a normative sample and also examine whether these children reported higher levels of internalizing and externalizing behavior problems. It was found that 4.5% of adolescents in the generalized sample reported firesetting. Firesetters reported more internalizing, externalizing, and total problems than their non-firesetting peers. In this normative sample, firesetters were found to have lower academic performance and more attention problems. Limitations include the low overall number of firesetters in each dataset (Factor Analysis n = 123 and National Survey n = 53) and the inclusion of children who had been referred for services in the Factor Analysis Dataset. PMID:24350229

  12. Quantifying the reliability of four global datasets for drought monitoring over a semiarid region

    NASA Astrophysics Data System (ADS)

    Katiraie-Boroujerdy, Pari-Sima; Nasrollahi, Nasrin; Hsu, Kuo-lin; Sorooshian, Soroosh

    2016-01-01

    Drought is one of the most relevant natural disasters, especially in arid regions such as Iran. One of the requirements to access reliable drought monitoring is long-term and continuous high-resolution precipitation data. Different climatic and global databases are being developed and made available in real time or near real time by different agencies and centers; however, for this purpose, these databases must be evaluated regionally and in different local climates. In this paper, a near real-time global climate model, a data assimilation system, and two gridded gauge-based datasets over Iran are evaluated. The ground truth data include 50 gauges from the period of 1980 to 2010. Drought analysis was carried out by means of the Standard Precipitation Index (SPI) at 2-, 3-, 6-, and 12-month timescales. Although the results show spatial variations, overall the two gauge-based datasets perform better than the models. In addition, the results are more reliable for the western portion of the Zagros Range and the eastern region of the country. The analysis of the onsets of the 6-month moderate drought with at least 3 months' persistence indicates that all datasets have a better performance over the western portion of the Zagros Range, but display poor performance over the coast of the Caspian Sea. Base on the results of this study, the Modern-Era Retrospective Analysis for Research and Applications (MERRA) dataset is a preferred alternative for drought analysis in the region when gauge-based datasets are not available.

  13. Advancements in Wind Integration Study Input Data Modeling: The Wind Integration National Dataset (WIND) Toolkit

    NASA Astrophysics Data System (ADS)

    Hodge, B.; Orwig, K.; McCaa, J. R.; Harrold, S.; Draxl, C.; Jones, W.; Searight, K.; Getman, D.

    2013-12-01

    Regional wind integration studies in the United States, such as the Western Wind and Solar Integration Study (WWSIS), Eastern Wind Integration and Transmission Study (EWITS), and Eastern Renewable Generation Integration Study (ERGIS), perform detailed simulations of the power system to determine the impact of high wind and solar energy penetrations on power systems operations. Some of the specific aspects examined include: infrastructure requirements, impacts on grid operations and conventional generators, ancillary service requirements, as well as the benefits of geographic diversity and forecasting. These studies require geographically broad and temporally consistent wind and solar power production input datasets that realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of wind and solar power plant production, and are time-synchronous with load profiles. The original western and eastern wind datasets were generated independently for 2004-2006 using numerical weather prediction (NWP) models run on a ~2 km grid with 10-minute resolution. Each utilized its own site selection process to augment existing wind plants with simulated sites of high development potential. The original dataset also included day-ahead simulated forecasts. These datasets were the first of their kind and many lessons were learned from their development. For example, the modeling approach used generated periodic false ramps that later had to be removed due to unrealistic impacts on ancillary service requirements. For several years, stakeholders have been requesting an updated dataset that: 1) covers more recent years; 2) spans four or more years to better evaluate interannual variability; 3) uses improved methods to minimize false ramps and spatial seams; 4) better incorporates solar power production inputs; and 5) is more easily accessible. To address these needs, the U.S. Department of Energy (DOE) Wind and Solar Programs have funded two

  14. Accuracy assessment of gridded precipitation datasets in the Himalayas

    NASA Astrophysics Data System (ADS)

    Khan, A.

    2015-12-01

    Accurate precipitation data are vital for hydro-climatic modelling and water resources assessments. Based on mass balance calculations and Turc-Budyko analysis, this study investigates the accuracy of twelve widely used precipitation gridded datasets for sub-basins in the Upper Indus Basin (UIB) in the Himalayas-Karakoram-Hindukush (HKH) region. These datasets are: 1) Global Precipitation Climatology Project (GPCP), 2) Climate Prediction Centre (CPC) Merged Analysis of Precipitation (CMAP), 3) NCEP / NCAR, 4) Global Precipitation Climatology Centre (GPCC), 5) Climatic Research Unit (CRU), 6) Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE), 7) Tropical Rainfall Measuring Mission (TRMM), 8) European Reanalysis (ERA) interim data, 9) PRINCETON, 10) European Reanalysis-40 (ERA-40), 11) Willmott and Matsuura, and 12) WATCH Forcing Data based on ERA interim (WFDEI). Precipitation accuracy and consistency was assessed by physical mass balance involving sum of annual measured flow, estimated actual evapotranspiration (average of 4 datasets), estimated glacier mass balance melt contribution (average of 4 datasets), and ground water recharge (average of 3 datasets), during 1999-2010. Mass balance assessment was complemented by Turc-Budyko non-dimensional analysis, where annual precipitation, measured flow and potential evapotranspiration (average of 5 datasets) data were used for the same period. Both analyses suggest that all tested precipitation datasets significantly underestimate precipitation in the Karakoram sub-basins. For the Hindukush and Himalayan sub-basins most datasets underestimate precipitation, except ERA-interim and ERA-40. The analysis indicates that for this large region with complicated terrain features and stark spatial precipitation gradients the reanalysis datasets have better consistency with flow measurements than datasets derived from records of only sparsely distributed climatic

  15. Genetic architecture of vitamin B12 and folate levels uncovered applying deeply sequenced large datasets.

    PubMed

    Grarup, Niels; Sulem, Patrick; Sandholt, Camilla H; Thorleifsson, Gudmar; Ahluwalia, Tarunveer S; Steinthorsdottir, Valgerdur; Bjarnason, Helgi; Gudbjartsson, Daniel F; Magnusson, Olafur T; Sparsø, Thomas; Albrechtsen, Anders; Kong, Augustine; Masson, Gisli; Tian, Geng; Cao, Hongzhi; Nie, Chao; Kristiansen, Karsten; Husemoen, Lise Lotte; Thuesen, Betina; Li, Yingrui; Nielsen, Rasmus; Linneberg, Allan; Olafsson, Isleifur; Eyjolfsson, Gudmundur I; Jørgensen, Torben; Wang, Jun; Hansen, Torben; Thorsteinsdottir, Unnur; Stefánsson, Kari; Pedersen, Oluf

    2013-06-01

    Genome-wide association studies have mainly relied on common HapMap sequence variations. Recently, sequencing approaches have allowed analysis of low frequency and rare variants in conjunction with common variants, thereby improving the search for functional variants and thus the understanding of the underlying biology of human traits and diseases. Here, we used a large Icelandic whole genome sequence dataset combined with Danish exome sequence data to gain insight into the genetic architecture of serum levels of vitamin B(12) (B12) and folate. Up to 22.9 million sequence variants were analyzed in combined samples of 45,576 and 37,341 individuals with serum B(12) and folate measurements, respectively. We found six novel loci associating with serum B(12) (CD320, TCN2, ABCD4, MMAA, MMACHC) or folate levels (FOLR3) and confirmed seven loci for these traits (TCN1, FUT6, FUT2, CUBN, CLYBL, MUT, MTHFR). Conditional analyses established that four loci contain additional independent signals. Interestingly, 13 of the 18 identified variants were coding and 11 of the 13 target genes have known functions related to B(12) and folate pathways. Contrary to epidemiological studies we did not find consistent association of the variants with cardiovascular diseases, cancers or Alzheimer's disease although some variants demonstrated pleiotropic effects. Although to some degree impeded by low statistical power for some of these conditions, these data suggest that sequence variants that contribute to the population diversity in serum B(12) or folate levels do not modify the risk of developing these conditions. Yet, the study demonstrates the value of combining whole genome and exome sequencing approaches to ascertain the genetic and molecular architectures underlying quantitative trait associations.

  16. Cloud and Precipitation Properties Merged Dataset from Vertically Pointing ARM Radars During GOAmazon

    NASA Astrophysics Data System (ADS)

    Toto, T.; Giangrande, S. E.; Troyan, D.; Jensen, M. P.; Bartholomew, M. J.; Johnson, K. L.

    2014-12-01

    The Green Ocean Amazon (GOAmazon) field campaign is in its first year of a two-year deployment in the Amazon Basin to study aerosol and cloud lifecycles as they relate to cloud-aerosol-precipitation interactions. Insights from GOAmazon datasets will fill gaps in our understanding, ultimately improving constraints in tropical rain forest climate model parameterizations. As part of GOAmazon, the Atmospheric Radiation Measurement (ARM) Mobile Facility (AMF) has been collecting a unique set of observations near Manacapuru, Brazil, a site known to experience both the pristine condition of its locale as well as, at times, the effects of the Manaus, Brazil, mega city pollution plume. In order to understand the effects of anthropogenic aerosol on clouds, radiative balance and climate, documentation of cloud and precipitation properties in the absence and presence of the Manaus plume is a necessary complement to the aerosol measurements collected during the campaign. The AMF is uniquely equipped to capture the most complete and continuous record of cloud and precipitation column properties using the UHF (915 MHz) ARM zenith radar (UAZR) and vertically pointing W-Band (95 GHz) ARM Cloud Radar (WACR). Together, these radars provide multiple methods (e.g., moment-based, dual-frequency, and Doppler spectral techniques) to retrieve properties of the cloud field that may be influenced by aerosols. This includes drop size distribution, dynamical and microphysical properties (e.g., vertical air motion, latent heat retrievals), and associated uncertainties. Additional quality assurance is available from independent rain gauge and column platforms. Here, we merge data from the UAZR and WACR (WACR-ARSCL VAP) radars, along with ARM Sounding observations and optical parsivel measurement constraints, to present a first look at select convective and stratiform events, their precipitation properties and statistical profile characterization.

  17. Vikodak - A Modular Framework for Inferring Functional Potential of Microbial Communities from 16S Metagenomic Datasets

    PubMed Central

    Nagpal, Sunil; Haque, Mohammed Monzoorul; Mande, Sharmila S.

    2016-01-01

    Background The overall metabolic/functional potential of any given environmental niche is a function of the sum total of genes/proteins/enzymes that are encoded and expressed by various interacting microbes residing in that niche. Consequently, prior (collated) information pertaining to genes, enzymes encoded by the resident microbes can aid in indirectly (re)constructing/ inferring the metabolic/ functional potential of a given microbial community (given its taxonomic abundance profile). In this study, we present Vikodak—a multi-modular package that is based on the above assumption and automates inferring and/ or comparing the functional characteristics of an environment using taxonomic abundance generated from one or more environmental sample datasets. With the underlying assumptions of co-metabolism and independent contributions of different microbes in a community, a concerted effort has been made to accommodate microbial co-existence patterns in various modules incorporated in Vikodak. Results Validation experiments on over 1400 metagenomic samples have confirmed the utility of Vikodak in (a) deciphering enzyme abundance profiles of any KEGG metabolic pathway, (b) functional resolution of distinct metagenomic environments, (c) inferring patterns of functional interaction between resident microbes, and (d) automating statistical comparison of functional features of studied microbiomes. Novel features incorporated in Vikodak also facilitate automatic removal of false positives and spurious functional predictions. Conclusions With novel provisions for comprehensive functional analysis, inclusion of microbial co-existence pattern based algorithms, automated inter-environment comparisons; in-depth analysis of individual metabolic pathways and greater flexibilities at the user end, Vikodak is expected to be an important value addition to the family of existing tools for 16S based function prediction. Availability and Implementation A web implementation of Vikodak

  18. Genetic Architecture of Vitamin B12 and Folate Levels Uncovered Applying Deeply Sequenced Large Datasets

    PubMed Central

    Thorleifsson, Gudmar; Ahluwalia, Tarunveer S.; Steinthorsdottir, Valgerdur; Bjarnason, Helgi; Gudbjartsson, Daniel F.; Magnusson, Olafur T.; Sparsø, Thomas; Albrechtsen, Anders; Kong, Augustine; Masson, Gisli; Tian, Geng; Cao, Hongzhi; Nie, Chao; Kristiansen, Karsten; Husemoen, Lise Lotte; Thuesen, Betina; Li, Yingrui; Nielsen, Rasmus; Linneberg, Allan; Olafsson, Isleifur; Eyjolfsson, Gudmundur I.; Jørgensen, Torben; Wang, Jun; Hansen, Torben; Thorsteinsdottir, Unnur; Stefánsson, Kari; Pedersen, Oluf

    2013-01-01

    Genome-wide association studies have mainly relied on common HapMap sequence variations. Recently, sequencing approaches have allowed analysis of low frequency and rare variants in conjunction with common variants, thereby improving the search for functional variants and thus the understanding of the underlying biology of human traits and diseases. Here, we used a large Icelandic whole genome sequence dataset combined with Danish exome sequence data to gain insight into the genetic architecture of serum levels of vitamin B12 (B12) and folate. Up to 22.9 million sequence variants were analyzed in combined samples of 45,576 and 37,341 individuals with serum B12 and folate measurements, respectively. We found six novel loci associating with serum B12 (CD320, TCN2, ABCD4, MMAA, MMACHC) or folate levels (FOLR3) and confirmed seven loci for these traits (TCN1, FUT6, FUT2, CUBN, CLYBL, MUT, MTHFR). Conditional analyses established that four loci contain additional independent signals. Interestingly, 13 of the 18 identified variants were coding and 11 of the 13 target genes have known functions related to B12 and folate pathways. Contrary to epidemiological studies we did not find consistent association of the variants with cardiovascular diseases, cancers or Alzheimer's disease although some variants demonstrated pleiotropic effects. Although to some degree impeded by low statistical power for some of these conditions, these data suggest that sequence variants that contribute to the population diversity in serum B12 or folate levels do not modify the risk of developing these conditions. Yet, the study demonstrates the value of combining whole genome and exome sequencing approaches to ascertain the genetic and molecular architectures underlying quantitative trait associations. PMID:23754956

  19. Really big data: Processing and analysis of large datasets

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  20. Primary Datasets for Case Studies of River-Water Quality

    ERIC Educational Resources Information Center

    Goulder, Raymond

    2008-01-01

    Level 6 (final-year BSc) students undertook case studies on between-site and temporal variation in river-water quality. They used professionally-collected datasets supplied by the Environment Agency. The exercise gave students the experience of working with large, real-world datasets and led to their understanding how the quality of river water is…

  1. Interface between astrophysical datasets and distributed database management systems (DAVID)

    NASA Technical Reports Server (NTRS)

    Iyengar, S. S.

    1988-01-01

    This is a status report on the progress of the DAVID (Distributed Access View Integrated Database Management System) project being carried out at Louisiana State University, Baton Rouge, Louisiana. The objective is to implement an interface between Astrophysical datasets and DAVID. Discussed are design details and implementation specifics between DAVID and astrophysical datasets.

  2. Finding Spatio-Temporal Patterns in Large Sensor Datasets

    ERIC Educational Resources Information Center

    McGuire, Michael Patrick

    2010-01-01

    Spatial or temporal data mining tasks are performed in the context of the relevant space, defined by a spatial neighborhood, and the relevant time period, defined by a specific time interval. Furthermore, when mining large spatio-temporal datasets, interesting patterns typically emerge where the dataset is most dynamic. This dissertation is…

  3. Querying Patterns in High-Dimensional Heterogenous Datasets

    ERIC Educational Resources Information Center

    Singh, Vishwakarma

    2012-01-01

    The recent technological advancements have led to the availability of a plethora of heterogenous datasets, e.g., images tagged with geo-location and descriptive keywords. An object in these datasets is described by a set of high-dimensional feature vectors. For example, a keyword-tagged image is represented by a color-histogram and a…

  4. Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System

    NASA Astrophysics Data System (ADS)

    Ji, Z.; Worley, S. J.; Schuster, D. C.

    2011-12-01

    at hourly, daily, monthly, and yearly intervals. DSUPDT is also fully scalable and continues to support addition of new data streams. This paper will introduce the powerful functionality of the RDAMS for operational dataset updates, and provide examples of its use

  5. NCAR's Research Data Archive: OPeNDAP Access for Complex Datasets

    NASA Astrophysics Data System (ADS)

    Dattore, R.; Worley, S. J.

    2014-12-01

    Many datasets have complex structures including hundreds of parameters and numerous vertical levels, grid resolutions, and temporal products. Making these data accessible is a challenge for a data provider. OPeNDAP is powerful protocol for delivering in real-time multi-file datasets that can be ingested by many analysis and visualization tools, but for these datasets there are too many choices about how to aggregate. Simple aggregation schemes can fail to support, or at least make it very challenging, for many potential studies based on complex datasets. We address this issue by using a rich file content metadata collection to create a real-time customized OPeNDAP service to match the full suite of access possibilities for complex datasets. The Climate Forecast System Reanalysis (CFSR) and it's extension, the Climate Forecast System Version 2 (CFSv2) datasets produced by the National Centers for Environmental Prediction (NCEP) and hosted by the Research Data Archive (RDA) at the Computational and Information Systems Laboratory (CISL) at NCAR are examples of complex datasets that are difficult to aggregate with existing data server software. CFSR and CFSv2 contain 141 distinct parameters on 152 vertical levels, six grid resolutions and 36 products (analyses, n-hour forecasts, multi-hour averages, etc.) where not all parameter/level combinations are available at all grid resolution/product combinations. These data are archived in the RDA with the data structure provided by the producer; no additional re-organization or aggregation have been applied. Since 2011, users have been able to request customized subsets (e.g. - temporal, parameter, spatial) from the CFSR/CFSv2, which are processed in delayed-mode and then downloaded to a user's system. Until now, the complexity has made it difficult to provide real-time OPeNDAP access to the data. We have developed a service that leverages the already-existing subsetting interface and allows users to create a virtual dataset

  6. New model for datasets citation and extraction reproducibility in VAMDC

    NASA Astrophysics Data System (ADS)

    Zwölf, Carlo Maria; Moreau, Nicolas; Dubernet, Marie-Lise

    2016-09-01

    In this paper we present a new paradigm for the identification of datasets extracted from the Virtual Atomic and Molecular Data Centre (VAMDC) e-science infrastructure. Such identification includes information on the origin and version of the datasets, references associated to individual data in the datasets, as well as timestamps linked to the extraction procedure. This paradigm is described through the modifications of the language used to exchange data within the VAMDC and through the services that will implement those modifications. This new paradigm should enforce traceability of datasets, favor reproducibility of datasets extraction, and facilitate the systematic citation of the authors having originally measured and/or calculated the extracted atomic and molecular data.

  7. Increasing spatial resolution of CHIRPS rainfall datasets for Cyprus with artificial neural networks

    NASA Astrophysics Data System (ADS)

    Tymvios, Filippos; Michaelides, Silas; Retalis, Adrianos; Katsanos, Dimitrios; Lelieveld, Jos

    2016-08-01

    The use of high resolution rainfall datasets is an alternative way of studying climatological regions where conventional rain measurements are sparse or not available. Starting in 1981 to near-present, the CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data) dataset incorporates a 5km×5km resolution satellite imagery with in-situ station data to create gridded rainfall time series for trend analysis, severe events and seasonal drought monitoring. The aim of this work is to further increase the resolution of the rainfall dataset for Cyprus to 1km×1km, by correlating the CHIRPS dataset with elevation information, the NDVI index (Normalized Difference Vegetation Index) from satellite images at 1km×1km and precipitation measurements from the official raingauge network of the Cyprus' Department of Meteorology, utilizing Artificial Neural Networks. The Artificial Neural Networks' architecture that was implemented is the Multi-Layer Perceptron (MLP) trained with the back propagation method, which is widely used in environmental studies. Seven different network architectures were tested, all with two hidden layers. The number of neurons ranged from 3 to10 in the first hidden layer and from 5 to 25 in the second hidden layer. The dataset was separated into a randomly selected training set, a validation set and a testing set; the latter is independently used for the final assessment of the models' performance. Using the Artificial Neural Network approach, a new map of the spatial analysis of rainfall is constructed which exhibits a considerable increase in its spatial resolution. A statistical assessment of the new spatial analysis was made using the rainfall ground measurements from the raingauge network. The assessment indicates that the methodology is promising for several applications.

  8. Fostering Musical Independence

    ERIC Educational Resources Information Center

    Shieh, Eric; Allsup, Randall Everett

    2016-01-01

    Musical independence has always been an essential aim of musical instruction. But this objective can refer to everything from high levels of musical expertise to more student choice in the classroom. While most conceptualizations of musical independence emphasize the demonstration of knowledge and skills within particular music traditions, this…

  9. Independent vs. Laboratory Papers.

    ERIC Educational Resources Information Center

    Wilson, Clint C., II

    1981-01-01

    Comparisons of independent and laboratory newspapers at selected California colleges indicated that (1) the independent newspapers were superior in editorial opinion and leadership characteristics; (2) the laboratory newspapers made better use of photography, art, and graphics; and (3) professional journalists highly rated their laboratory…

  10. Independence of Internal Auditors.

    ERIC Educational Resources Information Center

    Montondon, Lucille; Meixner, Wilda F.

    1993-01-01

    A survey of 288 college and university auditors investigated patterns in their appointment, reporting, and supervisory practices as indicators of independence and objectivity. Results indicate a weakness in the positioning of internal auditing within institutions, possibly compromising auditor independence. Because the auditing function is…

  11. American Independence. Fifth Grade.

    ERIC Educational Resources Information Center

    Crosby, Annette

    This fifth grade teaching unit covers early conflicts between the American colonies and Britain, battles of the American Revolutionary War, and the Declaration of Independence. Knowledge goals address the pre-revolutionary acts enforced by the British, the concepts of conflict and independence, and the major events and significant people from the…

  12. Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis

    PubMed Central

    Hanauer, David A; Saeed, Mohammed; Zheng, Kai; Mei, Qiaozhu; Shedden, Kerby; Aronson, Alan R; Ramakrishnan, Naren

    2014-01-01

    Objective We describe experiments designed to determine the feasibility of distinguishing known from novel associations based on a clinical dataset comprised of International Classification of Disease, V.9 (ICD-9) codes from 1.6 million patients by comparing them to associations of ICD-9 codes derived from 20.5 million Medline citations processed using MetaMap. Associations appearing only in the clinical dataset, but not in Medline citations, are potentially novel. Methods Pairwise associations of ICD-9 codes were independently identified in both the clinical and Medline datasets, which were then compared to quantify their degree of overlap. We also performed a manual review of a subset of the associations to validate how well MetaMap performed in identifying diagnoses mentioned in Medline citations that formed the basis of the Medline associations. Results The overlap of associations based on ICD-9 codes in the clinical and Medline datasets was low: only 6.6% of the 3.1 million associations found in the clinical dataset were also present in the Medline dataset. Further, a manual review of a subset of the associations that appeared in both datasets revealed that co-occurring diagnoses from Medline citations do not always represent clinically meaningful associations. Discussion Identifying novel associations derived from large clinical datasets remains challenging. Medline as a sole data source for existing knowledge may not be adequate to filter out widely known associations. Conclusions In this study, novel associations were not readily identified. Further improvements in accuracy and relevance for tools such as MetaMap are needed to realize their expected utility. PMID:24928177

  13. Evaluating Reanalysis - Independent Observations and Observation Independence

    NASA Astrophysics Data System (ADS)

    Wahl, S.; Bollmeyer, C.; Danek, C.; Friederichs, P.; Keller, J. D.; Ohlwein, C.

    2014-12-01

    Reanalyses on global to regional scales are widely used for validation of meteorological or hydrological models and for many climate applications. However, the evaluation of the reanalyses itself is still a crucial task. A major challenge is the lack of independent observations, since most of the available observational data is already included, e. g. by the data assimilation scheme. Here, we focus on the evaluation of dynamical reanalyses which are obtained by using numerical weather prediction models with a fixed data assimilation scheme. Precipitation is generally not assimilated in dynamical reanalyses (except for e.g. latent heat nudging) and thereby provides valuable data for the evaluation of reanalysis. Since precipitation results from the complex dynamical and microphysical atmospheric processes, an accurate representation of precipitation is often used as an indicator for a good model performance. Here, we use independent observations of daily precipitation accumulations from European rain gauges (E-OBS) of the years 2008 and 2009 for the intercomparison of various regional reanalyses products for the European CORDEX domain (Hirlam reanalysis at 0.2°, Metoffice UM reanalysis at 0.11°, COSMO reanalysis at 0.055°). This allows for assessing the benefits of increased horizontal resolution compared to global reanalyses. Furthermore, the effect of latent heat nudging (assimilation of radar-derived rain rates) is investigated using an experimental setup of the COSMO reanalysis with 6km and 2km resolution for summer 2011. Further, we present an observation independent evaluation based on kinetic energy spectra. Such spectra should follow a k-3 dependence of the wave number k for the larger scale, and a k-5/3 dependence on the mesoscale. We compare the spectra of the aforementioned regional reanalyses in order to investigate the general capability of the reanalyses to resolve events on the mesoscale (e.g. effective resolution). The intercomparison and

  14. Efficient segmentation of 3D fluoroscopic datasets from mobile C-arm

    NASA Astrophysics Data System (ADS)

    Styner, Martin A.; Talib, Haydar; Singh, Digvijay; Nolte, Lutz-Peter

    2004-05-01

    The emerging mobile fluoroscopic 3D technology linked with a navigation system combines the advantages of CT-based and C-arm-based navigation. The intra-operative, automatic segmentation of 3D fluoroscopy datasets enables the combined visualization of surgical instruments and anatomical structures for enhanced planning, surgical eye-navigation and landmark digitization. We performed a thorough evaluation of several segmentation algorithms using a large set of data from different anatomical regions and man-made phantom objects. The analyzed segmentation methods include automatic thresholding, morphological operations, an adapted region growing method and an implicit 3D geodesic snake method. In regard to computational efficiency, all methods performed within acceptable limits on a standard Desktop PC (30sec-5min). In general, the best results were obtained with datasets from long bones, followed by extremities. The segmentations of spine, pelvis and shoulder datasets were generally of poorer quality. As expected, the threshold-based methods produced the worst results. The combined thresholding and morphological operations methods were considered appropriate for a smaller set of clean images. The region growing method performed generally much better in regard to computational efficiency and segmentation correctness, especially for datasets of joints, and lumbar and cervical spine regions. The less efficient implicit snake method was able to additionally remove wrongly segmented skin tissue regions. This study presents a step towards efficient intra-operative segmentation of 3D fluoroscopy datasets, but there is room for improvement. Next, we plan to study model-based approaches for datasets from the knee and hip joint region, which would be thenceforth applied to all anatomical regions in our continuing development of an ideal segmentation procedure for 3D fluoroscopic images.

  15. Domino: Extracting, Comparing, and Manipulating Subsets across Multiple Tabular Datasets

    PubMed Central

    Gratzl, Samuel; Gehlenborg, Nils; Lex, Alexander; Pfister, Hanspeter; Streit, Marc

    2016-01-01

    Answering questions about complex issues often requires analysts to take into account information contained in multiple interconnected datasets. A common strategy in analyzing and visualizing large and heterogeneous data is dividing it into meaningful subsets. Interesting subsets can then be selected and the associated data and the relationships between the subsets visualized. However, neither the extraction and manipulation nor the comparison of subsets is well supported by state-of-the-art techniques. In this paper we present Domino, a novel multiform visualization technique for effectively representing subsets and the relationships between them. By providing comprehensive tools to arrange, combine, and extract subsets, Domino allows users to create both common visualization techniques and advanced visualizations tailored to specific use cases. In addition to the novel technique, we present an implementation that enables analysts to manage the wide range of options that our approach offers. Innovative interactive features such as placeholders and live previews support rapid creation of complex analysis setups. We introduce the technique and the implementation using a simple example and demonstrate scalability and effectiveness in a use case from the field of cancer genomics. PMID:26356916

  16. The Centennial Trends Greater Horn of Africa precipitation dataset.

    PubMed

    Funk, Chris; Nicholson, Sharon E; Landsfeld, Martin; Klotter, Douglas; Peterson, Pete; Harrison, Laura

    2015-01-01

    East Africa is a drought prone, food and water insecure region with a highly variable climate. This complexity makes rainfall estimation challenging, and this challenge is compounded by low rain gauge densities and inhomogeneous monitoring networks. The dearth of observations is particularly problematic over the past decade, since the number of records in globally accessible archives has fallen precipitously. This lack of data coincides with an increasing scientific and humanitarian need to place recent seasonal and multi-annual East African precipitation extremes in a deep historic context. To serve this need, scientists from the UC Santa Barbara Climate Hazards Group and Florida State University have pooled their station archives and expertise to produce a high quality gridded 'Centennial Trends' precipitation dataset. Additional observations have been acquired from the national meteorological agencies and augmented with data provided by other universities. Extensive quality control of the data was carried out and seasonal anomalies interpolated using kriging. This paper documents the CenTrends methodology and data.

  17. The Centennial Trends Greater Horn of Africa precipitation dataset

    PubMed Central

    Funk, Chris; Nicholson, Sharon E.; Landsfeld, Martin; Klotter, Douglas; Peterson, Pete; Harrison, Laura

    2015-01-01

    East Africa is a drought prone, food and water insecure region with a highly variable climate. This complexity makes rainfall estimation challenging, and this challenge is compounded by low rain gauge densities and inhomogeneous monitoring networks. The dearth of observations is particularly problematic over the past decade, since the number of records in globally accessible archives has fallen precipitously. This lack of data coincides with an increasing scientific and humanitarian need to place recent seasonal and multi-annual East African precipitation extremes in a deep historic context. To serve this need, scientists from the UC Santa Barbara Climate Hazards Group and Florida State University have pooled their station archives and expertise to produce a high quality gridded ‘Centennial Trends’ precipitation dataset. Additional observations have been acquired from the national meteorological agencies and augmented with data provided by other universities. Extensive quality control of the data was carried out and seasonal anomalies interpolated using kriging. This paper documents the CenTrends methodology and data. PMID:26451250

  18. The Centennial Trends Greater Horn of Africa precipitation dataset.

    PubMed

    Funk, Chris; Nicholson, Sharon E; Landsfeld, Martin; Klotter, Douglas; Peterson, Pete; Harrison, Laura

    2015-01-01

    East Africa is a drought prone, food and water insecure region with a highly variable climate. This complexity makes rainfall estimation challenging, and this challenge is compounded by low rain gauge densities and inhomogeneous monitoring networks. The dearth of observations is particularly problematic over the past decade, since the number of records in globally accessible archives has fallen precipitously. This lack of data coincides with an increasing scientific and humanitarian need to place recent seasonal and multi-annual East African precipitation extremes in a deep historic context. To serve this need, scientists from the UC Santa Barbara Climate Hazards Group and Florida State University have pooled their station archives and expertise to produce a high quality gridded 'Centennial Trends' precipitation dataset. Additional observations have been acquired from the national meteorological agencies and augmented with data provided by other universities. Extensive quality control of the data was carried out and seasonal anomalies interpolated using kriging. This paper documents the CenTrends methodology and data. PMID:26451250

  19. Identification of rogue datasets in serial crystallography1

    PubMed Central

    Assmann, Greta; Brehm, Wolfgang; Diederichs, Kay

    2016-01-01

    Advances in beamline optics, detectors and X-ray sources allow new techniques of crystallographic data collection. In serial crystallography, a large number of partial datasets from crystals of small volume are measured. Merging of datasets from different crystals in order to enhance data completeness and accuracy is only valid if the crystals are isomorphous, i.e. sufficiently similar in cell parameters, unit-cell contents and molecular structure. Identification and exclusion of non-isomorphous datasets is therefore indispensable and must be done by means of suitable indicators. To identify rogue datasets, the influence of each dataset on CC1/2 [Karplus & Diederichs (2012 ▸). Science, 336, 1030–1033], the correlation coefficient between pairs of intensities averaged in two randomly assigned subsets of observations, is evaluated. The presented method employs a precise calculation of CC1/2 that avoids the random assignment, and instead of using an overall CC1/2, an average over resolution shells is employed to obtain sensible results. The selection procedure was verified by measuring the correlation of observed (merged) intensities and intensities calculated from a model. It is found that inclusion and merging of non-isomorphous datasets may bias the refined model towards those datasets, and measures to reduce this effect are suggested. PMID:27275144

  20. Analyses of mitochondrial amino acid sequence datasets support the proposal that specimens of Hypodontus macropi from three species of macropodid hosts represent distinct species

    PubMed Central

    2013-01-01

    Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823

  1. Global Drought Assessment using a Multi-Model Dataset

    NASA Astrophysics Data System (ADS)

    VanLanen, H.; Huijgevoort, M. V.; Corzo Perez, G.; Wanders, N.; Hazenberg, P.; Loon, A. V.; Estifanos, S.; Melsen, L.

    2011-12-01

    Large-scale models are often applied to study past drought (forced with global reanalysis datasets) and to assess future drought (using downscaled, bias-corrected forcing from climate models). The EU project WATer and global CHange (WATCH) provides a 0.5o degree global dataset of meteorological forcing (i.e. WATCH Forcing Data, WFD), which was used as input for a suite of global hydrological models (GHMs) and land surface models (LSMs). Ten GHMs and LSMs have been run for the second half of the 20th C and seven for the whole century. Spatio-temporal drought characteristics were derived from gridded time series of daily and monthly aggregated runoff using the threshold level, and non-contiguous and contiguous approaches. GHMs and LSMs were intercompared and to some extent also tested against observations to explore to what level these models capture past drought events. This paper will present an overview of results. Global maps showing drought summary statistics (e.g. average duration) and distribution of drought clusters across the globe for major documented drought events will be presented. In addition, area in drought and the occurrence of the maximum drought cluster will be discussed. The main results from a number of studies are: (i) drought characteristics across the globe vary dependent on the selected window of years, (ii) GHMs and LSMs broadly identified major drought events in a number of large river basins around the world, (iii) drought events obtained with individual GHMs an LSMs may substantially deviate from those derived with a catchment scale hydrological model (selected EU WATCH river basins), but the multi-model ensemble mean agrees rather well, (iv) use of different calculation methods for reference evapotranspiration has little to substantial influence on drought characteristics dependent on the climate region (Köppen-Geiger), (v) groundwater systems are as important as climate for the development of drought in runoff. Understanding of past

  2. Dataset of aggregate producers in New Mexico

    USGS Publications Warehouse

    Orris, Greta J.

    2000-01-01

    This report presents data, including latitude and longitude, for aggregate sites in New Mexico that were believed to be active in the period 1997-1999. The data are presented in paper form in Part A of this report and as Microsoft Excel 97 and Data Interchange Format (DIF) files in Part B. The work was undertaken as part of the effort to update information for the National Atlas. This compilation includes data from: the files of U.S. Geological Survey (USGS); company contacts; the New Mexico Bureau of Mines and Mineral Resources, New Mexico Bureau of Mine Inspection, and the Mining and Minerals Division of the New Mexico Energy, Minerals and Natural Resources Department (Hatton and others, 1998); the Bureau of Land Management Information; and direct communications with some of the aggregate operators. Additional information on most of the sites is available in Hatton and others (1998).

  3. Length-independent structural similarities enrich the antibody CDR canonical class model

    PubMed Central

    Nowak, Jaroslaw; Baker, Terry; Georges, Guy; Kelm, Sebastian; Klostermann, Stefan; Shi, Jiye; Sridharan, Sudharsan; Deane, Charlotte M.

    2016-01-01

    ABSTRACT Complementarity-determining regions (CDRs) are antibody loops that make up the antigen binding site. Here, we show that all CDR types have structurally similar loops of different lengths. Based on these findings, we created length-independent canonical classes for the non-H3 CDRs. Our length variable structural clusters show strong sequence patterns suggesting either that they evolved from the same original structure or result from some form of convergence. We find that our length-independent method not only clusters a larger number of CDRs, but also predicts canonical class from sequence better than the standard length-dependent approach. To demonstrate the usefulness of our findings, we predicted cluster membership of CDR-L3 sequences from 3 next-generation sequencing datasets of the antibody repertoire (over 1,000,000 sequences). Using the length-independent clusters, we can structurally classify an additional 135,000 sequences, which represents a ∼20% improvement over the standard approach. This suggests that our length-independent canonical classes might be a highly prevalent feature of antibody space, and could substantially improve our ability to accurately predict the structure of novel CDRs identified by next-generation sequencing. PMID:26963563

  4. The Transition of NASA EOS Datasets to WFO Operations: A Model for Future Technology Transfer

    NASA Technical Reports Server (NTRS)

    Darden, C.; Burks, J.; Jedlovec, G.; Haines, S.

    2007-01-01

    The collocation of a National Weather Service (NWS) Forecast Office with atmospheric scientists from NASA/Marshall Space Flight Center (MSFC) in Huntsville, Alabama has afforded a unique opportunity for science sharing and technology transfer. Specifically, the NWS office in Huntsville has interacted closely with research scientists within the SPORT (Short-term Prediction and Research and Transition) Center at MSFC. One significant technology transfer that has reaped dividends is the transition of unique NASA EOS polar orbiting datasets into NWS field operations. NWS forecasters primarily rely on the AWIPS (Advanced Weather Information and Processing System) decision support system for their day to day forecast and warning decision making. Unfortunately, the transition of data from operational polar orbiters or low inclination orbiting satellites into AWIPS has been relatively slow due to a variety of reasons. The ability to integrate these high resolution NASA datasets into operations has yielded several benefits. The MODIS (MODerate-resolution Imaging Spectrometer ) instrument flying on the Aqua and Terra satellites provides a broad spectrum of multispectral observations at resolutions as fine as 250m. Forecasters routinely utilize these datasets to locate fine lines, boundaries, smoke plumes, locations of fog or haze fields, and other mesoscale features. In addition, these important datasets have been transitioned to other WFOs for a variety of local uses. For instance, WFO Great Falls Montana utilizes the MODIS snow cover product for hydrologic planning purposes while several coastal offices utilize the output from the MODIS and AMSR-E instruments to supplement observations in the data sparse regions of the Gulf of Mexico and western Atlantic. In the short term, these datasets have benefited local WFOs in a variety of ways. In the longer term, the process by which these unique datasets were successfully transitioned to operations will benefit the planning and

  5. The COST-HOME monthly benchmark dataset with temperature and precipitation data for testing homogenisation algorithms

    NASA Astrophysics Data System (ADS)

    Venema, V. K. C.; Mestre, O.

    2009-04-01

    As part of the COST Action HOME (Advances in homogenisation methods of climate series: an integrated approach) a dataset is generated that will serve as a benchmark for homogenisation algorithms. Members of the Action and third parties are invited to homogenise this dataset. The results of this exercise will be analysed by the HOME Working Groups (WG) on detection (WG2) and correction (WG3) algorithms to obtain recommendations for a standard homogenisation procedure for climate data. This talk will introduce this benchmark dataset. Based upon a survey among homogenisation experts we chose to start our work with monthly values for temperature and precipitation. Temperature and precipitation are selected because most participants consider these elements the most relevant for their studies. Furthermore, they represent two important types of statistics (additive and multiplicative). The benchmark will have three difference types of datasets: real data, surrogate data and synthetic data. Real datasets will allow comparing the different homogenisation methods with the most realistic type of data and inhomogeneities. Thus this part of the benchmark is important for a faithful comparison of algorithms with each other. However, as in this case the truth is not known, it is not possible to quantify the improvements due to homogenisation. Therefore, the benchmark also has two datasets with artificial data to which we inserted known inhomogeneities: surrogate and synthetic data. The aim of surrogate data is to reproduce the structure of measured data accurately enough that it can be used as substitute for measurements. The surrogate climate networks have the spatial and temporal auto- and cross-correlation functions of real homogenised networks as well as the (non-Gaussian) exact distribution of each station. The idealised synthetic data is based on the surrogate networks. The change is that the difference between the stations has been modelled as uncorrelated Gaussian white

  6. Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

    PubMed

    Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

    2015-01-01

    The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.

  7. Data Machine Independence

    1994-12-30

    Data-machine independence achieved by using four technologies (ASN.1, XDR, SDS, and ZEBRA) has been evaluated by encoding two different applications in each of the above; and their results compared against the standard programming method using C.

  8. Media independent interface

    NASA Technical Reports Server (NTRS)

    1987-01-01

    The work done on the Media Independent Interface (MII) Interface Control Document (ICD) program is described and recommendations based on it were made. Explanations and rationale for the content of the ICD itself are presented.

  9. Hadley cell dynamics in Japanese Reanalysis-55 dataset: evaluation using other reanalysis datasets and global radiosonde network observations

    NASA Astrophysics Data System (ADS)

    Mathew, Sneha Susan; Kumar, Karanam Kishore; Subrahmanyam, Kandula Venkata

    2016-02-01

    Hadley circulation (HC) is a planetary scale circulation spanning one-third of the globe from tropics to the sub-tropics. Recent changes in HC width and its temporal variability is a topic of paramount interest because of the climate implications it carry alongside. The present study attempts to bring out the subtropical climate change indications in the comparatively new Japanese Re-analysis (JRA55) dataset by means of the mean meridional stream function (MSF). The observed features of HC in JRA55 are found to be reproduced in NCEP, MERRA and ECMWF datasets, with notable differences in the magnitudes of MSF. The calculated annual cycle of HC edges, center and total width from this dataset closely resembles the annual cycle of the respective parameters derived from the rest of the datasets, with very less inter-annual variability. For the first time, MSF estimated using four reanalysis datasets (JRA55, NCEP, MERRA and ECMWF datasets) are verified with observations from integrated global radiosonde archive datasets, using the process of subsampling. The features so estimated show a high degree of similarity amongst each other as well as with observations. The monthly trend in the total width of the HC is quantified to show a maximum of expansion during the month of July, which is significant at the 95 % confidence interval for all datasets. The present paper also discusses the presence of a `minor circulation' feature in the northern hemisphere which is centered on 34°N during the June and July months, but not in all years. The significance of the present study lies in evaluating the relatively new JRA55 datasets with widely used reanalysis data sets and radiosonde observations and revelation of a minor circulation not discussed hitherto in the context of HC dynamics.

  10. Constructing Phylogenetic Networks Based on the Isomorphism of Datasets.

    PubMed

    Wang, Juan; Zhang, Zhibin; Li, Yanjuan

    2016-01-01

    Constructing rooted phylogenetic networks from rooted phylogenetic trees has become an important problem in molecular evolution. So far, many methods have been presented in this area, in which most efficient methods are based on the incompatible graph, such as the CASS, the LNETWORK, and the BIMLR. This paper will research the commonness of the methods based on the incompatible graph, the relationship between incompatible graph and the phylogenetic network, and the topologies of incompatible graphs. We can find out all the simplest datasets for a topology G and construct a network for every dataset. For any one dataset , we can compute a network from the network representing the simplest dataset which is isomorphic to . This process will save more time for the algorithms when constructing networks. PMID:27547759

  11. BMDExpress Data Viewer: A Visualization Tool to Analyze BMDExpress Datasets

    EPA Science Inventory

    Regulatory agencies increasingly apply benchmark dose (BMD) modeling to determine points of departure in human risk assessments. BMDExpress applies BMD modeling to transcriptomics datasets and groups genes to biological processes and pathways for rapid assessment of doses at whic...

  12. Comparison of Eight Different Precipitation Datasets for South America

    NASA Astrophysics Data System (ADS)

    Pinto, L. C.; Costa, M. H.; Diniz, L. F.

    2007-05-01

    Long and continuous meteorological data series for large areas are hard to obtain, so several groups have developed climate datasets generated through the combination of models and observed and remote sensing data, including reanalysis products. This study compares eight different precipitation datasets for South America (NCEP/NCAR-2, ERA-40, CMAP, GPCP, CRU, CPTEC, TRMM, Legates and Willmott, Leemans and Cramer). For each dataset, we analyze the four moments of the data distribution (mean, variance, skewness, kurtosis), for latitudinal variation, for the major river basins and for the major vegetation types in the continent, allowing to identify the geographical variations in each dataset. We verified that significant differences exist among the precipitation products.

  13. A daily global mesoscale ocean eddy dataset from satellite altimetry.

    PubMed

    Faghmous, James H; Frenger, Ivy; Yao, Yuanshun; Warmka, Robert; Lindell, Aron; Kumar, Vipin

    2015-01-01

    Mesoscale ocean eddies are ubiquitous coherent rotating structures of water with radial scales on the order of 100 kilometers. Eddies play a key role in the transport and mixing of momentum and tracers across the World Ocean. We present a global daily mesoscale ocean eddy dataset that contains ~45 million mesoscale features and 3.3 million eddy trajectories that persist at least two days as identified in the AVISO dataset over a period of 1993-2014. This dataset, along with the open-source eddy identification software, extract eddies with any parameters (minimum size, lifetime, etc.), to study global eddy properties and dynamics, and to empirically estimate the impact eddies have on mass or heat transport. Furthermore, our open-source software may be used to identify mesoscale features in model simulations and compare them to observed features. Finally, this dataset can be used to study the interaction between mesoscale ocean eddies and other components of the Earth System.

  14. Constructing Phylogenetic Networks Based on the Isomorphism of Datasets

    PubMed Central

    Zhang, Zhibin; Li, Yanjuan

    2016-01-01

    Constructing rooted phylogenetic networks from rooted phylogenetic trees has become an important problem in molecular evolution. So far, many methods have been presented in this area, in which most efficient methods are based on the incompatible graph, such as the CASS, the LNETWORK, and the BIMLR. This paper will research the commonness of the methods based on the incompatible graph, the relationship between incompatible graph and the phylogenetic network, and the topologies of incompatible graphs. We can find out all the simplest datasets for a topology G and construct a network for every dataset. For any one dataset 𝒞, we can compute a network from the network representing the simplest dataset which is isomorphic to 𝒞. This process will save more time for the algorithms when constructing networks. PMID:27547759

  15. Bayes classifiers for imbalanced traffic accidents datasets.

    PubMed

    Mujalli, Randa Oqab; López, Griselda; Garach, Laura

    2016-03-01

    Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under-sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents.

  16. Efficient genotype compression and analysis of large genetic variation datasets

    PubMed Central

    Layer, Ryan M.; Kindlon, Neil; Karczewski, Konrad J.; Quinlan, Aaron R.

    2015-01-01

    Genotype Query Tools (GQT) is a new indexing strategy that expedites analyses of genome variation datasets in VCF format based on sample genotypes, phenotypes and relationships. GQT’s compressed genotype index minimizes decompression for analysis, and performance relative to existing methods improves with cohort size. We show substantial (up to 443 fold) performance gains over existing methods and demonstrate GQT’s utility for exploring massive datasets involving thousands to millions of genomes. PMID:26550772

  17. Sampling Within k-Means Algorithm to Cluster Large Datasets

    SciTech Connect

    Bejarano, Jeremy; Bose, Koushiki; Brannan, Tyler; Thomas, Anita; Adragni, Kofi; Neerchal, Nagaraj; Ostrouchov, George

    2011-08-01

    Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.

  18. Toward computational cumulative biology by combining models of biological datasets.

    PubMed

    Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

    2014-01-01

    A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.

  19. Toward Computational Cumulative Biology by Combining Models of Biological Datasets

    PubMed Central

    Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

    2014-01-01

    A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations—for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database. PMID:25427176

  20. Precipitation comparison for the CFSR, MERRA, TRMM3B42 and Combined Scheme datasets in Bolivia

    NASA Astrophysics Data System (ADS)

    Blacutt, Luis A.; Herdies, Dirceu L.; de Gonçalves, Luis Gustavo G.; Vila, Daniel A.; Andrade, Marcos

    2015-09-01

    An overwhelming number of applications depend on reliable precipitation estimations. However, over complex terrain in regions such as the Andes or the southwestern Amazon, the spatial coverage of rain gauges is scarce. Two reanalysis datasets, a satellite algorithm and a scheme that combines surface observations with satellite estimations were selected for studying rainfall in the following areas of Bolivia: the central Andes, Altiplano, southwestern Amazonia, and Chaco. These Bolivian regions can be divided into three main basins: the Altiplano, La Plata, and Amazon. The selected reanalyses were the Modern-Era Retrospective Analysis for Research and Applications, which has a horizontal resolution (~ 50 km) conducive for studying rainfall in relatively small precipitation systems, and the Climate Forecast System Reanalysis and Reforecast, which features an improved horizontal resolution (~ 38 km). The third dataset was the seventh version of the Tropical Rainfall Measurement Mission 3B42 algorithm, which is conducive for studying rainfall at an ~ 25 km horizontal resolution. The fourth dataset utilizes a new technique known as the Combined Scheme, which successfully removes satellite bias. All four of these datasets were aggregated to a coarser resolution. Additionally, the daily totals were calculated to match the cumulative daily values of the ground observations. This research aimed to describe and compare precipitations in the two reanalysis datasets, the satellite-algorithm dataset, and the Combined Scheme with ground observations. Two seasons were selected for studying the precipitation estimates: the rainy season (December-February) and the dry season (June-August). The average, bias, standard deviation, correlation coefficient, and root mean square error were calculated. Moreover, a contingency table was generated to calculate the accuracy, bias frequency, probability of detection, false alarm ratio, and equitable threat score. All four datasets correctly

  1. Evaluation of Global Observations-Based Evapotranspiration Datasets and IPCC AR4 Simulations

    NASA Technical Reports Server (NTRS)

    Mueller, B.; Seneviratne, S. I.; Jimenez, C.; Corti, T.; Hirschi, M.; Balsamo, G.; Ciais, P.; Dirmeyer, P.; Fisher, J. B.; Guo, Z.; Jung, M.; Maignan, F.; McCabe, M. F.; Reichle, R.; Reichstein, M.; Rodell, M.; Sheffield, J.; Teuling, A. J.; Wang, K.; Wood, E. F.; Zhang, Y.

    2011-01-01

    Quantification of global land evapotranspiration (ET) has long been associated with large uncertainties due to the lack of reference observations. Several recently developed products now provide the capacity to estimate ET at global scales. These products, partly based on observational data, include satellite ]based products, land surface model (LSM) simulations, atmospheric reanalysis output, estimates based on empirical upscaling of eddycovariance flux measurements, and atmospheric water balance datasets. The LandFlux-EVAL project aims to evaluate and compare these newly developed datasets. Additionally, an evaluation of IPCC AR4 global climate model (GCM) simulations is presented, providing an assessment of their capacity to reproduce flux behavior relative to the observations ]based products. Though differently constrained with observations, the analyzed reference datasets display similar large-scale ET patterns. ET from the IPCC AR4 simulations was significantly smaller than that from the other products for India (up to 1 mm/d) and parts of eastern South America, and larger in the western USA, Australia and China. The inter-product variance is lower across the IPCC AR4 simulations than across the reference datasets in several regions, which indicates that uncertainties may be underestimated in the IPCC AR4 models due to shared biases of these simulations.

  2. Trajectory-Based Flow Feature Tracking in Joint Particle/Volume Datasets.

    PubMed

    Sauer, Franz; Yu, Hongfeng; Ma, Kwan-Liu

    2014-12-01

    Studying the dynamic evolution of time-varying volumetric data is essential in countless scientific endeavors. The ability to isolate and track features of interest allows domain scientists to better manage large complex datasets both in terms of visual understanding and computational efficiency. This work presents a new trajectory-based feature tracking technique for use in joint particle/volume datasets. While traditional feature tracking approaches generally require a high temporal resolution, this method utilizes the indexed trajectories of corresponding Lagrangian particle data to efficiently track features over large jumps in time. Such a technique is especially useful for situations where the volume dataset is either temporally sparse or too large to efficiently track a feature through all intermediate timesteps. In addition, this paper presents a few other applications of this approach, such as the ability to efficiently track the internal properties of volumetric features using variables from the particle data. We demonstrate the effectiveness of this technique using real world combustion and atmospheric datasets and compare it to existing tracking methods to justify its advantages and accuracy. PMID:26356970

  3. Northern Hemisphere winter storm track trends since 1959 derived from multiple reanalysis datasets

    NASA Astrophysics Data System (ADS)

    Chang, Edmund K. M.; Yau, Albert M. W.

    2016-09-01

    In this study, a comprehensive comparison of Northern Hemisphere winter storm track trend since 1959 derived from multiple reanalysis datasets and rawinsonde observations has been conducted. In addition, trends in terms of variance and cyclone track statistics have been compared. Previous studies, based largely on the National Center for Environmental Prediction-National Center for Atmospheric Research Reanalysis (NNR), have suggested that both the Pacific and Atlantic storm tracks have significantly intensified between the 1950s and 1990s. Comparison with trends derived from rawinsonde observations suggest that the trends derived from NNR are significantly biased high, while those from the European Center for Medium Range Weather Forecasts 40-year Reanalysis and the Japanese 55-year Reanalysis are much less biased but still too high. Those from the two twentieth century reanalysis datasets are most consistent with observations but may exhibit slight biases of opposite signs. Between 1959 and 2010, Pacific storm track activity has likely increased by 10 % or more, while Atlantic storm track activity has likely increased by <10 %. Our analysis suggests that trends in Pacific and Atlantic basin wide storm track activity prior to the 1950s derived from the two twentieth century reanalysis datasets are unlikely to be reliable due to changes in density of surface observations. Nevertheless, these datasets may provide useful information on interannual variability, especially over the Atlantic.

  4. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.

    PubMed

    Ernst, Jason; Kellis, Manolis

    2015-04-01

    With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information. PMID:25690853

  5. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.

    PubMed

    Ernst, Jason; Kellis, Manolis

    2015-04-01

    With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.

  6. Nine martian years of dust optical depth observations: A reference dataset

    NASA Astrophysics Data System (ADS)

    Montabone, Luca; Forget, Francois; Kleinboehl, Armin; Kass, David; Wilson, R. John; Millour, Ehouarn; Smith, Michael; Lewis, Stephen; Cantor, Bruce; Lemmon, Mark; Wolff, Michael

    2016-07-01

    We present a multi-annual reference dataset of the horizontal distribution of airborne dust from martian year 24 to 32 using observations of the martian atmosphere from April 1999 to June 2015 made by the Thermal Emission Spectrometer (TES) aboard Mars Global Surveyor, the Thermal Emission Imaging System (THEMIS) aboard Mars Odyssey, and the Mars Climate Sounder (MCS) aboard Mars Reconnaissance Orbiter (MRO). Our methodology to build the dataset works by gridding the available retrievals of column dust optical depth (CDOD) from TES and THEMIS nadir observations, as well as the estimates of this quantity from MCS limb observations. The resulting (irregularly) gridded maps (one per sol) were validated with independent observations of CDOD by PanCam cameras and Mini-TES spectrometers aboard the Mars Exploration Rovers "Spirit" and "Opportunity", by the Surface Stereo Imager aboard the Phoenix lander, and by the Compact Reconnaissance Imaging Spectrometer for Mars aboard MRO. Finally, regular maps of CDOD are produced by spatially interpolating the irregularly gridded maps using a kriging method. These latter maps are used as dust scenarios in the Mars Climate Database (MCD) version 5, and are useful in many modelling applications. The two datasets (daily irregularly gridded maps and regularly kriged maps) for the nine available martian years are publicly available as NetCDF files and can be downloaded from the MCD website at the URL: http://www-mars.lmd.jussieu.fr/mars/dust_climatology/index.html

  7. Estimating multivariate similarity between neuroimaging datasets with sparse canonical correlation analysis: an application to perfusion imaging

    PubMed Central

    Rosa, Maria J.; Mehta, Mitul A.; Pich, Emilio M.; Risterucci, Celine; Zelaya, Fernando; Reinders, Antje A. T. S.; Williams, Steve C. R.; Dazzan, Paola; Doyle, Orla M.; Marquand, Andre F.

    2015-01-01

    An increasing number of neuroimaging studies are based on either combining more than one data modality (inter-modal) or combining more than one measurement from the same modality (intra-modal). To date, most intra-modal studies using multivariate statistics have focused on differences between datasets, for instance relying on classifiers to differentiate between effects in the data. However, to fully characterize these effects, multivariate methods able to measure similarities between datasets are needed. One classical technique for estimating the relationship between two datasets is canonical correlation analysis (CCA). However, in the context of high-dimensional data the application of CCA is extremely challenging. A recent extension of CCA, sparse CCA (SCCA), overcomes this limitation, by regularizing the model parameters while yielding a sparse solution. In this work, we modify SCCA with the aim of facilitating its application to high-dimensional neuroimaging data and finding meaningful multivariate image-to-image correspondences in intra-modal studies. In particular, we show how the optimal subset of variables can be estimated independently and we look at the information encoded in more than one set of SCCA transformations. We illustrate our framework using Arterial Spin Labeling data to investigate multivariate similarities between the effects of two antipsychotic drugs on cerebral blood flow. PMID:26528117

  8. Estimating multivariate similarity between neuroimaging datasets with sparse canonical correlation analysis: an application to perfusion imaging.

    PubMed

    Rosa, Maria J; Mehta, Mitul A; Pich, Emilio M; Risterucci, Celine; Zelaya, Fernando; Reinders, Antje A T S; Williams, Steve C R; Dazzan, Paola; Doyle, Orla M; Marquand, Andre F

    2015-01-01

    An increasing number of neuroimaging studies are based on either combining more than one data modality (inter-modal) or combining more than one measurement from the same modality (intra-modal). To date, most intra-modal studies using multivariate statistics have focused on differences between datasets, for instance relying on classifiers to differentiate between effects in the data. However, to fully characterize these effects, multivariate methods able to measure similarities between datasets are needed. One classical technique for estimating the relationship between two datasets is canonical correlation analysis (CCA). However, in the context of high-dimensional data the application of CCA is extremely challenging. A recent extension of CCA, sparse CCA (SCCA), overcomes this limitation, by regularizing the model parameters while yielding a sparse solution. In this work, we modify SCCA with the aim of facilitating its application to high-dimensional neuroimaging data and finding meaningful multivariate image-to-image correspondences in intra-modal studies. In particular, we show how the optimal subset of variables can be estimated independently and we look at the information encoded in more than one set of SCCA transformations. We illustrate our framework using Arterial Spin Labeling data to investigate multivariate similarities between the effects of two antipsychotic drugs on cerebral blood flow. PMID:26528117

  9. Making of a solar spectral irradiance dataset I: observations, uncertainties, and methods

    NASA Astrophysics Data System (ADS)

    Schöll, Micha; Dudok de Wit, Thierry; Kretzschmar, Matthieu; Haberreiter, Margit

    2016-03-01

    Context. Changes in the spectral solar irradiance (SSI) are a key driver of the variability of the Earth's environment, strongly affecting the upper atmosphere, but also impacting climate. However, its measurements have been sparse and of different quality. The "First European Comprehensive Solar Irradiance Data Exploitation project" (SOLID) aims at merging the complete set of European irradiance data, complemented by archive data that include data from non-European missions. Aims: As part of SOLID, we present all available space-based SSI measurements, reference spectra, and relevant proxies in a unified format with regular temporal re-gridding, interpolation, gap-filling as well as associated uncertainty estimations. Methods: We apply a coherent methodology to all available SSI datasets. Our pipeline approach consists of the pre-processing of the data, the interpolation of missing data by utilizing the spectral coherency of SSI, the temporal re-gridding of the data, an instrumental outlier detection routine, and a proxy-based interpolation for missing and flagged values. In particular, to detect instrumental outliers, we combine an autoregressive model with proxy data. We independently estimate the precision and stability of each individual dataset and flag all changes due to processing in an accompanying quality mask. Results: We present a unified database of solar activity records with accompanying meta-data and uncertainties. Conclusions: This dataset can be used for further investigations of the long-term trend of solar activity and the construction of a homogeneous SSI record.

  10. Assessing global land cover reference datasets for different user communities

    NASA Astrophysics Data System (ADS)

    Tsendbazar, N. E.; de Bruin, S.; Herold, M.

    2015-05-01

    Global land cover (GLC) maps and assessments of their accuracy provide important information for different user communities. To date, there are several GLC reference datasets which are used for assessing the accuracy of specific maps. Despite significant efforts put into generating them, their availability and role in applications outside their intended use have been very limited. This study analyses metadata information from 12 existing and forthcoming GLC reference datasets and assesses their characteristics and potential uses in the context of 4 GLC user groups, i.e., climate modellers requiring data on Essential Climate Variables (ECV), global forest change analysts, the GEO Community of Practice for Global Agricultural Monitoring and GLC map producers. We assessed user requirements with respect to the sampling scheme, thematic coverage, spatial and temporal detail and quality control of the GLC reference datasets. Suitability of the datasets is highly dependent upon specific applications by the user communities considered. The LC-CCI, GOFC-GOLD, FAO-FRA and Geo-Wiki datasets had the broadest applicability for multiple uses. The re-usability of the GLC reference datasets would be greatly enhanced by making them publicly available in an expert framework that guides users on how to use them for specific applications.

  11. Identification of sample annotation errors in gene expression datasets.

    PubMed

    Lohr, Miriam; Hellwig, Birte; Edlund, Karolina; Mattsson, Johanna S M; Botling, Johan; Schmidt, Marcus; Hengstler, Jan G; Micke, Patrick; Rahnenführer, Jörg

    2015-12-01

    The comprehensive transcriptomic analysis of clinically annotated human tissue has found widespread use in oncology, cell biology, immunology, and toxicology. In cancer research, microarray-based gene expression profiling has successfully been applied to subclassify disease entities, predict therapy response, and identify cellular mechanisms. Public accessibility of raw data, together with corresponding information on clinicopathological parameters, offers the opportunity to reuse previously analyzed data and to gain statistical power by combining multiple datasets. However, results and conclusions obviously depend on the reliability of the available information. Here, we propose gene expression-based methods for identifying sample misannotations in public transcriptomic datasets. Sample mix-up can be detected by a classifier that differentiates between samples from male and female patients. Correlation analysis identifies multiple measurements of material from the same sample. The analysis of 45 datasets (including 4913 patients) revealed that erroneous sample annotation, affecting 40 % of the analyzed datasets, may be a more widespread phenomenon than previously thought. Removal of erroneously labelled samples may influence the results of the statistical evaluation in some datasets. Our methods may help to identify individual datasets that contain numerous discrepancies and could be routinely included into the statistical analysis of clinical gene expression data.

  12. Developing a regional retrospective ensemble precipitation dataset for watershed hydrology modeling, Idaho, USA

    NASA Astrophysics Data System (ADS)

    Flores, A. N.; Smith, K.; LaPorte, P.

    2011-12-01

    Applications like flood forecasting, military trafficability assessment, and slope stability analysis necessitate the use of models capable of resolving hydrologic states and fluxes at spatial scales of hillslopes (e.g., 10s to 100s m). These models typically require precipitation forcings at spatial scales of kilometers or better and time intervals of hours. Yet in especially rugged terrain that typifies much of the Western US and throughout much of the developing world, precipitation data at these spatiotemporal resolutions is difficult to come by. Ground-based weather radars have significant problems in high-relief settings and are sparsely located, leaving significant gaps in coverage and high uncertainties. Precipitation gages provide accurate data at points but are very sparsely located and their placement is often not representative, yielding significant coverage gaps in a spatial and physiographic sense. Numerical weather prediction efforts have made precipitation data, including critically important information on precipitation phase, available globally and in near real-time. However, these datasets present watershed modelers with two problems: (1) spatial scales of many of these datasets are tens of kilometers or coarser, (2) numerical weather models used to generate these datasets include a land surface parameterization that in some circumstances can significantly affect precipitation predictions. We report on the development of a regional precipitation dataset for Idaho that leverages: (1) a dataset derived from a numerical weather prediction model, (2) gages within Idaho that report hourly precipitation data, and (3) a long-term precipitation climatology dataset. Hourly precipitation estimates from the Modern Era Retrospective-analysis for Research and Applications (MERRA) are stochastically downscaled using a hybrid orographic and statistical model from their native resolution (1/2 x 2/3 degrees) to a resolution of approximately 1 km. Downscaled

  13. Independent NOAA considered

    NASA Astrophysics Data System (ADS)

    Richman, Barbara T.

    A proposal to pull the National Oceanic and Atmospheric Administration (NOAA) out of the Department of Commerce and make it an independent agency was the subject of a recent congressional hearing. Supporters within the science community and in Congress said that an independent NOAA will benefit by being more visible and by not being tied to a cabinet-level department whose main concerns lie elsewhere. The proposal's critics, however, cautioned that making NOAA independent could make it even more vulnerable to the budget axe and would sever the agency's direct access to the President.The separation of NOAA from Commerce was contained in a June 1 proposal by President Ronald Reagan that also called for all federal trade functions under the Department of Commerce to be reorganized into a new Department of International Trade and Industry (DITI).

  14. Independent technical review, handbook

    SciTech Connect

    Not Available

    1994-02-01

    Purpose Provide an independent engineering review of the major projects being funded by the Department of Energy, Office of Environmental Restoration and Waste Management. The independent engineering review will address questions of whether the engineering practice is sufficiently developed to a point where a major project can be executed without significant technical problems. The independent review will focus on questions related to: (1) Adequacy of development of the technical base of understanding; (2) Status of development and availability of technology among the various alternatives; (3) Status and availability of the industrial infrastructure to support project design, equipment fabrication, facility construction, and process and program/project operation; (4) Adequacy of the design effort to provide a sound foundation to support execution of project; (5) Ability of the organization to fully integrate the system, and direct, manage, and control the execution of a complex major project.

  15. SALSA: A Novel Dataset for Multimodal Group Behavior Analysis.

    PubMed

    Alameda-Pineda, Xavier; Staiano, Jacopo; Subramanian, Ramanathan; Batrinca, Ligia; Ricci, Elisa; Lepri, Bruno; Lanz, Oswald; Sebe, Nicu

    2016-08-01

    Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa. PMID:26540677

  16. Coexpression analysis of large cancer datasets provides insight into the cellular phenotypes of the tumour microenvironment

    PubMed Central

    2013-01-01

    Background Biopsies taken from individual tumours exhibit extensive differences in their cellular composition due to the inherent heterogeneity of cancers and vagaries of sample collection. As a result genes expressed in specific cell types, or associated with certain biological processes are detected at widely variable levels across samples in transcriptomic analyses. This heterogeneity also means that the level of expression of genes expressed specifically in a given cell type or process, will vary in line with the number of those cells within samples or activity of the pathway, and will therefore be correlated in their expression. Results Using a novel 3D network-based approach we have analysed six large human cancer microarray datasets derived from more than 1,000 individuals. Based upon this analysis, and without needing to isolate the individual cells, we have defined a broad spectrum of cell-type and pathway-specific gene signatures present in cancer expression data which were also found to be largely conserved in a number of independent datasets. Conclusions The conserved signature of the tumour-associated macrophage is shown to be largely-independent of tumour cell type. All stromal cell signatures have some degree of correlation with each other, since they must all be inversely correlated with the tumour component. However, viewed in the context of established tumours, the interactions between stromal components appear to be multifactorial given the level of one component e.g. vasculature, does not correlate tightly with another, such as the macrophage. PMID:23845084

  17. Discovery and Analysis of Intersecting Datasets: JMARS as a Comparative Science Platform

    NASA Astrophysics Data System (ADS)

    Carter, S.; Christensen, P. R.; Dickenshied, S.; Anwar, S.; Noss, D.

    2014-12-01

    sources under the given area. JMARS has the ability to geographically locate and display a vast array of remote sensing data for a user. In addition to its powerful searching ability, it also enables users to compare datasets using the Data Spike and Data Profile techniques. Plots and tables from this data can be exported and used in presentations, papers, or external software for further study.

  18. GLEAM v3: updated land evaporation and root-zone soil moisture datasets

    NASA Astrophysics Data System (ADS)

    Martens, Brecht; Miralles, Diego; Lievens, Hans; van der Schalie, Robin; de Jeu, Richard; Fernández-Prieto, Diego; Verhoest, Niko

    2016-04-01

    Evaporation determines the availability of surface water resources and the requirements for irrigation. In addition, through its impacts on the water, carbon and energy budgets, evaporation influences the occurrence of rainfall and the dynamics of air temperature. Therefore, reliable estimates of this flux at regional to global scales are of major importance for water management and meteorological forecasting of extreme events. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to the limited global coverage of in situ measurements. Remote sensing techniques can help to overcome the lack of ground data. However, evaporation is not directly observable from satellite systems. As a result, recent efforts have focussed on combining the observable drivers of evaporation within process-based models. The Global Land Evaporation Amsterdam Model (GLEAM, www.gleam.eu) estimates terrestrial evaporation based on daily satellite observations of meteorological drivers of terrestrial evaporation, vegetation characteristics and soil moisture. Since the publication of the first version of the model in 2011, GLEAM has been widely applied for the study of trends in the water cycle, interactions between land and atmosphere and hydrometeorological extreme events. A third version of the GLEAM global datasets will be available from the beginning of 2016 and will be distributed using www.gleam.eu as gateway. The updated datasets include separate estimates for the different components of the evaporative flux (i.e. transpiration, bare-soil evaporation, interception loss, open-water evaporation and snow sublimation), as well as variables like the evaporative stress, potential evaporation, root-zone soil moisture and surface soil moisture. A new dataset using SMOS-based input data of surface soil moisture and vegetation optical depth will also be

  19. The CRUTEM4 land-surface air temperature dataset: construction, previous versions and dissemination via Google Earth

    NASA Astrophysics Data System (ADS)

    Osborn, T. J.; Jones, P. D.

    2013-10-01

    The CRUTEM4 (Climatic Research Unit Temperature version 4) land-surface air temperature dataset is one of the most widely used records of the climate system. Here we provide an important additional dissemination route for this dataset: online access to monthly, seasonal and annual data values and timeseries graphs via Google Earth. This is achieved via an interface written in Keyhole Markup Language (KML) and also provides access to the underlying weather station data used to construct the CRUTEM4 dataset. A mathematical description of the construction of the CRUTEM4 dataset (and its predecessor versions) is also provided, together with an archive of some previous versions and a recommendation for identifying the precise version of the dataset used in a particular study. The CRUTEM4 dataset used here is available from doi:10.5285/EECBA94F-62F9-4B7C-88D3-482F2C93C468.

  20. A test-retest dataset for assessing long-term reliability of brain morphology and resting-state brain activity

    PubMed Central

    Huang, Lijie; Huang, Taicheng; Zhen, Zonglei; Liu, Jia

    2016-01-01

    We present a test-retest dataset for evaluation of long-term reliability of measures from structural and resting-state functional magnetic resonance imaging (sMRI and rfMRI) scans. The repeated scan dataset was collected from 61 healthy adults in two sessions using highly similar imaging parameters at an interval of 103–189 days. However, as the imaging parameters were not completely identical, the reliability estimated from this dataset shall reflect the lower bounds of the true reliability of sMRI/rfMRI measures. Furthermore, in conjunction with other test-retest datasets, our dataset may help explore the impact of different imaging parameters on reliability of sMRI/rfMRI measures, which is especially critical for assessing datasets collected from multiple centers. In addition, intelligence quotient (IQ) was measured for each participant using Raven’s Advanced Progressive Matrices. The data can thus be used for purposes other than assessing reliability of sMRI/rfMRI alone. For example, data from each single session could be used to associate structural and functional measures of the brain with the IQ metrics to explore brain-IQ association. PMID:26978040

  1. A test-retest dataset for assessing long-term reliability of brain morphology and resting-state brain activity.

    PubMed

    Huang, Lijie; Huang, Taicheng; Zhen, Zonglei; Liu, Jia

    2016-03-15

    We present a test-retest dataset for evaluation of long-term reliability of measures from structural and resting-state functional magnetic resonance imaging (sMRI and rfMRI) scans. The repeated scan dataset was collected from 61 healthy adults in two sessions using highly similar imaging parameters at an interval of 103-189 days. However, as the imaging parameters were not completely identical, the reliability estimated from this dataset shall reflect the lower bounds of the true reliability of sMRI/rfMRI measures. Furthermore, in conjunction with other test-retest datasets, our dataset may help explore the impact of different imaging parameters on reliability of sMRI/rfMRI measures, which is especially critical for assessing datasets collected from multiple centers. In addition, intelligence quotient (IQ) was measured for each participant using Raven's Advanced Progressive Matrices. The data can thus be used for purposes other than assessing reliability of sMRI/rfMRI alone. For example, data from each single session could be used to associate structural and functional measures of the brain with the IQ metrics to explore brain-IQ association.

  2. Fluoroscopic "heart chamber" anatomy - the case for imaging modality-independent terminology.

    PubMed

    Piazza, Nicolo; Mylotte, Darren; Theriault Lauzier, Pascal

    2016-09-18

    Interventional cardiologists have traditionally relied upon fluoro-scopic imaging for percutaneous coronary interventions. Transcatheter structural heart interventions, however, require additional imaging modalities such as echocardiography and multislice computed tomography (MSCT) for pre-, intra- and post-procedural assistance. MSCT has emerged as the critical imaging modality for patient and device selection prior to transcatheter structural heart interventions. MSCT is unique as it provides a complete 3-dimensional (3D) dataset of the heart and vasculature that is amenable to multiplanar reconstruction for 2-dimensional (2D) or volume-rendered interpretations. Herein, we present a modality-independent terminology for understanding volumetric images in the context of transcatheter heart valve therapies. The goal of this system is to allow physicians to readily interpret the orientation of fluoroscopic, MSCT, echocardiographic and MRI images, thus generalising their understanding of cardiac anatomy to all imaging modalities. PMID:27640046

  3. Independence and Survival.

    ERIC Educational Resources Information Center

    James, H. Thomas

    Independent schools that are of viable size, well managed, and strategically located to meet competition will survive and prosper past the current financial crisis. We live in a complex technological society with insatiable demands for knowledgeable people to keep it running. The future will be marked by the orderly selection of qualified people,…

  4. Independence, Disengagement, and Discipline

    ERIC Educational Resources Information Center

    Rubin, Ron

    2012-01-01

    School disengagement is linked to a lack of opportunities for students to fulfill their needs for independence and self-determination. Young people have little say about what, when, where, and how they will learn, the criteria used to assess their success, and the content of school and classroom rules. Traditional behavior management discourages…

  5. Caring about Independent Lives

    ERIC Educational Resources Information Center

    Christensen, Karen

    2010-01-01

    With the rhetoric of independence, new cash for care systems were introduced in many developed welfare states at the end of the 20th century. These systems allow local authorities to pay people who are eligible for community care services directly, to enable them to employ their own careworkers. Despite the obvious importance of the careworker's…

  6. Postcard from Independence, Mo.

    ERIC Educational Resources Information Center

    Archer, Jeff

    2004-01-01

    This article reports results showing that the Independence, Missori school district failed to meet almost every one of its improvement goals under the No Child Left Behind Act. The state accreditation system stresses improvement over past scores, while the federal law demands specified amounts of annual progress toward the ultimate goal of 100…

  7. Independent School Governance.

    ERIC Educational Resources Information Center

    Beavis, Allan K.

    Findings of a study that examined the role of the governing body in the independent school's self-renewing processes are presented in this paper. From the holistic paradigm, the school is viewed as a self-renewing system that is able to maintain its identity despite environmental changes through existing structures that define and create…

  8. Experimental Investigation of Three Machine Learning Algorithms for ITS Dataset

    NASA Astrophysics Data System (ADS)

    Yearwood, J. L.; Kang, B. H.; Kelarev, A. V.

    The present article is devoted to experimental investigation of the performance of three machine learning algorithms for ITS dataset in their ability to achieve agreement with classes published in the biologi cal literature before. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form a Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel machine learning ap proaches to the analysis of datasets of this sort. This paper introduces a k-committees classifier and compares it with the discrete k-means and Nearest Neighbour classifiers. It turns out that all three machine learning algorithms are efficient and can be used to automate future biologically significant classifications for datasets of this kind. A simplified version of a synthetic dataset, where the k-committees classifier outperforms k-means and Nearest Neighbour classifiers, is also presented.

  9. Realistic computer network simulation for network intrusion detection dataset generation

    NASA Astrophysics Data System (ADS)

    Payer, Garrett

    2015-05-01

    The KDD-99 Cup dataset is dead. While it can continue to be used as a toy example, the age of this dataset makes it all but useless for intrusion detection research and data mining. Many of the attacks used within the dataset are obsolete and do not reflect the features important for intrusion detection in today's networks. Creating a new dataset encompassing a large cross section of the attacks found on the Internet today could be useful, but would eventually fall to the same problem as the KDD-99 Cup; its usefulness would diminish after a period of time. To continue research into intrusion detection, the generation of new datasets needs to be as dynamic and as quick as the attacker. Simply examining existing network traffic and using domain experts such as intrusion analysts to label traffic is inefficient, expensive, and not scalable. The only viable methodology is simulation using technologies including virtualization, attack-toolsets such as Metasploit and Armitage, and sophisticated emulation of threat and user behavior. Simulating actual user behavior and network intrusion events dynamically not only allows researchers to vary scenarios quickly, but enables online testing of intrusion detection mechanisms by interacting with data as it is generated. As new threat behaviors are identified, they can be added to the simulation to make quicker determinations as to the effectiveness of existing and ongoing network intrusion technology, methodology and models.

  10. The LANDFIRE Refresh strategy: updating the national dataset

    USGS Publications Warehouse

    Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

    2013-01-01

    The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.

  11. Dataset from chemical gas sensor array in turbulent wind tunnel.

    PubMed

    Fonollosa, Jordi; Rodríguez-Luján, Irene; Trincavelli, Marco; Huerta, Ramón

    2015-06-01

    The dataset includes the acquired time series of a chemical detection platform exposed to different gas conditions in a turbulent wind tunnel. The chemo-sensory elements were sampling directly the environment. In contrast to traditional approaches that include measurement chambers, open sampling systems are sensitive to dispersion mechanisms of gaseous chemical analytes, namely diffusion, turbulence, and advection, making the identification and monitoring of chemical substances more challenging. The sensing platform included 72 metal-oxide gas sensors that were positioned at 6 different locations of the wind tunnel. At each location, 10 distinct chemical gases were released in the wind tunnel, the sensors were evaluated at 5 different operating temperatures, and 3 different wind speeds were generated in the wind tunnel to induce different levels of turbulence. Moreover, each configuration was repeated 20 times, yielding a dataset of 18,000 measurements. The dataset was collected over a period of 16 months. The data is related to "On the performance of gas sensor arrays in open sampling systems using Inhibitory Support Vector Machines", by Vergara et al.[1]. The dataset can be accessed publicly at the UCI repository upon citation of [1]: http://archive.ics.uci.edu/ml/datasets/Gas+sensor+arrays+in+open+sampling+settings.

  12. Securely measuring the overlap between private datasets with cryptosets.

    PubMed

    Swamidass, S Joshua; Matlock, Matthew; Rozenblit, Leon

    2015-01-01

    Many scientific questions are best approached by sharing data--collected by different groups or across large collaborative networks--into a combined analysis. Unfortunately, some of the most interesting and powerful datasets--like health records, genetic data, and drug discovery data--cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure.

  13. The role of dataset selection in cloud microphysics parameterization development

    NASA Astrophysics Data System (ADS)

    Kogan, Y. L.

    2009-12-01

    A number of cloud microphysical parameterizations have been developed during the last decade using various datasets of cloud drop spectra. These datasets can be obtained either from observations, artificially produced by some drop size spectra generator (e.g. by solving the coagulation equation under different input conditions), or obtained as output of LES model which can predict cloud drop spectra explicitly. Each of the methods has its deficiencies, for example in-situ aircraft observations being constrained to the flight path and the dependence of coagulation equation solutions on input conditions. The ultimate aim is to create a cloud drop spectra dataset that mimics realistically drop parameters in real clouds. These parameters are closely related to the distribution of thermodynamical conditions, which are difficult, if not impossible, to obtain a priori. Using LES model with explicit microphysics (SAMEX) we have demonstrated high sensitivity of cloud parameterizations to the choice of a dataset. We emphasize that the development of accurate parameterizations should require the use of a dynamically balanced cloud drop spectra dataset. The accuracy of conversion rates can be increased by scaling them with precipitation intensity. We also demonstrate that the accuracy of the saturation adjustment scheme employed in calculations of latent heat release can be increased by accounting for the aerosol load. Finally we show how to formulate the new saturation adjustment in the framework of a two-moment cloud physics parameterization.

  14. Securely measuring the overlap between private datasets with cryptosets.

    PubMed

    Swamidass, S Joshua; Matlock, Matthew; Rozenblit, Leon

    2015-01-01

    Many scientific questions are best approached by sharing data--collected by different groups or across large collaborative networks--into a combined analysis. Unfortunately, some of the most interesting and powerful datasets--like health records, genetic data, and drug discovery data--cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure. PMID:25714898

  15. Dataset for a case report of a homozygous PEX16 F332del mutation

    PubMed Central

    Bacino, Carlos; Chao, Yu-Hsin; Seto, Elaine; Lotze, Tim; Xia, Fan; Jones, Richard O.; Moser, Ann; Wangler, Michael F.

    2015-01-01

    This dataset provides a clinical description along with extensive biochemical and molecular characterization of a patient with a homozygous mutation in PEX16 with an atypical phenotype. This patient described in Molecular Genetics and Metabolism Reports was ultimately diagnosed with an atypical peroxisomal disorder on exome sequencing. A clinical timeline and diagnostic summary, results of an extensive plasma and fibroblast analysis of this patient׳s peroxisomal profile is provided. In addition, a table of additional variants from the exome analysis is provided. PMID:26870756

  16. Why Additional Presentations Help Identify a Stimulus

    ERIC Educational Resources Information Center

    Guest, Duncan; Kent, Christopher; Adelman, James S.

    2010-01-01

    Nosofsky (1983) reported that additional stimulus presentations within a trial increase discriminability in absolute identification, suggesting that each presentation creates an independent stimulus representation, but it remains unclear whether exposure duration or the formation of independent representations improves discrimination in such…

  17. obs4MIPS: Satellite Datasets for Model Evaluation

    NASA Astrophysics Data System (ADS)

    Ferraro, R.; Waliser, D. E.; Gleckler, P. J.

    2013-12-01

    This poster will review the current status of the obs4MIPs project, whose purpose is to provide a limited collection of well-established and documented datasets for comparison with Earth system models. These datasets have been reformatted to correspond with the CMIP5 model output requirements, and include technical documentation specifically targeted for their use in model output evaluation. There are currently over 50 datasets containing observations that directly correspond to CMIP5 model output variables. We will review the rational and requirements for obs4MIPs contributions, and provide summary information of the current obs4MIPs holdings on the Earth System Grid Federation. We will also provide some usage statistics, an update on governance for the obs4MIPs project, and plans for supporting CMIP6.

  18. The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

    NASA Technical Reports Server (NTRS)

    Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

    1997-01-01

    The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.

  19. Identifying Differentially Abundant Metabolic Pathways in Metagenomic Datasets

    NASA Astrophysics Data System (ADS)

    Liu, Bo; Pop, Mihai

    Enabled by rapid advances in sequencing technology, metagenomic studies aim to characterize entire communities of microbes bypassing the need for culturing individual bacterial members. One major goal of such studies is to identify specific functional adaptations of microbial communities to their habitats. Here we describe a powerful analytical method (MetaPath) that can identify differentially abundant pathways in metagenomic data-sets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge. We show that MetaPath outperforms other common approaches when evaluated on simulated datasets. We also demonstrate the power of our methods in analyzing two, publicly available, metagenomic datasets: a comparison of the gut microbiome of obese and lean twins; and a comparison of the gut microbiome of infant and adult subjects. We demonstrate that the subpathways identified by our method provide valuable insights into the biological activities of the microbiome.

  20. Modes of independence while informal caregiving.

    PubMed

    Tellioğlu, Hilda; Hensely-Schinkinger, Susanne; Pinatti De Carvalho, Aparecido Fabiano

    2015-01-01

    This paper is about understanding and conceptualizing the notion of independence in the context of caregiving. Based on the current studies and on our ethnographic and design research in an AAL project (TOPIC) we introduce a model of independence consisting of four dimensions: action, finance, decision, and emotion. These interrelated dimensions are described and discussed in the setting of informal caregiving. Some additional examples are shown to illustrate how to reduce the dependence of informal caregivers before concluding the paper. PMID:26294578

  1. Publishing datasets with eSciDoc and panMetaDocs

    NASA Astrophysics Data System (ADS)

    Ulbricht, D.; Klump, J.; Bertelmann, R.

    2012-04-01

    publishing scientific datasets as electronic data supplements to research papers. Publication of research manuscripts has an already well established workflow that shares junctures with other processes and involves several parties in the process of dataset publication. Activities of the author, the reviewer, the print publisher and the data publisher have to be coordinated into a common data publication workflow. The case of data publication at GFZ Potsdam displays some specifics, e.g. the DOIDB webservice. The DOIDB is a proxy service at GFZ for the DataCite [4] DOI registration and its metadata store. DOIDB provides a local summary of the dataset DOIs registered through GFZ as a publication agent. An additional use case for the DOIDB is its function to enrich the datacite metadata with additional custom attributes, like a geographic reference in a DIF record. These attributes are at the moment not available in the datacite metadata schema but would be valuable elements for the compilation of data catalogues in the earth sciences and for dissemination of catalogue data via OAI-PMH. [1] http://www.escidoc.org , eSciDoc, FIZ Karlruhe, Germany [2] http://panmetadocs.sf.net , panMetaDocs, GFZ Potsdam, Germany [3] http://metaworks.pangaea.de , panMetaWorks, Dr. R. Huber, MARUM, Univ. Bremen, Germany [4] http://www.datacite.org

  2. Climate Model Datasets on Earth System Grid II (ESG II)

    DOE Data Explorer

    Earth System Grid (ESG) is a project that combines the power and capacity of supercomputers, sophisticated analysis servers, and datasets on the scale of petabytes. The goal is to provide a seamless distributed environment that allows scientists in many locations to work with large-scale data, perform climate change modeling and simulation,and share results in innovative ways. Though ESG is more about the computing environment than the data, still there are several catalogs of data available at the web site that can be browsed or search. Most of the datasets are restricted to registered users, but several are open to any access.

  3. The Wind Integration National Dataset (WIND) toolkit (Presentation)

    SciTech Connect

    Caroline Draxl: NREL

    2014-01-01

    Regional wind integration studies require detailed wind power output data at many locations to perform simulations of how the power system will operate under high penetration scenarios. The wind datasets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as being time synchronized with available load profiles.As described in this presentation, the WIND Toolkit fulfills these requirements by providing a state-of-the-art national (US) wind resource, power production and forecast dataset.

  4. Dataset of mitochondrial genome variants associated with asymptomatic atherosclerosis

    PubMed Central

    Sazonova, Margarita A.; Zhelankin, Andrey V.; Barinova, Valeria A.; Sinyov, Vasily V.; Khasanova, Zukhra B.; Postnov, Anton Y.; Sobenin, Igor A.; Bobryshev, Yuri V.; Orekhov, Alexander N.

    2016-01-01

    This dataset report is dedicated to mitochondrial genome variants associated with asymptomatic atherosclerosis. These data were obtained using the method of next generation pyrosequencing (NGPS). The whole mitochondrial genome of the sample of patients from the Moscow region was analyzed. In this article the dataset including anthropometric, biochemical and clinical parameters along with detected mtDNA variants in patients with carotid atherosclerosis and healthy individuals was presented. Among 58 of the most common homoplasmic mtDNA variants found in the observed sample, 7 variants occurred more often in patients with atherosclerosis and 16 variants occurred more often in healthy individuals. PMID:27222855

  5. An evaluation of the global 1-km AVHRR land dataset

    USGS Publications Warehouse

    Teillet, P.M.; El Saleous, N.; Hansen, M.C.; Eidenshink, Jeffery C.; Justice, C.O.; Townshend, J.R.G.

    2000-01-01

    This paper summarizes the steps taken in the generation of the global 1-km AVHRR land dataset, and it documents an evaluation of the data product with respect to the original specifications and its usefulness in research and applications to date. The evaluation addresses data characterization, processing, compositing and handling issues. Examples of the main scientific outputs are presented and options for improved processing are outlined and prioritized. The dataset has made a significant contribution, and a strong recommendation is made for its reprocessing and continuation to produce a long-term record for global change research.

  6. A synthetic Longitudinal Study dataset for England and Wales.

    PubMed

    Dennett, Adam; Norman, Paul; Shelton, Nicola; Stuchbury, Rachel

    2016-12-01

    This article describes the new synthetic England and Wales Longitudinal Study 'spine' dataset designed for teaching and experimentation purposes. In the United Kingdom, there exist three Census-based longitudinal micro-datasets, known collectively as the Longitudinal Studies. The England and Wales Longitudinal Study (LS) is a 1% sample of the population of England and Wales (around 500,000 individuals), linking individual person records from the 1971 to 2011 Censuses. The synthetic data presented contains a similar number of individuals to the original data and accurate longitudinal transitions between 2001 and 2011 for key demographic variables, but unlike the original data, is open access. PMID:27656667

  7. A synthetic Longitudinal Study dataset for England and Wales.

    PubMed

    Dennett, Adam; Norman, Paul; Shelton, Nicola; Stuchbury, Rachel

    2016-12-01

    This article describes the new synthetic England and Wales Longitudinal Study 'spine' dataset designed for teaching and experimentation purposes. In the United Kingdom, there exist three Census-based longitudinal micro-datasets, known collectively as the Longitudinal Studies. The England and Wales Longitudinal Study (LS) is a 1% sample of the population of England and Wales (around 500,000 individuals), linking individual person records from the 1971 to 2011 Censuses. The synthetic data presented contains a similar number of individuals to the original data and accurate longitudinal transitions between 2001 and 2011 for key demographic variables, but unlike the original data, is open access.

  8. Benchmark three-dimensional eye-tracking dataset for visual saliency prediction on stereoscopic three-dimensional video

    NASA Astrophysics Data System (ADS)

    Banitalebi-Dehkordi, Amin; Nasiopoulos, Eleni; Pourazad, Mahsa T.; Nasiopoulos, Panos

    2016-01-01

    Visual attention models (VAMs) predict the location of image or video regions that are most likely to attract human attention. Although saliency detection is well explored for two-dimensional (2-D) image and video content, there have been only a few attempts made to design three-dimensional (3-D) saliency prediction models. Newly proposed 3-D VAMs have to be validated over large-scale video saliency prediction datasets, which also contain results of eye-tracking information. There are several publicly available eye-tracking datasets for 2-D image and video content. In the case of 3-D, however, there is still a need for large-scale video saliency datasets for the research community for validating different 3-D VAMs. We introduce a large-scale dataset containing eye-tracking data collected from 61 stereoscopic 3-D videos (and also 2-D versions of those), and 24 subjects participated in a free-viewing test. We evaluate the performance of the existing saliency detection methods over the proposed dataset. In addition, we created an online benchmark for validating the performance of the existing 2-D and 3-D VAMs and facilitating the addition of new VAMs to the benchmark. Our benchmark currently contains 50 different VAMs.

  9. Multiplexed MS/MS for Improved Data Independent Acquisition

    PubMed Central

    Egertson, Jarrett D.; Kuehn, Andreas; Merrihew, Gennifer E.; Bateman, Nicholas W.; MacLean, Brendan X.; Ting, Ying S.; Canterbury, Jesse D.; Marsh, Donald M.; Kellmann, Markus; Zabrouskov, Vlad; Wu, Christine C.; MacCoss, Michael J.

    2013-01-01

    In mass spectrometry based proteomics, data-independent acquisition (DIA) strategies have the ability to acquire a single dataset useful for identification and quantification of detectable peptides in a complex mixture. Despite this, DIA is often overlooked due to noisier data resulting from a typical five to ten fold reduction in precursor selectivity compared to data dependent acquisition or selected reaction monitoring. We demonstrate a multiplexing technique which improves precursor selectivity five-fold. PMID:23793237

  10. Agent independent task planning

    NASA Technical Reports Server (NTRS)

    Davis, William S.

    1990-01-01

    Agent-Independent Planning is a technique that allows the construction of activity plans without regard to the agent that will perform them. Once generated, a plan is then validated and translated into instructions for a particular agent, whether a robot, crewmember, or software-based control system. Because Space Station Freedom (SSF) is planned for orbital operations for approximately thirty years, it will almost certainly experience numerous enhancements and upgrades, including upgrades in robotic manipulators. Agent-Independent Planning provides the capability to construct plans for SSF operations, independent of specific robotic systems, by combining techniques of object oriented modeling, nonlinear planning and temporal logic. Since a plan is validated using the physical and functional models of a particular agent, new robotic systems can be developed and integrated with existing operations in a robust manner. This technique also provides the capability to generate plans for crewmembers with varying skill levels, and later apply these same plans to more sophisticated robotic manipulators made available by evolutions in technology.

  11. International exploration by independents

    SciTech Connect

    Bertagne, R.G. )

    1991-03-01

    Recent industry trends indicate that the smaller US independents are looking at foreign exploration opportunities as one of the alternatives for growth in the new age of exploration. It is usually accepted that foreign finding costs per barrel are substantially lower than domestic because of the large reserve potential of international plays. To get involved overseas requires, however, an adaptation to different cultural, financial, legal, operational, and political conditions. Generally foreign exploration proceeds at a slower pace than domestic because concessions are granted by the government, or are explored in partnership with the national oil company. First, a mid- to long-term strategy, tailored to the goals and the financial capabilities of the company, must be prepared; it must be followed by an ongoing evaluation of quality prospects in various sedimentary basins, and a careful planning and conduct of the operations. To successfully explore overseas also requires the presence on the team of a minimum number of explorationists and engineers thoroughly familiar with the various exploratory and operational aspects of foreign work, having had a considerable amount of onsite experience in various geographical and climatic environments. Independents that are best suited for foreign expansion are those that have been financially successful domestically, and have a good discovery track record. When properly approached foreign exploration is well within the reach of smaller US independents and presents essentially no greater risk than domestic exploration; the reward, however, can be much larger and can catapult the company into the big leagues.

  12. International exploration by independent

    SciTech Connect

    Bertragne, R.G.

    1992-04-01

    Recent industry trends indicate that the smaller U.S. independents are looking at foreign exploration opportunities as one of the alternatives for growth in the new age of exploration. Foreign finding costs per barrel usually are accepted to be substantially lower than domestic costs because of the large reserve potential of international plays. To get involved in overseas exploration, however, requires the explorationist to adapt to different cultural, financial, legal, operational, and political conditions. Generally, foreign exploration proceeds at a slower pace than domestic exploration because concessions are granted by a country's government, or are explored in partnership with a national oil company. First, the explorationist must prepare a mid- to long-term strategy, tailored to the goals and the financial capabilities of the company; next, is an ongoing evaluation of quality prospects in various sedimentary basins, and careful planning and conduct of the operations. To successfully explore overseas also requires the presence of a minimum number of explorationists and engineers thoroughly familiar with the various exploratory and operational aspects of foreign work. Ideally, these team members will have had a considerable amount of on-site experience in various countries and climates. Independents best suited for foreign expansion are those who have been financially successful in domestic exploration. When properly approached, foreign exploration is well within the reach of smaller U.S. independents, and presents essentially no greater risk than domestic exploration; however, the reward can be much larger and can catapult the company into the 'big leagues.'

  13. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    NASA Astrophysics Data System (ADS)

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  14. Modular reorganization of brain resting state networks and its independent validation in Alzheimer's disease patients.

    PubMed

    Chen, Guangyu; Zhang, Hong-Ying; Xie, Chunming; Chen, Gang; Zhang, Zhi-Jun; Teng, Gao-Jun; Li, Shi-Jiang

    2013-01-01

    Previous studies have demonstrated disruption in structural and functional connectivity occurring in the Alzheimer's Disease (AD). However, it is not known how these disruptions alter brain network reorganization. With the modular analysis method of graph theory, and datasets acquired by the resting-state functional connectivity MRI (R-fMRI) method, we investigated and compared the brain organization patterns between the AD group and the cognitively normal control (CN) group. Our main finding is that the largest homotopic module (defined as the insula module) in the CN group was broken down to the pieces in the AD group. Specifically, it was discovered that the eight pairs of the bilateral regions (the opercular part of inferior frontal gyrus, area triangularis, insula, putamen, globus pallidus, transverse temporal gyri, superior temporal gyrus, and superior temporal pole) of the insula module had lost symmetric functional connection properties, and the corresponding gray matter concentration (GMC) was significant lower in AD group. We further quantified the functional connectivity changes with an index (index A) and structural changes with the GMC index in the insula module to demonstrate their great potential as AD biomarkers. We further validated these results with six additional independent datasets (271 subjects in six groups). Our results demonstrated specific underlying structural and functional reorganization from young to old, and for diseased subjects. Further, it is suggested that by combining the structural GMC analysis and functional modular analysis in the insula module, a new biomarker can be developed at the single-subject level.

  15. Revisiting Spitzer Transit Observations with Independent Component Analysis: New Results for Exoplanetary Systems

    NASA Astrophysics Data System (ADS)

    Morello, G.; Waldmann, I. P.; Tinetti, G.; Howarth, I. D.; Micela, G.

    2015-10-01

    Blind source separation techniques are used to reanalyse several exoplanetary transit lightcurves of a few exoplanets recorded with the infrared camera IRAC on board the Spitzer Space Telescope during the "cold" era. These observations, together with observations at other IR wavelengths, are crucial to characterise the atmospheres of the planets. Previous analyses of the same datasets reported discrepant results, hence the necessity of the reanalyses. The method we used here is based on the Independent Component Analysis (ICA) statistical technique, which ensures a high degree of objectivity. The use of ICA to detrend single photometric observations in a self-consistent way is novel in the literature. The advantage of our reanalyses over previous work is that we do not have to make any assumptions on the structure of the unknown instrumental systematics. We obtained for the first time coherent and repeatable results over different epochs for the exoplanets HD189733b and GJ436b[Morello et al.(2014), Morello et al.(2015)]. The technique has been also tested on simulated datasets with different instrument properties, proving its validity in a more general context [Morello et al.(2015b)]. We will present here the technique, and the results of its application to different observations, in addition to the already published ones. A uniform re-analysis of other archive data with this technique will provide improved parameters for a list of exoplanets, and in particular some other results debated in the literature.

  16. Would the ‘real’ observed dataset stand up? A critical examination of eight observed gridded climate datasets for China

    NASA Astrophysics Data System (ADS)

    Sun, Qiaohong; Miao, Chiyuan; Duan, Qingyun; Kong, Dongxian; Ye, Aizhong; Di, Zhenhua; Gong, Wei

    2014-01-01

    This research compared and evaluated the spatio-temporal similarities and differences of eight widely used gridded datasets. The datasets include daily precipitation over East Asia (EA), the Climate Research Unit (CRU) product, the Global Precipitation Climatology Centre (GPCC) product, the University of Delaware (UDEL) product, Precipitation Reconstruction over Land (PREC/L), the Asian Precipitation Highly Resolved Observational (APHRO) product, the Institute of Atmospheric Physics (IAP) dataset from the Chinese Academy of Sciences, and the National Meteorological Information Center dataset from the China Meteorological Administration (CN05). The meteorological variables focus on surface air temperature (SAT) or precipitation (PR) in China. All datasets presented general agreement on the whole spatio-temporal scale, but some differences appeared for specific periods and regions. On a temporal scale, EA shows the highest amount of PR, while APHRO shows the lowest. CRU and UDEL show higher SAT than IAP or CN05. On a spatial scale, the most significant differences occur in western China for PR and SAT. For PR, the difference between EA and CRU is the largest. When compared with CN05, CRU shows higher SAT in the central and southern Northwest river drainage basin, UDEL exhibits higher SAT over the Southwest river drainage system, and IAP has lower SAT in the Tibetan Plateau. The differences in annual mean PR and SAT primarily come from summer and winter, respectively. Finally, potential factors impacting agreement among gridded climate datasets are discussed, including raw data sources, quality control (QC) schemes, orographic correction, and interpolation techniques. The implications and challenges of these results for climate research are also briefly addressed.

  17. In-depth evaluation of software tools for data-independent acquisition based label-free quantification.

    PubMed

    Kuharev, Jörg; Navarro, Pedro; Distler, Ute; Jahn, Olaf; Tenzer, Stefan

    2015-09-01

    Label-free quantification (LFQ) based on data-independent acquisition workflows currently experiences increasing popularity. Several software tools have been recently published or are commercially available. The present study focuses on the evaluation of three different software packages (Progenesis, synapter, and ISOQuant) supporting ion mobility enhanced data-independent acquisition data. In order to benchmark the LFQ performance of the different tools, we generated two hybrid proteome samples of defined quantitative composition containing tryptically digested proteomes of three different species (mouse, yeast, Escherichia coli). This model dataset simulates complex biological samples containing large numbers of both unregulated (background) proteins as well as up- and downregulated proteins with exactly known ratios between samples. We determined the number and dynamic range of quantifiable proteins and analyzed the influence of applied algorithms (retention time alignment, clustering, normalization, etc.) on quantification results. Analysis of technical reproducibility revealed median coefficients of variation of reported protein abundances below 5% for MS(E) data for Progenesis and ISOQuant. Regarding accuracy of LFQ, evaluation with synapter and ISOQuant yielded superior results compared to Progenesis. In addition, we discuss reporting formats and user friendliness of the software packages. The data generated in this study have been deposited to the ProteomeXchange Consortium with identifier PXD001240 (http://proteomecentral.proteomexchange.org/dataset/PXD001240).

  18. Solar Cycle Variability in New Merge Satellite Ozone Datasets

    NASA Astrophysics Data System (ADS)

    Kuchar, A.; Pisoft, P.

    2014-12-01

    Studies using coupled chemistry climate model simulations of the solar cycle in the ozone field reveal agreement with the observed "double-peaked" ozone anomaly in the original satellite observations represented by SBUV(/2), HALOE and SAGE datasets. The motivation of our analysis is to examine whether the solar signal in the last generation of reanalyzed datasets (i.e. MERRA and ERA-INTERIM) is consistent with the observed double-peaked ozone anomaly extracted from satellite measurements. Since an analysis of the solar cycle response requires long-term and temporal homogeneous time series of the ozone profile and no single satellite instrument has covered the entire period since 1984, satellite measurements in our study are represented by new merged satellite ozone datasets, i.e. GOZCARDS, SBUV MOD and SWOOSH datasets. The results of the presented study are based on the attribution analysis using multiple nonlinear techniques besides traditional linear approach based on the multiple linear models. The study results are supplemented by a frequency analysis using the pseudo-2D wavelet transform algorithms.

  19. The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

    NASA Technical Reports Server (NTRS)

    Bridges, James; Wernet, Mark P.

    2011-01-01

    Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.

  20. Accounting For Uncertainty in The Application Of High Throughput Datasets

    EPA Science Inventory

    The use of high throughput screening (HTS) datasets will need to adequately account for uncertainties in the data generation process and propagate these uncertainties through to ultimate use. Uncertainty arises at multiple levels in the construction of predictors using in vitro ...

  1. A global experimental dataset for assessing grain legume production

    PubMed Central

    Cernay, Charles; Pelzer, Elise; Makowski, David

    2016-01-01

    Grain legume crops are a significant component of the human diet and animal feed and have an important role in the environment, but the global diversity of agricultural legume species is currently underexploited. Experimental assessments of grain legume performances are required, to identify potential species with high yields. Here, we introduce a dataset including results of field experiments published in 173 articles. The selected experiments were carried out over five continents on 39 grain legume species. The dataset includes measurements of grain yield, aerial biomass, crop nitrogen content, residual soil nitrogen content and water use. When available, yields for cereals and oilseeds grown after grain legumes in the crop sequence are also included. The dataset is arranged into a relational database with nine structured tables and 198 standardized attributes. Tillage, fertilization, pest and irrigation management are systematically recorded for each of the 8,581 crop*field site*growing season*treatment combinations. The dataset is freely reusable and easy to update. We anticipate that it will provide valuable information for assessing grain legume production worldwide. PMID:27676125

  2. Automated single particle detection and tracking for large microscopy datasets.

    PubMed

    Wilson, Rhodri S; Yang, Lei; Dun, Alison; Smyth, Annya M; Duncan, Rory R; Rickman, Colin; Lu, Weiping

    2016-05-01

    Recent advances in optical microscopy have enabled the acquisition of very large datasets from living cells with unprecedented spatial and temporal resolutions. Our ability to process these datasets now plays an essential role in order to understand many biological processes. In this paper, we present an automated particle detection algorithm capable of operating in low signal-to-noise fluorescence microscopy environments and handling large datasets. When combined with our particle linking framework, it can provide hitherto intractable quantitative measurements describing the dynamics of large cohorts of cellular components from organelles to single molecules. We begin with validating the performance of our method on synthetic image data, and then extend the validation to include experiment images with ground truth. Finally, we apply the algorithm to two single-particle-tracking photo-activated localization microscopy biological datasets, acquired from living primary cells with very high temporal rates. Our analysis of the dynamics of very large cohorts of 10 000 s of membrane-associated protein molecules show that they behave as if caged in nanodomains. We show that the robustness and efficiency of our method provides a tool for the examination of single-molecule behaviour with unprecedented spatial detail and high acquisition rates.

  3. Eastern Renewable Generation Integration Study Solar Dataset (Presentation)

    SciTech Connect

    Hummon, M.

    2014-04-01

    The National Renewable Energy Laboratory produced solar power production data for the Eastern Renewable Generation Integration Study (ERGIS) including "real time" 5-minute interval data, "four hour ahead forecast" 60-minute interval data, and "day-ahead forecast" 60-minute interval data for the year 2006. This presentation provides a brief overview of the three solar power datasets.

  4. Automated single particle detection and tracking for large microscopy datasets

    PubMed Central

    Wilson, Rhodri S.; Yang, Lei; Dun, Alison; Smyth, Annya M.; Duncan, Rory R.; Rickman, Colin

    2016-01-01

    Recent advances in optical microscopy have enabled the acquisition of very large datasets from living cells with unprecedented spatial and temporal resolutions. Our ability to process these datasets now plays an essential role in order to understand many biological processes. In this paper, we present an automated particle detection algorithm capable of operating in low signal-to-noise fluorescence microscopy environments and handling large datasets. When combined with our particle linking framework, it can provide hitherto intractable quantitative measurements describing the dynamics of large cohorts of cellular components from organelles to single molecules. We begin with validating the performance of our method on synthetic image data, and then extend the validation to include experiment images with ground truth. Finally, we apply the algorithm to two single-particle-tracking photo-activated localization microscopy biological datasets, acquired from living primary cells with very high temporal rates. Our analysis of the dynamics of very large cohorts of 10 000 s of membrane-associated protein molecules show that they behave as if caged in nanodomains. We show that the robustness and efficiency of our method provides a tool for the examination of single-molecule behaviour with unprecedented spatial detail and high acquisition rates. PMID:27293801

  5. Mutual-information-based registration for ultrasound and CT datasets

    NASA Astrophysics Data System (ADS)

    Firle, Evelyn A.; Wesarg, Stefan; Dold, Christian

    2004-05-01

    In many applications for minimal invasive surgery the acquisition of intra-operative medical images is helpful if not absolutely necessary. Especially for Brachytherapy imaging is critically important to the safe delivery of the therapy. Modern computed tomography (CT) and magnetic resonance (MR) scanners allow minimal invasive procedures to be performed under direct imaging guidance. However, conventional scanners do not have real-time imaging capability and are expensive technologies requiring a special facility. Ultrasound (U/S) is a much cheaper and one of the most flexible imaging modalities. It can be moved to the application room as required and the physician sees what is happening as it occurs. Nevertheless it may be easier to interpret these 3D intra-operative U/S images if they are used in combination with less noisier preoperative data such as CT. The purpose of our current investigation is to develop a registration tool for automatically combining pre-operative CT volumes with intra-operatively acquired 3D U/S datasets. The applied alignment procedure is based on the information theoretic approach of maximizing the mutual information of two arbitrary datasets from different modalities. Since the CT datasets include a much bigger field of view we introduced a bounding box to narrow down the region of interest within the CT dataset. We conducted a phantom experiment using a CIRS Model 53 U/S Prostate Training Phantom to evaluate the feasibility and accuracy of the proposed method.

  6. Using Real Datasets for Interdisciplinary Business/Economics Projects

    ERIC Educational Resources Information Center

    Goel, Rajni; Straight, Ronald L.

    2005-01-01

    The workplace's global and dynamic nature allows and requires improved approaches for providing business and economics education. In this article, the authors explore ways of enhancing students' understanding of course material by using nontraditional, real-world datasets of particular interest to them. Teaching at a historically Black university,…

  7. Mining Institutional Datasets to Support Policy Making and Implementation

    ERIC Educational Resources Information Center

    Yorke, Mantz; Barnett, Greg; Evanson, Peter; Haines, Chris; Jenkins, Don; Knight, Peter; Scurry, Dave; Stowell, Marie; Woolf, Harvey

    2005-01-01

    Datasets are often under-exploited by institutions, yet they contain evidence that is potentially of high value for planning and decision-making. This article shows how institutional data were used to determine whether the demographic background of students might have an influence on their performance: this is a matter of particular interest where…

  8. Oregon Cascades Play Fairway Analysis: Raster Datasets and Models

    SciTech Connect

    Adam Brandt

    2015-11-15

    This submission includes maps of the spatial distribution of basaltic, and felsic rocks in the Oregon Cascades. It also includes a final Play Fairway Analysis (PFA) model, with the heat and permeability composite risk segments (CRS) supplied separately. Metadata for each raster dataset can be found within the zip files, in the TIF images

  9. NEW WEB-BASED ACCESS TO NUCLEAR STRUCTURE DATASETS.

    SciTech Connect

    WINCHELL,D.F.

    2004-09-26

    As part of an effort to migrate the National Nuclear Data Center (NNDC) databases to a relational platform, a new web interface has been developed for the dissemination of the nuclear structure datasets stored in the Evaluated Nuclear Structure Data File and Experimental Unevaluated Nuclear Data List.

  10. A Dataset for Visual Navigation with Neuromorphic Methods.

    PubMed

    Barranco, Francisco; Fermuller, Cornelia; Aloimonos, Yiannis; Delbruck, Tobi

    2016-01-01

    Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS) and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets. PMID:26941595

  11. A Dataset for Breast Cancer Histopathological Image Classification.

    PubMed

    Spanhol, Fabio A; Oliveira, Luiz S; Petitjean, Caroline; Heutte, Laurent

    2016-07-01

    Today, medical image analysis papers require solid experiments to prove the usefulness of proposed methods. However, experiments are often performed on data selected by the researchers, which may come from different institutions, scanners, and populations. Different evaluation measures may be used, making it difficult to compare the methods. In this paper, we introduce a dataset of 7909 breast cancer histopathology images acquired on 82 patients, which is now publicly available from http://web.inf.ufpr.br/vri/breast-cancer-database. The dataset includes both benign and malignant images. The task associated with this dataset is the automated classification of these images in two classes, which would be a valuable computer-aided diagnosis tool for the clinician. In order to assess the difficulty of this task, we show some preliminary results obtained with state-of-the-art image classification systems. The accuracy ranges from 80% to 85%, showing room for improvement is left. By providing this dataset and a standardized evaluation protocol to the scientific community, we hope to gather researchers in both the medical and the machine learning field to advance toward this clinical application.

  12. A Dataset for Visual Navigation with Neuromorphic Methods

    PubMed Central

    Barranco, Francisco; Fermuller, Cornelia; Aloimonos, Yiannis; Delbruck, Tobi

    2016-01-01

    Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS) and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets. PMID:26941595

  13. Fitting Meta-Analytic Structural Equation Models with Complex Datasets

    ERIC Educational Resources Information Center

    Wilson, Sandra Jo; Polanin, Joshua R.; Lipsey, Mark W.

    2016-01-01

    A modification of the first stage of the standard procedure for two-stage meta-analytic structural equation modeling for use with large complex datasets is presented. This modification addresses two common problems that arise in such meta-analyses: (a) primary studies that provide multiple measures of the same construct and (b) the correlation…

  14. A daily global mesoscale ocean eddy dataset from satellite altimetry

    PubMed Central

    Faghmous, James H.; Frenger, Ivy; Yao, Yuanshun; Warmka, Robert; Lindell, Aron; Kumar, Vipin

    2015-01-01

    Mesoscale ocean eddies are ubiquitous coherent rotating structures of water with radial scales on the order of 100 kilometers. Eddies play a key role in the transport and mixing of momentum and tracers across the World Ocean. We present a global daily mesoscale ocean eddy dataset that contains ~45 million mesoscale features and 3.3 million eddy trajectories that persist at least two days as identified in the AVISO dataset over a period of 1993–2014. This dataset, along with the open-source eddy identification software, extract eddies with any parameters (minimum size, lifetime, etc.), to study global eddy properties and dynamics, and to empirically estimate the impact eddies have on mass or heat transport. Furthermore, our open-source software may be used to identify mesoscale features in model simulations and compare them to observed features. Finally, this dataset can be used to study the interaction between mesoscale ocean eddies and other components of the Earth System. PMID:26097744

  15. Finding the Maine Story in Hugh Cumbersome National Monitoring Datasets

    EPA Science Inventory

    What’s a manager, analyst, or concerned citizen to do with the complex datasets generated by State and Federal monitoring efforts? Is it possible to use such information to address Maine’s environmental issues without having a degree in informatics and statistics? This presentati...

  16. Estimated Perennial Streams of Idaho and Related Geospatial Datasets

    USGS Publications Warehouse

    Rea, Alan; Skinner, Kenneth D.

    2009-01-01

    The perennial or intermittent status of a stream has bearing on many regulatory requirements. Because of changing technologies over time, cartographic representation of perennial/intermittent status of streams on U.S. Geological Survey (USGS) topographic maps is not always accurate and (or) consistent from one map sheet to another. Idaho Administrative Code defines an intermittent stream as one having a 7-day, 2-year low flow (7Q2) less than 0.1 cubic feet per second. To establish consistency with the Idaho Administrative Code, the USGS developed regional regression equations for Idaho streams for several low-flow statistics, including 7Q2. Using these regression equations, the 7Q2 streamflow may be estimated for naturally flowing streams anywhere in Idaho to help determine perennial/intermittent status of streams. Using these equations in conjunction with a Geographic Information System (GIS) technique known as weighted flow accumulation allows for an automated and continuous estimation of 7Q2 streamflow at all points along a stream, which in turn can be used to determine if a stream is intermittent or perennial according to the Idaho Administrative Code operational definition. The selected regression equations were applied to create continuous grids of 7Q2 estimates for the eight low-flow regression regions of Idaho. By applying the 0.1 ft3/s criterion, the perennial streams have been estimated in each low-flow region. Uncertainty in the estimates is shown by identifying a 'transitional' zone, corresponding to flow estimates of 0.1 ft3/s plus and minus one standard error. Considerable additional uncertainty exists in the model of perennial streams presented in this report. The regression models provide overall estimates based on general trends within each regression region. These models do not include local factors such as a large spring or a losing reach that may greatly affect flows at any given point. Site-specific flow data, assuming a sufficient period of

  17. Correcting OCR text by association with historical datasets

    NASA Astrophysics Data System (ADS)

    Hauser, Susan E.; Schlaifer, Jonathan; Sabir, Tehseen F.; Demner-Fushman, Dina; Straughan, Scott; Thoma, George R.

    2003-01-01

    The Medical Article Records System (MARS) developed by the Lister Hill National Center for Biomedical Communications uses scanning, OCR and automated recognition and reformatting algorithms to generate electronic bibliographic citation data from paper biomedical journal articles. The OCR server incorporated in MARS performs well in general, but fares less well with text printed in small or italic fonts. Affiliations are often printed in small italic fonts in the journals processed by MARS. Consequently, although the automatic processes generate much of the citation data correctly, the affiliation field frequently contains incorrect data, which must be manually corrected by verification operators. In contrast, author names are usually printed in large, normal fonts that are correctly converted to text by the OCR server. The National Library of Medicine"s MEDLINE database contains 11 million indexed citations for biomedical journal articles. This paper documents our effort to use the historical author, affiliation relationships from this large dataset to find potential correct affiliations for MARS articles based on the author and the affiliation in the OCR output. Preliminary tests using a table of about 400,000 author/affiliation pairs extracted from the corrected data from MARS indicated that about 44% of the author/affiliation pairs were repeats and that about 47% of newly converted author names would be found in this set. A text-matching algorithm was developed to determine the likelihood that an affiliation found in the table corresponding to the OCR text of the first author was the current, correct affiliation. This matching algorithm compares an affiliation found in the author/affiliation table (found with the OCR text of the first author) to the OCR output affiliation, and calculates a score indicating the similarity of the affiliation found in the table to the OCR affiliation. Using a ground truth set of 519 OCR author/OCR affiliation/correct affiliation

  18. Comparison and validation of gridded precipitation datasets for Spain

    NASA Astrophysics Data System (ADS)

    Quintana-Seguí, Pere; Turco, Marco; Míguez-Macho, Gonzalo

    2016-04-01

    In this study, two gridded precipitation datasets are compared and validated in Spain: the recently developed SAFRAN dataset and the Spain02 dataset. These are validated using rain gauges and they are also compared to the low resolution ERA-Interim reanalysis. The SAFRAN precipitation dataset has been recently produced, using the SAFRAN meteorological analysis, which is extensively used in France (Durand et al. 1993, 1999; Quintana-Seguí et al. 2008; Vidal et al., 2010) and which has recently been applied to Spain (Quintana-Seguí et al., 2015). SAFRAN uses an optimal interpolation (OI) algorithm and uses all available rain gauges from the Spanish State Meteorological Agency (Agencia Estatal de Meteorología, AEMET). The product has a spatial resolution of 5 km and it spans from September 1979 to August 2014. This dataset has been produced mainly to be used in large scale hydrological applications. Spain02 (Herrera et al. 2012, 2015) is another high quality precipitation dataset for Spain based on a dense network of quality-controlled stations and it has different versions at different resolutions. In this study we used the version with a resolution of 0.11°. The product spans from 1971 to 2010. Spain02 is well tested and widely used, mainly, but not exclusively, for RCM model validation and statistical downscliang. ERA-Interim is a well known global reanalysis with a spatial resolution of ˜79 km. It has been included in the comparison because it is a widely used product for continental and global scale studies and also in smaller scale studies in data poor countries. Thus, its comparison with higher resolution products of a data rich country, such as Spain, allows us to quantify the errors made when using such datasets for national scale studies, in line with some of the objectives of the EU-FP7 eartH2Observe project. The comparison shows that SAFRAN and Spain02 perform similarly, even though their underlying principles are different. Both products are largely

  19. Spatial Disaggregation of the 0.25-degree GLDAS Air Temperature Dataset to 30-arcsec Resolution

    NASA Astrophysics Data System (ADS)

    Ji, L.; Senay, G. B.; Verdin, J. P.; Velpuri, N. M.

    2015-12-01

    Air temperature is a key input variable in ecological and hydrological models for simulating the hydrological cycle and water budget. Several global reanalysis products have been developed at different organizations, which provide gridded air temperature datasets at resolutions ranging from 0.25º to 2.5º (or 27.8 - 278.3 km at the equator). However, gridded air temperature products at a high-resolution (≤1 km) are available only for limited areas of the world. To meet the needs for global eco-hydrological modeling, we aim to produce a continuous daily air temperature datasets at 1-km resolution for the global coverage. In this study, we developed a technique that spatially disaggregates the 0.25º Global Land Data Assimilation System (GLDAS) daily air temperature data to 30-arcsec (0.928 km at the equator) resolution by integrating the GLDAS data with the 30-arcsec WorldClim 1950 - 2000 monthly normal air temperature data. The method was tested using the GLDAS and Worldclim maximum and minimum air temperature datasets from 2002 and 2010 for the conterminous Unites States and Africa. The 30-arcsec disaggregated GLDAS (GLDASd) air temperature dataset retains the mean values of the original GLDAS data, while adding spatial variabilities inherited from the Worldclim data. A great improvement in GLDAS disaggregation is shown in mountain areas where complex terrain features have strong impact on temperature. We validated the disaggregation method by comparing the GLDASd product with daily meteorological observations archived by the Global Historical Climatology Network (GHCN) and the Global Surface Summary of the Day (GSOD) datasets. Additionally, the 30-arcsec TopoWX daily air temperature product was used to compare with the GLDASd data for the conterminous United States. The proposed data disaggregation method provides a convenient and efficient tool for generating a global high-resolution air temperature dataset, which will be beneficial to global eco

  20. Evaluation of a Moderate Resolution, Satellite-Based Impervious Surface Map Using an Independent, High-Resolution Validation Dataset

    EPA Science Inventory

    Given the relatively high cost of mapping impervious surfaces at regional scales, substantial effort is being expended in the development of moderate-resolution, satellite-based methods for estimating impervious surface area (ISA). To rigorously assess the accuracy of these data ...

  1. Proteome dataset of pre-ovulatory follicular fluids from less fertile dairy cows.

    PubMed

    Zachut, Maya; Sood, Pankaj; Livshitz, Lilya; Kra, Gitit; Levin, Yishai; Moallem, Uzi

    2016-06-01

    This article contains raw and processed data related to research published in Zachut et al. (2016) [1]. Proteomics data from preovulatory follicles in cows was obtained by liquid chromatography-mass spectrometry following protein extraction. Differential expression between controls and less fertile cows (LFC) was quantified using MS1 intensity based label-free. The only previous proteomic analysis of bovine FF detected merely 40 proteins in follicular cysts obtained from the slaughterhouse (Maniwa et al., 2005) [2], and the abundance of proteins in the bovine preovulatory FF remains unknown. Therefore, the objectives were to establish the first dataset of FF proteome in preovulatory follicles of cows, and to examine differentially expressed proteins in FF obtained in-vivo from preovulatory follicles of less fertile cows (also termed "repeat breeder") and control (CTL) cows. The proteome of FF from 10 preovulatory follicles that were aspirated in vivo (estradiol/progesterone>1) was analyzed. This novel dataset contains 219 identified and quantified proteins in FF, consisting mainly of binding proteins, proteases, receptor ligands, enzymes and transporters. In addition, differential abundance of 8 proteins relevant to follicular function was found in LFC compared to CTL; these findings are discussed in our recent research article Zachut et al. (2016) [1]. The present dataset of bovine FF proteome can be used as a reference for any study involving disorders of follicular development in dairy cows or in comparative studies between species. PMID:27182550

  2. Proteome dataset of pre-ovulatory follicular fluids from less fertile dairy cows.

    PubMed

    Zachut, Maya; Sood, Pankaj; Livshitz, Lilya; Kra, Gitit; Levin, Yishai; Moallem, Uzi

    2016-06-01

    This article contains raw and processed data related to research published in Zachut et al. (2016) [1]. Proteomics data from preovulatory follicles in cows was obtained by liquid chromatography-mass spectrometry following protein extraction. Differential expression between controls and less fertile cows (LFC) was quantified using MS1 intensity based label-free. The only previous proteomic analysis of bovine FF detected merely 40 proteins in follicular cysts obtained from the slaughterhouse (Maniwa et al., 2005) [2], and the abundance of proteins in the bovine preovulatory FF remains unknown. Therefore, the objectives were to establish the first dataset of FF proteome in preovulatory follicles of cows, and to examine differentially expressed proteins in FF obtained in-vivo from preovulatory follicles of less fertile cows (also termed "repeat breeder") and control (CTL) cows. The proteome of FF from 10 preovulatory follicles that were aspirated in vivo (estradiol/progesterone>1) was analyzed. This novel dataset contains 219 identified and quantified proteins in FF, consisting mainly of binding proteins, proteases, receptor ligands, enzymes and transporters. In addition, differential abundance of 8 proteins relevant to follicular function was found in LFC compared to CTL; these findings are discussed in our recent research article Zachut et al. (2016) [1]. The present dataset of bovine FF proteome can be used as a reference for any study involving disorders of follicular development in dairy cows or in comparative studies between species.

  3. Development of an Ensemble Gridded Hydrometeorological Forcing Dataset over the Contiguous United States

    NASA Astrophysics Data System (ADS)

    Newman, Andrew; Clark, Martyn; Craig, Jason; Nijssen, Bart; Wood, Andrew; Gutmann, Ethan; Mizukami, Naoki; Brekke, Levi; Arnold, Jeff

    2015-04-01

    Gridded hydrometeorological forcing datasets are inherently uncertain due to myriad factors. These include interpolation from a sparse observation network, measurement representativeness, and measurement errors. Generally, uncertainty estimates are not included in gridded products; or if present, they may be included in an ad-hoc manner. A lack of quantitative uncertainty estimates for hydrometeorological forcing fields limits their utility to support land surface and hydrologic modeling techniques such as data assimilation, probabilistic forecasting and verification. We present a first of its kind, gridded, observation-based ensemble of precipitation and temperature at a daily increment for the period 1980-2012. Statistical verification of the ensemble indicates that it provides generally good reliability and discrimination of events of various magnitudes, but has a small dry bias for high probability events. The ensemble mean is similar to other widely used hydrometeorological forcing datasets (i.e. Maurer et al. (2002), Daymet, NLDAS-2) but with some important differences. The ensemble product is able to produce a more realistic probability-of-precipitation field, which impacts the empirical derivation of other fields used in land-surface and hydrologic modeling. Additionally, daily maximum, minimum temperature and precipitation accumulation uncertainty can be estimated through the use of the ensemble variance. These types of datasets will help improve data assimilation and probabilistic forecast components of land-surface and hydrological modeling systems and provide a quantitative estimate of observation uncertainty for use in NWP forecast verification.

  4. Cary Potter on Independent Education

    ERIC Educational Resources Information Center

    Potter, Cary

    1978-01-01

    Cary Potter was President of the National Association of Independent Schools from 1964-1978. As he leaves NAIS he gives his views on education, on independence, on the independent school, on public responsibility, on choice in a free society, on educational change, and on the need for collective action by independent schools. (Author/RK)

  5. Myth or Truth: Independence Day.

    ERIC Educational Resources Information Center

    Gardner, Traci

    Most Americans think of the Fourth of July as Independence Day, but is it really the day the U.S. declared and celebrated independence? By exploring myths and truths surrounding Independence Day, this lesson asks students to think critically about commonly believed stories regarding the beginning of the Revolutionary War and the Independence Day…

  6. A gridded hourly rainfall dataset for the UK applied to a national physically-based modelling system

    NASA Astrophysics Data System (ADS)

    Lewis, Elizabeth; Blenkinsop, Stephen; Quinn, Niall; Freer, Jim; Coxon, Gemma; Woods, Ross; Bates, Paul; Fowler, Hayley

    2016-04-01

    An hourly gridded rainfall product has great potential for use in many hydrological applications that require high temporal resolution meteorological data. One important example of this is flood risk management, with flooding in the UK highly dependent on sub-daily rainfall intensities amongst other factors. Knowledge of sub-daily rainfall intensities is therefore critical to designing hydraulic structures or flood defences to appropriate levels of service. Sub-daily rainfall rates are also essential inputs for flood forecasting, allowing for estimates of peak flows and stage for flood warning and response. In addition, an hourly gridded rainfall dataset has significant potential for practical applications such as better representation of extremes and pluvial flash flooding, validation of high resolution climate models and improving the representation of sub-daily rainfall in weather generators. A new 1km gridded hourly rainfall dataset for the UK has been created by disaggregating the daily Gridded Estimates of Areal Rainfall (CEH-GEAR) dataset using comprehensively quality-controlled hourly rain gauge data from over 1300 observation stations across the country. Quality control measures include identification of frequent tips, daily accumulations and dry spells, comparison of daily totals against the CEH-GEAR daily dataset, and nearest neighbour checks. The quality control procedure was validated against historic extreme rainfall events and the UKCP09 5km daily rainfall dataset. General use of the dataset has been demonstrated by testing the sensitivity of a physically-based hydrological modelling system for Great Britain to the distribution and rates of rainfall and potential evapotranspiration. Of the sensitivity tests undertaken, the largest improvements in model performance were seen when an hourly gridded rainfall dataset was combined with potential evapotranspiration disaggregated to hourly intervals, with 61% of catchments showing an increase in NSE between

  7. Orientation-independent measures of ground motion

    USGS Publications Warehouse

    Boore, D.M.; Watson-Lamprey, Jennie; Abrahamson, N.A.

    2006-01-01

    The geometric mean of the response spectra for two orthogonal horizontal components of motion, commonly used as the response variable in predictions of strong ground motion, depends on the orientation of the sensors as installed in the field. This means that the measure of ground-motion intensity could differ for the same actual ground motion. This dependence on sensor orientation is most pronounced for strongly correlated motion (the extreme example being linearly polarized motion), such as often occurs at periods of 1 sec or longer. We propose two new measures of the geometric mean, GMRotDpp, and GMRotIpp, that are independent of the sensor orientations. Both are based on a set of geometric means computed from the as-recorded orthogonal horizontal motions rotated through all possible non-redundant rotation angles. GMRotDpp is determined as the ppth percentile of the set of geometric means for a given oscillator period. For example, GMRotDOO, GMRotD50, and GMRotD100 correspond to the minimum, median, and maximum values, respectively. The rotations that lead to GMRotDpp depend on period, whereas a single-period-independent rotation is used for GMRotIpp, the angle being chosen to minimize the spread of the rotation-dependent geometric mean (normalized by GMRotDpp) over the usable range of oscillator periods. GMRotI50 is the ground-motion intensity measure being used in the development of new ground-motion prediction equations by the Pacific Earthquake Engineering Center Next Generation Attenuation project. Comparisons with as-recorded geometric means for a large dataset show that the new measures are systematically larger than the geometric-mean response spectra using the as-recorded values of ground acceleration, but only by a small amount (less than 3%). The theoretical advantage of the new measures is that they remove sensor orientation as a contributor to aleatory uncertainty. Whether the reduction is of practical significance awaits detailed studies of large

  8. A consistent aerosol optical depth (AOD) dataset over mainland China by integration of several AOD products

    NASA Astrophysics Data System (ADS)

    Xu, H.; Guang, J.; Xue, Y.; de Leeuw, Gerrit; Che, Y. H.; Guo, Jianping; He, X. W.; Wang, T. K.

    2015-08-01

    The Moderate Resolution Imaging Spectroradiometer (MODIS), the Multiangle Imaging Spectroradiometer (MISR) and the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) provide validated aerosol optical depth (AOD) products over both land and ocean. However, the values of the AOD provided by each of these satellites may show spatial and temporal differences due to the instrument characteristics and aerosol retrieval algorithms used for each instrument. In this article we present a method to produce an AOD data set over Asia for the year 2007 based on fusion of the data provided by different instruments and/or algorithms. First, the bias of each satellite-derived AOD product was calculated by comparison with ground-based AOD data derived from the AErosol RObotic NETwork (AERONET) and the China Aerosol Remote Sensing NETwork (CARSNET) for different values of the surface albedo and the AOD. Then, these multiple AOD products were combined using the maximum likelihood estimate (MLE) method using weights derived from the root mean square error (RMSE) associated with the accuracies of the original AOD products. The original and merged AOD dataset has been validated by comparison with AOD data from the CARSNET. Results show that the mean bias error (MBE) and mean absolute error (MAE) of the merged AOD dataset are not larger than that of any of the original AOD products. In addition, for the merged AOD dataset the fraction of pixels with no data is significantly smaller than that of any of the original products, thus increasing the spatial coverage. The fraction of retrievable area is about 50% for the merged AOD dataset and between 5% and 20% for the MISR, SeaWiFS, MODIS-DT and MODIS-DB algorithms.

  9. A high resolution 7-Tesla resting-state fMRI test-retest dataset with cognitive and physiological measures.

    PubMed

    Gorgolewski, Krzysztof J; Mendes, Natacha; Wilfling, Domenica; Wladimirow, Elisabeth; Gauthier, Claudine J; Bonnen, Tyler; Ruby, Florence J M; Trampel, Robert; Bazin, Pierre-Louis; Cozatl, Roberto; Smallwood, Jonathan; Margulies, Daniel S

    2015-01-01

    Here we present a test-retest dataset of functional magnetic resonance imaging (fMRI) data acquired at rest. 22 participants were scanned during two sessions spaced one week apart. Each session includes two 1.5 mm isotropic whole-brain scans and one 0.75 mm isotropic scan of the prefrontal cortex, giving a total of six time-points. Additionally, the dataset includes measures of mood, sustained attention, blood pressure, respiration, pulse, and the content of self-generated thoughts (mind wandering). This data enables the investigation of sources of both intra- and inter-session variability not only limited to physiological changes, but also including alterations in cognitive and affective states, at high spatial resolution. The dataset is accompanied by a detailed experimental protocol and source code of all stimuli used.

  10. A high resolution 7-Tesla resting-state fMRI test-retest dataset with cognitive and physiological measures

    PubMed Central

    Gorgolewski, Krzysztof J; Mendes, Natacha; Wilfling, Domenica; Wladimirow, Elisabeth; Gauthier, Claudine J; Bonnen, Tyler; Ruby, Florence J.M; Trampel, Robert; Bazin, Pierre-Louis; Cozatl, Roberto; Smallwood, Jonathan; Margulies, Daniel S

    2015-01-01

    Here we present a test-retest dataset of functional magnetic resonance imaging (fMRI) data acquired at rest. 22 participants were scanned during two sessions spaced one week apart. Each session includes two 1.5 mm isotropic whole-brain scans and one 0.75 mm isotropic scan of the prefrontal cortex, giving a total of six time-points. Additionally, the dataset includes measures of mood, sustained attention, blood pressure, respiration, pulse, and the content of self-generated thoughts (mind wandering). This data enables the investigation of sources of both intra- and inter-session variability not only limited to physiological changes, but also including alterations in cognitive and affective states, at high spatial resolution. The dataset is accompanied by a detailed experimental protocol and source code of all stimuli used. PMID:25977805

  11. Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization

    PubMed Central

    2010-01-01

    This paper demonstrates how a Neural Grammar Network learns to classify and score molecules for a variety of tasks in chemistry and toxicology. In addition to a more detailed analysis on datasets previously studied, we introduce three new datasets (BBB, FXa, and toxicology) to show the generality of the approach. A new experimental methodology is developed and applied to both the new datasets as well as previously studied datasets. This methodology is rigorous and statistically grounded, and ultimately culminates in a Wilcoxon significance test that proves the effectiveness of the system. We further include a complete generalization of the specific technique to arbitrary grammars and datasets using a mathematical abstraction that allows researchers in different domains to apply the method to their own work. Background Our work can be viewed as an alternative to existing methods to solve the quantitative structure-activity relationship (QSAR) problem. To this end, we review a number approaches both from a methodological and also a performance perspective. In addition to these approaches, we also examined a number of chemical properties that can be used by generic classifier systems, such as feed-forward artificial neural networks. In studying these approaches, we identified a set of interesting benchmark problem sets to which many of the above approaches had been applied. These included: ACE, AChE, AR, BBB, BZR, Cox2, DHFR, ER, FXa, GPB, Therm, and Thr. Finally, we developed our own benchmark set by collecting data on toxicology. Results Our results show that our system performs better than, or comparatively to, the existing methods over a broad range of problem types. Our method does not require the expert knowledge that is necessary to apply the other methods to novel problems. Conclusions We conclude that our success is due to the ability of our system to: 1) encode molecules losslessly before presentation to the learning system, and 2) leverage the design of

  12. Improving the terrestial gravity dataset in South-Estonia

    NASA Astrophysics Data System (ADS)

    Oja, T.; Gruno, A.; Bloom, A.; Mäekivi, E.; Ellmann, A.; All, T.; Jürgenson, H.; Michelson, M.

    2009-04-01

    The only available gravity dataset covering the whole of Estonia has been observed from 1949 to 1958. This historic dataset has been used as a main input source for many applications including the geoid determination, the realization of the height system, the geological mapping. However, some recent studies have been indicated remarkable systematic biases in the dataset. For instance, a comparison of modern gravity control points with the historic data revealed unreasonable discrepancies in a large region in South-Estonia. However, the distribution of the gravity control was scarce, which did not allow to fully assess the quality of the historic data in the study area. In 2008 a pilot project was called out as a cooperation between Estonian Land Board, Geological Survey of Estonia, Tallinn University of Technology and Estonian University of Life Sciences to densify the detected problematic area (about 2000 km2) with new and reliable gravity data. Field work was carried out in October and November 2008, whereas GPS RTK and relative Scintrex gravimeter CG5 were used for precise positioning and gravity determinations, respectively. Altogether more than 140 new points were determined along the roads. Despite bad weather conditions and unstable observation base of the gravimeter (mostly on the bank of the road), uncertainty better than ±0.1 mGal (1 mGal = 10-5 m/s2) was estimated from the adjustment of gravimeter's readings. The separate gravity dataset of the Estonian Geological Survey were also incorporated into the gravity database of the project for further analysis. Those data were collected within several geological mapping projects in 1981-2007 and contain the data with uncertainty better than ±0.25 mGal. After the collection of new gravity data, a Kriging with proper variogram modeling was applied to form the Bouguer anomaly grids of the historic and the new datasets. The comparison of the resulting grids revealed biases up to -4 mGal at certain regions

  13. SCSPOD14, a South China Sea physical oceanographic dataset derived from in situ measurements during 1919–2014

    PubMed Central

    Zeng, Lili; Wang, Dongxiao; Chen, Ju; Wang, Weiqiang; Chen, Rongyu

    2016-01-01

    In addition to the oceanographic data available for the South China Sea (SCS) from the World Ocean Database (WOD) and Array for Real-time Geostrophic Oceanography (Argo) floats, a suite of observations has been made by the South China Sea Institute of Oceanology (SCSIO) starting from the 1970s. Here, we assemble a SCS Physical Oceanographic Dataset (SCSPOD14) based on 51,392 validated temperature and salinity profiles collected from these three datasets for the period 1919–2014. A gridded dataset of climatological monthly mean temperature, salinity, and mixed and isothermal layer depth derived from an objective analysis of profiles is also presented. Comparisons with the World Ocean Atlas (WOA) and IFREMER/LOS Mixed Layer Depth Climatology confirm the reliability of the new dataset. This unique dataset offers an invaluable baseline perspective on the thermodynamic processes, spatial and temporal variability of water masses, and basin-scale and mesoscale oceanic structures in the SCS. We anticipate improvements and regular updates to this product as more observations become available from existing and future in situ networks. PMID:27116565

  14. SCSPOD14, a South China Sea physical oceanographic dataset derived from in situ measurements during 1919-2014.

    PubMed

    Zeng, Lili; Wang, Dongxiao; Chen, Ju; Wang, Weiqiang; Chen, Rongyu

    2016-04-26

    In addition to the oceanographic data available for the South China Sea (SCS) from the World Ocean Database (WOD) and Array for Real-time Geostrophic Oceanography (Argo) floats, a suite of observations has been made by the South China Sea Institute of Oceanology (SCSIO) starting from the 1970s. Here, we assemble a SCS Physical Oceanographic Dataset (SCSPOD14) based on 51,392 validated temperature and salinity profiles collected from these three datasets for the period 1919-2014. A gridded dataset of climatological monthly mean temperature, salinity, and mixed and isothermal layer depth derived from an objective analysis of profiles is also presented. Comparisons with the World Ocean Atlas (WOA) and IFREMER/LOS Mixed Layer Depth Climatology confirm the reliability of the new dataset. This unique dataset offers an invaluable baseline perspective on the thermodynamic processes, spatial and temporal variability of water masses, and basin-scale and mesoscale oceanic structures in the SCS. We anticipate improvements and regular updates to this product as more observations become available from existing and future in situ networks.

  15. Fast and sensitive alignment of microbial whole genome sequencing reads to large sequence datasets on a desktop PC: application to metagenomic datasets and pathogen identification.

    PubMed

    Pongor, Lőrinc S; Vera, Roberto; Ligeti, Balázs

    2014-01-01

    Next generation sequencing (NGS) of metagenomic samples is becoming a standard approach to detect individual species or pathogenic strains of microorganisms. Computer programs used in the NGS community have to balance between speed and sensitivity and as a result, species or strain level identification is often inaccurate and low abundance pathogens can sometimes be missed. We have developed Taxoner, an open source, taxon assignment pipeline that includes a fast aligner (e.g. Bowtie2) and a comprehensive DNA sequence database. We tested the program on simulated datasets as well as experimental data from Illumina, IonTorrent, and Roche 454 sequencing platforms. We found that Taxoner performs as well as, and often better than BLAST, but requires two orders of magnitude less running time meaning that it can be run on desktop or laptop computers. Taxoner is slower than the approaches that use small marker databases but is more sensitive due the comprehensive reference database. In addition, it can be easily tuned to specific applications using small tailored databases. When applied to metagenomic datasets, Taxoner can provide a functional summary of the genes mapped and can provide strain level identification. Taxoner is written in C for Linux operating systems. The code and documentation are available for research applications at http://code.google.com/p/taxoner.

  16. Identification of common prognostic gene expression signatures with biological meanings from microarray gene expression datasets.

    PubMed

    Yao, Jun; Zhao, Qi; Yuan, Ying; Zhang, Li; Liu, Xiaoming; Yung, W K Alfred; Weinstein, John N

    2012-01-01

    Numerous prognostic gene expression signatures for breast cancer were generated previously with few overlap and limited insight into the biology of the disease. Here we introduce a novel algorithm named SCoR (Survival analysis using Cox proportional hazard regression and Random resampling) to apply random resampling and clustering methods in identifying gene features correlated with time to event data. This is shown to reduce overfitting noises involved in microarray data analysis and discover functional gene sets linked to patient survival. SCoR independently identified a common poor prognostic signature composed of cell proliferation genes from six out of eight breast cancer datasets. Furthermore, a sequential SCoR analysis on highly proliferative breast cancers repeatedly identified T/B cell markers as favorable prognosis factors. In glioblastoma, SCoR identified a common good prognostic signature of chromosome 10 genes from two gene expression datasets (TCGA and REMBRANDT), recapitulating the fact that loss of one copy of chromosome 10 (which harbors the tumor suppressor PTEN) is linked to poor survival in glioblastoma patients. SCoR also identified prognostic genes on sex chromosomes in lung adenocarcinomas, suggesting patient gender might be used to predict outcome in this disease. These results demonstrate the power of SCoR to identify common and biologically meaningful prognostic gene expression signatures.

  17. Identification of Common Prognostic Gene Expression Signatures with Biological Meanings from Microarray Gene Expression Datasets

    PubMed Central

    Yao, Jun; Zhao, Qi; Yuan, Ying; Zhang, Li; Liu, Xiaoming; Yung, W. K. Alfred; Weinstein, John N.

    2012-01-01

    Numerous prognostic gene expression signatures for breast cancer were generated previously with few overlap and limited insight into the biology of the disease. Here we introduce a novel algorithm named SCoR (Survival analysis using Cox proportional hazard regression and Random resampling) to apply random resampling and clustering methods in identifying gene features correlated with time to event data. This is shown to reduce overfitting noises involved in microarray data analysis and discover functional gene sets linked to patient survival. SCoR independently identified a common poor prognostic signature composed of cell proliferation genes from six out of eight breast cancer datasets. Furthermore, a sequential SCoR analysis on highly proliferative breast cancers repeatedly identified T/B cell markers as favorable prognosis factors. In glioblastoma, SCoR identified a common good prognostic signature of chromosome 10 genes from two gene expression datasets (TCGA and REMBRANDT), recapitulating the fact that loss of one copy of chromosome 10 (which harbors the tumor suppressor PTEN) is linked to poor survival in glioblastoma patients. SCoR also identified prognostic genes on sex chromosomes in lung adenocarcinomas, suggesting patient gender might be used to predict outcome in this disease. These results demonstrate the power of SCoR to identify common and biologically meaningful prognostic gene expression signatures. PMID:23029298

  18. Climatic Analysis of Oceanic Water Vapor Transports Based on Satellite E-P Datasets

    NASA Technical Reports Server (NTRS)

    Smith, Eric A.; Sohn, Byung-Ju; Mehta, Vikram

    2004-01-01

    Understanding the climatically varying properties of water vapor transports from a robust observational perspective is an essential step in calibrating climate models. This is tantamount to measuring year-to-year changes of monthly- or seasonally-averaged, divergent water vapor transport distributions. This cannot be done effectively with conventional radiosonde data over ocean regions where sounding data are generally sparse. This talk describes how a methodology designed to derive atmospheric water vapor transports over the world oceans from satellite-retrieved precipitation (P) and evaporation (E) datasets circumvents the problem of inadequate sampling. Ultimately, the method is intended to take advantage of the relatively complete and consistent coverage, as well as continuity in sampling, associated with E and P datasets obtained from satellite measurements. Independent P and E retrievals from Special Sensor Microwave Imager (SSM/I) measurements, along with P retrievals from Tropical Rainfall Measuring Mission (TRMM) measurements, are used to obtain transports by solving a potential function for the divergence of water vapor transport as balanced by large scale E - P conditions.

  19. Tests of diffusion-free scaling behaviors in numerical dynamo datasets

    NASA Astrophysics Data System (ADS)

    Cheng, J. S.; Aurnou, J. M.

    2016-02-01

    Many dynamo studies extrapolate numerical model results to planetary conditions by empirically constructing scaling laws. The seminal work of Christensen and Aubert (2006) proposed a set of scaling laws that have been used throughout the geoscience community. These scalings make use of specially-constructed parameters that are independent of fluid diffusivities, anticipating that large-scale turbulent processes will dominate the physics in planetary dynamo settings. With these 'diffusion-free' parameterizations, the results of current numerical dynamo models extrapolate directly to fully-turbulent planetary core systems; the effects of realistic fluid properties merit no further investigation. In this study, we test the validity of diffusion-free heat transfer scaling arguments and their applicability to planetary conditions. We do so by constructing synthetic heat transfer datasets and examining their scaling properties alongside those proposed by Christensen and Aubert (2006). We find that the diffusion-free parameters compress and stretch the heat transfer data, eliminating information and creating an artificial alignment of the data. Most significantly, diffusion-free heat transfer scalings are found to be unrelated to bulk turbulence and are instead controlled by the onset of non-magnetic rotating convection, itself determined by the viscous diffusivity of the working fluid. Ultimately, our results, in conjunction with those of Stelzer and Jackson (2013) and King and Buffett (2013), show that diffusion-free scalings are not validated by current-day numerical dynamo datasets and cannot yet be extrapolated to planetary conditions.

  20. MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.

    PubMed

    Reddy, Rachamalla Maheedhar; Mohammed, Monzoorul Haque; Mande, Sharmila S

    2014-01-01

    A key challenge in analyzing metagenomics data pertains to assembly of sequenced DNA fragments (i.e. reads) originating from various microbes in a given environmental sample. Several existing methodologies can assemble reads originating from a single genome. However, these methodologies cannot be applied for efficient assembly of metagenomic sequence datasets. In this study, we present MetaCAA - a clustering-aided methodology which helps in improving the quality of metagenomic sequence assembly. MetaCAA initially groups sequences constituting a given metagenome into smaller clusters. Subsequently, sequences in each cluster are independently assembled using CAP3, an existing single genome assembly program. Contigs formed in each of the clusters along with the unassembled reads are then subjected to another round of assembly for generating the final set of contigs. Validation using simulated and real-world metagenomic datasets indicates that MetaCAA aids in improving the overall quality of assembly. A software implementation of MetaCAA is available at https://metagenomics.atc.tcs.com/MetaCAA.

  1. Improving the Fundamental Understanding of Regional Seismic Signal Processing with a Unique Western U.S. Dataset

    SciTech Connect

    Walter, W R; Smith, K; O'Boyle, J; Hauk, T F; Ryall, F; Ruppert, S D; Myers, S C; Anderson, M; Dodge, D A

    2003-07-18

    recovered and reformatted old event segmented data from the LLNL and SNL managed stations for past nuclear tests and earthquakes. We then used the preferred origin catalog to extract waveforms from continuous data and associate event segmented waveforms within the database. The result is a well-organized regional western US dataset with hundreds of nuclear tests, thousands of mining explosions and hundreds of thousands of earthquakes. In the second stage of the project we have chosen a subset of approximately 125 events that are well located and cover a range of magnitudes, source types, and locations. Ms. Flori Ryall, an experienced seismic analyst is reviewing this dataset. She is picking all arrival onsets with quantitative uncertainties and making note of data problems (timing errors, glitches, dropouts) and issues. The resulting arrivals and comments will then be loaded into the database for future researcher use. During the summer of 2003 we will be carrying out some analysis and quality control on this subset. It is anticipated that this set of consistently picked, independently located data will provide an effective test set for regional sparse station location algorithms. In addition, because the set will include nuclear tests, earthquakes, and mine-related events, each with related source parameters, it will provide a valuable test set for regional discrimination and magnitude estimation as well. A final relational database of these approximately 125 events in the high quality subset will be put onto a CD-ROM and distributed for other researchers to use in benchmarking regional algorithms after the conclusion of the project.

  2. A Dataset from TIMSS to Examine the Relationship between Computer Use and Mathematics Achievement

    ERIC Educational Resources Information Center

    Kadijevich, Djordje M.

    2015-01-01

    Because the relationship between computer use and achievement is still puzzling, there is a need to prepare and analyze good quality datasets on computer use and achievement. Such a dataset can be derived from TIMSS data. This paper describes how this dataset can be prepared. It also gives an example of how the dataset may be analyzed. The…

  3. Frame independent cosmological perturbations

    SciTech Connect

    Prokopec, Tomislav; Weenink, Jan E-mail: j.g.weenink@uu.nl

    2013-09-01

    We compute the third order gauge invariant action for scalar-graviton interactions in the Jordan frame. We demonstrate that the gauge invariant action for scalar and tensor perturbations on one physical hypersurface only differs from that on another physical hypersurface via terms proportional to the equation of motion and boundary terms, such that the evolution of non-Gaussianity may be called unique. Moreover, we demonstrate that the gauge invariant curvature perturbation and graviton on uniform field hypersurfaces in the Jordan frame are equal to their counterparts in the Einstein frame. These frame independent perturbations are therefore particularly useful in relating results in different frames at the perturbative level. On the other hand, the field perturbation and graviton on uniform curvature hypersurfaces in the Jordan and Einstein frame are non-linearly related, as are their corresponding actions and n-point functions.

  4. A Computational Approach to Qualitative Analysis in Large Textual Datasets

    PubMed Central

    Evans, Michael S.

    2014-01-01

    In this paper I introduce computational techniques to extend qualitative analysis into the study of large textual datasets. I demonstrate these techniques by using probabilistic topic modeling to analyze a broad sample of 14,952 documents published in major American newspapers from 1980 through 2012. I show how computational data mining techniques can identify and evaluate the significance of qualitatively distinct subjects of discussion across a wide range of public discourse. I also show how examining large textual datasets with computational methods can overcome methodological limitations of conventional qualitative methods, such as how to measure the impact of particular cases on broader discourse, how to validate substantive inferences from small samples of textual data, and how to determine if identified cases are part of a consistent temporal pattern. PMID:24498398

  5. geoknife: Reproducible web-processing of large gridded datasets

    USGS Publications Warehouse

    Read, Jordan S.; Walker, Jordan I.; Appling, Alison P.; Blodgett, David L.; Read, Emily Kara; Winslow, Luke A.

    2016-01-01

    Geoprocessing of large gridded data according to overlap with irregular landscape features is common to many large-scale ecological analyses. The geoknife R package was created to facilitate reproducible analyses of gridded datasets found on the U.S. Geological Survey Geo Data Portal web application or elsewhere, using a web-enabled workflow that eliminates the need to download and store large datasets that are reliably hosted on the Internet. The package provides access to several data subset and summarization algorithms that are available on remote web processing servers. Outputs from geoknife include spatial and temporal data subsets, spatially-averaged time series values filtered by user-specified areas of interest, and categorical coverage fractions for various land-use types.

  6. Robust Machine Learning Applied to Terascale Astronomical Datasets

    NASA Astrophysics Data System (ADS)

    Ball, N. M.; Brunner, R. J.; Myers, A. D.

    2008-08-01

    We present recent results from the Laboratory for Cosmological Data Mining {http://lcdm.astro.uiuc.edu} at the National Center for Supercomputing Applications (NCSA) to provide robust classifications and photometric redshifts for objects in the terascale-class Sloan Digital Sky Survey (SDSS). Through a combination of machine learning in the form of decision trees, k-nearest neighbor, and genetic algorithms, the use of supercomputing resources at NCSA, and the cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million objects in the SDSS, improved photometric redshifts, and a full exploitation of the powerful k-nearest neighbor algorithm. This work is the first to apply the full power of these algorithms to contemporary terascale astronomical datasets, and the improvement over existing results is demonstrable. We discuss issues that we have encountered in dealing with data on the terascale, and possible solutions that can be implemented to deal with upcoming petascale datasets.

  7. Validating silicon polytrodes with paired juxtacellular recordings: method and dataset.

    PubMed

    Neto, Joana P; Lopes, Gonçalo; Frazão, João; Nogueira, Joana; Lacerda, Pedro; Baião, Pedro; Aarts, Arno; Andrei, Alexandru; Musa, Silke; Fortunato, Elvira; Barquinha, Pedro; Kampff, Adam R

    2016-08-01

    Cross-validating new methods for recording neural activity is necessary to accurately interpret and compare the signals they measure. Here we describe a procedure for precisely aligning two probes for in vivo "paired-recordings" such that the spiking activity of a single neuron is monitored with both a dense extracellular silicon polytrode and a juxtacellular micropipette. Our new method allows for efficient, reliable, and automated guidance of both probes to the same neural structure with micrometer resolution. We also describe a new dataset of paired-recordings, which is available online. We propose that our novel targeting system, and ever expanding cross-validation dataset, will be vital to the development of new algorithms for automatically detecting/sorting single-units, characterizing new electrode materials/designs, and resolving nagging questions regarding the origin and nature of extracellular neural signals.

  8. Serial femtosecond crystallography datasets from G protein-coupled receptors.

    PubMed

    White, Thomas A; Barty, Anton; Liu, Wei; Ishchenko, Andrii; Zhang, Haitao; Gati, Cornelius; Zatsepin, Nadia A; Basu, Shibom; Oberthür, Dominik; Metz, Markus; Beyerlein, Kenneth R; Yoon, Chun Hong; Yefanov, Oleksandr M; James, Daniel; Wang, Dingjie; Messerschmidt, Marc; Koglin, Jason E; Boutet, Sébastien; Weierstall, Uwe; Cherezov, Vadim

    2016-08-01

    We describe the deposition of four datasets consisting of X-ray diffraction images acquired using serial femtosecond crystallography experiments on microcrystals of human G protein-coupled receptors, grown and delivered in lipidic cubic phase, at the Linac Coherent Light Source. The receptors are: the human serotonin receptor 2B in complex with an agonist ergotamine, the human δ-opioid receptor in complex with a bi-functional peptide ligand DIPP-NH2, the human smoothened receptor in complex with an antagonist cyclopamine, and finally the human angiotensin II type 1 receptor in complex with the selective antagonist ZD7155. All four datasets have been deposited, with minimal processing, in an HDF5-based file format, which can be used directly for crystallographic processing with CrystFEL or other software. We have provided processing scripts and supporting files for recent versions of CrystFEL, which can be used to validate the data.

  9. Validating silicon polytrodes with paired juxtacellular recordings: method and dataset.

    PubMed

    Neto, Joana P; Lopes, Gonçalo; Frazão, João; Nogueira, Joana; Lacerda, Pedro; Baião, Pedro; Aarts, Arno; Andrei, Alexandru; Musa, Silke; Fortunato, Elvira; Barquinha, Pedro; Kampff, Adam R

    2016-08-01

    Cross-validating new methods for recording neural activity is necessary to accurately interpret and compare the signals they measure. Here we describe a procedure for precisely aligning two probes for in vivo "paired-recordings" such that the spiking activity of a single neuron is monitored with both a dense extracellular silicon polytrode and a juxtacellular micropipette. Our new method allows for efficient, reliable, and automated guidance of both probes to the same neural structure with micrometer resolution. We also describe a new dataset of paired-recordings, which is available online. We propose that our novel targeting system, and ever expanding cross-validation dataset, will be vital to the development of new algorithms for automatically detecting/sorting single-units, characterizing new electrode materials/designs, and resolving nagging questions regarding the origin and nature of extracellular neural signals. PMID:27306671

  10. Serial femtosecond crystallography datasets from G protein-coupled receptors

    PubMed Central

    White, Thomas A.; Barty, Anton; Liu, Wei; Ishchenko, Andrii; Zhang, Haitao; Gati, Cornelius; Zatsepin, Nadia A.; Basu, Shibom; Oberthür, Dominik; Metz, Markus; Beyerlein, Kenneth R.; Yoon, Chun Hong; Yefanov, Oleksandr M.; James, Daniel; Wang, Dingjie; Messerschmidt, Marc; Koglin, Jason E.; Boutet, Sébastien; Weierstall, Uwe; Cherezov, Vadim

    2016-01-01

    We describe the deposition of four datasets consisting of X-ray diffraction images acquired using serial femtosecond crystallography experiments on microcrystals of human G protein-coupled receptors, grown and delivered in lipidic cubic phase, at the Linac Coherent Light Source. The receptors are: the human serotonin receptor 2B in complex with an agonist ergotamine, the human δ-opioid receptor in complex with a bi-functional peptide ligand DIPP-NH2, the human smoothened receptor in complex with an antagonist cyclopamine, and finally the human angiotensin II type 1 receptor in complex with the selective antagonist ZD7155. All four datasets have been deposited, with minimal processing, in an HDF5-based file format, which can be used directly for crystallographic processing with CrystFEL or other software. We have provided processing scripts and supporting files for recent versions of CrystFEL, which can be used to validate the data. PMID:27479354

  11. A computational approach to qualitative analysis in large textual datasets.

    PubMed

    Evans, Michael S

    2014-01-01

    In this paper I introduce computational techniques to extend qualitative analysis into the study of large textual datasets. I demonstrate these techniques by using probabilistic topic modeling to analyze a broad sample of 14,952 documents published in major American newspapers from 1980 through 2012. I show how computational data mining techniques can identify and evaluate the significance of qualitatively distinct subjects of discussion across a wide range of public discourse. I also show how examining large textual datasets with computational methods can overcome methodological limitations of conventional qualitative methods, such as how to measure the impact of particular cases on broader discourse, how to validate substantive inferences from small samples of textual data, and how to determine if identified cases are part of a consistent temporal pattern.

  12. Serial femtosecond crystallography datasets from G protein-coupled receptors.

    PubMed

    White, Thomas A; Barty, Anton; Liu, Wei; Ishchenko, Andrii; Zhang, Haitao; Gati, Cornelius; Zatsepin, Nadia A; Basu, Shibom; Oberthür, Dominik; Metz, Markus; Beyerlein, Kenneth R; Yoon, Chun Hong; Yefanov, Oleksandr M; James, Daniel; Wang, Dingjie; Messerschmidt, Marc; Koglin, Jason E; Boutet, Sébastien; Weierstall, Uwe; Cherezov, Vadim

    2016-01-01

    We describe the deposition of four datasets consisting of X-ray diffraction images acquired using serial femtosecond crystallography experiments on microcrystals of human G protein-coupled receptors, grown and delivered in lipidic cubic phase, at the Linac Coherent Light Source. The receptors are: the human serotonin receptor 2B in complex with an agonist ergotamine, the human δ-opioid receptor in complex with a bi-functional peptide ligand DIPP-NH2, the human smoothened receptor in complex with an antagonist cyclopamine, and finally the human angiotensin II type 1 receptor in complex with the selective antagonist ZD7155. All four datasets have been deposited, with minimal processing, in an HDF5-based file format, which can be used directly for crystallographic processing with CrystFEL or other software. We have provided processing scripts and supporting files for recent versions of CrystFEL, which can be used to validate the data. PMID:27479354

  13. Validating silicon polytrodes with paired juxtacellular recordings: method and dataset

    PubMed Central

    Lopes, Gonçalo; Frazão, João; Nogueira, Joana; Lacerda, Pedro; Baião, Pedro; Aarts, Arno; Andrei, Alexandru; Musa, Silke; Fortunato, Elvira; Barquinha, Pedro; Kampff, Adam R.

    2016-01-01

    Cross-validating new methods for recording neural activity is necessary to accurately interpret and compare the signals they measure. Here we describe a procedure for precisely aligning two probes for in vivo “paired-recordings” such that the spiking activity of a single neuron is monitored with both a dense extracellular silicon polytrode and a juxtacellular micropipette. Our new method allows for efficient, reliable, and automated guidance of both probes to the same neural structure with micrometer resolution. We also describe a new dataset of paired-recordings, which is available online. We propose that our novel targeting system, and ever expanding cross-validation dataset, will be vital to the development of new algorithms for automatically detecting/sorting single-units, characterizing new electrode materials/designs, and resolving nagging questions regarding the origin and nature of extracellular neural signals. PMID:27306671

  14. Development of a video tampering dataset for forensic investigation.

    PubMed

    Ismael Al-Sanjary, Omar; Ahmed, Ahmed Abdullah; Sulong, Ghazali

    2016-09-01

    Forgery is an act of modifying a document, product, image or video, among other media. Video tampering detection research requires an inclusive database of video modification. This paper aims to discuss a comprehensive proposal to create a dataset composed of modified videos for forensic investigation, in order to standardize existing techniques for detecting video tampering. The primary purpose of developing and designing this new video library is for usage in video forensics, which can be consciously associated with reliable verification using dynamic and static camera recognition. To the best of the author's knowledge, there exists no similar library among the research community. Videos were sourced from YouTube and by exploring social networking sites extensively by observing posted videos and rating their feedback. The video tampering dataset (VTD) comprises a total of 33 videos, divided among three categories in video tampering: (1) copy-move, (2) splicing, and (3) swapping-frames. Compared to existing datasets, this is a higher number of tampered videos, and with longer durations. The duration of every video is 16s, with a 1280×720 resolution, and a frame rate of 30 frames per second. Moreover, all videos possess the same formatting quality (720p(HD).avi). Both temporal and spatial video features were considered carefully during selection of the videos, and there exists complete information related to the doctored regions in every modified video in the VTD dataset. This database has been made publically available for research on splicing, Swapping frames, and copy-move tampering, and, as such, various video tampering detection issues with ground truth. The database has been utilised by many international researchers and groups of researchers.

  15. Wind Integration Datasets from the National Renewable Energy Laboratory (NREL)

    DOE Data Explorer

    The Wind Integration Datasets provide time-series wind data for 2004, 2005, and 2006. They are intended to be used by energy professionals such as transmission planners, utility planners, project developers, and university researchers, helping them to perform comparisons of sites and estimate power production from hypothetical wind plants. NREL cautions that the information from modeled data may not match wind resource information shown on NREL;s state wind maps as they were created for different purposes and using different methodologies.

  16. Soil chemistry in lithologically diverse datasets: the quartz dilution effect

    USGS Publications Warehouse

    Bern, Carleton R.

    2009-01-01

    National- and continental-scale soil geochemical datasets are likely to move our understanding of broad soil geochemistry patterns forward significantly. Patterns of chemistry and mineralogy delineated from these datasets are strongly influenced by the composition of the soil parent material, which itself is largely a function of lithology and particle size sorting. Such controls present a challenge by obscuring subtler patterns arising from subsequent pedogenic processes. Here the effect of quartz concentration is examined in moist-climate soils from a pilot dataset of the North American Soil Geochemical Landscapes Project. Due to variable and high quartz contents (6.2–81.7 wt.%), and its residual and inert nature in soil, quartz is demonstrated to influence broad patterns in soil chemistry. A dilution effect is observed whereby concentrations of various elements are significantly and strongly negatively correlated with quartz. Quartz content drives artificial positive correlations between concentrations of some elements and obscures negative correlations between others. Unadjusted soil data show the highly mobile base cations Ca, Mg, and Na to be often strongly positively correlated with intermediately mobile Al or Fe, and generally uncorrelated with the relatively immobile high-field-strength elements (HFS) Ti and Nb. Both patterns are contrary to broad expectations for soils being weathered and leached. After transforming bulk soil chemistry to a quartz-free basis, the base cations are generally uncorrelated with Al and Fe, and negative correlations generally emerge with the HFS elements. Quartz-free element data may be a useful tool for elucidating patterns of weathering or parent-material chemistry in large soil datasets.

  17. Microscopic images dataset for automation of RBCs counting.

    PubMed

    Abbas, Sherif

    2015-12-01

    A method for Red Blood Corpuscles (RBCs) counting has been developed using RBCs light microscopic images and Matlab algorithm. The Dataset consists of Red Blood Corpuscles (RBCs) images and there RBCs segmented images. A detailed description using flow chart is given in order to show how to produce RBCs mask. The RBCs mask was used to count the number of RBCs in the blood smear image.

  18. Development of a video tampering dataset for forensic investigation.

    PubMed

    Ismael Al-Sanjary, Omar; Ahmed, Ahmed Abdullah; Sulong, Ghazali

    2016-09-01

    Forgery is an act of modifying a document, product, image or video, among other media. Video tampering detection research requires an inclusive database of video modification. This paper aims to discuss a comprehensive proposal to create a dataset composed of modified videos for forensic investigation, in order to standardize existing techniques for detecting video tampering. The primary purpose of developing and designing this new video library is for usage in video forensics, which can be consciously associated with reliable verification using dynamic and static camera recognition. To the best of the author's knowledge, there exists no similar library among the research community. Videos were sourced from YouTube and by exploring social networking sites extensively by observing posted videos and rating their feedback. The video tampering dataset (VTD) comprises a total of 33 videos, divided among three categories in video tampering: (1) copy-move, (2) splicing, and (3) swapping-frames. Compared to existing datasets, this is a higher number of tampered videos, and with longer durations. The duration of every video is 16s, with a 1280×720 resolution, and a frame rate of 30 frames per second. Moreover, all videos possess the same formatting quality (720p(HD).avi). Both temporal and spatial video features were considered carefully during selection of the videos, and there exists complete information related to the doctored regions in every modified video in the VTD dataset. This database has been made publically available for research on splicing, Swapping frames, and copy-move tampering, and, as such, various video tampering detection issues with ground truth. The database has been utilised by many international researchers and groups of researchers. PMID:27574113

  19. Quantification of NSW Ambulance Record Linkages with Multiple External Datasets.

    PubMed

    Carroll, Therese; Muecke, Sandy; Simpson, Judy; Irvine, Katie; Jenkins, André

    2015-01-01

    This study has two aims: 1) to describe linkage rates between ambulance data and external datasets for "episodes of care" and "patient only" linkages in New South Wales (NSW), Australia; and 2) to detect and report any systematic issues with linkage that relate to patients, and operational or clinical variables that may introduce bias in subsequent studies if not adequately addressed. During 2010-11, the Centre for Health Record Linkage (CHeReL) in NSW, linked the records for patients attended by NSW Ambulance paramedics for the period July 2006 to June 2009, with four external datasets: Emergency Department Data Collection; Admitted Patient Data Collection; NSW Registry of Births, Deaths and Marriages death registration data; and the Australian Bureau of Statistics mortality data. This study reports linkage rates in terms of those "expected" to link and those who were "not expected" to link with external databases within 24 hours of paramedic attendance. Following thorough data preparation processes, 2,041,728 NSW Ambulance care episodes for 1,116,509 patients fulfilled the inclusion criteria. The overall episode-specific hospital linkage rate was 97.2%. Where a patient was not transported to hospital following paramedic care, 8.6% of these episodes resulted in an emergency department attendance within 24 hours. For all care episodes, 5.2% linked to a death record at some time within the 3-year period, with 2.4% of all death episodes occurring within 7 days of a paramedic encounter. For NSW Ambulance episodes of care that were expected to link to an external dataset but did not, nonlinkage to hospital admission records tended to decrease with age. For all other variables, issues relating to rates of linkage and nonlinkage were more indiscriminate. This quantification of the limitations of this large linked dataset will underpin the interpretation and results of ensuing studies that will inform future clinical and operational policies and practices at NSW Ambulance.

  20. Circumpolar dataset of sequenced specimens of Promachocrinus kerguelensis (Echinodermata, Crinoidea).

    PubMed

    Hemery, Lenaïg G; Améziane, Nadia; Eléaume, Marc

    2013-01-01

    This circumpolar dataset of the comatulid (Echinodermata: Crinoidea) Promachocrinus kerguelensis (Carpenter, 1888) from the Southern Ocean, documents biodiversity associated with the specimens sequenced in Hemery et al. (2012). The aim of Hemery et al. (2012) paper was to use phylogeographic and phylogenetic tools to assess the genetic diversity, demographic history and evolutionary relationships of this very common and abundant comatulid, in the context of the glacial history of the Antarctic and Sub-Antarctic shelves (Thatje et al. 2005, 2008). Over one thousand three hundred specimens (1307) used in this study were collected during seventeen cruises from 1996 to 2010, in eight regions of the Southern Ocean: Kerguelen Plateau, Davis Sea, Dumont d'Urville Sea, Ross Sea, Amundsen Sea, West Antarctic Peninsula, East Weddell Sea and Scotia Arc including the tip of the Antarctic Peninsula and the Bransfield Strait. We give here the metadata of this dataset, which lists sampling sources (cruise ID, ship name, sampling date, sampling gear), sampling sites (station, geographic coordinates, depth) and genetic data (phylogroup, haplotype, sequence ID) for each of the 1307 specimens. The identification of the specimens was controlled by an expert taxonomist specialist of crinoids (Marc Eléaume, Muséum national d'Histoire naturelle, Paris) and all the COI sequences were matched against those available on the Barcode of Life Data System (BOLD: http://www.boldsystems.org/index.php/IDS_OpenIdEngine). This dataset can be used by studies dealing with, among other interests, Antarctic and/or crinoid diversity (species richness, distribution patterns), biogeography or habitat / ecological niche modeling. This dataset is accessible through the GBIF network at http://ipt.biodiversity.aq/resource.do?r=proke.

  1. GLEAM version 3: Global Land Evaporation Datasets and Model

    NASA Astrophysics Data System (ADS)

    Martens, B.; Miralles, D. G.; Lievens, H.; van der Schalie, R.; de Jeu, R.; Fernandez-Prieto, D.; Verhoest, N.

    2015-12-01

    Terrestrial evaporation links energy, water and carbon cycles over land and is therefore a key variable of the climate system. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to limitations in in situ measurements. As a result, several methods have risen to estimate global patterns of land evaporation from satellite observations. However, these algorithms generally differ in their approach to model evaporation, resulting in large differences in their estimates. One of these methods is GLEAM, the Global Land Evaporation: the Amsterdam Methodology. GLEAM estimates terrestrial evaporation based on daily satellite observations of meteorological variables, vegetation characteristics and soil moisture. Since the publication of the first version of the algorithm (2011), the model has been widely applied to analyse trends in the water cycle and land-atmospheric feedbacks during extreme hydrometeorological events. A third version of the GLEAM global datasets is foreseen by the end of 2015. Given the relevance of having a continuous and reliable record of global-scale evaporation estimates for climate and hydrological research, the establishment of an online data portal to host these data to the public is also foreseen. In this new release of the GLEAM datasets, different components of the model have been updated, with the most significant change being the revision of the data assimilation algorithm. In this presentation, we will highlight the most important changes of the methodology and present three new GLEAM datasets and their validation against in situ observations and an alternative dataset of terrestrial evaporation (ERA-Land). Results of the validation exercise indicate that the magnitude and the spatiotemporal variability of the modelled evaporation agree reasonably well with the estimates of ERA-Land and the in situ

  2. Multiresolution persistent homology for excessively large biomolecular datasets

    NASA Astrophysics Data System (ADS)

    Xia, Kelin; Zhao, Zhixiong; Wei, Guo-Wei

    2015-10-01

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  3. Automatic run-time provenance capture for scientific dataset generation

    NASA Astrophysics Data System (ADS)

    Frew, J.; Slaughter, P.

    2008-12-01

    Provenance---the directed graph of a dataset's processing history---is difficult to capture effectively. Human- generated provenance, as narrative metadata, is labor-intensive and thus often incorrect, incomplete, or simply not recorded. Workflow systems capture some provenance implicitly in workflow specifications, but these systems are not ubiquitous or standardized, and a workflow specification may not capture all of the factors involved in a dataset's production. System audit trails capture potentially all processing activities, but not the relationships between them. We describe a system that transparently (i.e., without any modification to science codes) and automatically (i.e. without any human intervention) captures the low-level interactions (files read/written, parameters accessed, etc.) between scientific processes, and then synthesizes these relationships into a provenance graph. This system---the Earth System Science Server (ES3)---is sufficiently general that it can accommodate any combination of stand-alone programs, interpreted codes (e.g. IDL), and command- language scripts. Provenance in ES3 can be published in well-defined XML formats (including formats suitable for graphical visualization), and queried to determine the ancestors or descendants of any specific data file or process invocation. We demonstrate how ES3 can be used to capture the provenance of a large operational ocean color dataset.

  4. Classification of large microarray datasets using fast random forest construction.

    PubMed

    Manilich, Elena A; Özsoyoğlu, Z Meral; Trubachev, Valeriy; Radivoyevitch, Tomas

    2011-04-01

    Random forest is an ensemble classification algorithm. It performs well when most predictive variables are noisy and can be used when the number of variables is much larger than the number of observations. The use of bootstrap samples and restricted subsets of attributes makes it more powerful than simple ensembles of trees. The main advantage of a random forest classifier is its explanatory power: it measures variable importance or impact of each factor on a predicted class label. These characteristics make the algorithm ideal for microarray data. It was shown to build models with high accuracy when tested on high-dimensional microarray datasets. Current implementations of random forest in the machine learning and statistics community, however, limit its usability for mining over large datasets, as they require that the entire dataset remains permanently in memory. We propose a new framework, an optimized implementation of a random forest classifier, which addresses specific properties of microarray data, takes computational complexity of a decision tree algorithm into consideration, and shows excellent computing performance while preserving predictive accuracy. The implementation is based on reducing overlapping computations and eliminating dependency on the size of main memory. The implementation's excellent computational performance makes the algorithm useful for interactive data analyses and data mining.

  5. Variability in docking success rates due to dataset preparation.

    PubMed

    Corbeil, Christopher R; Williams, Christopher I; Labute, Paul

    2012-06-01

    The results of cognate docking with the prepared Astex dataset provided by the organizers of the "Docking and Scoring: A Review of Docking Programs" session at the 241st ACS national meeting are presented. The MOE software with the newly developed GBVI/WSA dG scoring function is used throughout the study. For 80 % of the Astex targets, the MOE docker produces a top-scoring pose within 2 Å of the X-ray structure. For 91 % of the targets a pose within 2 Å of the X-ray structure is produced in the top 30 poses. Docking failures, defined as cases where the top scoring pose is greater than 2 Å from the experimental structure, are shown to be largely due to the absence of bound waters in the source dataset, highlighting the need to include these and other crucial information in future standardized sets. Docking success is shown to depend heavily on data preparation. A "dataset preparation" error of 0.5 kcal/mol is shown to cause fluctuations of over 20 % in docking success rates. PMID:22566074

  6. Securely Measuring the Overlap between Private Datasets with Cryptosets

    PubMed Central

    Swamidass, S. Joshua; Matlock, Matthew; Rozenblit, Leon

    2015-01-01

    Many scientific questions are best approached by sharing data—collected by different groups or across large collaborative networks—into a combined analysis. Unfortunately, some of the most interesting and powerful datasets—like health records, genetic data, and drug discovery data—cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset’s contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach “information-theoretic” security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure. PMID:25714898

  7. Nanocubes for real-time exploration of spatiotemporal datasets.

    PubMed

    Lins, Lauro; Klosowski, James T; Scheidegger, Carlos

    2013-12-01

    Consider real-time exploration of large multidimensional spatiotemporal datasets with billions of entries, each defined by a location, a time, and other attributes. Are certain attributes correlated spatially or temporally? Are there trends or outliers in the data? Answering these questions requires aggregation over arbitrary regions of the domain and attributes of the data. Many relational databases implement the well-known data cube aggregation operation, which in a sense precomputes every possible aggregate query over the database. Data cubes are sometimes assumed to take a prohibitively large amount of space, and to consequently require disk storage. In contrast, we show how to construct a data cube that fits in a modern laptop's main memory, even for billions of entries; we call this data structure a nanocube. We present algorithms to compute and query a nanocube, and show how it can be used to generate well-known visual encodings such as heatmaps, histograms, and parallel coordinate plots. When compared to exact visualizations created by scanning an entire dataset, nanocube plots have bounded screen error across a variety of scales, thanks to a hierarchical structure in space and time. We demonstrate the effectiveness of our technique on a variety of real-world datasets, and present memory, timing, and network bandwidth measurements. We find that the timings for the queries in our examples are dominated by network and user-interaction latencies.

  8. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy

    PubMed Central

    Levin, Barnaby D.A.; Padgett, Elliot; Chen, Chien-Chun; Scott, M.C.; Xu, Rui; Theis, Wolfgang; Jiang, Yi; Yang, Yongsoo; Ophus, Colin; Zhang, Haitao; Ha, Don-Hyung; Wang, Deli; Yu, Yingchao; Abruña, Hector D.; Robinson, Richard D.; Ercius, Peter; Kourkoutis, Lena F.; Miao, Jianwei; Muller, David A.; Hovden, Robert

    2016-01-01

    Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the current limits of experimental technique, are of high quality, and contain materials with structural complexity. Included are tomographic series of a hyperbranched Co2P nanocrystal, platinum nanoparticles on a carbon nanofibre imaged over the complete 180° tilt range, a platinum nanoparticle and a tungsten needle both imaged at atomic resolution by equal slope tomography, and a through-focal tilt series of PtCu nanoparticles. A volumetric reconstruction from every dataset is provided for comparison and development of post-processing and visualization techniques. Researchers interested in creating novel data processing and reconstruction algorithms will now have access to state of the art experimental test data. PMID:27272459

  9. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy.

    PubMed

    Levin, Barnaby D A; Padgett, Elliot; Chen, Chien-Chun; Scott, M C; Xu, Rui; Theis, Wolfgang; Jiang, Yi; Yang, Yongsoo; Ophus, Colin; Zhang, Haitao; Ha, Don-Hyung; Wang, Deli; Yu, Yingchao; Abruña, Hector D; Robinson, Richard D; Ercius, Peter; Kourkoutis, Lena F; Miao, Jianwei; Muller, David A; Hovden, Robert

    2016-01-01

    Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the current limits of experimental technique, are of high quality, and contain materials with structural complexity. Included are tomographic series of a hyperbranched Co2P nanocrystal, platinum nanoparticles on a carbon nanofibre imaged over the complete 180° tilt range, a platinum nanoparticle and a tungsten needle both imaged at atomic resolution by equal slope tomography, and a through-focal tilt series of PtCu nanoparticles. A volumetric reconstruction from every dataset is provided for comparison and development of post-processing and visualization techniques. Researchers interested in creating novel data processing and reconstruction algorithms will now have access to state of the art experimental test data. PMID:27272459

  10. Web-based 2-d Visualization with Large Datasets

    NASA Astrophysics Data System (ADS)

    Goldina, T.; Roby, W.; Wu, X.; Ly, L.

    2015-09-01

    Modern astronomical surveys produce large catalogs. Modern archives are web-based. As the science becomes more and more data driven, the pressure on visualization tools to support large datasets increases. While tables can render one page at a time, image overlays showing the returned catalog entries or XY plots showing the relationship between table columns must cover all of the rows to be meaningful. The large data set could easily overwhelm the browsers capabilities. Therefore the amount of data to be transported or rendered must be reduced. IRSA's catalog visualization is based on Firefly package, developed in IPAC (Roby 2013). Firefly is used by multiple web-based tools and archives, maintained by IRSA: Catalog Search, Spitzer, WISE, Plank, etc. Its distinctive feature is the tri-view: table, image overlay, and XY plot. All three highly interactive components are integrated together. The tri-view presentation allows an astronomer to dissect a dataset in various ways and to detect underlying structure and anomalies in the data, which makes it a handy tool for data exploration. Many challenges are encountered when only a subset of data is used in place of the full data set. Preserving coherence and maintaining the ability to select and filter data become issues. This talk addresses how we have solved problems in large dataset visualization.

  11. Multiresolution persistent homology for excessively large biomolecular datasets

    PubMed Central

    Xia, Kelin; Zhao, Zhixiong; Wei, Guo-Wei

    2015-01-01

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs. PMID:26450288

  12. Igloo-Plot: a tool for visualization of multidimensional datasets.

    PubMed

    Kuntal, Bhusan K; Ghosh, Tarini Shankar; Mande, Sharmila S

    2014-01-01

    Advances in science and technology have resulted in an exponential growth of multivariate (or multi-dimensional) datasets which are being generated from various research areas especially in the domain of biological sciences. Visualization and analysis of such data (with the objective of uncovering the hidden patterns therein) is an important and challenging task. We present a tool, called Igloo-Plot, for efficient visualization of multidimensional datasets. The tool addresses some of the key limitations of contemporary multivariate visualization and analysis tools. The visualization layout, not only facilitates an easy identification of clusters of data-points having similar feature compositions, but also the 'marker features' specific to each of these clusters. The applicability of the various functionalities implemented herein is demonstrated using several well studied multi-dimensional datasets. Igloo-Plot is expected to be a valuable resource for researchers working in multivariate data mining studies. Igloo-Plot is available for download from: http://metagenomics.atc.tcs.com/IglooPlot/.

  13. Multiresolution persistent homology for excessively large biomolecular datasets

    SciTech Connect

    Xia, Kelin; Zhao, Zhixiong; Wei, Guo-Wei

    2015-10-07

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  14. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy.

    PubMed

    Levin, Barnaby D A; Padgett, Elliot; Chen, Chien-Chun; Scott, M C; Xu, Rui; Theis, Wolfgang; Jiang, Yi; Yang, Yongsoo; Ophus, Colin; Zhang, Haitao; Ha, Don-Hyung; Wang, Deli; Yu, Yingchao; Abruña, Hector D; Robinson, Richard D; Ercius, Peter; Kourkoutis, Lena F; Miao, Jianwei; Muller, David A; Hovden, Robert

    2016-06-07

    Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the current limits of experimental technique, are of high quality, and contain materials with structural complexity. Included are tomographic series of a hyperbranched Co2P nanocrystal, platinum nanoparticles on a carbon nanofibre imaged over the complete 180° tilt range, a platinum nanoparticle and a tungsten needle both imaged at atomic resolution by equal slope tomography, and a through-focal tilt series of PtCu nanoparticles. A volumetric reconstruction from every dataset is provided for comparison and development of post-processing and visualization techniques. Researchers interested in creating novel data processing and reconstruction algorithms will now have access to state of the art experimental test data.

  15. Image segmentation evaluation for very-large datasets

    NASA Astrophysics Data System (ADS)

    Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

    2016-03-01

    With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.

  16. Boosting medical diagnostics by pooling independent judgments

    PubMed Central

    Kurvers, Ralf H. J. M.; Herzog, Stefan M.; Hertwig, Ralph; Krause, Jens; Carney, Patricia A.; Bogart, Andy; Argenziano, Giuseppe; Zalaudek, Iris; Wolf, Max

    2016-01-01

    Collective intelligence refers to the ability of groups to outperform individual decision makers when solving complex cognitive problems. Despite its potential to revolutionize decision making in a wide range of domains, including medical, economic, and political decision making, at present, little is known about the conditions underlying collective intelligence in real-world contexts. We here focus on two key areas of medical diagnostics, breast and skin cancer detection. Using a simulation study that draws on large real-world datasets, involving more than 140 doctors making more than 20,000 diagnoses, we investigate when combining the independent judgments of multiple doctors outperforms the best doctor in a group. We find that similarity in diagnostic accuracy is a key condition for collective intelligence: Aggregating the independent judgments of doctors outperforms the best doctor in a group whenever the diagnostic accuracy of doctors is relatively similar, but not when doctors’ diagnostic accuracy differs too much. This intriguingly simple result is highly robust and holds across different group sizes, performance levels of the best doctor, and collective intelligence rules. The enabling role of similarity, in turn, is explained by its systematic effects on the number of correct and incorrect decisions of the best doctor that are overruled by the collective. By identifying a key factor underlying collective intelligence in two important real-world contexts, our findings pave the way for innovative and more effective approaches to complex real-world decision making, and to the scientific analyses of those approaches. PMID:27432950

  17. Boosting medical diagnostics by pooling independent judgments.

    PubMed

    Kurvers, Ralf H J M; Herzog, Stefan M; Hertwig, Ralph; Krause, Jens; Carney, Patricia A; Bogart, Andy; Argenziano, Giuseppe; Zalaudek, Iris; Wolf, Max

    2016-08-01

    Collective intelligence refers to the ability of groups to outperform individual decision makers when solving complex cognitive problems. Despite its potential to revolutionize decision making in a wide range of domains, including medical, economic, and political decision making, at present, little is known about the conditions underlying collective intelligence in real-world contexts. We here focus on two key areas of medical diagnostics, breast and skin cancer detection. Using a simulation study that draws on large real-world datasets, involving more than 140 doctors making more than 20,000 diagnoses, we investigate when combining the independent judgments of multiple doctors outperforms the best doctor in a group. We find that similarity in diagnostic accuracy is a key condition for collective intelligence: Aggregating the independent judgments of doctors outperforms the best doctor in a group whenever the diagnostic accuracy of doctors is relatively similar, but not when doctors' diagnostic accuracy differs too much. This intriguingly simple result is highly robust and holds across different group sizes, performance levels of the best doctor, and collective intelligence rules. The enabling role of similarity, in turn, is explained by its systematic effects on the number of correct and incorrect decisions of the best doctor that are overruled by the collective. By identifying a key factor underlying collective intelligence in two important real-world contexts, our findings pave the way for innovative and more effective approaches to complex real-world decision making, and to the scientific analyses of those approaches.

  18. Boosting medical diagnostics by pooling independent judgments.

    PubMed

    Kurvers, Ralf H J M; Herzog, Stefan M; Hertwig, Ralph; Krause, Jens; Carney, Patricia A; Bogart, Andy; Argenziano, Giuseppe; Zalaudek, Iris; Wolf, Max

    2016-08-01

    Collective intelligence refers to the ability of groups to outperform individual decision makers when solving complex cognitive problems. Despite its potential to revolutionize decision making in a wide range of domains, including medical, economic, and political decision making, at present, little is known about the conditions underlying collective intelligence in real-world contexts. We here focus on two key areas of medical diagnostics, breast and skin cancer detection. Using a simulation study that draws on large real-world datasets, involving more than 140 doctors making more than 20,000 diagnoses, we investigate when combining the independent judgments of multiple doctors outperforms the best doctor in a group. We find that similarity in diagnostic accuracy is a key condition for collective intelligence: Aggregating the independent judgments of doctors outperforms the best doctor in a group whenever the diagnostic accuracy of doctors is relatively similar, but not when doctors' diagnostic accuracy differs too much. This intriguingly simple result is highly robust and holds across different group sizes, performance levels of the best doctor, and collective intelligence rules. The enabling role of similarity, in turn, is explained by its systematic effects on the number of correct and incorrect decisions of the best doctor that are overruled by the collective. By identifying a key factor underlying collective intelligence in two important real-world contexts, our findings pave the way for innovative and more effective approaches to complex real-world decision making, and to the scientific analyses of those approaches. PMID:27432950

  19. Production of a national 1:1,000,000-scale hydrography dataset for the United States: feature selection, simplification, and refinement

    USGS Publications Warehouse

    Gary, Robin H.; Wilson, Zachary D.; Archuleta, Christy-Ann M.; Thompson, Florence E.; Vrabel, Joseph

    2009-01-01

    During 2006-09, the U.S. Geological Survey, in cooperation with the National Atlas of the United States, produced a 1:1,000,000-scale (1:1M) hydrography dataset comprising streams and waterbodies for the entire United States, including Puerto Rico and the U.S. Virgin Islands, for inclusion in the recompiled National Atlas. This report documents the methods used to select, simplify, and refine features in the 1:100,000-scale (1:100K) (1:63,360-scale in Alaska) National Hydrography Dataset to create the national 1:1M hydrography dataset. Custom tools and semi-automated processes were created to facilitate generalization of the 1:100K National Hydrography Dataset (1:63,360-scale in Alaska) to 1:1M on the basis of existing small-scale hydrography datasets. The first step in creating the new 1:1M dataset was to address feature selection and optimal data density in the streams network. Several existing methods were evaluated. The production method that was established for selecting features for inclusion in the 1:1M dataset uses a combination of the existing attributes and network in the National Hydrography Dataset and several of the concepts from the methods evaluated. The process for creating the 1:1M waterbodies dataset required a similar approach to that used for the streams dataset. Geometric simplification of features was the next step. Stream reaches and waterbodies indicated in the feature selection process were exported as new feature classes and then simplified using a geographic information system tool. The final step was refinement of the 1:1M streams and waterbodies. Refinement was done through the use of additional geographic information system tools.

  20. Evaluation of catchment delineation methods for the medium-resolution National Hydrography Dataset

    USGS Publications Warehouse

    Johnston, Craig M.; Dewald, Thomas G.; Bondelid, Timothy R.; Worstell, Bruce B.; McKay, Lucinda D.; Rea, Alan; Moore, Richard B.; Goodall, Jonathan L.

    2009-01-01

    Different methods for determining catchments (incremental drainage areas) for stream segments of the medium-resolution (1:100,000-scale) National Hydrography Dataset (NHD) were evaluated by the U.S. Geological Survey (USGS), in cooperation with the U.S. Environmental Protection Agency (USEPA). The NHD is a comprehensive set of digital spatial data that contains information about surface-water features (such as lakes, ponds, streams, and rivers) of the United States. The need for NHD catchments was driven primarily by the goal to estimate NHD streamflow and velocity to support water-quality modeling. The application of catchments for this purpose also demonstrates the broader value of NHD catchments for supporting landscape characterization and analysis. Five catchment delineation methods were evaluated. Four of the methods use topographic information for the delineation of the NHD catchments. These methods include the Raster Seeding Method; two variants of a method first used in a USGS New England study-one used the Watershed Boundary Dataset (WBD) and the other did not-termed the 'New England Methods'; and the Outlet Matching Method. For these topographically based methods, the elevation data source was the 30-meter (m) resolution National Elevation Dataset (NED), as this was the highest resolution available for the conterminous United States and Hawaii. The fifth method evaluated, the Thiessen Polygon Method, uses distance to the nearest NHD stream segments to determine catchment boundaries. Catchments were generated using each method for NHD stream segments within six hydrologically and geographically distinct Subbasins to evaluate the applicability of the method across the United States. The five methods were evaluated by comparing the resulting catchments with the boundaries and the computed area measurements available from several verification datasets that were developed independently using manual methods. The results of the evaluation indicated that the two

  1. Online Visualization and Analysis of Merged Global Geostationary Satellite Infrared Dataset

    NASA Technical Reports Server (NTRS)

    Liu, Zhong; Ostrenga, D.; Leptoukh, G.; Mehta, A.

    2008-01-01

    The NASA Goddard Earth Sciences Data Information Services Center (GES DISC) is home of Tropical Rainfall Measuring Mission (TRMM) data archive. The global merged IR product also known as the NCEP/CPC 4-km Global (60 degrees N - 60 degrees S) IR Dataset, is one of TRMM ancillary datasets. They are globally merged (60 degrees N - 60 degrees S) pixel-resolution (4 km) IR brightness temperature data (equivalent blackbody temperatures), merged from all available geostationary satellites (GOES-8/10, METEOSAT-7/5 and GMS). The availability of data from METEOSAT-5, which is located at 63E at the present time, yields a unique opportunity for total global (60 degrees N- 60 degrees S) coverage. The GES DISC has collected over 8 years of the data beginning from February of 2000. This high temporal resolution dataset can not only provide additional background information to TRMM and other satellite missions, but also allow observing a wide range of meteorological phenomena from space, such as, mesoscale convection systems, tropical cyclones, hurricanes, etc. The dataset can also be used to verify model simulations. Despite that the data can be downloaded via ftp, however, its large volume poses a challenge for many users. A single file occupies about 70 MB disk space and there is a total of approximately 73,000 files (approximately 4.5 TB) for the past 8 years. In order to facilitate data access, we have developed a web prototype to allow users to conduct online visualization and analysis of this dataset. With a web browser and few mouse clicks, users can have a full access to over 8 year and over 4.5 TB data and generate black and white IR imagery and animation without downloading any software and data. In short, you can make your own images! Basic functions include selection of area of interest, single imagery or animation, a time skip capability for different temporal resolution and image size. Users can save an animation as a file (animated gif) and import it in other

  2. CHARMe Commentary metadata for Climate Science: collecting, linking and sharing user feedback on climate datasets

    NASA Astrophysics Data System (ADS)

    Blower, Jon; Lawrence, Bryan; Kershaw, Philip; Nagni, Maurizio

    2014-05-01

    The research process can be thought of as an iterative activity, initiated based on prior domain knowledge, as well on a number of external inputs, and producing a range of outputs including datasets, studies and peer reviewed publications. These outputs may describe the problem under study, the methodology used, the results obtained, etc. In any new publication, the author may cite or comment other papers or datasets in order to support their research hypothesis. However, as their work progresses, the researcher may draw from many other latent channels of information. These could include for example, a private conversation following a lecture or during a social dinner; an opinion expressed concerning some significant event such as an earthquake or for example a satellite failure. In addition, other sources of information of grey literature are important public such as informal papers such as the arxiv deposit, reports and studies. The climate science community is not an exception to this pattern; the CHARMe project, funded under the European FP7 framework, is developing an online system for collecting and sharing user feedback on climate datasets. This is to help users judge how suitable such climate data are for an intended application. The user feedback could be comments about assessments, citations, or provenance of the dataset, or other information such as descriptions of uncertainty or data quality. We define this as a distinct category of metadata called Commentary or C-metadata. We link C-metadata with target climate datasets using a Linked Data approach via the Open Annotation data model. In the context of Linked Data, C-metadata plays the role of a resource which, depending on its nature, may be accessed as simple text or as more structured content. The project is implementing a range of software tools to create, search or visualize C-metadata including a JavaScript plugin enabling this functionality to be integrated in situ with data provider portals

  3. Can atmospheric reanalysis datasets be used to reproduce flood characteristics?

    NASA Astrophysics Data System (ADS)

    Andreadis, K.; Schumann, G.; Stampoulis, D.

    2014-12-01

    Floods are one of the costliest natural disasters and the ability to understand their characteristics and their interactions with population, land cover and climate changes is of paramount importance. In order to accurately reproduce flood characteristics such as water inundation and heights both in the river channels and floodplains, hydrodynamic models are required. Most of these models operate at very high resolutions and are computationally very expensive, making their application over large areas very difficult. However, a need exists for such models to be applied at regional to global scales so that the effects of climate change with regards to flood risk can be examined. We use the LISFLOOD-FP hydrodynamic model to simulate a 40-year history of flood characteristics at the continental scale, particularly over Australia. LISFLOOD-FP is a 2-D hydrodynamic model that solves the approximate Saint-Venant equations at large scales (on the order of 1 km) using a sub-grid representation of the river channel. This implementation is part of an effort towards a global 1-km flood modeling framework that will allow the reconstruction of a long-term flood climatology. The components of this framework include a hydrologic model (the widely-used Variable Infiltration Capacity model) and a meteorological dataset that forces it. In order to extend the simulated flood climatology to 50-100 years in a consistent manner, reanalysis datasets have to be used. The objective of this study is the evaluation of multiple atmospheric reanalysis datasets (ERA, NCEP, MERRA, JRA) as inputs to the VIC/LISFLOOD-FP model. Comparisons of the simulated flood characteristics are made with both satellite observations of inundation and a benchmark simulation of LISFLOOD-FP being forced by observed flows. Finally, the implications of the availability of a global flood modeling framework for producing flood hazard maps and disseminating disaster information are discussed.

  4. Determining similarity of scientific entities in annotation datasets

    PubMed Central

    Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

    2015-01-01

    Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug–drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called ‘AnnSim’ that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1–1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ PMID:25725057

  5. Determining similarity of scientific entities in annotation datasets.

    PubMed

    Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

    2015-01-01

    Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug-drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called 'AnnSim' that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1-1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ PMID:25725057

  6. Rapid Global Fitting of Large Fluorescence Lifetime Imaging Microscopy Datasets

    PubMed Central

    Warren, Sean C.; Margineanu, Anca; Alibhai, Dominic; Kelly, Douglas J.; Talbot, Clifford; Alexandrov, Yuriy; Munro, Ian; Katan, Matilda

    2013-01-01

    Fluorescence lifetime imaging (FLIM) is widely applied to obtain quantitative information from fluorescence signals, particularly using Förster Resonant Energy Transfer (FRET) measurements to map, for example, protein-protein interactions. Extracting FRET efficiencies or population fractions typically entails fitting data to complex fluorescence decay models but such experiments are frequently photon constrained, particularly for live cell or in vivo imaging, and this leads to unacceptable errors when analysing data on a pixel-wise basis. Lifetimes and population fractions may, however, be more robustly extracted using global analysis to simultaneously fit the fluorescence decay data of all pixels in an image or dataset to a multi-exponential model under the assumption that the lifetime components are invariant across the image (dataset). This approach is often considered to be prohibitively slow and/or computationally expensive but we present here a computationally efficient global analysis algorithm for the analysis of time-correlated single photon counting (TCSPC) or time-gated FLIM data based on variable projection. It makes efficient use of both computer processor and memory resources, requiring less than a minute to analyse time series and multiwell plate datasets with hundreds of FLIM images on standard personal computers. This lifetime analysis takes account of repetitive excitation, including fluorescence photons excited by earlier pulses contributing to the fit, and is able to accommodate time-varying backgrounds and instrument response functions. We demonstrate that this global approach allows us to readily fit time-resolved fluorescence data to complex models including a four-exponential model of a FRET system, for which the FRET efficiencies of the two species of a bi-exponential donor are linked, and polarisation-resolved lifetime data, where a fluorescence intensity and bi-exponential anisotropy decay model is applied to the analysis of live cell

  7. Fast methods for training Gaussian processes on large datasets.

    PubMed

    Moore, C J; Chua, A J K; Berry, C P L; Gair, J R

    2016-05-01

    Gaussian process regression (GPR) is a non-parametric Bayesian technique for interpolating or fitting data. The main barrier to further uptake of this powerful tool rests in the computational costs associated with the matrices which arise when dealing with large datasets. Here, we derive some simple results which we have found useful for speeding up the learning stage in the GPR algorithm, and especially for performing Bayesian model comparison between different covariance functions. We apply our techniques to both synthetic and real data and quantify the speed-up relative to using nested sampling to numerically evaluate model evidences. PMID:27293793

  8. Visualization and data sharing of COSMIC radio occultation dataset

    NASA Astrophysics Data System (ADS)

    Ho, Y.; Weber, W. J.; Chastang, J.; Murray, D.; McWhirter, J.; Integrated Data Viewer

    2010-12-01

    Visualizing the trajectory and the sounding profile of the COSMIC netCDF dataset, and its evolution through time is developed in Unidata's Integrated data Viewer (IDV). The COSMIC radio occultation data is located in a remote data server called RAMADDA, which is a content management system for earth science data. The combination of these two software packages provides a powerful visualization and analysis tools for sharing real time and archived data for research and education. In this presentation we would like to demonstrate the development and the usage of these two software packages.

  9. Agile data management for curation of genomes to watershed datasets

    NASA Astrophysics Data System (ADS)

    Varadharajan, C.; Agarwal, D.; Faybishenko, B.; Versteeg, R.

    2015-12-01

    A software platform is being developed for data management and assimilation [DMA] as part of the U.S. Department of Energy's Genomes to Watershed Sustainable Systems Science Focus Area 2.0. The DMA components and capabilities are driven by the project science priorities and the development is based on agile development techniques. The goal of the DMA software platform is to enable users to integrate and synthesize diverse and disparate field, laboratory, and simulation datasets, including geological, geochemical, geophysical, microbiological, hydrological, and meteorological data across a range of spatial and temporal scales. The DMA objectives are (a) developing an integrated interface to the datasets, (b) storing field monitoring data, laboratory analytical results of water and sediments samples collected into a database, (c) providing automated QA/QC analysis of data and (d) working with data providers to modify high-priority field and laboratory data collection and reporting procedures as needed. The first three objectives are driven by user needs, while the last objective is driven by data management needs. The project needs and priorities are reassessed regularly with the users. After each user session we identify development priorities to match the identified user priorities. For instance, data QA/QC and collection activities have focused on the data and products needed for on-going scientific analyses (e.g. water level and geochemistry). We have also developed, tested and released a broker and portal that integrates diverse datasets from two different databases used for curation of project data. The development of the user interface was based on a user-centered design process involving several user interviews and constant interaction with data providers. The initial version focuses on the most requested feature - i.e. finding the data needed for analyses through an intuitive interface. Once the data is found, the user can immediately plot and download data

  10. Scientific Datasets: Discovery and Aggregation for Semantic Interpretation.

    NASA Astrophysics Data System (ADS)

    Lopez, L. A.; Scott, S.; Khalsa, S. J. S.; Duerr, R.

    2015-12-01

    One of the biggest challenges that interdisciplinary researchers face is finding suitable datasets in order to advance their science; this problem remains consistent across multiple disciplines. A surprising number of scientists, when asked what tool they use for data discovery, reply "Google", which is an acceptable solution in some cases but not even Google can find -or cares to compile- all the data that's relevant for science and particularly geo sciences. If a dataset is not discoverable through a well known search provider it will remain dark data to the scientific world.For the past year, BCube, an EarthCube Building Block project, has been developing, testing and deploying a technology stack capable of data discovery at web-scale using the ultimate dataset: The Internet. This stack has 2 principal components, a web-scale crawling infrastructure and a semantic aggregator. The web-crawler is a modified version of Apache Nutch (the originator of Hadoop and other big data technologies) that has been improved and tailored for data and data service discovery. The second component is semantic aggregation, carried out by a python-based workflow that extracts valuable metadata and stores it in the form of triples through the use semantic technologies.While implementing the BCube stack we have run into several challenges such as a) scaling the project to cover big portions of the Internet at a reasonable cost, b) making sense of very diverse and non-homogeneous data, and lastly, c) extracting facts about these datasets using semantic technologies in order to make them usable for the geosciences community. Despite all these challenges we have proven that we can discover and characterize data that otherwise would have remained in the dark corners of the Internet. Having all this data indexed and 'triplelized' will enable scientists to access a trove of information relevant to their work in a more natural way. An important characteristic of the BCube stack is that all

  11. Fast methods for training Gaussian processes on large datasets

    PubMed Central

    Moore, C. J.; Berry, C. P. L.; Gair, J. R.

    2016-01-01

    Gaussian process regression (GPR) is a non-parametric Bayesian technique for interpolating or fitting data. The main barrier to further uptake of this powerful tool rests in the computational costs associated with the matrices which arise when dealing with large datasets. Here, we derive some simple results which we have found useful for speeding up the learning stage in the GPR algorithm, and especially for performing Bayesian model comparison between different covariance functions. We apply our techniques to both synthetic and real data and quantify the speed-up relative to using nested sampling to numerically evaluate model evidences. PMID:27293793

  12. Fast methods for training Gaussian processes on large datasets.

    PubMed

    Moore, C J; Chua, A J K; Berry, C P L; Gair, J R

    2016-05-01

    Gaussian process regression (GPR) is a non-parametric Bayesian technique for interpolating or fitting data. The main barrier to further uptake of this powerful tool rests in the computational costs associated with the matrices which arise when dealing with large datasets. Here, we derive some simple results which we have found useful for speeding up the learning stage in the GPR algorithm, and especially for performing Bayesian model comparison between different covariance functions. We apply our techniques to both synthetic and real data and quantify the speed-up relative to using nested sampling to numerically evaluate model evidences.

  13. The Maunder minimum: A reassessment from multiple dataset

    NASA Astrophysics Data System (ADS)

    Usoskin, Ilya; Arlt, Rainer; Asvestari, Eleanna; Kovaltsov, Gennady; Krivova, Natalie; Lockwood, Michael; Käpylä, Maarit; Owens, Matthew; Sokoloff, Dmitry D.; Solanki, Sami; Soon, Willie; Vaquero, Jose; Scott, Chris

    2015-08-01

    The Maunder minimum (MM) in 1645-1715 was a period of the lowest ever known solar activity recorded via sunspot numbers since 1610. Since it is the only Grand minimum of solar activity directly observed, it forms a benchmark for the solar variability studies. Therefore, it is crucially important to assess the level and other features of temporal and spatial solar magnetic variability during that time. However, because of uncertainties related mostly to ambiguity of some historical sunspot observation records, the exact level of solar activity during the MM is somewhat unclear, leaving room for continuous discussions and speculations. Many of these issues have been addressed by Jack Eddy in his cornerstone papers of 1976 and 1983, but since then numerous new pieces of evidence and datasets have appeared, making it possible to verify the paradigm of the Maunder minimum with far greater certainty than before.Here we provide a full reassessment of the Maunder minimum using all the available datasets: augmented sunspot counts and drawings; revisited historical archives; both well-known and newly revealed records of auroral observations; cosmic ray variability via cosmogenic isotope records of 14C in tree trunks, 10Be in ice cores and 44Ti in fallen meteorites. We show that, while the exact level of the activity is not easy to determine, the Sun indeed exhibited exceptionally low magnetic activity during the MM, in comparison to other periods of moderate or decreased activity, such as the Dalton minimum (ca. 1800), the Gleissberg minimum (ca. 1900) and the present weak solar cycle # 24. We show that a scenario of moderate or strong activity during the MM contradicts all the available datasets.Thus, we confirm, using all the presently available datasets of different nature, that the period of the Maunder minimum in 1645-1715 was indeed a Grand minimum, with very low solar surface magnetic activity, low intensity of the interplanetary magnetic field, as well as lower

  14. Septic tank additive impacts on microbial populations.

    PubMed

    Pradhan, S; Hoover, M T; Clark, G H; Gumpertz, M; Wollum, A G; Cobb, C; Strock, J

    2008-01-01

    Environmental health specialists, other onsite wastewater professionals, scientists, and homeowners have questioned the effectiveness of septic tank additives. This paper describes an independent, third-party, field scale, research study of the effects of three liquid bacterial septic tank additives and a control (no additive) on septic tank microbial populations. Microbial populations were measured quarterly in a field study for 12 months in 48 full-size, functioning septic tanks. Bacterial populations in the 48 septic tanks were statistically analyzed with a mixed linear model. Additive effects were assessed for three septic tank maintenance levels (low, intermediate, and high). Dunnett's t-test for tank bacteria (alpha = .05) indicated that none of the treatments were significantly different, overall, from the control at the statistical level tested. In addition, the additives had no significant effects on septic tank bacterial populations at any of the septic tank maintenance levels. Additional controlled, field-based research iswarranted, however, to address additional additives and experimental conditions.

  15. Robust language-independent OCR system

    NASA Astrophysics Data System (ADS)

    Lu, Zhidong A.; Bazzi, Issam; Kornai, Andras; Makhoul, John; Natarajan, Premkumar S.; Schwartz, Richard

    1999-01-01

    We present a language-independent optical character recognition system that is capable, in principle, of recognizing printed text from most of the world's languages. For each new language or script the system requires sample training data along with ground truth at the text-line level; there is no need to specify the location of either the lines or the words and characters. The system uses hidden Markov modeling technology to model each character. In addition to language independence, the technology enhances performance for degraded data, such as fax, by using unsupervised adaptation techniques. Thus far, we have demonstrated the language-independence of this approach for Arabic, English, and Chinese. Recognition results are presented in this paper, including results on faxed data.

  16. [Food additives and healthiness].

    PubMed

    Heinonen, Marina

    2014-01-01

    Additives are used for improving food structure or preventing its spoilage, for example. Many substances used as additives are also naturally present in food. The safety of additives is evaluated according to commonly agreed principles. If high concentrations of an additive cause adverse health effects for humans, a limit of acceptable daily intake (ADI) is set for it. An additive is a risk only when ADI is exceeded. The healthiness of food is measured on the basis of nutrient density and scientifically proven effects.

  17. Introducing A Global Dataset Of Open Permanent Water Bodies

    NASA Astrophysics Data System (ADS)

    Santoro, Maurizio; Lamarche, Celine; Bontemps, Sophie; Wegmuller, Urs; Kalogirou, Vasileios; Arino, Oliver; Defourny, Pierre

    2013-12-01

    This paper introduces a 300-m global map of open permanent water bodies derived from multi-temporal synthetic aperture radar (SAR) data. The SAR dataset consisted of images of the radar backscatter acquired by Envisat Advanced SAR(ASAR) in Wide Swath Mode (WSM, 150 m spatial resolution) between 2005 and 2010. Extended time series of WSM to 2012, Image Mode Medium resolution (IMM) and Global Monitoring Mode (GMM) data have been used to fill gaps. Using as input the temporal variability (TV) of the backscatter and the minimum backscatter (MB), a SAR- based indicator of water bodies (SAR-WBI) has been generated for all continents with a previously validated thresholding algorithm and local refinements. The accuracy of the SAR-WBI is 80%; a threshold of 50% has been used for the land/water fraction in the case of mixed pixels. Correction of inconsistencies with respect to auxiliary datasets, completion of gaps and aggregation to 300 m were applied to obtain the final global water body map referred to as Climate Change Initiative Land Cover Water Body (CCI-LC WB) Product.

  18. Digital Astronaut Photography: A Discovery Dataset for Archaeology

    NASA Technical Reports Server (NTRS)

    Stefanov, William L.

    2010-01-01

    Astronaut photography acquired from the International Space Station (ISS) using commercial off-the-shelf cameras offers a freely-accessible source for high to very high resolution (4-20 m/pixel) visible-wavelength digital data of Earth. Since ISS Expedition 1 in 2000, over 373,000 images of the Earth-Moon system (including land surface, ocean, atmospheric, and lunar images) have been added to the Gateway to Astronaut Photography of Earth online database (http://eol.jsc.nasa.gov ). Handheld astronaut photographs vary in look angle, time of acquisition, solar illumination, and spatial resolution. These attributes of digital astronaut photography result from a unique combination of ISS orbital dynamics, mission operations, camera systems, and the individual skills of the astronaut. The variable nature of astronaut photography makes the dataset uniquely useful for archaeological applications in comparison with more traditional nadir-viewing multispectral datasets acquired from unmanned orbital platforms. For example, surface features such as trenches, walls, ruins, urban patterns, and vegetation clearing and regrowth patterns may be accentuated by low sun angles and oblique viewing conditions (Fig. 1). High spatial resolution digital astronaut photographs can also be used with sophisticated land cover classification and spatial analysis approaches like Object Based Image Analysis, increasing the potential for use in archaeological characterization of landscapes and specific sites.

  19. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets

    PubMed Central

    Li, Lianwei; Ma, Zhanshan (Sam)

    2016-01-01

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health—the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples, we discovered that only 49 communities (less than 1%) satisfied the neutral theory, and concluded that human microbial communities are not neutral in general. The 49 positive cases, although only a tiny minority, do demonstrate the existence of neutral processes. We realize that the traditional doctrine of microbial biogeography “Everything is everywhere, but the environment selects” first proposed by Baas-Becking resolves the apparent contradiction. The first part of Baas-Becking doctrine states that microbes are not dispersal-limited and therefore are neutral prone, and the second part reiterates that the freely dispersed microbes must endure selection by the environment. Therefore, in most cases, it is the host environment that ultimately shapes the community assembly and tip the human microbiome to niche regime. PMID:27527985

  20. Biofuel Enduse Datasets from the Bioenergy Knowledge Discovery Framework (KDF)

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. [copied from https://www.bioenergykdf.net/content/about]

    Holdings include datasets, models, and maps. This is a very new resource, but the collections will grow due to both DOE contributions and individualsÆ data uploads. Currently the Biofuel Enduse collection includes 133 items. Most of these are categorized as literature, but 36 are listed as datasets and ten as models.

  1. Using Benford's law to investigate Natural Hazard dataset homogeneity.

    PubMed

    Joannes-Boyau, Renaud; Bodin, Thomas; Scheffers, Anja; Sambridge, Malcolm; May, Simon Matthias

    2015-01-01

    Working with a large temporal dataset spanning several decades often represents a challenging task, especially when the record is heterogeneous and incomplete. The use of statistical laws could potentially overcome these problems. Here we apply Benford's Law (also called the "First-Digit Law") to the traveled distances of tropical cyclones since 1842. The record of tropical cyclones has been extensively impacted by improvements in detection capabilities over the past decades. We have found that, while the first-digit distribution for the entire record follows Benford's Law prediction, specific changes such as satellite detection have had serious impacts on the dataset. The least-square misfit measure is used as a proxy to observe temporal variations, allowing us to assess data quality and homogeneity over the entire record, and at the same time over specific periods. Such information is crucial when running climatic models and Benford's Law could potentially be used to overcome and correct for data heterogeneity and/or to select the most appropriate part of the record for detailed studies. PMID:26156060

  2. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets.

    PubMed

    Li, Lianwei; Ma, Zhanshan Sam

    2016-01-01

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health-the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples, we discovered that only 49 communities (less than 1%) satisfied the neutral theory, and concluded that human microbial communities are not neutral in general. The 49 positive cases, although only a tiny minority, do demonstrate the existence of neutral processes. We realize that the traditional doctrine of microbial biogeography "Everything is everywhere, but the environment selects" first proposed by Baas-Becking resolves the apparent contradiction. The first part of Baas-Becking doctrine states that microbes are not dispersal-limited and therefore are neutral prone, and the second part reiterates that the freely dispersed microbes must endure selection by the environment. Therefore, in most cases, it is the host environment that ultimately shapes the community assembly and tip the human microbiome to niche regime. PMID:27527985

  3. Development of a Watershed Boundary Dataset for Mississippi

    USGS Publications Warehouse

    Van Wilson, K.; Clair, Michael G.; Turnipseed, D. Phil; Rebich, Richard A.

    2009-01-01

    The U.S. Geological Survey, in cooperation with the Mississippi Department of Environmental Quality, U.S. Department of Agriculture-Natural Resources Conservation Service, Mississippi Department of Transportation, U.S. Department of Agriculture-Forest Service, and the Mississippi Automated Resource Information System, developed a 1:24,000-scale Watershed Boundary Dataset for Mississippi including watershed and subwatershed boundaries, codes, names, and drainage areas. The Watershed Boundary Dataset for Mississippi provides a standard geographical framework for water-resources and selected land-resources planning. The original 8-digit subbasins (hydrologic unit codes) were further subdivided into 10-digit watersheds and 12-digit subwatersheds - the exceptions are the Lower Mississippi River Alluvial Plain (known locally as the Delta) and the Mississippi River inside levees, which were only subdivided into 10-digit watersheds. Also, large water bodies in the Mississippi Sound along the coast were not delineated as small as a typical 12-digit subwatershed. All of the data - including watershed and subwatershed boundaries, hydrologic unit codes and names, and drainage-area data - are stored in a Geographic Information System database.

  4. High-throughput concentration-response analysis for omics datasets.

    PubMed

    Smetanová, Soňa; Riedl, Janet; Zitzkat, Dimitar; Altenburger, Rolf; Busch, Wibke

    2015-09-01

    Omics-based methods are increasingly used in current ecotoxicology. Therefore, a large number of observations for various toxic substances and organisms are available and may be used for identifying modes of action, adverse outcome pathways, or novel biomarkers. For these purposes, good statistical analysis of toxicogenomic data is vital. In contrast to established ecotoxicological techniques, concentration-response modeling is rarely used for large datasets. Instead, statistical hypothesis testing is prevalent, which provides only a limited scope for inference. The present study therefore applied automated concentration-response modeling for 3 different ecotoxicotranscriptomic and ecotoxicometabolomic datasets. The modeling process was performed by simultaneously applying 9 different regression models, representing distinct mechanistic, toxicological, and statistical ideas that result in different curve shapes. The best-fitting models were selected by using Akaike's information criterion. The linear and exponential models represented the best data description for more than 50% of responses. Models generating U-shaped curves were frequently selected for transcriptomic signals (30%), and sigmoid models were identified as best fit for many metabolomic signals (21%). Thus, selecting the models from an array of different types seems appropriate, because concentration-response functions may vary because of the observed response type, and they also depend on the compound, the organism, and the investigated concentration and exposure duration range. The application of concentration-response models can help to further tap the potential of omics data and is a necessary step for quantitative mixture effect assessment at the molecular response level.

  5. Comparison of LDA and SPRT on Clinical Dataset Classifications

    PubMed Central

    Lee, Chih; Nkounkou, Brittany; Huang, Chun-Hsi

    2011-01-01

    In this work, we investigate the well-known classification algorithm LDA as well as its close relative SPRT. SPRT affords many theoretical advantages over LDA. It allows specification of desired classification error rates α and β and is expected to be faster in predicting the class label of a new instance. However, SPRT is not as widely used as LDA in the pattern recognition and machine learning community. For this reason, we investigate LDA, SPRT and a modified SPRT (MSPRT) empirically using clinical datasets from Parkinson’s disease, colon cancer, and breast cancer. We assume the same normality assumption as LDA and propose variants of the two SPRT algorithms based on the order in which the components of an instance are sampled. Leave-one-out cross-validation is used to assess and compare the performance of the methods. The results indicate that two variants, SPRT-ordered and MSPRT-ordered, are superior to LDA in terms of prediction accuracy. Moreover, on average SPRT-ordered and MSPRT-ordered examine less components than LDA before arriving at a decision. These advantages imply that SPRT-ordered and MSPRT-ordered are the preferred algorithms over LDA when the normality assumption can be justified for a dataset. PMID:21949476

  6. Comparison of LDA and SPRT on Clinical Dataset Classifications.

    PubMed

    Lee, Chih; Nkounkou, Brittany; Huang, Chun-Hsi

    2011-04-19

    In this work, we investigate the well-known classification algorithm LDA as well as its close relative SPRT. SPRT affords many theoretical advantages over LDA. It allows specification of desired classification error rates α and β and is expected to be faster in predicting the class label of a new instance. However, SPRT is not as widely used as LDA in the pattern recognition and machine learning community. For this reason, we investigate LDA, SPRT and a modified SPRT (MSPRT) empirically using clinical datasets from Parkinson's disease, colon cancer, and breast cancer. We assume the same normality assumption as LDA and propose variants of the two SPRT algorithms based on the order in which the components of an instance are sampled. Leave-one-out cross-validation is used to assess and compare the performance of the methods. The results indicate that two variants, SPRT-ordered and MSPRT-ordered, are superior to LDA in terms of prediction accuracy. Moreover, on average SPRT-ordered and MSPRT-ordered examine less components than LDA before arriving at a decision. These advantages imply that SPRT-ordered and MSPRT-ordered are the preferred algorithms over LDA when the normality assumption can be justified for a dataset.

  7. Exploitation of a large COSMO-SkyMed interferometric dataset

    NASA Astrophysics Data System (ADS)

    Nutricato, Raffaele; Nitti, Davide O.; Bovenga, Fabio; Refice, Alberto; Chiaradia, Maria T.

    2014-10-01

    In this work we explored a dataset made by more than 100 images acquired by COSMO-SkyMed (CSK) constellation over the Port-au-Prince (Haiti) metropolitan and surrounding areas that were severely hit by the January 12th, 2010 earthquake. The images were acquired along ascending pass by all the four sensors of the constellation with a mean rate of 1 acquisition/week. This consistent CSK dataset was fully exploited by using the Persistent Scatterer Interferometry algorithm SPINUA with the aim of: i) providing a displacement map of the area; ii) assessing the use of CSK and PSI for ground elevation measurements; iii) exploring the CSK satellite orbital tube in terms of both precision and size. In particular, significant subsidence phenomena were detected affecting river deltas and coastal areas of the Port-au-Prince and Carrefour region, as well as very slow slope movements and local ground instabilities. Ground elevation was also measured on PS targets with resolution of 3m. The density of these measurable targets depends on the ground coverage, and reaches values higher than 4000 PS/km2 over urban areas, while it drops over vegetated areas or along slopes affected by layover and shadow. Heights values were compared with LIDAR data at 1m of resolution collected soon after the 2010 earthquake. Furthermore, by using geocoding procedures and the precise LIDAR data as reference, the orbital errors affecting CSK records were investigated. The results are in line with other recent studies.

  8. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets.

    PubMed

    Li, Lianwei; Ma, Zhanshan Sam

    2016-08-16

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health-the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples, we discovered that only 49 communities (less than 1%) satisfied the neutral theory, and concluded that human microbial communities are not neutral in general. The 49 positive cases, although only a tiny minority, do demonstrate the existence of neutral processes. We realize that the traditional doctrine of microbial biogeography "Everything is everywhere, but the environment selects" first proposed by Baas-Becking resolves the apparent contradiction. The first part of Baas-Becking doctrine states that microbes are not dispersal-limited and therefore are neutral prone, and the second part reiterates that the freely dispersed microbes must endure selection by the environment. Therefore, in most cases, it is the host environment that ultimately shapes the community assembly and tip the human microbiome to niche regime.

  9. Datasets for radiation network algorithm development and testing

    SciTech Connect

    Rao, Nageswara S; Sen, Satyabrata; Berry, M. L..; Wu, Qishi; Grieme, M.; Brooks, Richard R; Cordone, G.

    2016-01-01

    Domestic Nuclear Detection Office s (DNDO) Intelligence Radiation Sensors Systems (IRSS) program supported the development of networks of commercial-off-the-shelf (COTS) radiation counters for detecting, localizing, and identifying low-level radiation sources. Under this program, a series of indoor and outdoor tests were conducted with multiple source strengths and types, different background profiles, and various types of source and detector movements. Following the tests, network algorithms were replayed in various re-constructed scenarios using sub-networks. These measurements and algorithm traces together provide a rich collection of highly valuable datasets for testing the current and next generation radiation network algorithms, including the ones (to be) developed by broader R&D communities such as distributed detection, information fusion, and sensor networks. From this multiple TeraByte IRSS database, we distilled out and packaged the first batch of canonical datasets for public release. They include measurements from ten indoor and two outdoor tests which represent increasingly challenging baseline scenarios for robustly testing radiation network algorithms.

  10. Sodankylä ionospheric tomography dataset 2003-2014

    NASA Astrophysics Data System (ADS)

    Norberg, J.; Roininen, L.; Kero, A.; Raita, T.; Ulich, T.; Markkanen, M.; Juusola, L.; Kauristie, K.

    2015-12-01

    Sodankylä Geophysical Observatory has been operating a tomographic receiver network and collecting the produced data since 2003. The collected dataset consists of phase difference curves measured from Russian COSMOS dual-frequency (150/400 MHz) low-Earth-orbit satellite signals, and tomographic electron density reconstructions obtained from these measurements. In this study vertical total electron content (VTEC) values are integrated from the reconstructed electron densities to make a qualitative and quantitative analysis to validate the long-term performance of the tomographic system. During the observation period, 2003-2014, there were three-to-five operational stations at the Fenno-Scandinavian sector. Altogether the analysis consists of around 66 000 overflights, but to ensure the quality of the reconstructions, the examination is limited to cases with descending (north to south) overflights and maximum elevation over 60°. These constraints limit the number of overflights to around 10 000. Based on this dataset, one solar cycle of ionospheric vertical total electron content estimates is constructed. The measurements are compared against International Reference Ionosphere IRI-2012 model, F10.7 solar flux index and sunspot number data. Qualitatively the tomographic VTEC estimate corresponds to reference data very well, but the IRI-2012 model are on average 40 % higher of that of the tomographic results.

  11. Collaboration tools and techniques for large model datasets

    USGS Publications Warehouse

    Signell, R.P.; Carniel, S.; Chiggiato, J.; Janekovic, I.; Pullen, J.; Sherwood, C.R.

    2008-01-01

    In MREA and many other marine applications, it is common to have multiple models running with different grids, run by different institutions. Techniques and tools are described for low-bandwidth delivery of data from large multidimensional datasets, such as those from meteorological and oceanographic models, directly into generic analysis and visualization tools. Output is stored using the NetCDF CF Metadata Conventions, and then delivered to collaborators over the web via OPeNDAP. OPeNDAP datasets served by different institutions are then organized via THREDDS catalogs. Tools and procedures are then used which enable scientists to explore data on the original model grids using tools they are familiar with. It is also low-bandwidth, enabling users to extract just the data they require, an important feature for access from ship or remote areas. The entire implementation is simple enough to be handled by modelers working with their webmasters - no advanced programming support is necessary. ?? 2007 Elsevier B.V. All rights reserved.

  12. A new compression format for fiber tracking datasets.

    PubMed

    Presseau, Caroline; Jodoin, Pierre-Marc; Houde, Jean-Christophe; Descoteaux, Maxime

    2015-04-01

    A single diffusion MRI streamline fiber tracking dataset may contain hundreds of thousands, and often millions of streamlines and can take up to several gigabytes of memory. This amount of data is not only heavy to compute, but also difficult to visualize and hard to store on disk (especially when dealing with a collection of brains). These problems call for a fiber-specific compression format that simplifies its manipulation. As of today, no fiber compression format has yet been adopted and the need for it is now becoming an issue for future connectomics research. In this work, we propose a new compression format, .zfib, for streamline tractography datasets reconstructed from diffusion magnetic resonance imaging (dMRI). Tracts contain a large amount of redundant information and are relatively smooth. Hence, they are highly compressible. The proposed method is a processing pipeline containing a linearization, a quantization and an encoding step. Our pipeline is tested and validated under a wide range of DTI and HARDI tractography configurations (step size, streamline number, deterministic and probabilistic tracking) and compression options. Similar to JPEG, the user has one parameter to select: a worst-case maximum tolerance error in millimeter (mm). Overall, we find a compression factor of more than 96% for a maximum error of 0.1mm without any perceptual change or change of diffusion statistics (mean fractional anisotropy and mean diffusivity) along bundles. This opens new opportunities for connectomics and tractometry applications. PMID:25592997

  13. Periodicity detection method for small-sample time series datasets.

    PubMed

    Tominaga, Daisuke

    2010-01-01

    Time series of gene expression often exhibit periodic behavior under the influence of multiple signal pathways, and are represented by a model that incorporates multiple harmonics and noise. Most of these data, which are observed using DNA microarrays, consist of few sampling points in time, but most periodicity detection methods require a relatively large number of sampling points. We have previously developed a detection algorithm based on the discrete Fourier transform and Akaike's information criterion. Here we demonstrate the performance of the algorithm for small-sample time series data through a comparison with conventional and newly proposed periodicity detection methods based on a statistical analysis of the power of harmonics.We show that this method has higher sensitivity for data consisting of multiple harmonics, and is more robust against noise than other methods. Although "combinatorial explosion" occurs for large datasets, the computational time is not a problem for small-sample datasets. The MATLAB/GNU Octave script of the algorithm is available on the author's web site: http://www.cbrc.jp/%7Etominaga/piccolo/. PMID:21151841

  14. Systematic analysis of a novel human renal glomerulus-enriched gene expression dataset.

    PubMed

    Lindenmeyer, Maja T; Eichinger, Felix; Sen, Kontheari; Anders, Hans-Joachim; Edenhofer, Ilka; Mattinzoli, Deborah; Kretzler, Matthias; Rastaldi, Maria P; Cohen, Clemens D

    2010-01-01

    Glomerular diseases account for the majority of cases with chronic renal failure. Several genes have been identified with key relevance for glomerular function. Quite a few of these genes show a specific or preferential mRNA expression in the renal glomerulus. To identify additional candidate genes involved in glomerular function in humans we generated a human renal glomerulus-enriched gene expression dataset (REGGED) by comparing gene expression profiles from human glomeruli and tubulointerstitium obtained from six transplant living donors using Affymetrix HG-U133A arrays. This analysis resulted in 677 genes with prominent overrepresentation in the glomerulus. Genes with 'a priori' known prominent glomerular expression served for validation and were all found in the novel dataset (e.g. CDKN1, DAG1, DDN, EHD3, MYH9, NES, NPHS1, NPHS2, PDPN, PLA2R1, PLCE1, PODXL, PTPRO, SYNPO, TCF21, TJP1, WT1). The mRNA expression of several novel glomerulus-enriched genes in REGGED was validated by qRT-PCR. Gene ontology and pathway analysis identified biological processes previously not reported to be of relevance in glomeruli of healthy human adult kidneys including among others axon guidance. This finding was further validated by assessing the expression of the axon guidance molecules neuritin (NRN1) and roundabout receptor ROBO1 and -2. In diabetic nephropathy, a prevalent glomerulopathy, differential regulation of glomerular ROBO2 mRNA was found.In summary, novel transcripts with predominant expression in the human glomerulus could be identified using a comparative strategy on microdissected nephrons. A systematic analysis of this glomerulus-specific gene expression dataset allows the detection of target molecules and biological processes involved in glomerular biology and renal disease. PMID:20634963

  15. A High-Resolution Merged Wind Dataset for DYNAMO: Progress and Future Plans

    NASA Technical Reports Server (NTRS)

    Lang, Timothy J.; Mecikalski, John; Li, Xuanli; Chronis, Themis; Castillo, Tyler; Hoover, Kacie; Brewer, Alan; Churnside, James; McCarty, Brandi; Hein, Paul; Rutledge, Steve; Dolan, Brenda; Matthews, Alyssa; Thompson, Elizabeth

    2015-01-01

    In order to support research on optimal data assimilation methods for the Cyclone Global Navigation Satellite System (CYGNSS), launching in 2016, work has been ongoing to produce a high-resolution merged wind dataset for the Dynamics of the Madden Julian Oscillation (DYNAMO) field campaign, which took place during late 2011/early 2012. The winds are produced by assimilating DYNAMO observations into the Weather Research and Forecasting (WRF) three-dimensional variational (3DVAR) system. Data sources from the DYNAMO campaign include the upper-air sounding network, radial velocities from the radar network, vector winds from the Advanced Scatterometer (ASCAT) and Oceansat-2 Scatterometer (OSCAT) satellite instruments, the NOAA High Resolution Doppler Lidar (HRDL), and several others. In order the prep them for 3DVAR, significant additional quality control work is being done for the currently available TOGA and SMART-R radar datasets, including automatically dealiasing radial velocities and correcting for intermittent TOGA antenna azimuth angle errors. The assimilated winds are being made available as model output fields from WRF on two separate grids with different horizontal resolutions - a 3-km grid focusing on the main DYNAMO quadrilateral (i.e., Gan Island, the R/V Revelle, the R/V Mirai, and Diego Garcia), and a 1-km grid focusing on the Revelle. The wind dataset is focused on three separate approximately 2-week periods during the Madden Julian Oscillation (MJO) onsets that occurred in October, November, and December 2011. Work is ongoing to convert the 10-m surface winds from these model fields to simulated CYGNSS observations using the CYGNSS End-To-End Simulator (E2ES), and these simulated satellite observations are being compared to radar observations of DYNAMO precipitation systems to document the anticipated ability of CYGNSS to provide information on the relationships between surface winds and oceanic precipitation at the mesoscale level. This research will

  16. Intercomparison of an improved 20th Century reanalysis version 2c dataset spanning 1850 to 2012

    NASA Astrophysics Data System (ADS)

    Compo, G. P.; Whitaker, J. S.; Sardeshmukh, P. D.; Giese, B. S.; Brohan, P.

    2014-12-01

    The historical reanalysis dataset generated by NOAA ESRL and the University of Colorado CIRES, the Twentieth Century Reanalysis version 2 (20CRv2), is a comprehensive global atmospheric circulation dataset spanning 1871-2012, assimilating only surface pressure and using monthly Hadley Centre SST and sea ice distributions (HadISST1.1) as boundary conditions. It has been made possible through collaboration with GCOS, WCRP, and the ACRE initiative. It is chiefly motivated by a need to provide an observational validation dataset, with quantified uncertainties, for assessments of climate model simulations of the 20th century, with emphasis on the statistics of daily weather. It uses, together with an NCEP global numerical weather prediction (NWP) land/atmosphere model to provide background "first guess" fields, an Ensemble Kalman Filter (EnKF) data assimilation method. This yields a global analysis every 6 hours as the most likely state of the atmosphere, and also yields the uncertainty of that analysis. Improvements in the new version ("2c") include an extension back to 1850 and the specification of new boundary conditions. These come from new fields of monthly COBE-SST2 sea ice concentrations and an ensemble of daily Simple Ocean Data Assimilation with Sparse Input (SODAsi.2c) sea surface temperatures. SODAsi.2c itself was forced with 20CR, allowing these boundary conditions to be more consistent with the atmospheric reanalysis. Millions of additional pressure observations contained in the new International Surface Pressure Databank version 3 are also included. These improvements result in 20CR version "2c" having comparable or better analyses, as suggested by improved 24 hour forecast skill, more realistic uncertainty in near-surface air temperature, and a reduction in spurious centennial trends in the tropical and polar regions. An intercomparison with ERA-Interim, MERRA, and JRA-55 reanalyses that assimilate all available upper-air and satellite observations will

  17. Remote web-based 3D visualization of hydrological forecasting datasets.

    NASA Astrophysics Data System (ADS)

    van Meersbergen, Maarten; Drost, Niels; Blower, Jon; Griffiths, Guy; Hut, Rolf; van de Giesen, Nick

    2015-04-01

    As the possibilities for larger and more detailed simulations of geoscientific data expand, the need for smart solutions in data visualization grow as well. Large volumes of data should be quickly accessible from anywhere in the world without the need for transferring the simulation results. We aim to provide tools for both processing and the handling of these large datasets. As an example, the eWaterCycle project (www.ewatercycle.org) aims to provide a running 14-day ensemble forecast to predict water related stress around the globe. The large volumes of simulation results with uncertainty data that are generated through ensemble hydrological predictions provide a challenge for existing visualization solutions. One possible solution for this challenge lies in the use of web-enabled technology for visualization and analysis of these datasets. Web-based visualization provides an additional benefit in that it eliminates the need for any software installation and configuration and allows for the easy communication of research results between collaborating research parties. Providing interactive tools for the exploration of these datasets will not only help in the analysis of the data by researchers, it can also aid in the dissemination of the research results to the general public. In Vienna, we will present a working open source solution for remote visualization of large volumes of global geospatial data based on the proven open-source 3D web visualization software package Cesium (cesiumjs.org), the ncWMS software package provided by the Reading e-Science Centre and the WebGL and NetCDF standards.

  18. Multitask Coupled Logistic Regression and its Fast Implementation for Large Multitask Datasets.

    PubMed

    Gu, Xin; Chung, Fu-Lai; Ishibuchi, Hisao; Wang, Shitong

    2015-09-01

    When facing multitask-learning problems, it is desirable that the learning method could find the correct input-output features and share the commonality among multiple domains and also scale-up for large multitask datasets. We introduce the multitask coupled logistic regression (LR) framework called LR-based multitask classification learning algorithm (MTC-LR), which is a new method for generating each classifier for each task, capable of sharing the commonality among multitask domains. The basic idea of MTC-LR is to use all individual LR based classifiers, each one appropriate for each task domain, but in contrast to other support vector machine (SVM)-based proposals, learning all the parameter vectors of all individual classifiers by using the conjugate gradient method, in a global way and without the use of kernel trick, and being easily extended into its scaled version. We theoretically show that the addition of a new term in the cost function of the set of LRs (that penalizes the diversity among multiple tasks) produces a coupling of multiple tasks that allows MTC-LR to improve the learning performance in a LR way. This finding can make us easily integrate it with a state-of-the-art fast LR algorithm called dual coordinate descent method (CDdual) to develop its fast version MTC-LR-CDdual for large multitask datasets. The proposed algorithm MTC-LR-CDdual is also theoretically analyzed. Our experimental results on artificial and real-datasets indicate the effectiveness of the proposed algorithm MTC-LR-CDdual in classification accuracy, speed, and robustness. PMID:25423663

  19. Developing a Resource for Implementing ArcSWAT Using Global Datasets

    NASA Astrophysics Data System (ADS)

    Taggart, M.; Caraballo Álvarez, I. O.; Mueller, C.; Palacios, S. L.; Schmidt, C.; Milesi, C.; Palmer-Moloney, L. J.

    2015-12-01

    This project developed a comprehensive user manual outlining methods for adapting and implementing global datasets for use within ArcSWAT for international and worldwide applications. The Soil and Water Assessment Tool (SWAT) is a hydrologic model that looks at a number of hydrologic variables including runoff and the chemical makeup of water at a given location on the Earth's surface using Digital Elevation Models (DEM), land cover, soil, and weather data. However, the application of ArcSWAT for projects outside of the United States is challenging as there is no standard framework for inputting global datasets into ArcSWAT. This project aims to remove this obstacle by outlining methods for adapting and implementing these global datasets via the user manual. The manual takes the user through the processes of data conditioning while providing solutions and suggestions for common errors. The efficacy of the manual was explored using examples from watersheds located in Puerto Rico, Mexico and Western Africa. Each run explored the various options for setting up a ArcSWAT project as well as a range of satellite data products and soil databases. Future work will incorporate in-situ data for validation and calibration of the model and outline additional resources to assist future users in efficiently implementing the model for worldwide applications. The capacity to manage and monitor freshwater availability is of critical importance in both developed and developing countries. As populations grow and climate changes, both the quality and quantity of freshwater are affected resulting in negative impacts on the health of the surrounding population. The use of hydrologic models such as ArcSWAT can help stakeholders and decision makers understand the future impacts of these changes enabling informed and substantiated decisions.

  20. Investigating Martian and Venusian hyperspectral datasets through Positive Source Separation

    NASA Astrophysics Data System (ADS)

    Tréguier, E.; Schmidt, F.; Schmidt, A.; Moussaoui, S.; Dobigeon, N.; Erard, S.; Cardesín, A.; Pinet, P.; Martin, P.

    2010-12-01

    Spectro-imagers with improved spectral/spatial resolution have mapped planetary bodies, providing high dimensional hyperspectral datasets that contain abundant data about the surface and/or atmosphere. The spatial extent of a pixel is usually large enough to contain a mixture of various surface/atmospheric constituents which contribute to a single pixel spectrum. Unsupervised spectral unmixing [1] aims at identifying the spectral signatures of materials present in the image and at estimating their abundances in each pixel. Bayesian Positive Source Separation (BPSS) [2] is an interesting way to deal with this unmixing challenge under linearity constraints. Notably, it ensures the non-negativity of both the unmixed component spectra and their abundances. Such a constraint is crucial to the physical interpretability of the results. A sum-to-one constraint [3] can also be imposed on the estimated abundances: its relevance depends on the nature of the dataset under consideration. Despite undeniable advantages, the use of such algorithms has so far been hampered by excessive computational resource requirements; so far it has not been possible to process a whole hyperspectral image of a size typically encountered in Earth and Planetary Sciences. Two kinds of implementation strategies were adopted to overcome this computational issue [4]. Firstly, several technical optimizations made it possible to run the BPSS algorithms on a complete image for the first time. Secondly, a pixel selection method was investigated: performed as a preprocessing step, it aims at extracting a few especially relevant pixels among all the image pixels. Then, the algorithm can be launched on this selection, with significantly lower computation overhead. In order to better understand the behavior of the method, tests on synthetic datasets generated by linear mixing of known mineral endmembers were performed. They help to assess the potential loss of quality induced by the pixel selection, depending

  1. Applicability of AgMERRA Forcing Dataset to Fill Gaps in Historical in-situ Meteorological Data

    NASA Astrophysics Data System (ADS)

    Bannayan, M.; Lashkari, A.; Zare, H.; Asadi, S.; Salehnia, N.

    2015-12-01

    Integrated assessment studies of food production systems use crop models to simulate the effects of climate and socio-economic changes on food security. Climate forcing data is one of those key inputs of crop models. This study evaluated the performance of AgMERRA climate forcing dataset to fill gaps in historical in-situ meteorological data for different climatic regions of Iran. AgMERRA dataset intercompared with in- situ observational dataset for daily maximum and minimum temperature and precipitation during 1980-2010 periods via Root Mean Square error (RMSE), Mean Absolute Error (MAE) and Mean Bias Error (MBE) for 17 stations in four climatic regions included humid and moderate, cold, dry and arid, hot and humid. Moreover, probability distribution function and cumulative distribution function compared between model and observed data. The results of measures of agreement between AgMERRA data and observed data demonstrated that there are small errors in model data for all stations. Except for stations which are located in cold regions, model data in the other stations illustrated under-prediction for daily maximum temperature and precipitation. However, it was not significant. In addition, probability distribution function and cumulative distribution function showed the same trend for all stations between model and observed data. Therefore, the reliability of AgMERRA dataset is high to fill gaps in historical observations in different climatic regions of Iran as well as it could be applied as a basis for future climate scenarios.

  2. Synthesizing Global and Local Datasets to Estimate Jurisdictional Forest Carbon Fluxes in Berau, Indonesia

    PubMed Central

    Griscom, Bronson W.; Ellis, Peter W.; Baccini, Alessandro; Marthinus, Delon; Evans, Jeffrey S.; Ruslandi

    2016-01-01

    Background Forest conservation efforts are increasingly being implemented at the scale of sub-national jurisdictions in order to mitigate global climate change and provide other ecosystem services. We see an urgent need for robust estimates of historic forest carbon emissions at this scale, as the basis for credible measures of climate and other benefits achieved. Despite the arrival of a new generation of global datasets on forest area change and biomass, confusion remains about how to produce credible jurisdictional estimates of forest emissions. We demonstrate a method for estimating the relevant historic forest carbon fluxes within the Regency of Berau in eastern Borneo, Indonesia. Our method integrates best available global and local datasets, and includes a comprehensive analysis of uncertainty at the regency scale. Principal Findings and Significance We find that Berau generated 8.91 ± 1.99 million tonnes of net CO2 emissions per year during 2000–2010. Berau is an early frontier landscape where gross emissions are 12 times higher than gross sequestration. Yet most (85%) of Berau’s original forests are still standing. The majority of net emissions were due to conversion of native forests to unspecified agriculture (43% of total), oil palm (28%), and fiber plantations (9%). Most of the remainder was due to legal commercial selective logging (17%). Our overall uncertainty estimate offers an independent basis for assessing three other estimates for Berau. Two other estimates were above the upper end of our uncertainty range. We emphasize the importance of including an uncertainty range for all parameters of the emissions equation to generate a comprehensive uncertainty estimate–which has not been done before. We believe comprehensive estimates of carbon flux uncertainty are increasingly important as national and international institutions are challenged with comparing alternative estimates and identifying a credible range of historic emissions values

  3. Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset

    PubMed Central

    Li, Ying; Wang, Nan; Perkins, Edward J.; Zhang, Chaoyang; Gong, Ping

    2010-01-01

    Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. We have developed an earthworm microarray containing 15,208 unique oligo probes and have used it to profile gene expression in 248 earthworms exposed to TNT, RDX or neither. We assembled a new machine learning pipeline consisting of several well-established feature filtering/selection and classification techniques to analyze the 248-array dataset in order to construct classifier models that can separate earthworm samples into three groups: control, TNT-treated, and RDX-treated. First, a total of 869 genes differentially expressed in response to TNT or RDX exposure were identified using a univariate statistical algorithm of class comparison. Then, decision tree-based algorithms were applied to select a subset of 354 classifier genes, which were ranked by their overall weight of significance. A multiclass support vector machine (MC-SVM) method and an unsupervised K-mean clustering method were applied to independently refine the classifier, producing a smaller subset of 39 and 30 classifier genes, separately, with 11 common genes being potential biomarkers. The combined 58 genes were considered the refined subset and used to build MC-SVM and clustering models with classification accuracy of 83.5% and 56.9%, respectively. This study demonstrates that the machine learning approach can be used to identify and optimize a small subset of classifier/biomarker genes from high dimensional datasets and generate classification models of acceptable precision for multiple classes. PMID:21060837

  4. Restoration and Recalibration of the Viking MAWD Datasets

    NASA Astrophysics Data System (ADS)

    Nuno, R. G.; Paige, D. A.; Sullivan, M.

    2014-12-01

    High-resolution HIRISE images of transient albedo dark features, called Recurring Slope Lineae (RSL), have been interpreted to be evidence for current hydrological activity [1]. If there are surface sources of water, then localized plumes of atmospheric water may be observable from orbit. The Viking MAWD column water vapor data are uniquely valuable for this purpose because they cover the full range of Martian local times, and include data sampled at high spatial resolution [2]. They also are accompanied by simultaneously acquired surface and atmospheric temperatures acquired by the Viking Infrared Thermal Mapper (IRTM) instruments. We searched the raster-averaged Viking Orbiter 1 and 2 MAWD column water vapor dataset for regions of localized elevated column water vapor abundances and found mid-latitude regions with transient water observations [3]. The raster averaged Viking Orbiter 1 and 2 MAWD column water vapor data available in the Planetary Data System (PDS), were calculated from radiance measurements using seasonally and topographically varying surface pressures which, at the time, had high uncertainties [4]. Due to recent interest in transient hydrological activity on Mars [2], we decoded the non-raster averaged Viking MAWD dataset, which are sampled at 15 times higher spatial resolution than the data that are currently available from PDS. This new dataset is being used to recalculate column water vapor abundances using current topographical data, as well as dust and pressure measurements from the Mars Global Circulation Model.References: [1] McEwen, A. S., et al. (2011). Seasonal flows on warm Martian slopes. Science (New York, N.Y.), 333(6043), 740-3. [2] Farmer, C. B., & Laporte, D. D. (1972). The Detection and Mapping of Water Vapor in the Martian Atmosphere. Icarus. [3] Nuno, R. G., et al. (2013). Searching for Localized Water Vapor Sources on Mars Utilizing Viking MAWD Data. 44th Lunar and Planetary Science Conference. [4] Farmer, C. B., et al. (1977

  5. Interoperable Geoprocessing for Rapid Prototyping of Landuse/Landcover, Topographical and Meteorological Datasets for Hydrological Simulation

    NASA Astrophysics Data System (ADS)

    Alarcon, V. J.; O'Hara, C. G.; Viger, R.; Shrestha, B.; Mali, P.; Toll, D. L.; Engman, T.

    2007-12-01

    Geoprocessing of landuse/landcover, topographical, and meteorological datasets is ubiquitous in the initial set- up of environmental models' applications. Geoprocessing provides parameterized geographic/meteorological information to models, organized per geographic sub-region or in the form of time-series. Environmental models use the summarized information for simulation of past, present or future events of interest. Traditionally, geoprocessing is performed following protocols and methodologies built-in into the environmental models. Each geoprocessing routine is tailored by the model's developers and is not re-usable or transferable to other models. Furthermore, metadata documenting the geoprocessing steps are usually not detailed. This paper proposes the use of the Geospatial Object Library for Environmental Modeling (GEOLEM) as an alternative or complementary tool for calculating land use, topographical, and meteorological parameters and time-series. GEOLEM has the capability of providing re-usable and transferable geoprocessed information to environmental models. Although this research focuses on the calculation of the geographical parameters and time-series needed by the Hydrological Simulation Program Fortran (HSPF), potential uses of GEOLEM in other environmental modeling frameworks are also addressed. The project area is located in the Saint Louis Bay watershed in the Mississippi Gulf coast. Interferometric Synthetic Aperture Radar (IFSAR) Digital Surface Model (5-m horizontal, 0.01-m vertical resolution), NASA's Shuttle Radar Topography Mission (SRTM) DTED Level 2 (30- m horizontal, 0.01-m vertical), National Elevation Data (NED) (30-m horizontal, 1-m vertical), and USGS DEM (300- m horizontal, 1-m vertical) topographical datasets were used in this research. Additionally, three landuse/landcover datasets were included: Geographic Retrieval and Analysis System (GIRAS), National Land Cover Dataset (NLCD), and NASA's Moderate Resolution Imaging

  6. Comparison of Two U.S. Power-Plant Carbon Dioxide Emissions Datasets

    NASA Astrophysics Data System (ADS)

    Ackerman, K. V.; Sundquist, E. T.

    2006-12-01

    U.S. electric generating facilities account for 8-9 percent of global fossil-fuel CO2 emissions. Because estimates of fossil-fuel consumption and CO2 emissions are recorded at each power-plant point source, U.S. power-plant CO2 emissions may be the most thoroughly monitored globally significant source of fossil-fuel CO2 emissions. We examined two datasets for the years 1998-2000: (1) the Department of Energy/Energy Information Administration (EIA) dataset of emissions calculated from fuel data contained in the EIA electricity database files, and (2) eGRID (Emissions and Generation Resource Integrated Database), a publicly available database generated by the Environmental Protection Agency. We compared the eGRID and EIA estimates of CO2 emissions for electricity generation at power plants within the conterminous U.S. at two levels: (1) estimates for individual power-plant emissions, which allowed analysis of differences due to plant listings, calculation methods, and measurement methods; and (2) estimated conterminous U.S. totals for power-plant emissions, which allowed analysis of the aggregated effects of these individual plant differences, and assessment of the aggregated differences in the context of previously published uncertainty estimates. Comparison of data for individual plants, after removing outliers, shows the average difference (absolute value) between eGRID and EIA estimates for individual plants to be approximately 12 percent, relative to the means of the paired estimates. Systematic differences are apparent in the eGRID and EIA reporting of emissions from combined heat and power plants. Additional differences between the eGRID and EIA datasets can be attributed to the fact that most of the emissions from the largest plants are derived from a Continuous Emissions Monitoring (CEM) system in eGRID and are calculated using fuel consumption data in the EIA dataset. This results in a conterminous U.S. total calculated by eGRID that is 3.4 to 5.8 percent

  7. Visual integration of multi-displicinary datasets for the geophysical analysis of tectonic processes

    NASA Astrophysics Data System (ADS)

    Jacobs, A. M.; Dingler, J. A.; Brothers, D.; Kent, G. M.

    2006-12-01

    Within the scientific community, there is a growing emphasis on interdisciplinary analyses to gain a more complete understanding of how entire earth systems function. Challenges of this approach include integrating the numerous, and often disparate, datasets, while also presenting the integrated data in a manner comprehensible to a wide range of scientists. Three- and four-dimensional visualization is quickly becoming the primary tool for facilitating these challenges. We frequently utilize the modular methodology of the IVS Fledermaus visualization software package to enhance our ability to better understand various geophysical datasets and the tectonic processes occurring within their respective systems. A main benefit of this software is that it allows us to generate individual visual objects from geo-referenced datasets and then combine them to form interactive, multi-dimension visual scenes. Additionally, this visualization process is advantageous to interdisciplinary analyses because: 1) the visual objects are portable across scenes, 2) they can be easily exchanged between scientists to build new user-specific scenes, and 3) both the objects or scenes can be viewed using the full software package or the free viewer, iView3D, on any modern computer operating system (i.e., Mac OSX, Windows, Linux). Here we present examples of Fledermaus and how we have used visualization to better "see" oceanic, coastal, and continental tectonic environments. In one visualization, bathymetric, petrologic and hydrothermal vent information from a spreading system in the Lau back-arc basin is integrated with multichannel seismic (MCS) data to ascertain where the subduction zone influences begin strongly shaping the character of the spreading ridge. In visualizations of coastal environments, we combine high-resolution seismic CHIRP data with bathymetry, side-scan and MCS data, Landsat images, geological maps, and earthquake locations to look at slope stability in the Santa Barbara

  8. New Atmospheric and Oceanic Angular Momentum Datasets for Predictions of Earth Rotation/Polar Motion

    NASA Astrophysics Data System (ADS)

    Salstein, D. A.; Stamatakos, N.

    2014-12-01

    We are reviewing the state of the art in available datasets for both atmospheric angular momentum (AAM) and oceanic angular momentum (OAM) for the purposes of analysis and prediction of both polar motion and length of day series. Both analyses and forecasts of these quantities have been used separately and in combination to aid in short and medium range predictions of Earth rotation parameters. The AAM and OAM combination, with the possible addition of hydrospheric angular momentum can form a proxy index for the Earth rotation parameters themselves due to the conservation of angular momentum in the Earth system. Such a combination of angular momentum of the geophysical fluids has helped in forecasts within periods up to about 10 days, due to the dynamic models, and together with extended statistical predictions of Earth rotation parameters out even as far as 90 days, according to Dill et al. (2013). We assess other dataset combinations that can be used in such analysis and prediction efforts for the Earth rotation parameters, and demonstrate the corresponding skill levels in doing so.

  9. A multi-dataset data-collection strategy produces better diffraction data.

    PubMed

    Liu, Zhi Jie; Chen, Lirong; Wu, Dong; Ding, Wei; Zhang, Hua; Zhou, Weihong; Fu, Zheng Qing; Wang, Bi Cheng

    2011-11-01

    A multi-dataset (MDS) data-collection strategy is proposed and analyzed for macromolecular crystal diffraction data acquisition. The theoretical analysis indicated that the MDS strategy can reduce the standard deviation (background noise) of diffraction data compared with the commonly used single-dataset strategy for a fixed X-ray dose. In order to validate the hypothesis experimentally, a data-quality evaluation process, termed a readiness test of the X-ray data-collection system, was developed. The anomalous signals of sulfur atoms in zinc-free insulin crystals were used as the probe to differentiate the quality of data collected using different data-collection strategies. The data-collection results using home-laboratory-based rotating-anode X-ray and synchrotron X-ray systems indicate that the diffraction data collected with the MDS strategy contain more accurate anomalous signals from sulfur atoms than the data collected with a regular data-collection strategy. In addition, the MDS strategy offered more advantages with respect to radiation-damage-sensitive crystals and better usage of rotating-anode as well as synchrotron X-rays.

  10. Wide-Area Mapping of Forest with National Airborne Laser Scanning and Field Inventory Datasets

    NASA Astrophysics Data System (ADS)

    Monnet, J.-M.; Ginzler, C.; Clivaz, J.-C.

    2016-06-01

    Airborne laser scanning (ALS) remote sensing data are now available for entire countries such as Switzerland. Methods for the estimation of forest parameters from ALS have been intensively investigated in the past years. However, the implementation of a forest mapping workflow based on available data at a regional level still remains challenging. A case study was implemented in the Canton of Valais (Switzerland). The national ALS dataset and field data of the Swiss National Forest Inventory were used to calibrate estimation models for mean and maximum height, basal area, stem density, mean diameter and stem volume. When stratification was performed based on ALS acquisition settings and geographical criteria, satisfactory prediction models were obtained for volume (R2 = 0.61 with a root mean square error of 47 %) and basal area (respectively 0.51 and 45 %) while height variables had an error lower than 19%. This case study shows that the use of nationwide ALS and field datasets for forest resources mapping is cost efficient, but additional investigations are required to handle the limitations of the input data and optimize the accuracy.

  11. Intercomparison and suitability of five Greenland topographic datasets for the purpose of hydrologic runoff modeling

    NASA Astrophysics Data System (ADS)

    Pitcher, L. H.; Smith, L. C.; Rennermalm, A. K.; Chu, V. W.; Gleason, C. J.; Yang, K.; Finnegan, D. C.; LeWinter, A. L.; Moller, D.; Moustafa, S.

    2012-12-01

    Rapid melting of the Greenland Ice Sheet (GrIS) and subsequent sea level rise has underscored the need for accurate modeling of hydrologic processes. Researchers rely on the accuracy of topography datasets for this purpose, especially in remote areas like Greenland where in situ validation data are difficult to acquire. A number of new remotely-sensed Digital Elevation Models (DEMs) have recently become available for Greenland, but a comparative study of their respective quality and suitability for hydrologic modeling has not been undertaken. We examine five such remotely-sensed DEMs acquired for proglacial and supraglacial ablation zones of Greenland, namely (1) WorldView stereo DEMs, (2) NASA GLISTIN-A experimental radar, (3) NASA/IceBridge Airborne Topographic Mapper (ATM), (4) Greenland Ice Mapping Project (GIMP) DEM, and (5) ASTER DEM. The quality, strengths and weaknesses of these DEMs for GrIS hydrologic modeling is assessed through intercomparison and in situ terrestrial lidar scanning data with precise RTK GPS control. Additionally, gridded bedrock (i.e. NASA/IceBridge Multichannel Coherent Radar Depth Sounder (MCoRDS); Bamber DEMs) and surface topography datasets are combined to create a hydraulic potentiometric surface for hydrologic modeling. Finally, the suitability of these combined topographic products for hydrologic modeling, characterization of GrIS meltwater runoff, and estimating sub- and/or englacial pathways is explored.

  12. Establishing macroecological trait datasets: digitalization, extrapolation, and validation of diet preferences in terrestrial mammals worldwide

    PubMed Central

    Kissling, Wilm Daniel; Dalby, Lars; Fløjgaard, Camilla; Lenoir, Jonathan; Sandel, Brody; Sandom, Christopher; Trøjelsgaard, Kristian; Svenning, Jens-Christian

    2014-01-01

    Ecological trait data are essential for understanding the broad-scale distribution of biodiversity and its response to global change. For animals, diet represents a fundamental aspect of species’ evolutionary adaptations, ecological and functional roles, and trophic interactions. However, the importance of diet for macroevolutionary and macroecological dynamics remains little explored, partly because of the lack of comprehensive trait datasets. We compiled and evaluated a comprehensive global dataset of diet preferences of mammals (“MammalDIET”). Diet information was digitized from two global and cladewide data sources and errors of data entry by multiple data recorders were assessed. We then developed a hierarchical extrapolation procedure to fill-in diet information for species with missing information. Missing data were extrapolated with information from other taxonomic levels (genus, other species within the same genus, or family) and this extrapolation was subsequently validated both internally (with a jack-knife approach applied to the compiled species-level diet data) and externally (using independent species-level diet information from a comprehensive continentwide data source). Finally, we grouped mammal species into trophic levels and dietary guilds, and their species richness as well as their proportion of total richness were mapped at a global scale for those diet categories with good validation results. The success rate of correctly digitizing data was 94%, indicating that the consistency in data entry among multiple recorders was high. Data sources provided species-level diet information for a total of 2033 species (38% of all 5364 terrestrial mammal species, based on the IUCN taxonomy). For the remaining 3331 species, diet information was mostly extrapolated from genus-level diet information (48% of all terrestrial mammal species), and only rarely from other species within the same genus (6%) or from family level (8%). Internal and external

  13. Revisiting Frazier's subdeltas: enhancing datasets with dimensionality, better to understand geologic systems

    USGS Publications Warehouse

    Flocks, James

    2006-01-01

    Scientific knowledge from the past century is commonly represented by two-dimensional figures and graphs, as presented in manuscripts and maps. Using today's computer technology, this information can be extracted and projected into three- and four-dimensional perspectives. Computer models can be applied to datasets to provide additional insight into complex spatial and temporal systems. This process can be demonstrated by applying digitizing and modeling techniques to valuable information within widely used publications. The seminal paper by D. Frazier, published in 1967, identified 16 separate delta lobes formed by the Mississippi River during the past 6,000 yrs. The paper includes stratigraphic descriptions through geologic cross-sections, and provides distribution and chronologies of the delta lobes. The data from Frazier's publication are extensively referenced in the literature. Additional information can be extracted from the data through computer modeling. Digitizing and geo-rectifying Frazier's geologic cross-sections produce a three-dimensional perspective of the delta lobes. Adding the chronological data included in the report provides the fourth-dimension of the delta cycles, which can be visualized through computer-generated animation. Supplemental information can be added to the model, such as post-abandonment subsidence of the delta-lobe surface. Analyzing the regional, net surface-elevation balance between delta progradations and land subsidence is computationally intensive. By visualizing this process during the past 4,500 yrs through multi-dimensional animation, the importance of sediment compaction in influencing both the shape and direction of subsequent delta progradations becomes apparent. Visualization enhances a classic dataset, and can be further refined using additional data, as well as provide a guide for identifying future areas of study.

  14. Data Discovery of Big and Diverse Climate Change Datasets - Options, Practices and Challenges

    NASA Astrophysics Data System (ADS)

    Palanisamy, G.; Boden, T.; McCord, R. A.; Frame, M. T.

    2013-12-01

    Developing data search tools is a very common, but often confusing, task for most of the data intensive scientific projects. These search interfaces need to be continually improved to handle the ever increasing diversity and volume of data collections. There are many aspects which determine the type of search tool a project needs to provide to their user community. These include: number of datasets, amount and consistency of discovery metadata, ancillary information such as availability of quality information and provenance, and availability of similar datasets from other distributed sources. Environmental Data Science and Systems (EDSS) group within the Environmental Science Division at the Oak Ridge National Laboratory has a long history of successfully managing diverse and big observational datasets for various scientific programs via various data centers such as DOE's Atmospheric Radiation Measurement Program (ARM), DOE's Carbon Dioxide Information and Analysis Center (CDIAC), USGS's Core Science Analytics and Synthesis (CSAS) metadata Clearinghouse and NASA's Distributed Active Archive Center (ORNL DAAC). This talk will showcase some of the recent developments for improving the data discovery within these centers The DOE ARM program recently developed a data discovery tool which allows users to search and discover over 4000 observational datasets. These datasets are key to the research efforts related to global climate change. The ARM discovery tool features many new functions such as filtered and faceted search logic, multi-pass data selection, filtering data based on data quality, graphical views of data quality and availability, direct access to data quality reports, and data plots. The ARM Archive also provides discovery metadata to other broader metadata clearinghouses such as ESGF, IASOA, and GOS. In addition to the new interface, ARM is also currently working on providing DOI metadata records to publishers such as Thomson Reuters and Elsevier. The ARM

  15. Web-based Data Information and Sharing System Using Mars Remotely Sensed Datasets

    NASA Astrophysics Data System (ADS)

    Necsoiu, M.; Dinwiddie, C. L.; Colton, S.; Coleman, N. M.

    2004-05-01

    to 2S) outflow channels. For Walla Walla Vallis (name provisionally approved by the International Astronomical Union), a small outflow channel, the integrated datasets helped resolve the locations of reaches that were indistinct in visible light images. For Ravi Vallis, the composite data system enhanced our understanding of how some chaotic terrain forms. As presented by Coleman, N.M. (2004 Lunar and Planetary Science Conference, Abstract #1299), thinning of the cryosphere by deep fluvial incision spawned secondary breakouts of groundwater, forming new chaos zones. The systems flexible design allows for incorporation of additional remote sensing datasets, such as those provided by MOC, TES, and MARSIS instruments. In summary, our integrated data-access system will make the wealth of new Martian data more readily available to planetary researchers enabling scientists to focus more time on analyses or algorithm development rather than on finding data and format conversions. Disclaimer: An employee of the U.S. Nuclear Regulatory Commission (NRC) made contributions to this work on his own time apart from regular duties. NRC has neither approved nor disapproved the technical context of this abstract.

  16. Dataset used to improve liquid water absorption models in the microwave

    SciTech Connect

    Turner, David

    2015-12-14

    Two datasets, one a compilation of laboratory data and one a compilation from three field sites, are provided here. These datasets provide measurements of the real and imaginary refractive indices and absorption as a function of cloud temperature. These datasets were used in the development of the new liquid water absorption model that was published in Turner et al. 2015.

  17. MATCH: Metadata Access Tool for Climate and Health Datasets

    DOE Data Explorer

    MATCH is a searchable clearinghouse of publicly available Federal metadata (i.e. data about data) and links to datasets. Most metadata on MATCH pertain to geospatial data sets ranging from local to global scales. The goals of MATCH are to: 1) Provide an easily accessible clearinghouse of relevant Federal metadata on climate and health that will increase efficiency in solving research problems; 2) Promote application of research and information to understand, mitigate, and adapt to the health effects of climate change; 3) Facilitate multidirectional communication among interested stakeholders to inform and shape Federal research directions; 4) Encourage collaboration among traditional and non-traditional partners in development of new initiatives to address emerging climate and health issues. [copied from http://match.globalchange.gov/geoportal/catalog/content/about.page

  18. [Parallel virtual reality visualization of extreme large medical datasets].

    PubMed

    Tang, Min

    2010-04-01

    On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis. PMID:20481303

  19. Feedstock Production Datasets from the Bioenergy Knowledge Discovery Framework

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. [copied from https://www.bioenergykdf.net/content/about] Holdings include datasets, models, and maps and the collections are growing due to both DOE contributions and data uploads from individuals.

  20. Biofuel Production Datasets from DOE's Bioenergy Knowledge Discovery Framework (KDF)

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. [copied from https://www.bioenergykdf.net/content/about]

    Holdings include datasets, models, and maps and the collections arel growing due to both DOE contributions and data uploads from individuals.

  1. Biofuel Distribution Datasets from the Bioenergy Knowledge Discovery Framework

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. [copied from https://www.bioenergykdf.net/content/about] Holdings include datasets, models, and maps and the collections are growing due to both DOE contributions and individuals' data uploads.

  2. Feedstock Logistics Datasets from DOE's Bioenergy Knowledge Discovery Framework (KDF)

    DOE Data Explorer

    The Bioenergy Knowledge Discovery Framework invites users to discover the power of bioenergy through an interface that provides extensive access to research data and literature, GIS mapping tools, and collaborative networks. The Bioenergy KDF supports efforts to develop a robust and sustainable bioenergy industry. The KDF facilitates informed decision making by providing a means to synthesize, analyze, and visualize vast amounts of information in a relevant and succinct manner. It harnesses Web 2.0 and social networking technologies to build a collective knowledge system that can better examine the economic and environmental impacts of development options for biomass feedstock production, biorefineries, and related infrastructure. Holdings include datasets, models, and maps. [from https://www.bioenergykdf.net/content/about

  3. Recovering complete and draft population genomes from metagenome datasets

    DOE PAGES

    Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

    2016-03-08

    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less

  4. Approximate Nearest Neighbor Search for a Dataset of Normalized Vectors

    NASA Astrophysics Data System (ADS)

    Terasawa, Kengo; Tanaka, Yuzuru

    This paper describes a novel algorithm for approximate nearest neighbor searching. For solving this problem especially in high dimensional spaces, one of the best-known algorithm is Locality-Sensitive Hashing (LSH). This paper presents a variant of the LSH algorithm that outperforms previously proposed methods when the dataset consists of vectors normalized to unit length, which is often the case in pattern recognition. The LSH scheme is based on a family of hash functions that preserves the locality of points. This paper points out that for our special case problem we can design efficient hash functions that map a point on the hypersphere into the closest vertex of the randomly rotated regular polytope. The computational analysis confirmed that the proposed method could improve the exponent ρ, the main indicator of the performance of the LSH algorithm. The practical experiments also supported the efficiency of our algorithm both in time and in space.

  5. Incorporating the TRMM Dataset into the GPM Mission Data Suite

    NASA Technical Reports Server (NTRS)

    Stocker, Erich Franz; Ji, Yimin; Chou, Joyce; Kelley, Owen; Kwiatkowski, John; Stout, John

    2016-01-01

    In June 2015 the TRMM satellite came to its end. The 17 plus year of mission data that it provided has proven a valuable asset to a variety of science communities. This 17plus year data set does not, however, stagnate with the end of the mission itself. NASA/JAXA intend to integrate the TRMM data set into the data suite of the GPM mission. This will ensure the creation of a consistent, intercalibrated, accurate dataset within GPM that extends back to November of 1998. This paper describes the plans for incorporating the TRMM 17plus year data into the GPM data suite. These plans call for using GPM algorithms for both radiometer and radar to reprocess TRMM data as well as intercalibrating partner radiometers using GPM intercalibration techniques. This reprocessing will mean changes in content, logical format and physical format as well as improved geolocation, sensor corrections and retrieval techniques.

  6. Computational models in the age of large datasets.

    PubMed

    O'Leary, Timothy; Sutton, Alexander C; Marder, Eve

    2015-06-01

    Technological advances in experimental neuroscience are generating vast quantities of data, from the dynamics of single molecules to the structure and activity patterns of large networks of neurons. How do we make sense of these voluminous, complex, disparate and often incomplete data? How do we find general principles in the morass of detail? Computational models are invaluable and necessary in this task and yield insights that cannot otherwise be obtained. However, building and interpreting good computational models is a substantial challenge, especially so in the era of large datasets. Fitting detailed models to experimental data is difficult and often requires onerous assumptions, while more loosely constrained conceptual models that explore broad hypotheses and principles can yield more useful insights.

  7. ConStrains identifies microbial strains in metagenomic datasets.

    PubMed

    Luo, Chengwei; Knight, Rob; Siljander, Heli; Knip, Mikael; Xavier, Ramnik J; Gevers, Dirk

    2015-10-01

    An important fraction of microbial diversity is harbored in strain individuality, so identification of conspecific bacterial strains is imperative for improved understanding of microbial community functions. Limitations in bioinformatics and sequencing technologies have to date precluded strain identification owing to difficulties in phasing short reads to faithfully recover the original strain-level genotypes, which have highly similar sequences. We present ConStrains, an open-source algorithm that identifies conspecific strains from metagenomic sequence data and reconstructs the phylogeny of these strains in microbial communities. The algorithm uses single-nucleotide polymorphism (SNP) patterns in a set of universal genes to infer within-species structures that represent strains. Applying ConStrains to simulated and host-derived datasets provides insights into microbial community dynamics.

  8. Recovering complete and draft population genomes from metagenome datasets.

    PubMed

    Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A

    2016-03-08

    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.

  9. Orthology detection combining clustering and synteny for very large datasets.

    PubMed

    Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K; Prohaska, Sonja J; Stadler, Peter F

    2014-01-01

    The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

  10. Hypersonic Turbulent Boundary-Layer and Free Sheer Database Datasets

    NASA Technical Reports Server (NTRS)

    Settles, Gary S.; Dodson, Lori J.

    1993-01-01

    A critical assessment and compilation of data are presented on attached hypersonic turbulent boundary layers in pressure gradients and compressible turbulent mixing layers. Extensive searches were conducted to identify candidate experiments, which were subjected to a rigorous set of acceptance criteria. Accepted datasets are both tabulated and provided in machine-readable form. The purpose of this database effort is to make existing high quality data available in detailed form for the turbulence-modeling and computational fluid dynamics communities. While significant recent data were found on the subject of compressible turbulent mixing, the available boundary-layer/pressure-gradient experiments are all older ones of which no acceptable data were found at hypersonic Mach numbers.

  11. Parallel acoustic wave propagation and generation of a seismic dataset

    SciTech Connect

    Oldfield, R.; Dyke, J.V.; Semeraro, B.D.

    1995-12-01

    The ultimate goal of this work is to construct a large seismic dataset that will be used to calibrate industrial seismic analysis codes. Seismic analysis is used in oil and gas exploration to deduce subterranean geological formations based on the reflection of acoustic waves from a source to an array of receivers placed on or near the surface. This work deals with the generation of a test set of acoustic data based on a known representative geological formation. Industrial users of the data will calibrate their codes by comparing their predicted geology to the know geology used to generate the test data. This is a cooperative effort involving Los Alamos, Sandia, Oak Ridge and Lawrence Livermore national labs as well as Institut Francais du Petrole and the Society of Exploration Geophysicists.

  12. A Discretized Method for Deriving Vortex Impulse from Volumetric Datasets

    NASA Astrophysics Data System (ADS)

    Buckman, Noam; Mendelson, Leah; Techet, Alexandra

    2015-11-01

    Many biological and mechanical systems transfer momentum through a fluid by creating vortical structures. To study this mechanism, we derive a method for extracting impulse and its time derivative from flow fields observed in experiments and simulations. We begin by discretizing a thin-cored vortex filament, and extend the model to account for finite vortex core thickness and asymmetric distributions of vorticity. By solely using velocity fields to extract vortex cores and calculate circulation, this method is applicable to 3D PIV datasets, even with low spatial resolution flow fields and measurement noise. To assess the performance of this analysis method, we simulate vortex rings and arbitrary vortex structures using OpenFOAM computational fluid dynamics software and analyze the wake momentum using this model in order to validate this method. We further examine a piston-vortex experiment, using 3D synthetic particle image velocimetry (SAPIV) to capture velocity fields. Strengths, limitations, and improvements to the framework are discussed.

  13. Ontology-based aggregation of biological pathway datasets.

    PubMed

    Jiang, Keyuan; Nash, Christopher

    2005-01-01

    The massive accumulation of biological data in the past decades has generated a significant amount of biological knowledge which is represented in one way as biological pathways. The existence of over 150 pathway databases reflects the diversity of the biological data and heterogeneity of data models, storage formats and access methods. To address an intriguing biological question, it is not uncommon for a biologist to query more one pathway database to acquire a more complete picture of current understanding of biology. To facility life scientists in searching biological pathway data, we designed a biological pathway aggregator which aggregates various pathway datasets via the BioPAX ontology, a community-developed ontology based upon the concept of Semantic Web for integrating and exchanging biological pathway data. Our aggregator is composed of modules that retrieve the data from various sources, transform the raw data to BioPAX format, persist the converted data in the persistent data store, and enable queries by other applications.

  14. Worldwide dataset of glacier thickness observations compiled by literature review

    NASA Astrophysics Data System (ADS)

    Naegeli, Kathrin; Gärtner-Roer, Isabelle; Hagg, Wilfried; Huss, Matthias; Machguth, Horst; Zemp, Michael

    2013-04-01

    The volume of glaciers and ice caps is still poorly known, although it is expected to contribute significantly to changes in the hydrological cycle and global sea level rise over the next decades. Studies presenting worldwide estimations are mostly based on modelling and scaling approaches and are usually calibrated with only few measurements. Direct investigations of glacier thickness, a crucial parameter for ice volume calculations, are rather sparse but nevertheless available from all around the globe. This study presents a worldwide compilation of glacier thickness observation data. Literature review revealed mean and/or maximum thickness values from 442 glaciers and ice caps, elevation band information and point measurements for 10 and 14 glaciers, respectively. Resulting in a dataset containing glaciers and ice caps with areas ranging from smaller than 0.1 km2 (e.g. Pizolgletscher, Switzerland) to larger than 10'000 km2 (e.g. Agassiz Ice Cap, Canada), mean ice thicknesses between 4 m (Blaueis, Germany) and 550 m (Aletschgletscher, Switzerland) and 64 values for ice masses with entries from different years. Thickness values are derived from various observation methods and cover a survey period between 1923 and 2011. A major advantage of the database is the included metadata, giving information about specific fields, such as the mean thickness value of Aletschgletscher, which is only valid for the investigation area Konkordiaplatz and not over the entire glacier. The relatively small collection of records in the two more detailed database levels reflects the poor availability of such data. For modelling purposes, where ice thicknesses are implemented to derive ice volumes, this database provides essential information about glacier and ice cap characteristics and enables the comparison between various approaches. However, the dataset offers a great variety of locations, thicknesses and surface areas of glaciers and ice caps and can therefore help to compare

  15. Cirrus Mammatus Properties Derived from an Extended Remote Sensing Dataset.

    NASA Astrophysics Data System (ADS)

    Wang, Likun; Sassen, Kenneth

    2006-02-01

    The first quantitative and statistical evaluation of cirrus mammatus clouds based on wavelet analysis of remote sensing data is made by analyzing the University of Utah Facility for Atmospheric Remote Sensing (FARS) 10-yr high-cloud dataset. First, a case study of cirrus mammata combining a high-resolution lidar system and a W-band Doppler radar is presented, yielding an assessment of the thermodynamic environment and dynamic mechanisms. Then, 25 cirrus mammatus cases selected from the FARS lidar dataset are used to disclose their characteristic environmental conditions, and vertical and length scales. The results show that cirrus mammata occur in the transition zone from moist (cloudy) to dry air layers with weak wind shear, which suggests that cloud-induced thermal structures play a key role in their formation. Their maximum vertical and horizontal length scales vary from 0.3 to 1.1 km and 0.5 to 8.0 km, respectively. It is also found that small-scale structures develop between the large-scale protuberances. The spectral slopes of the lidar-returned power and mean radar Doppler velocity data extracted from the cirrus cloud-base region further indicate the presence of developed three-dimensional, locally isotropic, homogeneous turbulence generated by buoyancy. Finally, comparisons of anvil and cirrus mammata are made. Although both are generated in a similar environment, cirrus mammata generally do not form fallout fronts like their anvil counterparts, and so do not have their smooth and beautiful outlines.


  16. Independently Controlled Wing Stroke Patterns in the Fruit Fly Drosophila melanogaster

    PubMed Central

    Chakraborty, Soma; Bartussek, Jan; Fry, Steven N.; Zapotocky, Martin

    2015-01-01

    Flies achieve supreme flight maneuverability through a small set of miniscule steering muscles attached to the wing base. The fast flight maneuvers arise from precisely timed activation of the steering muscles and the resulting subtle modulation of the wing stroke. In addition, slower modulation of wing kinematics arises from changes in the activity of indirect flight muscles in the thorax. We investigated if these modulations can be described as a superposition of a limited number of elementary deformations of the wing stroke that are under independent physiological control. Using a high-speed computer vision system, we recorded the wing motion of tethered flying fruit flies for up to 12 000 consecutive wing strokes at a sampling rate of 6250 Hz. We then decomposed the joint motion pattern of both wings into components that had the minimal mutual information (a measure of statistical dependence). In 100 flight segments measured from 10 individual flies, we identified 7 distinct types of frequently occurring least-dependent components, each defining a kinematic pattern (a specific deformation of the wing stroke and the sequence of its activation from cycle to cycle). Two of these stroke deformations can be associated with the control of yaw torque and total flight force, respectively. A third deformation involves a change in the downstroke-to-upstroke duration ratio, which is expected to alter the pitch torque. A fourth kinematic pattern consists in the alteration of stroke amplitude with a period of 2 wingbeat cycles, extending for dozens of cycles. Our analysis indicates that these four elementary kinematic patterns can be activated mutually independently, and occur both in isolation and in linear superposition. The results strengthen the available evidence for independent control of yaw torque, pitch torque, and total flight force. Our computational method facilitates systematic identification of novel patterns in large kinematic datasets. PMID:25710715

  17. Independent Learning Models: A Comparison.

    ERIC Educational Resources Information Center

    Wickett, R. E. Y.

    Five models of independent learning are suitable for use in adult education programs. The common factor is a facilitator who works in some way with the student in the learning process. They display different characteristics, including the extent of independence in relation to content and/or process. Nondirective tutorial instruction and learning…

  18. Alaska national hydrography dataset positional accuracy assessment study

    USGS Publications Warehouse

    Arundel, Samantha; Yamamoto, Kristina H.; Constance, Eric; Mantey, Kim; Vinyard-Houx, Jeremy

    2013-01-01

    Initial visual assessments Wide range in the quality of fit between features in NHD and these new image sources. No statistical analysis has been performed to actually quantify accuracy Determining absolute accuracy is cost prohibitive (must collect independent, well defined test points) Quantitative analysis of relative positional error is feasible.

  19. Improved 3D density modelling of the Central Andes from combining terrestrial datasets with satellite based datasets

    NASA Astrophysics Data System (ADS)

    Schaller, Theresa; Sobiesiak, Monika; Götze, Hans-Jürgen; Ebbing, Jörg

    2015-04-01

    As horizontal gravity gradients are proxies for large stresses, the uniquely high gravity gradients of the South American continental margin seem to be indicative for the frequently occurring large earthquakes at this plate boundary. It has been observed that these earthquakes can break repeatedly the same respective segment but can also combine to form M>9 earthquakes at the end of longer seismic cycles. A large seismic gap left behind by the 1877 M~9 earthquake existed in the northernmost part of Chile. This gap has partially been ruptured in the Mw 7.7 2007 Tocopilla earthquake and the Mw 8.2 2014 Pisagua earthquake. The nature of this seismological segmentation and the distribution of energy release in an earthquake is part of ongoing research. It can be assumed that both features are related to thickness variations of high density bodies located in the continental crust of the coastal area. These batholiths produce a clear maximum in the gravity signal. Those maxima also show a good spatial correlation with seismic asperity structures and seismological segment boundaries. Understanding of the tectonic situation can be improved through 3D forward density modelling of the gravity field. Problems arise in areas with less ground measurements. Especially in the high Andes severe gaps exist due to the inaccessibility of some regions. Also the transition zone between on and offshore date data displays significant problems, particularly since this is the area that is most interesting in terms of seismic hazard. We modelled the continental and oceanic crust and upper mantle using different gravity datasets. The first one includes terrestrial data measured at a station spacing of 5 km or less along all passable roads combined with satellite altimetry data offshore. The second data set is the newly released EIGEN-6C4 which combines the latest satellite data with ground measurements. The spherical harmonics maximum degree of EIGEN-6C4 is 2190 which corresponds to a

  20. Displaying Planetary and Geophysical Datasets on NOAAs Science On a Sphere (TM)

    NASA Astrophysics Data System (ADS)

    Albers, S. C.; MacDonald, A. E.; Himes, D.

    2005-12-01

    NOAAs Science On a Sphere(TM)(SOS)was developed to educate current and future generations about the changing Earth and its processes. This system presents NOAAs global science through a 3D representation of our planet as if the viewer were looking at the Earth from outer space. In our presentation, we will describe the preparation of various global datasets for display on Science On a Sphere(TM), a 1.7-m diameter spherical projection system developed and patented at the Forecast Systems Laboratory (FSL) in Boulder, Colorado. Four projectors cast rotating images onto a spherical projection screen to create the effect of Earth, planet, or satellite floating in space. A static dataset can be prepared for display using popular image formats such as JPEG, usually sized at 1024x2048 or 2048x4096 pixels. A set of static images in a directory will comprise a movie. Imagery and data for SOS are obtained from a variety of government organizations, sometimes post-processed by various individuals, including the authors. Some datasets are already available in the required cylindrical projection. Readily available planetary maps can often be improved in coverage and/or appearance by reprojecting and combining additional images and mosaics obtained by various spacecraft, such as Voyager, Galileo, and Cassini. A map of Mercury was produced by blending some Mariner 10 photo-mosaics with a USGS shaded-relief map. An improved high-resolution map of Venus was produced by combining several Magellan mosaics, supplied by The Planetary Society, along with other spacecraft data. We now have a full set of Jupiter's Galilean satellite imagery that we can display on Science On a Sphere(TM). Photo-mosaics of several Saturnian satellites were updated by reprojecting and overlaying recently taken Cassini flyby images. Maps of imagery from five Uranian satellites were added, as well as one for Neptune. More image processing was needed to add a high-resolution Voyager mosaic to a pre-existing map

  1. Polyimide processing additives

    NASA Technical Reports Server (NTRS)

    Fletcher, James C. (Inventor); Pratt, J. Richard (Inventor); St.clair, Terry L. (Inventor); Stoakley, Diane M. (Inventor); Burks, Harold D. (Inventor)

    1992-01-01

    A process for preparing polyimides having enhanced melt flow properties is described. The process consists of heating a mixture of a high molecular weight poly-(amic acid) or polyimide with a low molecular weight amic acid or imide additive in the range of 0.05 to 15 percent by weight of additive. The polyimide powders so obtained show improved processability, as evidenced by lower melt viscosity by capillary rheometry. Likewise, films prepared from mixtures of polymers with additives show improved processability with earlier onset of stretching by TMA.

  2. Polyimide processing additives

    NASA Technical Reports Server (NTRS)

    Pratt, J. Richard (Inventor); St.clair, Terry L. (Inventor); Stoakley, Diane M. (Inventor); Burks, Harold D. (Inventor)

    1993-01-01

    A process for preparing polyimides having enhanced melt flow properties is described. The process consists of heating a mixture of a high molecular weight poly-(amic acid) or polyimide with a low molecular weight amic acid or imide additive in the range of 0.05 to 15 percent by weight of the additive. The polyimide powders so obtained show improved processability, as evidenced by lower melt viscosity by capillary rheometry. Likewise, films prepared from mixtures of polymers with additives show improved processability with earlier onset of stretching by TMA.

  3. Atlas Toolkit: Fast registration of 3D morphological datasets in the absence of landmarks

    PubMed Central

    Grocott, Timothy; Thomas, Paul; Münsterberg, Andrea E.

    2016-01-01

    Image registration is a gateway technology for Developmental Systems Biology, enabling computational analysis of related datasets within a shared coordinate system. Many registration tools rely on landmarks to ensure that datasets are correctly aligned; yet suitable landmarks are not present in many datasets. Atlas Toolkit is a Fiji/ImageJ plugin collection offering elastic group-wise registration of 3D morphological datasets, guided by segmentation of the interesting morphology. We demonstrate the method by combinatorial mapping of cell signalling events in the developing eyes of chick embryos, and use the integrated datasets to predictively enumerate Gene Regulatory Network states. PMID:26864723

  4. Additional Types of Neuropathy

    MedlinePlus

    ... A A Listen En Español Additional Types of Neuropathy Charcot's Joint Charcot's Joint, also called neuropathic arthropathy, ... can stop bone destruction and aid healing. Cranial Neuropathy Cranial neuropathy affects the 12 pairs of nerves ...

  5. Food Additives and Hyperkinesis

    ERIC Educational Resources Information Center

    Wender, Ester H.

    1977-01-01

    The hypothesis that food additives are causally associated with hyperkinesis and learning disabilities in children is reviewed, and available data are summarized. Available from: American Medical Association 535 North Dearborn Street Chicago, Illinois 60610. (JG)

  6. Smog control fuel additives

    SciTech Connect

    Lundby, W.

    1993-06-29

    A method is described of controlling, reducing or eliminating, ozone and related smog resulting from photochemical reactions between ozone and automotive or industrial gases comprising the addition of iodine or compounds of iodine to hydrocarbon-base fuels prior to or during combustion in an amount of about 1 part iodine per 240 to 10,000,000 parts fuel, by weight, to be accomplished by: (a) the addition of these inhibitors during or after the refining or manufacturing process of liquid fuels; (b) the production of these inhibitors for addition into fuel tanks, such as automotive or industrial tanks; or (c) the addition of these inhibitors into combustion chambers of equipment utilizing solid fuels for the purpose of reducing ozone.

  7. Analysis of cornea curvature using radial basis functions - Part II: Fitting to data-set.

    PubMed

    Griffiths, G W; Płociniczak, Ł; Schiesser, W E

    2016-10-01

    In part I we discussed the solution of corneal curvature using a 2D meshless method based on radial basis functions (RBFs). In Part II we use these methods to fit a full nonlinear thin membrane model to a measured data-set in order to generate a topological mathematical description of the cornea. In addition, we show how these results can lead to estimations for corneal radius of curvature and certain physical properties of the cornea; namely, tension and elasticity coefficient. Again all calculations and graphics generation were performed using the R language programming environment. The model describes corneal topology extremely well, and the estimated properties fall well within the expected range of values. The method is straight forward to implement and offers scope for further analysis using more detailed 3D models that include corneal thickness. PMID:27570056

  8. Extensive dataset of boar seminal plasma proteome displaying putative reproductive functions of identified proteins.

    PubMed

    Perez-Patiño, Cristina; Barranco, Isabel; Parrilla, Inmaculada; Martinez, Emilio A; Rodriguez-Martinez, Heriberto; Roca, Jordi

    2016-09-01

    A complete proteomic profile of seminal plasma (SP) remains challenging, particularly in porcine. The data reports on the analysis of boar SP-proteins by using a combination of SEC, 1-D SDS PAGE and NanoLC-ESI-MS/MS from 33 pooled SP-samples (11 boars, 3 ejaculates/boar). A complete dataset of the 536 SP-proteins identified and validated with confidence ≥95% (Unused Score >1.3) and a false discovery rate (FDR) ≤1%, is provided. In addition, the relative abundance of 432 of them is also shown. Gene ontology annotation of the complete SP-proteome complemented by an extensive description of the putative reproductive role of SP-proteins, providing a valuable source for a better understanding of SP role in the reproductive success. This data article refers to the article entitled "Characterization of the porcine seminal plasma proteome comparing ejaculate portions" (Perez-Patiño et al., 2016) [1]. PMID:27583342

  9. Compiling a Comprehensive EVA Training Dataset for NASA Astronauts

    NASA Technical Reports Server (NTRS)

    Laughlin, M. S.; Murray, J. D.; Lee, L. R.; Wear, M. L.; Van Baalen, M.

    2016-01-01

    Training for a spacewalk or extravehicular activity (EVA) is considered a hazardous duty for NASA astronauts. This places astronauts at risk for decompression sickness as well as various musculoskeletal disorders from working in the spacesuit. As a result, the operational and research communities over the years have requested access to EVA training data to supplement their studies. The purpose of this paper is to document the comprehensive EVA training data set that was compiled from multiple sources by the Lifetime Surveillance of Astronaut Health (LSAH) epidemiologists to investigate musculoskeletal injuries. The EVA training dataset does not contain any medical data, rather it only documents when EVA training was performed, by whom and other details about the session. The first activities practicing EVA maneuvers in water were performed at the Neutral Buoyancy Simulator (NBS) at the Marshall Spaceflight Center in Huntsville, Alabama. This facility opened in 1967 and was used for EVA training until the early Space Shuttle program days. Although several photographs show astronauts performing EVA training in the NBS, records detailing who performed the training and the frequency of training are unavailable. Paper training records were stored within the NBS after it was designated as a National Historic Landmark in 1985 and closed in 1997, but significant resources would be needed to identify and secure these records, and at this time LSAH has not pursued acquisition of these early training records. Training in the NBS decreased when the Johnson Space Center in Houston, Texas, opened the Weightless Environment Training Facility (WETF) in 1980. Early training records from the WETF consist of 11 hand-written dive logbooks compiled by individual workers that were digitized at the request of LSAH. The WETF was integral in the training for Space Shuttle EVAs until its closure in 1998. The Neutral Buoyancy Laboratory (NBL) at the Sonny Carter Training Facility near JSC

  10. A comparison of absolute performance of different correlative and mechanistic species distribution models in an independent area.

    PubMed

    Shabani, Farzin; Kumar, Lalit; Ahmadi, Mohsen

    2016-08-01

    To investigate the comparative abilities of six different bioclimatic models in an independent area, utilizing the distribution of eight different species available at a global scale and in Australia. Global scale and Australia. We tested a variety of bioclimatic models for eight different plant species employing five discriminatory correlative species distribution models (SDMs) including Generalized Linear Model (GLM), MaxEnt, Random Forest (RF), Boosted Regression Tree (BRT), Bioclim, together with CLIMEX (CL) as a mechanistic niche model. These models were fitted using a training dataset of available global data, but with the exclusion of Australian locations. The capabilities of these techniques in projecting suitable climate, based on independent records for these species in Australia, were compared. Thus, Australia is not used to calibrate the models and therefore it is as an independent area regarding geographic locations. To assess and compare performance, we utilized the area under the receiver operating characteristic (ROC) curves (AUC), true skill statistic (TSS), and fractional predicted areas for all SDMs. In addition, we assessed satisfactory agreements between the outputs of the six different bioclimatic models, for all eight species in Australia. The modeling method impacted on potential distribution predictions under current climate. However, the utilization of sensitivity and the fractional predicted areas showed that GLM, MaxEnt, Bioclim, and CL had the highest sensitivity for Australian climate conditions. Bioclim calculated the highest fractional predicted area of an independent area, while RF and BRT were poor. For many applications, it is difficult to decide which bioclimatic model to use. This research shows that variable results are obtained using different SDMs in an independent area. This research also shows that the SDMs produce different results for different species; for example, Bioclim may not be good for one species but works better

  11. A comparison of absolute performance of different correlative and mechanistic species distribution models in an independent area.

    PubMed

    Shabani, Farzin; Kumar, Lalit; Ahmadi, Mohsen

    2016-08-01

    To investigate the comparative abilities of six different bioclimatic models in an independent area, utilizing the distribution of eight different species available at a global scale and in Australia. Global scale and Australia. We tested a variety of bioclimatic models for eight different plant species employing five discriminatory correlative species distribution models (SDMs) including Generalized Linear Model (GLM), MaxEnt, Random Forest (RF), Boosted Regression Tree (BRT), Bioclim, together with CLIMEX (CL) as a mechanistic niche model. These models were fitted using a training dataset of available global data, but with the exclusion of Australian locations. The capabilities of these techniques in projecting suitable climate, based on independent records for these species in Australia, were compared. Thus, Australia is not used to calibrate the models and therefore it is as an independent area regarding geographic locations. To assess and compare performance, we utilized the area under the receiver operating characteristic (ROC) curves (AUC), true skill statistic (TSS), and fractional predicted areas for all SDMs. In addition, we assessed satisfactory agreements between the outputs of the six different bioclimatic models, for all eight species in Australia. The modeling method impacted on potential distribution predictions under current climate. However, the utilization of sensitivity and the fractional predicted areas showed that GLM, MaxEnt, Bioclim, and CL had the highest sensitivity for Australian climate conditions. Bioclim calculated the highest fractional predicted area of an independent area, while RF and BRT were poor. For many applications, it is difficult to decide which bioclimatic model to use. This research shows that variable results are obtained using different SDMs in an independent area. This research also shows that the SDMs produce different results for different species; for example, Bioclim may not be good for one species but works better

  12. Comparative and Joint Analysis of Two Metagenomic Datasets from a Biogas Fermenter Obtained by 454-Pyrosequencing

    PubMed Central

    Jaenicke, Sebastian; Ander, Christina; Bekel, Thomas; Bisdorf, Regina; Dröge, Marcus; Gartemann, Karl-Heinz; Jünemann, Sebastian; Kaiser, Olaf; Krause, Lutz; Tille, Felix; Zakrzewski, Martha; Pühler, Alfred

    2011-01-01

    Biogas production from renewable resources is attracting increased attention as an alternative energy source due to the limited availability of traditional fossil fuels. Many countries are promoting the use of alternative energy sources for sustainable energy production. In this study, a metagenome from a production-scale biogas fermenter was analysed employing Roche's GS FLX Titanium technology and compared to a previous dataset obtained from the same community DNA sample that was sequenced on the GS FLX platform. Taxonomic profiling based on 16S rRNA-specific sequences and an Environmental Gene Tag (EGT) analysis employing CARMA demonstrated that both approaches benefit from the longer read lengths obtained on the Titanium platform. Results confirmed Clostridia as the most prevalent taxonomic class, whereas species of the order Methanomicrobiales are dominant among methanogenic Archaea. However, the analyses also identified additional taxa that were missed by the previous study, including members of the genera Streptococcus, Acetivibrio, Garciella, Tissierella, and Gelria, which might also play a role in the fermentation process leading to the formation of methane. Taking advantage of the CARMA feature to correlate taxonomic information of sequences with their assigned functions, it appeared that Firmicutes, followed by Bacteroidetes and Proteobacteria, dominate within the functional context of polysaccharide degradation whereas Methanomicrobiales represent the most abundant taxonomic group responsible for methane production. Clostridia is the most important class involved in the reductive CoA pathway (Wood-Ljungdahl pathway) that is characteristic for acetogenesis. Based on binning of 16S rRNA-specific sequences allocated to the dominant genus Methanoculleus, it could be shown that this genus is represented by several different species. Phylogenetic analysis of these sequences placed them in close proximity to the hydrogenotrophic methanogen Methanoculleus

  13. Aster Global dem Version 3, and New Aster Water Body Dataset

    NASA Astrophysics Data System (ADS)

    Abrams, M.

    2016-06-01

    In 2016, the US/Japan ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) project released Version 3 of the Global DEM (GDEM). This 30 m DEM covers the earth's surface from 82N to 82S, and improves on two earlier versions by correcting some artefacts and filling in areas of missing DEMs by the acquisition of additional data. The GDEM was produced by stereocorrelation of 2 million ASTER scenes and operation on a pixel-by-pixel basis: cloud screening; stacking data from overlapping scenes; removing outlier values, and averaging elevation values. As previously, the GDEM is packaged in ~ 23,000 1 x 1 degree tiles. Each tile has a DEM file, and a NUM file reporting the number of scenes used for each pixel, and identifying the source for fill-in data (where persistent clouds prevented computation of an elevation value). An additional data set was concurrently produced and released: the ASTER Water Body Dataset (AWBD). This is a 30 m raster product, which encodes every pixel as either lake, river, or ocean; thus providing a global inland and shore-line water body mask. Water was identified through spectral analysis algorithms and manual editing. This product was evaluated against the Shuttle Water Body Dataset (SWBD), and the Landsat-based Global Inland Water (GIW) product. The SWBD only covers the earth between about 60 degrees north and south, so it is not a global product. The GIW only delineates inland water bodies, and does not deal with ocean coastlines. All products are at 30 m postings.

  14. Multielement geochemical dataset of surficial materials for the northern Great Basin

    USGS Publications Warehouse

    Coombs, Mary Jane; Kotlyar, Boris B.; Ludington, Steve; Folger, Helen W.; Mossotti, Victor G.

    2002-01-01

    This report presents geochemical data generated during mineral and environmental assessments for the Bureau of Land Management in northern Nevada, northeastern California, southeastern Oregon, and southwestern Idaho, along with metadata and map representations of selected elements. The dataset presented here is a compilation of chemical analyses of over 10,200 stream-sediment and soil samples originally collected during the National Uranium Resource Evaluation's (NURE) Hydrogeochemical and Stream Sediment Reconnaissance (HSSR) program of the Department of Energy and its predecessors and reanalyzed to support a series of mineral-resource assessments by the U.S. Geological Survey (USGS). The dataset also includes the analyses of additional samples collected by the USGS in 1992. The sample sites are in southeastern Oregon, southwestern Idaho, northeastern California, and, primarily, in northern Nevada. These samples were collected from 1977 to 1983, before the development of most of the present-day large-scale mining infrastructure in northern Nevada. As such, these data may serve as an important baseline for current and future geoenvironmental studies. Largely because of the very diverse analytical methods used by the NURE HSSR program, the original NURE analyses in this area yielded little useful geochemical information. The Humboldt, Malheur-Jordan-Andrews, and Winnemucca-Surprise studies were designed to provide useful geochemical data via improved analytical methods (lower detection levels and higher precision) and, in the Malheur-Jordan-Andrews and Winnemucca Surprise areas, to collect additional stream-sediment samples to increase sampling coverage. The data are provided in *.xls (Microsoft Excel) and *.csv (comma-separated-value) format. We also present graphically 35 elements, interpolated ("gridded") in a geographic information system (GIS) and overlain by major geologic trends, so that users may view the variation in elemental concentrations over the

  15. Biographical factors of occupational independence.

    PubMed

    Müller, G F

    2001-10-01

    The present study examined biographical factors of occupational independence including any kind of nonemployed profession. Participants were 59 occupationally independent and 58 employed persons of different age (M = 36.3 yr.), sex, and profession. They were interviewed on variables like family influence, educational background, occupational role models, and critical events for choosing a particular type of occupational career. The obtained results show that occupationally independent people reported stronger family ties, experienced fewer restrictions of formal education, and remembered fewer negative role models than the employed people. Implications of these results are discussed. PMID:11783553

  16. Approximately Independent Features of Languages

    NASA Astrophysics Data System (ADS)

    Holman, Eric W.

    To facilitate the testing of models for the evolution of languages, the present paper offers a set of linguistic features that are approximately independent of each other. To find these features, the adjusted Rand index (R‧) is used to estimate the degree of pairwise relationship among 130 linguistic features in a large published database. Many of the R‧ values prove to be near zero, as predicted for independent features, and a subset of 47 features is found with an average R‧ of -0.0001. These 47 features are recommended for use in statistical tests that require independent units of analysis.

  17. Sequence independent amplification of DNA

    DOEpatents

    Bohlander, Stefan K.

    1998-01-01

    The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei.

  18. Sequence independent amplification of DNA

    DOEpatents

    Bohlander, S.K.

    1998-03-24

    The present invention is a rapid sequence-independent amplification procedure (SIA). Even minute amounts of DNA from various sources can be amplified independent of any sequence requirements of the DNA or any a priori knowledge of any sequence characteristics of the DNA to be amplified. This method allows, for example, the sequence independent amplification of microdissected chromosomal material and the reliable construction of high quality fluorescent in situ hybridization (FISH) probes from YACs or from other sources. These probes can be used to localize YACs on metaphase chromosomes but also--with high efficiency--in interphase nuclei. 25 figs.

  19. Independent component representations for face recognition

    NASA Astrophysics Data System (ADS)

    Stewart Bartlett, Marian; Lades, Martin H.; Sejnowski, Terrence J.

    1998-07-01

    In a task such as face recognition, much of the important information may be contained in the high-order relationships among the image pixels. A number of face recognition algorithms employ principal component analysis (PCA), which is based on the second-order statistics of the image set, and does not address high-order statistical dependencies such as the relationships among three or more pixels. Independent component analysis (ICA) is a generalization of PCA which separates the high-order moments of the input in addition to the second-order moments. ICA was performed on a set of face images by an unsupervised learning algorithm derived from the principle of optimal information transfer through sigmoidal neurons. The algorithm maximizes the mutual information between the input and the output, which produces statistically independent outputs under certain conditions. ICA was performed on the face images under two different architectures. The first architecture provided a statistically independent basis set for the face images that can be viewed as a set of independent facial features. The second architecture provided a factorial code, in which the probability of any combination of features can be obtained from the product of their individual probabilities. Both ICA representations were superior to representations based on principal components analysis for recognizing faces across sessions and changes in expression.

  20. Chandra Independently Determines Hubble Constant

    NASA Astrophysics Data System (ADS)

    2006-08-01

    A critically important number that specifies the expansion rate of the Universe, the so-called Hubble constant, has been independently determined using NASA's Chandra X-ray Observatory. This new value matches recent measurements using other methods and extends their validity to greater distances, thus allowing astronomers to probe earlier epochs in the evolution of the Universe. "The reason this result is so significant is that we need the Hubble constant to tell us the size of the Universe, its age, and how much matter it contains," said Max Bonamente from the University of Alabama in Huntsville and NASA's Marshall Space Flight Center (MSFC) in Huntsville, Ala., lead author on the paper describing the results. "Astronomers absolutely need to trust this number because we use it for countless calculations." Illustration of Sunyaev-Zeldovich Effect Illustration of Sunyaev-Zeldovich Effect The Hubble constant is calculated by measuring the speed at which objects are moving away from us and dividing by their distance. Most of the previous attempts to determine the Hubble constant have involved using a multi-step, or distance ladder, approach in which the distance to nearby galaxies is used as the basis for determining greater distances. The most common approach has been to use a well-studied type of pulsating star known as a Cepheid variable, in conjunction with more distant supernovae to trace distances across the Universe. Scientists using this method and observations from the Hubble Space Telescope were able to measure the Hubble constant to within 10%. However, only independent checks would give them the confidence they desired, considering that much of our understanding of the Universe hangs in the balance. Chandra X-ray Image of MACS J1149.5+223 Chandra X-ray Image of MACS J1149.5+223 By combining X-ray data from Chandra with radio observations of galaxy clusters, the team determined the distances to 38 galaxy clusters ranging from 1.4 billion to 9.3 billion