Yu, Yao; Hu, Hao; Bohlender, Ryan J; Hu, Fulan; Chen, Jiun-Sheng; Holt, Carson; Fowler, Jerry; Guthery, Stephen L; Scheet, Paul; Hildebrandt, Michelle A T; Yandell, Mark; Huff, Chad D
2018-04-06
High-throughput sequencing data are increasingly being made available to the research community for secondary analyses, providing new opportunities for large-scale association studies. However, heterogeneity in target capture and sequencing technologies often introduce strong technological stratification biases that overwhelm subtle signals of association in studies of complex traits. Here, we introduce the Cross-Platform Association Toolkit, XPAT, which provides a suite of tools designed to support and conduct large-scale association studies with heterogeneous sequencing datasets. XPAT includes tools to support cross-platform aware variant calling, quality control filtering, gene-based association testing and rare variant effect size estimation. To evaluate the performance of XPAT, we conducted case-control association studies for three diseases, including 783 breast cancer cases, 272 ovarian cancer cases, 205 Crohn disease cases and 3507 shared controls (including 1722 females) using sequencing data from multiple sources. XPAT greatly reduced Type I error inflation in the case-control analyses, while replicating many previously identified disease-gene associations. We also show that association tests conducted with XPAT using cross-platform data have comparable performance to tests using matched platform data. XPAT enables new association studies that combine existing sequencing datasets to identify genetic loci associated with common diseases and other complex traits.
CrossCheck: an open-source web tool for high-throughput screen data analysis.
Najafov, Jamil; Najafov, Ayaz
2017-07-19
Modern high-throughput screening methods allow researchers to generate large datasets that potentially contain important biological information. However, oftentimes, picking relevant hits from such screens and generating testable hypotheses requires training in bioinformatics and the skills to efficiently perform database mining. There are currently no tools available to general public that allow users to cross-reference their screen datasets with published screen datasets. To this end, we developed CrossCheck, an online platform for high-throughput screen data analysis. CrossCheck is a centralized database that allows effortless comparison of the user-entered list of gene symbols with 16,231 published datasets. These datasets include published data from genome-wide RNAi and CRISPR screens, interactome proteomics and phosphoproteomics screens, cancer mutation databases, low-throughput studies of major cell signaling mediators, such as kinases, E3 ubiquitin ligases and phosphatases, and gene ontological information. Moreover, CrossCheck includes a novel database of predicted protein kinase substrates, which was developed using proteome-wide consensus motif searches. CrossCheck dramatically simplifies high-throughput screen data analysis and enables researchers to dig deep into the published literature and streamline data-driven hypothesis generation. CrossCheck is freely accessible as a web-based application at http://proteinguru.com/crosscheck.
Exudate-based diabetic macular edema detection in fundus images using publicly available datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Giancardo, Luca; Meriaudeau, Fabrice; Karnowski, Thomas Paul
2011-01-01
Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME through the presence of exudation. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME.more » This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing (e.g., the classifier was trained on an independent dataset and tested on MESSIDOR). Our algorithm obtained an AUC between 0.88 and 0.94 depending on the dataset/features used. Additionally, it does not need ground truth at lesion level to reject false positives and is computationally efficient, as it generates a diagnosis on an average of 4.4 s (9.3 s, considering the optic nerve localization) per image on an 2.6 GHz platform with an unoptimized Matlab implementation.« less
EverVIEW: a visualization platform for hydrologic and Earth science gridded data
Romañach, Stephanie S.; McKelvy, James M.; Suir, Kevin J.; Conzelmann, Craig
2015-01-01
The EverVIEW Data Viewer is a cross-platform desktop application that combines and builds upon multiple open source libraries to help users to explore spatially-explicit gridded data stored in Network Common Data Form (NetCDF). Datasets are displayed across multiple side-by-side geographic or tabular displays, showing colorized overlays on an Earth globe or grid cell values, respectively. Time-series datasets can be animated to see how water surface elevation changes through time or how habitat suitability for a particular species might change over time under a given scenario. Initially targeted toward Florida's Everglades restoration planning, EverVIEW has been flexible enough to address the varied needs of large-scale planning beyond Florida, and is currently being used in biological planning efforts nationally and internationally.
Cross-platform normalization of microarray and RNA-seq data for machine learning applications
Thompson, Jeffrey A.; Tan, Jie
2016-01-01
Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language. PMID:26844019
Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study
2010-01-01
Background Gene silencing using exogenous small interfering RNAs (siRNAs) is now a widespread molecular tool for gene functional study and new-drug target identification. The key mechanism in this technique is to design efficient siRNAs that incorporated into the RNA-induced silencing complexes (RISC) to bind and interact with the mRNA targets to repress their translations to proteins. Although considerable progress has been made in the computational analysis of siRNA binding efficacy, few joint analysis of different RNAi experiments conducted under different experimental scenarios has been done in research so far, while the joint analysis is an important issue in cross-platform siRNA efficacy prediction. A collective analysis of RNAi mechanisms for different datasets and experimental conditions can often provide new clues on the design of potent siRNAs. Results An elegant multi-task learning paradigm for cross-platform siRNA efficacy prediction is proposed. Experimental studies were performed on a large dataset of siRNA sequences which encompass several RNAi experiments recently conducted by different research groups. By using our multi-task learning method, the synergy among different experiments is exploited and an efficient multi-task predictor for siRNA efficacy prediction is obtained. The 19 most popular biological features for siRNA according to their jointly importance in multi-task learning were ranked. Furthermore, the hypothesis is validated out that the siRNA binding efficacy on different messenger RNAs(mRNAs) have different conditional distribution, thus the multi-task learning can be conducted by viewing tasks at an "mRNA"-level rather than at the "experiment"-level. Such distribution diversity derived from siRNAs bound to different mRNAs help indicate that the properties of target mRNA have important implications on the siRNA binding efficacy. Conclusions The knowledge gained from our study provides useful insights on how to analyze various cross-platform RNAi data for uncovering of their complex mechanism. PMID:20380733
Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study.
Liu, Qi; Xu, Qian; Zheng, Vincent W; Xue, Hong; Cao, Zhiwei; Yang, Qiang
2010-04-10
Gene silencing using exogenous small interfering RNAs (siRNAs) is now a widespread molecular tool for gene functional study and new-drug target identification. The key mechanism in this technique is to design efficient siRNAs that incorporated into the RNA-induced silencing complexes (RISC) to bind and interact with the mRNA targets to repress their translations to proteins. Although considerable progress has been made in the computational analysis of siRNA binding efficacy, few joint analysis of different RNAi experiments conducted under different experimental scenarios has been done in research so far, while the joint analysis is an important issue in cross-platform siRNA efficacy prediction. A collective analysis of RNAi mechanisms for different datasets and experimental conditions can often provide new clues on the design of potent siRNAs. An elegant multi-task learning paradigm for cross-platform siRNA efficacy prediction is proposed. Experimental studies were performed on a large dataset of siRNA sequences which encompass several RNAi experiments recently conducted by different research groups. By using our multi-task learning method, the synergy among different experiments is exploited and an efficient multi-task predictor for siRNA efficacy prediction is obtained. The 19 most popular biological features for siRNA according to their jointly importance in multi-task learning were ranked. Furthermore, the hypothesis is validated out that the siRNA binding efficacy on different messenger RNAs(mRNAs) have different conditional distribution, thus the multi-task learning can be conducted by viewing tasks at an "mRNA"-level rather than at the "experiment"-level. Such distribution diversity derived from siRNAs bound to different mRNAs help indicate that the properties of target mRNA have important implications on the siRNA binding efficacy. The knowledge gained from our study provides useful insights on how to analyze various cross-platform RNAi data for uncovering of their complex mechanism.
Strength Performance Assessment in a Simulated Men’s Gymnastics Still Rings Cross
Dunlavy, Jennifer K.; Sands, William A.; McNeal, Jeni R.; Stone, Michael H.; Smith, Sarah L.; Jemni, Monem; Haff, G. Gregory
2007-01-01
Athletes in sports such as the gymnastics who perform the still rings cross position are disadvantaged due to a lack of objective and convenient measurement methods. The gymnastics “cross ”is a held isometric strength position considered fundamental to all still rings athletes. The purpose of this investigation was to determine if two small force platforms (FPs) placed on supports to simulate a cross position could demonstrate the fidelity necessary to differentiate between athletes who could perform a cross from those who could not. Ten gymnasts (5 USA Gymnastics, Senior National Team, and 5 Age Group Level Gymnasts) agreed to participate. The five Senior National Team athletes were grouped as cross Performers; the Age Group Gymnasts could not successfully perform the cross position and were grouped as cross Non- Performers. The two small FPs were first tested for reliability and validity and were then used to obtain a force-time record of a simulated cross position. The simulated cross test consisted of standing between two small force platforms placed on top of large solid gymnastics spotting blocks. The gymnasts attempted to perform a cross position by placing their hands at the center of the FPs and pressing downward with sufficient force that they could remove the support of their feet from the floor. Force-time curves (100 Hz) were obtained and analyzed for the sum of peak and mean arm ground reaction forces. The summed arm forces, mean and peak, were compared to body weight to determine how close the gymnasts came to achieving forces equal to body weight and thus the ability to perform the cross. The mean and peak summed arm forces were able to statistically differentiate between athletes who could perform the cross from those who could not (p < 0.05). The force-time curves and small FPs showed sufficient fidelity to differentiate between Performer and Non- Performer groups. This experiment showed that small and inexpensive force platforms may serve as useful adjuncts to athlete performance measurement such as the gymnastics still rings cross. Key pointsStrength-related skills are difficult to assess in some sports and thus require special means.Small force platforms have sufficient fidelity to assess the differences between gymnasts who can perform a still rings cross from those who cannot.Strength assessment via small force platforms may serve as a means of assessing skill readiness, strength symmetry, and progress in learning a still rings cross. PMID:24149230
Dataset from chemical gas sensor array in turbulent wind tunnel.
Fonollosa, Jordi; Rodríguez-Luján, Irene; Trincavelli, Marco; Huerta, Ramón
2015-06-01
The dataset includes the acquired time series of a chemical detection platform exposed to different gas conditions in a turbulent wind tunnel. The chemo-sensory elements were sampling directly the environment. In contrast to traditional approaches that include measurement chambers, open sampling systems are sensitive to dispersion mechanisms of gaseous chemical analytes, namely diffusion, turbulence, and advection, making the identification and monitoring of chemical substances more challenging. The sensing platform included 72 metal-oxide gas sensors that were positioned at 6 different locations of the wind tunnel. At each location, 10 distinct chemical gases were released in the wind tunnel, the sensors were evaluated at 5 different operating temperatures, and 3 different wind speeds were generated in the wind tunnel to induce different levels of turbulence. Moreover, each configuration was repeated 20 times, yielding a dataset of 18,000 measurements. The dataset was collected over a period of 16 months. The data is related to "On the performance of gas sensor arrays in open sampling systems using Inhibitory Support Vector Machines", by Vergara et al.[1]. The dataset can be accessed publicly at the UCI repository upon citation of [1]: http://archive.ics.uci.edu/ml/datasets/Gas+sensor+arrays+in+open+sampling+settings.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets.
Scharfe, Michael; Pielot, Rainer; Schreiber, Falk
2010-01-11
Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.
jsNMR: an embedded platform-independent NMR spectrum viewer.
Vosegaard, Thomas
2015-04-01
jsNMR is a lightweight NMR spectrum viewer written in JavaScript/HyperText Markup Language (HTML), which provides a cross-platform spectrum visualizer that runs on all computer architectures including mobile devices. Experimental (and simulated) datasets are easily opened in jsNMR by (i) drag and drop on a jsNMR browser window, (ii) by preparing a jsNMR file from the jsNMR web site, or (iii) by mailing the raw data to the jsNMR web portal. jsNMR embeds the original data in the HTML file, so a jsNMR file is a self-transforming dataset that may be exported to various formats, e.g. comma-separated values. The main applications of jsNMR are to provide easy access to NMR data without the need for dedicated software installed and to provide the possibility to visualize NMR spectra on web sites. Copyright © 2015 John Wiley & Sons, Ltd.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets
2010-01-01
Background Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. Results We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. Conclusions The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics. PMID:20064262
NASA Astrophysics Data System (ADS)
Li, J.; Zhang, T.; Huang, Q.; Liu, Q.
2014-12-01
Today's climate datasets are featured with large volume, high degree of spatiotemporal complexity and evolving fast overtime. As visualizing large volume distributed climate datasets is computationally intensive, traditional desktop based visualization applications fail to handle the computational intensity. Recently, scientists have developed remote visualization techniques to address the computational issue. Remote visualization techniques usually leverage server-side parallel computing capabilities to perform visualization tasks and deliver visualization results to clients through network. In this research, we aim to build a remote parallel visualization platform for visualizing and analyzing massive climate data. Our visualization platform was built based on Paraview, which is one of the most popular open source remote visualization and analysis applications. To further enhance the scalability and stability of the platform, we have employed cloud computing techniques to support the deployment of the platform. In this platform, all climate datasets are regular grid data which are stored in NetCDF format. Three types of data access methods are supported in the platform: accessing remote datasets provided by OpenDAP servers, accessing datasets hosted on the web visualization server and accessing local datasets. Despite different data access methods, all visualization tasks are completed at the server side to reduce the workload of clients. As a proof of concept, we have implemented a set of scientific visualization methods to show the feasibility of the platform. Preliminary results indicate that the framework can address the computation limitation of desktop based visualization applications.
Fusing Bluetooth Beacon Data with Wi-Fi Radiomaps for Improved Indoor Localization
Kanaris, Loizos; Kokkinis, Akis; Liotta, Antonio; Stavrou, Stavros
2017-01-01
Indoor user localization and tracking are instrumental to a broad range of services and applications in the Internet of Things (IoT) and particularly in Body Sensor Networks (BSN) and Ambient Assisted Living (AAL) scenarios. Due to the widespread availability of IEEE 802.11, many localization platforms have been proposed, based on the Wi-Fi Received Signal Strength (RSS) indicator, using algorithms such as K-Nearest Neighbour (KNN), Maximum A Posteriori (MAP) and Minimum Mean Square Error (MMSE). In this paper, we introduce a hybrid method that combines the simplicity (and low cost) of Bluetooth Low Energy (BLE) and the popular 802.11 infrastructure, to improve the accuracy of indoor localization platforms. Building on KNN, we propose a new positioning algorithm (dubbed i-KNN) which is able to filter the initial fingerprint dataset (i.e., the radiomap), after considering the proximity of RSS fingerprints with respect to the BLE devices. In this way, i-KNN provides an optimised small subset of possible user locations, based on which it finally estimates the user position. The proposed methodology achieves fast positioning estimation due to the utilization of a fragment of the initial fingerprint dataset, while at the same time improves positioning accuracy by minimizing any calculation errors. PMID:28394268
Fusing Bluetooth Beacon Data with Wi-Fi Radiomaps for Improved Indoor Localization.
Kanaris, Loizos; Kokkinis, Akis; Liotta, Antonio; Stavrou, Stavros
2017-04-10
Indoor user localization and tracking are instrumental to a broad range of services and applications in the Internet of Things (IoT) and particularly in Body Sensor Networks (BSN) and Ambient Assisted Living (AAL) scenarios. Due to the widespread availability of IEEE 802.11, many localization platforms have been proposed, based on the Wi-Fi Received Signal Strength (RSS) indicator, using algorithms such as K -Nearest Neighbour (KNN), Maximum A Posteriori (MAP) and Minimum Mean Square Error (MMSE). In this paper, we introduce a hybrid method that combines the simplicity (and low cost) of Bluetooth Low Energy (BLE) and the popular 802.11 infrastructure, to improve the accuracy of indoor localization platforms. Building on KNN, we propose a new positioning algorithm (dubbed i-KNN) which is able to filter the initial fingerprint dataset (i.e., the radiomap), after considering the proximity of RSS fingerprints with respect to the BLE devices. In this way, i-KNN provides an optimised small subset of possible user locations, based on which it finally estimates the user position. The proposed methodology achieves fast positioning estimation due to the utilization of a fragment of the initial fingerprint dataset, while at the same time improves positioning accuracy by minimizing any calculation errors.
NASA Astrophysics Data System (ADS)
Jiang, C.; Ryu, Y.; Fang, H.
2016-12-01
Proper usage of global satellite LAI products requires comprehensive evaluation. To address this issue, the Committee on Earth Observation Satellites (CEOS) Land Product Validation (LPV) subgroup proposed a four-stage validation hierarchy. During the past decade, great efforts have been made following this validation framework, mainly focused on absolute magnitude, seasonal trajectory, and spatial pattern of those global satellite LAI products. However, interannual variability and trends of global satellite LAI products have been investigated marginally. Targeting on this gap, we made an intercomparison between seven global satellite LAI datasets, including four short-term ones: MODIS C5, MODIS C6, GEOV1, MERIS, and three long-term products ones: LAI3g, GLASS, and GLOBMAP. We calculated global annual LAI time series for each dataset, among which we found substantial differences. During the overlapped period (2003 - 2011), MODIS C5, GLASS and GLOBMAP have positive correlation (r > 0.6) between each other, while MODIS C6, GEOV1, MERIS, and LAI3g are highly consistent (r > 0.7) in interannual variations. However, the previous three datasets show negative trends, all of which use MODIS C5 reflectance data, whereas the latter four show positive trends, using MODIS C6, SPOT/VGT, ENVISAT/MERIS, and NOAA/AVHRR, respectively. During the pre-MODIS era (1982 - 1999), the three AVHRR-based datasets (LAI3g, GLASS and GLOBMAP) agree well (r > 0.7), yet all of them show oscillation related with NOAA platform changes. In addition, both GLASS and GLOBMAP show clear cut-points around 2000 when they move from AVHRR to MODIS. Such inconsistency is also visible for GEOV1, which uses SPOT-4 and SPOT-5 before and after 2002. We further investigate the map-to-map deviations among these products. This study highlights that continuous sensor calibration and cross calibration are essential to obtain reliable global LAI time series.
Integrative Exploratory Analysis of Two or More Genomic Datasets.
Meng, Chen; Culhane, Aedin
2016-01-01
Exploratory analysis is an essential step in the analysis of high throughput data. Multivariate approaches such as correspondence analysis (CA), principal component analysis, and multidimensional scaling are widely used in the exploratory analysis of single dataset. Modern biological studies often assay multiple types of biological molecules (e.g., mRNA, protein, phosphoproteins) on a same set of biological samples, thereby creating multiple different types of omics data or multiassay data. Integrative exploratory analysis of these multiple omics data is required to leverage the potential of multiple omics studies. In this chapter, we describe the application of co-inertia analysis (CIA; for analyzing two datasets) and multiple co-inertia analysis (MCIA; for three or more datasets) to address this problem. These methods are powerful yet simple multivariate approaches that represent samples using a lower number of variables, allowing a more easily identification of the correlated structure in and between multiple high dimensional datasets. Graphical representations can be employed to this purpose. In addition, the methods simultaneously project samples and variables (genes, proteins) onto the same lower dimensional space, so the most variant variables from each dataset can be selected and associated with samples, which can be further used to facilitate biological interpretation and pathway analysis. We applied CIA to explore the concordance between mRNA and protein expression in a panel of 60 tumor cell lines from the National Cancer Institute. In the same 60 cell lines, we used MCIA to perform a cross-platform comparison of mRNA gene expression profiles obtained on four different microarray platforms. Last, as an example of integrative analysis of multiassay or multi-omics data we analyzed transcriptomic, proteomic, and phosphoproteomic data from pluripotent (iPS) and embryonic stem (ES) cell lines.
Karapetyan, Karen; Batchelor, Colin; Sharpe, David; Tkachenko, Valery; Williams, Antony J
2015-01-01
There are presently hundreds of online databases hosting millions of chemical compounds and associated data. As a result of the number of cheminformatics software tools that can be used to produce the data, subtle differences between the various cheminformatics platforms, as well as the naivety of the software users, there are a myriad of issues that can exist with chemical structure representations online. In order to help facilitate validation and standardization of chemical structure datasets from various sources we have delivered a freely available internet-based platform to the community for the processing of chemical compound datasets. The chemical validation and standardization platform (CVSP) both validates and standardizes chemical structure representations according to sets of systematic rules. The chemical validation algorithms detect issues with submitted molecular representations using pre-defined or user-defined dictionary-based molecular patterns that are chemically suspicious or potentially requiring manual review. Each identified issue is assigned one of three levels of severity - Information, Warning, and Error - in order to conveniently inform the user of the need to browse and review subsets of their data. The validation process includes validation of atoms and bonds (e.g., making aware of query atoms and bonds), valences, and stereo. The standard form of submission of collections of data, the SDF file, allows the user to map the data fields to predefined CVSP fields for the purpose of cross-validating associated SMILES and InChIs with the connection tables contained within the SDF file. This platform has been applied to the analysis of a large number of data sets prepared for deposition to our ChemSpider database and in preparation of data for the Open PHACTS project. In this work we review the results of the automated validation of the DrugBank dataset, a popular drug and drug target database utilized by the community, and ChEMBL 17 data set. CVSP web site is located at http://cvsp.chemspider.com/. A platform for the validation and standardization of chemical structure representations of various formats has been developed and made available to the community to assist and encourage the processing of chemical structure files to produce more homogeneous compound representations for exchange and interchange between online databases. While the CVSP platform is designed with flexibility inherent to the rules that can be used for processing the data we have produced a recommended rule set based on our own experiences with the large data sets such as DrugBank, ChEMBL, and data sets from ChemSpider.
Consistency of Global Modis Aerosol Optical Depths over Ocean on Terra and Aqua Ceres SSF Datasets
NASA Technical Reports Server (NTRS)
Ignatov, Alexander; Minnis, Patrick; Miller, Walter F.; Wielicki, Bruce A.; Remer, Lorraine
2006-01-01
Aerosol retrievals over ocean from the Moderate Resolution Imaging Spectroradiometer (MODIS) onboard Terra and Aqua platforms are available from the Clouds and the Earth's Radiant Energy System (CERES) Single Scanner Footprint (SSF) datasets generated at NASA Langley Research Center (LaRC). Two aerosol products are reported side-by-side. The primary M product is generated by sub-setting and remapping the multi-spectral (0.47-2.1 micrometer) MODIS produced oceanic aerosol (MOD04/MYD04 for Terra/Aqua) onto CERES footprints. M*D04 processing uses cloud screening and aerosol algorithms developed by the MODIS science team. The secondary AVHRR-like A product is generated in only two MODIS bands 1 and 6 (on Aqua, bands 1 and 7). The A processing uses the CERES cloud screening algorithm, and NOAA/NESDIS glint identification, and single-channel aerosol retrieval algorithms. The M and A products have been documented elsewhere and preliminarily compared using 2 weeks of global Terra CERES SSF Edition 1A data in which the M product was based on MOD04 collection 3. In this study, the comparisons between the M and A aerosol optical depths (AOD) in MODIS band 1 (0.64 micrometers), tau(sub 1M) and tau(sub 1A) are re-examined using 9 days of global CERES SSF Terra Edition 2A and Aqua Edition 1B data from 13 - 21 October 2002, and extended to include cross-platform comparisons. The M and A products on the new CERES SSF release are generated using the same aerosol algorithms as before, but with different preprocessing and sampling procedures, lending themselves to a simple sensitivity check to non-aerosol factors. Both tau(sub 1M) and tau(sub 1A) generally compare well across platforms. However, the M product shows some differences, which increase with ambient cloud amount and towards the solar side of the orbit. Three types of comparisons conducted in this study - cross-platform, cross-product, and cross-release confirm the previously made observation that the major area for improvement in the current aerosol processing lies in a more formalized and standardized sampling (and most importantly, cloud screening) whereas optimization of the aerosol algorithm is deemed to be an important yet less critical element.
Automatic Diabetic Macular Edema Detection in Fundus Images Using Publicly Available Datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Giancardo, Luca; Meriaudeau, Fabrice; Karnowski, Thomas Paul
2011-01-01
Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publiclymore » available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing. Our algorithm is robust to segmentation uncertainties, does not need ground truth at lesion level, and is very fast, generating a diagnosis on an average of 4.4 seconds per image on an 2.6 GHz platform with an unoptimised Matlab implementation.« less
ProteoWizard: open source software for rapid proteomics tools development.
Kessner, Darren; Chambers, Matt; Burke, Robert; Agus, David; Mallick, Parag
2008-11-01
The ProteoWizard software project provides a modular and extensible set of open-source, cross-platform tools and libraries. The tools perform proteomics data analyses; the libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard proteomics and LCMS dataset computations. The library contains readers and writers of the mzML data format, which has been written using modern C++ techniques and design principles and supports a variety of platforms with native compilers. The software has been specifically released under the Apache v2 license to ensure it can be used in both academic and commercial projects. In addition to the library, we also introduce a rapidly growing set of companion tools whose implementation helps to illustrate the simplicity of developing applications on top of the ProteoWizard library. Cross-platform software that compiles using native compilers (i.e. GCC on Linux, MSVC on Windows and XCode on OSX) is available for download free of charge, at http://proteowizard.sourceforge.net. This website also provides code examples, and documentation. It is our hope the ProteoWizard project will become a standard platform for proteomics development; consequently, code use, contribution and further development are strongly encouraged.
a Web-Based Interactive Platform for Co-Clustering Spatio-Temporal Data
NASA Astrophysics Data System (ADS)
Wu, X.; Poorthuis, A.; Zurita-Milla, R.; Kraak, M.-J.
2017-09-01
Since current studies on clustering analysis mainly focus on exploring spatial or temporal patterns separately, a co-clustering algorithm is utilized in this study to enable the concurrent analysis of spatio-temporal patterns. To allow users to adopt and adapt the algorithm for their own analysis, it is integrated within the server side of an interactive web-based platform. The client side of the platform, running within any modern browser, is a graphical user interface (GUI) with multiple linked visualizations that facilitates the understanding, exploration and interpretation of the raw dataset and co-clustering results. Users can also upload their own datasets and adjust clustering parameters within the platform. To illustrate the use of this platform, an annual temperature dataset from 28 weather stations over 20 years in the Netherlands is used. After the dataset is loaded, it is visualized in a set of linked visualizations: a geographical map, a timeline and a heatmap. This aids the user in understanding the nature of their dataset and the appropriate selection of co-clustering parameters. Once the dataset is processed by the co-clustering algorithm, the results are visualized in the small multiples, a heatmap and a timeline to provide various views for better understanding and also further interpretation. Since the visualization and analysis are integrated in a seamless platform, the user can explore different sets of co-clustering parameters and instantly view the results in order to do iterative, exploratory data analysis. As such, this interactive web-based platform allows users to analyze spatio-temporal data using the co-clustering method and also helps the understanding of the results using multiple linked visualizations.
Accounting for one-channel depletion improves missing value imputation in 2-dye microarray data.
Ritz, Cecilia; Edén, Patrik
2008-01-19
For 2-dye microarray platforms, some missing values may arise from an un-measurably low RNA expression in one channel only. Information of such "one-channel depletion" is so far not included in algorithms for imputation of missing values. Calculating the mean deviation between imputed values and duplicate controls in five datasets, we show that KNN-based imputation gives a systematic bias of the imputed expression values of one-channel depleted spots. Evaluating the correction of this bias by cross-validation showed that the mean square deviation between imputed values and duplicates were reduced up to 51%, depending on dataset. By including more information in the imputation step, we more accurately estimate missing expression values.
OpenSHS: Open Smart Home Simulator.
Alshammari, Nasser; Alshammari, Talal; Sedky, Mohamed; Champion, Justin; Bauer, Carolin
2017-05-02
This paper develops a new hybrid, open-source, cross-platform 3D smart home simulator, OpenSHS, for dataset generation. OpenSHS offers an opportunity for researchers in the field of the Internet of Things (IoT) and machine learning to test and evaluate their models. Following a hybrid approach, OpenSHS combines advantages from both interactive and model-based approaches. This approach reduces the time and efforts required to generate simulated smart home datasets. We have designed a replication algorithm for extending and expanding a dataset. A small sample dataset produced, by OpenSHS, can be extended without affecting the logical order of the events. The replication provides a solution for generating large representative smart home datasets. We have built an extensible library of smart devices that facilitates the simulation of current and future smart home environments. Our tool divides the dataset generation process into three distinct phases: first design: the researcher designs the initial virtual environment by building the home, importing smart devices and creating contexts; second, simulation: the participant simulates his/her context-specific events; and third, aggregation: the researcher applies the replication algorithm to generate the final dataset. We conducted a study to assess the ease of use of our tool on the System Usability Scale (SUS).
OpenSHS: Open Smart Home Simulator
Alshammari, Nasser; Alshammari, Talal; Sedky, Mohamed; Champion, Justin; Bauer, Carolin
2017-01-01
This paper develops a new hybrid, open-source, cross-platform 3D smart home simulator, OpenSHS, for dataset generation. OpenSHS offers an opportunity for researchers in the field of the Internet of Things (IoT) and machine learning to test and evaluate their models. Following a hybrid approach, OpenSHS combines advantages from both interactive and model-based approaches. This approach reduces the time and efforts required to generate simulated smart home datasets. We have designed a replication algorithm for extending and expanding a dataset. A small sample dataset produced, by OpenSHS, can be extended without affecting the logical order of the events. The replication provides a solution for generating large representative smart home datasets. We have built an extensible library of smart devices that facilitates the simulation of current and future smart home environments. Our tool divides the dataset generation process into three distinct phases: first design: the researcher designs the initial virtual environment by building the home, importing smart devices and creating contexts; second, simulation: the participant simulates his/her context-specific events; and third, aggregation: the researcher applies the replication algorithm to generate the final dataset. We conducted a study to assess the ease of use of our tool on the System Usability Scale (SUS). PMID:28468330
MetaboLights: An Open-Access Database Repository for Metabolomics Data.
Kale, Namrata S; Haug, Kenneth; Conesa, Pablo; Jayseelan, Kalaivani; Moreno, Pablo; Rocca-Serra, Philippe; Nainala, Venkata Chandrasekhar; Spicer, Rachel A; Williams, Mark; Li, Xuefei; Salek, Reza M; Griffin, Julian L; Steinbeck, Christoph
2016-03-24
MetaboLights is the first general purpose, open-access database repository for cross-platform and cross-species metabolomics research at the European Bioinformatics Institute (EMBL-EBI). Based upon the open-source ISA framework, MetaboLights provides Metabolomics Standard Initiative (MSI) compliant metadata and raw experimental data associated with metabolomics experiments. Users can upload their study datasets into the MetaboLights Repository. These studies are then automatically assigned a stable and unique identifier (e.g., MTBLS1) that can be used for publication reference. The MetaboLights Reference Layer associates metabolites with metabolomics studies in the archive and is extensively annotated with data fields such as structural and chemical information, NMR and MS spectra, target species, metabolic pathways, and reactions. The database is manually curated with no specific release schedules. MetaboLights is also recommended by journals for metabolomics data deposition. This unit provides a guide to using MetaboLights, downloading experimental data, and depositing metabolomics datasets using user-friendly submission tools. Copyright © 2016 John Wiley & Sons, Inc.
A Research on E - learning Resources Construction Based on Semantic Web
NASA Astrophysics Data System (ADS)
Rui, Liu; Maode, Deng
Traditional e-learning platforms have the flaws that it's usually difficult to query or positioning, and realize the cross platform sharing and interoperability. In the paper, the semantic web and metadata standard is discussed, and a kind of e - learning system framework based on semantic web is put forward to try to solve the flaws of traditional elearning platforms.
Zink, Jean-Vincent; Souteyrand, Philippe; Guis, Sandrine; Chagnaud, Christophe; Fur, Yann Le; Militianu, Daniela; Mattei, Jean-Pierre; Rozenbaum, Michael; Rosner, Itzhak; Guye, Maxime; Bernard, Monique; Bendahan, David
2015-01-01
AIM: To quantify the wrist cartilage cross-sectional area in humans from a 3D magnetic resonance imaging (MRI) dataset and to assess the corresponding reproducibility. METHODS: The study was conducted in 14 healthy volunteers (6 females and 8 males) between 30 and 58 years old and devoid of articular pain. Subjects were asked to lie down in the supine position with the right hand positioned above the pelvic region on top of a home-built rigid platform attached to the scanner bed. The wrist was wrapped with a flexible surface coil. MRI investigations were performed at 3T (Verio-Siemens) using volume interpolated breath hold examination (VIBE) and dual echo steady state (DESS) MRI sequences. Cartilage cross sectional area (CSA) was measured on a slice of interest selected from a 3D dataset of the entire carpus and metacarpal-phalangeal areas on the basis of anatomical criteria using conventional image processing radiology software. Cartilage cross-sectional areas between opposite bones in the carpal region were manually selected and quantified using a thresholding method. RESULTS: Cartilage CSA measurements performed on a selected predefined slice were 292.4 ± 39 mm2 using the VIBE sequence and slightly lower, 270.4 ± 50.6 mm2, with the DESS sequence. The inter (14.1%) and intra (2.4%) subject variability was similar for both MRI methods. The coefficients of variation computed for the repeated measurements were also comparable for the VIBE (2.4%) and the DESS (4.8%) sequences. The carpus length averaged over the group was 37.5 ± 2.8 mm with a 7.45% between-subjects coefficient of variation. Of note, wrist cartilage CSA measured with either the VIBE or the DESS sequences was linearly related to the carpal bone length. The variability between subjects was significantly reduced to 8.4% when the CSA was normalized with respect to the carpal bone length. CONCLUSION: The ratio between wrist cartilage CSA and carpal bone length is a highly reproducible standardized measurement which normalizes the natural diversity between individuals. PMID:26396941
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities
NASA Astrophysics Data System (ADS)
Wang, H.; Chen, Y.; West, P.; Erickson, J. S.; Ma, X.; Fox, P. A.
2014-12-01
Deep Carbon Observatory (DCO) is a decade-long scientific endeavor to understand carbon in the complex deep Earth system. Thousands of DCO scientists from institutions across the globe are organized into communities representing four domains of exploration: Extreme Physics and Chemistry, Reservoirs and Fluxes, Deep Energy, and Deep Life. Cross-community and cross-disciplinary collaboration is one of the most distinctive features in DCO's flexible research framework. VIVO is an open-source Semantic Web platform that facilitates cross-institutional researcher and research discovery. it includes a number of standard ontologies that interconnect people, organizations, publications, activities, locations, and other entities of research interest to enable browsing, searching, visualizing, and generating Linked Open (research) Data. The DCO-VIVO solution expedites research collaboration between DCO scientists and communities. Based on DCO's specific requirements, the DCO Data Science team developed a series of extensions to the VIVO platform including extending the VIVO information model, extended query over the semantic information within VIVO, integration with other open source collaborative environments and data management systems, using single sign-on, assigning of unique Handles to DCO objects, and publication and dataset ingesting extensions using existing publication systems. We present here the iterative development of these requirements that are now in daily use by the DCO community of scientists for research reporting, information sharing, and resource discovery in support of research activities and program management.
Portability and Cross-Platform Performance of an MPI-Based Parallel Polygon Renderer
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1999-01-01
Visualizing the results of computations performed on large-scale parallel computers is a challenging problem, due to the size of the datasets involved. One approach is to perform the visualization and graphics operations in place, exploiting the available parallelism to obtain the necessary rendering performance. Over the past several years, we have been developing algorithms and software to support visualization applications on NASA's parallel supercomputers. Our results have been incorporated into a parallel polygon rendering system called PGL. PGL was initially developed on tightly-coupled distributed-memory message-passing systems, including Intel's iPSC/860 and Paragon, and IBM's SP2. Over the past year, we have ported it to a variety of additional platforms, including the HP Exemplar, SGI Origin2OOO, Cray T3E, and clusters of Sun workstations. In implementing PGL, we have had two primary goals: cross-platform portability and high performance. Portability is important because (1) our manpower resources are limited, making it difficult to develop and maintain multiple versions of the code, and (2) NASA's complement of parallel computing platforms is diverse and subject to frequent change. Performance is important in delivering adequate rendering rates for complex scenes and ensuring that parallel computing resources are used effectively. Unfortunately, these two goals are often at odds. In this paper we report on our experiences with portability and performance of the PGL polygon renderer across a range of parallel computing platforms.
NASA Astrophysics Data System (ADS)
Patton, E. W.; West, P.; Greer, R.; Jin, B.
2011-12-01
Following on work presented at the 2010 AGU Fall Meeting, we present a number of real-world collections of semantically-enabled scientific metadata ingested into the Tetherless World RDF2HTML system as structured data and presented and edited using that system. Two separate datasets from two different domains (oceanography and solar sciences) are made available using existing web standards and services, e.g. encoded using ontologies represented with the Web Ontology Language (OWL) and stored in a SPARQL endpoint for querying. These datasets are deployed for use in three different web environments, i.e. Drupal, MediaWiki, and a custom web portal written in Java, to highlight the cross-platform nature of the data presentation. Stylesheets used to transform concepts in each domain as well as shared terms into HTML will be presented to show the power of using common ontologies to publish data and support reuse of existing terminologies. In addition, a single domain dataset is shared between two separate portal instances to demonstrate the ability for this system to offer distributed access and modification of content across the Internet. Lastly, we will highlight challenges that arose in the software engineering process, outline the design choices we made in solving those issues, and discuss how future improvements to this and other systems will enable the evolution of distributed, decentralized collaborations for scientific data sharing across multiple research groups.
Geolokit: An interactive tool for visualising and exploring geoscientific data in Google Earth
NASA Astrophysics Data System (ADS)
Triantafyllou, Antoine; Watlet, Arnaud; Bastin, Christophe
2017-10-01
Virtual globes have been developed to showcase different types of data combining a digital elevation model and basemaps of high resolution satellite imagery. Hence, they became a standard to share spatial data and information, although they suffer from a lack of toolboxes dedicated to the formatting of large geoscientific dataset. From this perspective, we developed Geolokit: a free and lightweight software that allows geoscientists - and every scientist working with spatial data - to import their data (e.g., sample collections, structural geology, cross-sections, field pictures, georeferenced maps), to handle and to transcribe them to Keyhole Markup Language (KML) files. KML files are then automatically opened in the Google Earth virtual globe and the spatial data accessed and shared. Geolokit comes with a large number of dedicated tools that can process and display: (i) multi-points data, (ii) scattered data interpolations, (iii) structural geology features in 2D and 3D, (iv) rose diagrams, stereonets and dip-plunge polar histograms, (v) cross-sections and oriented rasters, (vi) georeferenced field pictures, (vii) georeferenced maps and projected gridding. Therefore, together with Geolokit, Google Earth becomes not only a powerful georeferenced data viewer but also a stand-alone work platform. The toolbox (available online at http://www.geolokit.org) is written in Python, a high-level, cross-platform programming language and is accessible through a graphical user interface, designed to run in parallel with Google Earth, through a workflow that requires no additional third party software. Geolokit features are demonstrated in this paper using typical datasets gathered from two case studies illustrating its applicability at multiple scales of investigation: a petro-structural investigation of the Ile d'Yeu orthogneissic unit (Western France) and data collection of the Mariana oceanic subduction zone (Western Pacific).
Large-scale machine learning and evaluation platform for real-time traffic surveillance
NASA Astrophysics Data System (ADS)
Eichel, Justin A.; Mishra, Akshaya; Miller, Nicholas; Jankovic, Nicholas; Thomas, Mohan A.; Abbott, Tyler; Swanson, Douglas; Keller, Joel
2016-09-01
In traffic engineering, vehicle detectors are trained on limited datasets, resulting in poor accuracy when deployed in real-world surveillance applications. Annotating large-scale high-quality datasets is challenging. Typically, these datasets have limited diversity; they do not reflect the real-world operating environment. There is a need for a large-scale, cloud-based positive and negative mining process and a large-scale learning and evaluation system for the application of automatic traffic measurements and classification. The proposed positive and negative mining process addresses the quality of crowd sourced ground truth data through machine learning review and human feedback mechanisms. The proposed learning and evaluation system uses a distributed cloud computing framework to handle data-scaling issues associated with large numbers of samples and a high-dimensional feature space. The system is trained using AdaBoost on 1,000,000 Haar-like features extracted from 70,000 annotated video frames. The trained real-time vehicle detector achieves an accuracy of at least 95% for 1/2 and about 78% for 19/20 of the time when tested on ˜7,500,000 video frames. At the end of 2016, the dataset is expected to have over 1 billion annotated video frames.
Sonification Prototype for Space Physics
NASA Astrophysics Data System (ADS)
Candey, R. M.; Schertenleib, A. M.; Diaz Merced, W. L.
2005-12-01
As an alternative and adjunct to visual displays, auditory exploration of data via sonification (data controlled sound) and audification (audible playback of data samples) is promising for complex or rapidly/temporally changing visualizations, for data exploration of large datasets (particularly multi-dimensional datasets), and for exploring datasets in frequency rather than spatial dimensions (see also International Conferences on Auditory Display
Cross-Platform Toxicogenomics for the Prediction of Non-Genotoxic Hepatocarcinogenesis in Rat
Metzger, Ute; Templin, Markus F.; Plummer, Simon; Ellinger-Ziegelbauer, Heidrun; Zell, Andreas
2014-01-01
In the area of omics profiling in toxicology, i.e. toxicogenomics, characteristic molecular profiles have previously been incorporated into prediction models for early assessment of a carcinogenic potential and mechanism-based classification of compounds. Traditionally, the biomarker signatures used for model construction were derived from individual high-throughput techniques, such as microarrays designed for monitoring global mRNA expression. In this study, we built predictive models by integrating omics data across complementary microarray platforms and introduced new concepts for modeling of pathway alterations and molecular interactions between multiple biological layers. We trained and evaluated diverse machine learning-based models, differing in the incorporated features and learning algorithms on a cross-omics dataset encompassing mRNA, miRNA, and protein expression profiles obtained from rat liver samples treated with a heterogeneous set of substances. Most of these compounds could be unambiguously classified as genotoxic carcinogens, non-genotoxic carcinogens, or non-hepatocarcinogens based on evidence from published studies. Since mixed characteristics were reported for the compounds Cyproterone acetate, Thioacetamide, and Wy-14643, we reclassified these compounds as either genotoxic or non-genotoxic carcinogens based on their molecular profiles. Evaluating our toxicogenomics models in a repeated external cross-validation procedure, we demonstrated that the prediction accuracy of our models could be increased by joining the biomarker signatures across multiple biological layers and by adding complex features derived from cross-platform integration of the omics data. Furthermore, we found that adding these features resulted in a better separation of the compound classes and a more confident reclassification of the three undefined compounds as non-genotoxic carcinogens. PMID:24830643
Computation Methods for NASA Data-streams for Agricultural Efficiency Applications
NASA Astrophysics Data System (ADS)
Shrestha, B.; O'Hara, C. G.; Mali, P.
2007-12-01
Temporal Map Algebra (TMA) is a novel technique for analyzing time-series of satellite imageries using simple algebraic operators that treats time-series imageries as a three-dimensional dataset, where two dimensions encode planimetric position on earth surface and the third dimension encodes time. Spatio-temporal analytical processing methods such as TMA that utilize moderate spatial resolution satellite imagery having high temporal resolution to create multi-temporal composites are data intensive as well as computationally intensive. TMA analysis for multi-temporal composites provides dramatically enhanced usefulness that will yield previously unavailable capabilities to user communities, if deployment is coupled with significant High Performance Computing (HPC) capabilities; and interfaces are designed to deliver the full potential for these new technological developments. In this research, cross-platform data fusion and adaptive filtering using TMA was employed to create highly useful daily datasets and cloud-free high-temporal resolution vegetation index (VI) composites with enhanced information content for vegetation and bio-productivity monitoring, surveillance, and modeling. Fusion of Normalized Difference Vegetation Index (NDVI) data created from Aqua and Terra Moderate Resolution Imaging Spectroradiometer (MODIS) surface-reflectance data (MOD09) enables the creation of daily composites which are of immense value to a broad spectrum of global and national applications. Additionally these products are highly desired by many natural resources agencies like USDA/FAS/PECAD. Utilizing data streams collected by similar sensors on different platforms that transit the same areas at slightly different times of the day offers the opportunity to develop fused data products that have enhanced cloud-free and reduced noise characteristics. Establishing a Fusion Quality Confidence Code (FQCC) provides a metadata product that quantifies the method of fusion for a given pixel and enables a relative quality and confidence factor to be established for a given daily pixel value. When coupled with metadata that quantify the source sensor, day and time of acquisition, and the fusion method of each pixel to create the daily product; a wealth of information is available to assist in deriving new data and information products. These newly developed abilities to create highly useful daily data sets imply that temporal composites for a geographic area of interest may be created for user-defined temporal intervals that emphasize a user designated day of interest. At GeoResources Institute, Mississippi State University, solutions have been developed to create custom composites and cross-platform satellite data fusion using TMA which are useful for National Aeronautics and Space Administration (NASA) Rapid Prototyping Capability (RPC) and Integrated System Solutions (ISS) experiments for agricultural applications.
LinkedEarth and 21st century paleoclimatology: reducing data friction through standard development
NASA Astrophysics Data System (ADS)
Khider, D.; Emile-Geay, J.; McKay, N.; Garijo, D.; Ratnakar, V.; Gil, Y.; Zhu, F.
2017-12-01
Paleoclimate observations are crucial to assessing current climate change in the context of past variations. However, these observations usually come in non-standard formats, forcing paleogeoscientists to spend a significant fraction of their time searching and accessing the data they need, in the form they need it. In the 21st century, we should do much better. The EarthCube-supported LinkedEarth project is manifesting a better future by creating an online platform that (1) enables the curation of a publicly-accessible database by paleoclimate experts themselves, and (2) fosters the development of community standards. In 2016, a workshop on paleoclimate data standards served as a focal point to initiate this process. Workshop participants identified the necessity to distinguish a set of essential, recommended, and desired properties for each dataset. A consensus emerged that these levels are archive-specific, as what is needed to intelligently re-use marine-annually resolved records could be quite different than what is needed to intelligently re-use an ice core records, for instance. It was therefore decided that archive-centric working groups (WGs) would be best positioned to elaborate and discuss the components of a data standard for their specific sub-field of paleoclimatology. It is also critical to ensure interoperability between standards to enable multi-proxy investigations; to that end, longitudinal WGs were created, and the LinkedEarth leadership regularly monitors WG activity to ensure cross-pollination and consistency. These WGs carried out their discussions on the LinkedEarth online platform, providing the foundation for a preliminary standard that could be voted on by the rest of the community. In this presentation, I will showcase this preliminary paleoclimate data standard and dwell on community engagement through the use of online polls on the LinkedEarth platform, Twitter, and email-distributed online surveys. Finally, I will demonstrate how these standards have enabled cutting-edge data-analytic tools to be built in R and Python and applied to a wider array of datasets than ever possible before.
GeNets: a unified web platform for network-based genomic analyses.
Li, Taibo; Kim, April; Rosenbluh, Joseph; Horn, Heiko; Greenfeld, Liraz; An, David; Zimmer, Andrew; Liberzon, Arthur; Bistline, Jon; Natoli, Ted; Li, Yang; Tsherniak, Aviad; Narayan, Rajiv; Subramanian, Aravind; Liefeld, Ted; Wong, Bang; Thompson, Dawn; Calvo, Sarah; Carr, Steve; Boehm, Jesse; Jaffe, Jake; Mesirov, Jill; Hacohen, Nir; Regev, Aviv; Lage, Kasper
2018-06-18
Functional genomics networks are widely used to identify unexpected pathway relationships in large genomic datasets. However, it is challenging to compare the signal-to-noise ratios of different networks and to identify the optimal network with which to interpret a particular genetic dataset. We present GeNets, a platform in which users can train a machine-learning model (Quack) to carry out these comparisons and execute, store, and share analyses of genetic and RNA-sequencing datasets.
Zheng, Guoyan; Zhang, Xuan; Steppacher, Simon D; Murphy, Stephen B; Siebenrock, Klaus A; Tannast, Moritz
2009-09-01
The widely used procedure of evaluation of cup orientation following total hip arthroplasty using single standard anteroposterior (AP) radiograph is known inaccurate, largely due to the wide variability in individual pelvic orientation relative to X-ray plate. 2D-3D image registration methods have been introduced for an accurate determination of the post-operative cup alignment with respect to an anatomical reference extracted from the CT data. Although encouraging results have been reported, their extensive usage in clinical routine is still limited. This may be explained by their requirement of a CAD model of the prosthesis, which is often difficult to be organized from the manufacturer due to the proprietary issue, and by their requirement of either multiple radiographs or a radiograph-specific calibration, both of which are not available for most retrospective studies. To address these issues, we developed and validated an object-oriented cross-platform program called "HipMatch" where a hybrid 2D-3D registration scheme combining an iterative landmark-to-ray registration with a 2D-3D intensity-based registration was implemented to estimate a rigid transformation between a pre-operative CT volume and the post-operative X-ray radiograph for a precise estimation of cup alignment. No CAD model of the prosthesis is required. Quantitative and qualitative results evaluated on cadaveric and clinical datasets are given, which indicate the robustness and the accuracy of the program. HipMatch is written in object-oriented programming language C++ using cross-platform software Qt (TrollTech, Oslo, Norway), VTK, and Coin3D and is transportable to any platform.
A Hadoop-based Molecular Docking System
NASA Astrophysics Data System (ADS)
Dong, Yueli; Guo, Quan; Sun, Bin
2017-10-01
Molecular docking always faces the challenge of managing tens of TB datasets. It is necessary to improve the efficiency of the storage and docking. We proposed the molecular docking platform based on Hadoop for virtual screening, it provides the preprocessing of ligand datasets and the analysis function of the docking results. A molecular cloud database that supports mass data management is constructed. Through this platform, the docking time is reduced, the data storage is efficient, and the management of the ligand datasets is convenient.
Cross-platform method for identifying candidate network biomarkers for prostate cancer.
Jin, G; Zhou, X; Cui, K; Zhang, X-S; Chen, L; Wong, S T C
2009-11-01
Discovering biomarkers using mass spectrometry (MS) and microarray expression profiles is a promising strategy in molecular diagnosis. Here, the authors proposed a new pipeline for biomarker discovery that integrates disease information for proteins and genes, expression profiles in both genomic and proteomic levels, and protein-protein interactions (PPIs) to discover high confidence network biomarkers. Using this pipeline, a total of 474 molecules (genes and proteins) related to prostate cancer were identified and a prostate-cancer-related network (PCRN) was derived from the integrative information. Thus, a set of candidate network biomarkers were identified from multiple expression profiles composed by eight microarray datasets and one proteomics dataset. The network biomarkers with PPIs can accurately distinguish the prostate patients from the normal ones, which potentially provide more reliable hits of biomarker candidates than conventional biomarker discovery methods.
NASA Astrophysics Data System (ADS)
Gliss, Jonas; Stebel, Kerstin; Kylling, Arve; Solvejg Dinger, Anna; Sihler, Holger; Sudbø, Aasmund
2017-04-01
UV SO2 cameras have become a common method for monitoring SO2 emission rates from volcanoes. Scattered solar UV radiation is measured in two wavelength windows, typically around 310 nm and 330 nm (distinct / weak SO2 absorption) using interference filters. The data analysis comprises the retrieval of plume background intensities (to calculate plume optical densities), the camera calibration (to convert optical densities into SO2 column densities) and the retrieval of gas velocities within the plume as well as the retrieval of plume distances. SO2 emission rates are then typically retrieved along a projected plume cross section, for instance a straight line perpendicular to the plume propagation direction. Today, for most of the required analysis steps, several alternatives exist due to ongoing developments and improvements related to the measurement technique. We present piscope, a cross platform, open source software toolbox for the analysis of UV SO2 camera data. The code is written in the Python programming language and emerged from the idea of a common analysis platform incorporating a selection of the most prevalent methods found in literature. piscope includes several routines for plume background retrievals, routines for cell and DOAS based camera calibration including two individual methods to identify the DOAS field of view (shape and position) within the camera images. Gas velocities can be retrieved either based on an optical flow analysis or using signal cross correlation. A correction for signal dilution (due to atmospheric scattering) can be performed based on topographic features in the images. The latter requires distance retrievals to the topographic features used for the correction. These distances can be retrieved automatically on a pixel base using intersections of individual pixel viewing directions with the local topography. The main features of piscope are presented based on dataset recorded at Mt. Etna, Italy in September 2015.
Llamas, César; González, Manuel A; Hernández, Carmen; Vegas, Jesús
2016-10-01
Nearly every practical improvement in modeling human motion is well founded in a properly designed collection of data or datasets. These datasets must be made publicly available for the community could validate and accept them. It is reasonable to concede that a collective, guided enterprise could serve to devise solid and substantial datasets, as a result of a collaborative effort, in the same sense as the open software community does. In this way datasets could be complemented, extended and expanded in size with, for example, more individuals, samples and human actions. For this to be possible some commitments must be made by the collaborators, being one of them sharing the same data acquisition platform. In this paper, we offer an affordable open source hardware and software platform based on inertial wearable sensors in a way that several groups could cooperate in the construction of datasets through common software suitable for collaboration. Some experimental results about the throughput of the overall system are reported showing the feasibility of acquiring data from up to 6 sensors with a sampling frequency no less than 118Hz. Also, a proof-of-concept dataset is provided comprising sampled data from 12 subjects suitable for gait analysis. Copyright © 2016 Elsevier Inc. All rights reserved.
Li, Kai; Rollins, Jason; Yan, Erjia
2018-01-01
Clarivate Analytics's Web of Science (WoS) is the world's leading scientific citation search and analytical information platform. It is used as both a research tool supporting a broad array of scientific tasks across diverse knowledge domains as well as a dataset for large-scale data-intensive studies. WoS has been used in thousands of published academic studies over the past 20 years. It is also the most enduring commercial legacy of Eugene Garfield. Despite the central position WoS holds in contemporary research, the quantitative impact of WoS has not been previously examined by rigorous scientific studies. To better understand how this key piece of Eugene Garfield's heritage has contributed to science, we investigated the ways in which WoS (and associated products and features) is mentioned in a sample of 19,478 English-language research and review papers published between 1997 and 2017, as indexed in WoS databases. We offered descriptive analyses of the distribution of the papers across countries, institutions and knowledge domains. We also used natural language processingtechniques to identify the verbs and nouns in the abstracts of these papers that are grammatically connected to WoS-related phrases. This is the first study to empirically investigate the documentation of the use of the WoS platform in published academic papers in both scientometric and linguistic terms.
Goldsztein, Guillermo H.
2016-01-01
Consider a person standing on a platform that oscillates laterally, i.e. to the right and left of the person. Assume the platform satisfies Hooke’s law. As the platform moves, the person reacts and moves its body attempting to keep its balance. We develop a simple model to study this phenomenon and show that the person, while attempting to keep its balance, may do positive work on the platform and increase the amplitude of its oscillations. The studies in this article are motivated by the oscillations in pedestrian bridges that are sometimes observed when large crowds cross them. PMID:27304857
Goldsztein, Guillermo H
2016-01-01
Consider a person standing on a platform that oscillates laterally, i.e. to the right and left of the person. Assume the platform satisfies Hooke's law. As the platform moves, the person reacts and moves its body attempting to keep its balance. We develop a simple model to study this phenomenon and show that the person, while attempting to keep its balance, may do positive work on the platform and increase the amplitude of its oscillations. The studies in this article are motivated by the oscillations in pedestrian bridges that are sometimes observed when large crowds cross them.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hazelaar, Colien, E-mail: c.hazelaar@vumc.nl; Dahele, Max; Mostafavi, Hassan
Purpose: Spine stereotactic body radiation therapy (SBRT) requires highly accurate positioning. We report our experience with markerless template matching and triangulation of kilovoltage images routinely acquired during spine SBRT, to determine spine position. Methods and Materials: Kilovoltage images, continuously acquired at 7, 11 or 15 frames/s during volumetric modulated spine SBRT of 18 patients, consisting of 93 fluoroscopy datasets (1 dataset/arc), were analyzed off-line. Four patients were immobilized in a head/neck mask, 14 had no immobilization. Two-dimensional (2D) templates were created for each gantry angle from planning computed tomography data and registered to prefiltered kilovoltage images to determine 2D shiftsmore » between actual and planned spine position. Registrations were considered valid if the normalized cross correlation score was ≥0.15. Multiple registrations were triangulated to determine 3D position. For each spine position dataset, average positional offset and standard deviation were calculated. To verify the accuracy and precision of the technique, mean positional offset and standard deviation for twenty stationary phantom datasets with different baseline shifts were measured. Results: For the phantom, average standard deviations were 0.18 mm for left-right (LR), 0.17 mm for superior-inferior (SI), and 0.23 mm for the anterior-posterior (AP) direction. Maximum difference in average detected and applied shift was 0.09 mm. For the 93 clinical datasets, the percentage of valid matched frames was, on average, 90.7% (range: 49.9-96.1%) per dataset. Average standard deviations for all datasets were 0.28, 0.19, and 0.28 mm for LR, SI, and AP, respectively. Spine position offsets were, on average, −0.05 (range: −1.58 to 2.18), −0.04 (range: −3.56 to 0.82), and −0.03 mm (range: −1.16 to 1.51), respectively. Average positional deviation was <1 mm in all directions in 92% of the arcs. Conclusions: Template matching and triangulation using kilovoltage images acquired during irradiation allows spine position detection with submillimeter accuracy at subsecond intervals. Although the majority of patients were not immobilized, most vertebrae were stable at the sub-mm level during spine SBRT delivery.« less
bioWeb3D: an online webGL 3D data visualisation tool.
Pettit, Jean-Baptiste; Marioni, John C
2013-06-07
Data visualization is critical for interpreting biological data. However, in practice it can prove to be a bottleneck for non trained researchers; this is especially true for three dimensional (3D) data representation. Whilst existing software can provide all necessary functionalities to represent and manipulate biological 3D datasets, very few are easily accessible (browser based), cross platform and accessible to non-expert users. An online HTML5/WebGL based 3D visualisation tool has been developed to allow biologists to quickly and easily view interactive and customizable three dimensional representations of their data along with multiple layers of information. Using the WebGL library Three.js written in Javascript, bioWeb3D allows the simultaneous visualisation of multiple large datasets inputted via a simple JSON, XML or CSV file, which can be read and analysed locally thanks to HTML5 capabilities. Using basic 3D representation techniques in a technologically innovative context, we provide a program that is not intended to compete with professional 3D representation software, but that instead enables a quick and intuitive representation of reasonably large 3D datasets.
Glycan array data management at Consortium for Functional Glycomics.
Venkataraman, Maha; Sasisekharan, Ram; Raman, Rahul
2015-01-01
Glycomics or the study of structure-function relationships of complex glycans has reshaped post-genomics biology. Glycans mediate fundamental biological functions via their specific interactions with a variety of proteins. Recognizing the importance of glycomics, large-scale research initiatives such as the Consortium for Functional Glycomics (CFG) were established to address these challenges. Over the past decade, the Consortium for Functional Glycomics (CFG) has generated novel reagents and technologies for glycomics analyses, which in turn have led to generation of diverse datasets. These datasets have contributed to understanding glycan diversity and structure-function relationships at molecular (glycan-protein interactions), cellular (gene expression and glycan analysis), and whole organism (mouse phenotyping) levels. Among these analyses and datasets, screening of glycan-protein interactions on glycan array platforms has gained much prominence and has contributed to cross-disciplinary realization of the importance of glycomics in areas such as immunology, infectious diseases, cancer biomarkers, etc. This manuscript outlines methodologies for capturing data from glycan array experiments and online tools to access and visualize glycan array data implemented at the CFG.
Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach.
Liang, Muxuan; Li, Zhizhong; Chen, Ting; Zeng, Jianyang
2015-01-01
Identification of cancer subtypes plays an important role in revealing useful insights into disease pathogenesis and advancing personalized therapy. The recent development of high-throughput sequencing technologies has enabled the rapid collection of multi-platform genomic data (e.g., gene expression, miRNA expression, and DNA methylation) for the same set of tumor samples. Although numerous integrative clustering approaches have been developed to analyze cancer data, few of them are particularly designed to exploit both deep intrinsic statistical properties of each input modality and complex cross-modality correlations among multi-platform input data. In this paper, we propose a new machine learning model, called multimodal deep belief network (DBN), to cluster cancer patients from multi-platform observation data. In our integrative clustering framework, relationships among inherent features of each single modality are first encoded into multiple layers of hidden variables, and then a joint latent model is employed to fuse common features derived from multiple input modalities. A practical learning algorithm, called contrastive divergence (CD), is applied to infer the parameters of our multimodal DBN model in an unsupervised manner. Tests on two available cancer datasets show that our integrative data analysis approach can effectively extract a unified representation of latent features to capture both intra- and cross-modality correlations, and identify meaningful disease subtypes from multi-platform cancer data. In addition, our approach can identify key genes and miRNAs that may play distinct roles in the pathogenesis of different cancer subtypes. Among those key miRNAs, we found that the expression level of miR-29a is highly correlated with survival time in ovarian cancer patients. These results indicate that our multimodal DBN based data analysis approach may have practical applications in cancer pathogenesis studies and provide useful guidelines for personalized cancer therapy.
IM-TORNADO: a tool for comparison of 16S reads from paired-end libraries.
Jeraldo, Patricio; Kalari, Krishna; Chen, Xianfeng; Bhavsar, Jaysheel; Mangalam, Ashutosh; White, Bryan; Nelson, Heidi; Kocher, Jean-Pierre; Chia, Nicholas
2014-01-01
16S rDNA hypervariable tag sequencing has become the de facto method for accessing microbial diversity. Illumina paired-end sequencing, which produces two separate reads for each DNA fragment, has become the platform of choice for this application. However, when the two reads do not overlap, existing computational pipelines analyze data from read separately and underutilize the information contained in the paired-end reads. We created a workflow known as Illinois Mayo Taxon Organization from RNA Dataset Operations (IM-TORNADO) for processing non-overlapping reads while retaining maximal information content. Using synthetic mock datasets, we show that the use of both reads produced answers with greater correlation to those from full length 16S rDNA when looking at taxonomy, phylogeny, and beta-diversity. IM-TORNADO is freely available at http://sourceforge.net/projects/imtornado and produces BIOM format output for cross compatibility with other pipelines such as QIIME, mothur, and phyloseq.
Parallel Index and Query for Large Scale Data Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chou, Jerry; Wu, Kesheng; Ruebel, Oliver
2011-07-18
Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less
Predicting protein-binding regions in RNA using nucleotide profiles and compositions.
Choi, Daesik; Park, Byungkyu; Chae, Hanju; Lee, Wook; Han, Kyungsook
2017-03-14
Motivated by the increased amount of data on protein-RNA interactions and the availability of complete genome sequences of several organisms, many computational methods have been proposed to predict binding sites in protein-RNA interactions. However, most computational methods are limited to finding RNA-binding sites in proteins instead of protein-binding sites in RNAs. Predicting protein-binding sites in RNA is more challenging than predicting RNA-binding sites in proteins. Recent computational methods for finding protein-binding sites in RNAs have several drawbacks for practical use. We developed a new support vector machine (SVM) model for predicting protein-binding regions in mRNA sequences. The model uses sequence profiles constructed from log-odds scores of mono- and di-nucleotides and nucleotide compositions. The model was evaluated by standard 10-fold cross validation, leave-one-protein-out (LOPO) cross validation and independent testing. Since actual mRNA sequences have more non-binding regions than protein-binding regions, we tested the model on several datasets with different ratios of protein-binding regions to non-binding regions. The best performance of the model was obtained in a balanced dataset of positive and negative instances. 10-fold cross validation with a balanced dataset achieved a sensitivity of 91.6%, a specificity of 92.4%, an accuracy of 92.0%, a positive predictive value (PPV) of 91.7%, a negative predictive value (NPV) of 92.3% and a Matthews correlation coefficient (MCC) of 0.840. LOPO cross validation showed a lower performance than the 10-fold cross validation, but the performance remains high (87.6% accuracy and 0.752 MCC). In testing the model on independent datasets, it achieved an accuracy of 82.2% and an MCC of 0.656. Testing of our model and other state-of-the-art methods on a same dataset showed that our model is better than the others. Sequence profiles of log-odds scores of mono- and di-nucleotides were much more powerful features than nucleotide compositions in finding protein-binding regions in RNA sequences. But, a slight performance gain was obtained when using the sequence profiles along with nucleotide compositions. These are preliminary results of ongoing research, but demonstrate the potential of our approach as a powerful predictor of protein-binding regions in RNA. The program and supporting data are available at http://bclab.inha.ac.kr/RBPbinding .
Consistency of two global MODIS aerosol products over ocean on Terra and Aqua CERES SSF datasets
NASA Astrophysics Data System (ADS)
Ignatov, Alexander; Minnis, Patrick; Wielicki, Bruce; Loeb, Norman G.; Remer, Lorraine A.; Kaufman, Yoram J.; Miller, Walter F.; Sun-Mack, Sunny; Laszlo, Istvan; Geier, Erika B.
2004-12-01
MODIS aerosol retrievals over ocean from Terra and Aqua platforms are available from the Clouds and the Earth's Radiant Energy System (CERES) Single Scanner Footprint (SSF) datasets generated at NASA Langley Research Center (LaRC). Two aerosol products are reported side by side. The primary M product is generated by subsetting and remapping the multi-spectral (0.44 - 2.1 μm) MOD04 aerosols onto CERES footprints. MOD04 processing uses cloud screening and aerosol algorithms developed by the MODIS science team. The secondary (AVHRR-like) A product is generated in only two MODIS bands: 1 and 6 on Terra, and ` and 7 on Aqua. The A processing uses NASA/LaRC cloud-screening and NOAA/NESDIS single channel aerosol algorthm. The M and A products have been documented elsewhere and preliminarily compared using two weeks of global Terra CERES SSF (Edition 1A) data in December 2000 and June 2001. In this study, the M and A aerosol optical depths (AOD) in MODIS band 1 and (0.64 μm), τ1M and τ1A, are further checked for cross-platform consistency using 9 days of global Terra CERES SSF (Edition 2A) and Aqua CERES SSF (Edition 1A) data from 13 - 21 October 2002.
Kaushik, Abhinav; Ali, Shakir; Gupta, Dinesh
2017-01-01
Gene connection rewiring is an essential feature of gene network dynamics. Apart from its normal functional role, it may also lead to dysregulated functional states by disturbing pathway homeostasis. Very few computational tools measure rewiring within gene co-expression and its corresponding regulatory networks in order to identify and prioritize altered pathways which may or may not be differentially regulated. We have developed Altered Pathway Analyzer (APA), a microarray dataset analysis tool for identification and prioritization of altered pathways, including those which are differentially regulated by TFs, by quantifying rewired sub-network topology. Moreover, APA also helps in re-prioritization of APA shortlisted altered pathways enriched with context-specific genes. We performed APA analysis of simulated datasets and p53 status NCI-60 cell line microarray data to demonstrate potential of APA for identification of several case-specific altered pathways. APA analysis reveals several altered pathways not detected by other tools evaluated by us. APA analysis of unrelated prostate cancer datasets identifies sample-specific as well as conserved altered biological processes, mainly associated with lipid metabolism, cellular differentiation and proliferation. APA is designed as a cross platform tool which may be transparently customized to perform pathway analysis in different gene expression datasets. APA is freely available at http://bioinfo.icgeb.res.in/APA. PMID:28084397
Collaboration-Centred Cities through Urban Apps Based on Open and User-Generated Data
Aguilera, Unai; López-de-Ipiña, Diego; Pérez, Jorge
2016-01-01
This paper describes the IES Cities platform conceived to streamline the development of urban apps that combine heterogeneous datasets provided by diverse entities, namely, government, citizens, sensor infrastructure and other information data sources. This work pursues the challenge of achieving effective citizen collaboration by empowering them to prosume urban data across time. Particularly, this paper focuses on the query mapper; a key component of the IES Cities platform devised to democratize the development of open data-based mobile urban apps. This component allows developers not only to use available data, but also to contribute to existing datasets with the execution of SQL sentences. In addition, the component allows developers to create ad hoc storages for their applications, publishable as new datasets accessible by other consumers. As multiple users could be contributing and using a dataset, our solution also provides a data level permission mechanism to control how the platform manages the access to its datasets. We have evaluated the advantages brought forward by IES Cities from the developers’ perspective by describing an exemplary urban app created on top of it. In addition, we include an evaluation of the main functionalities of the query mapper. PMID:27376300
Collaboration-Centred Cities through Urban Apps Based on Open and User-Generated Data.
Aguilera, Unai; López-de-Ipiña, Diego; Pérez, Jorge
2016-07-01
This paper describes the IES Cities platform conceived to streamline the development of urban apps that combine heterogeneous datasets provided by diverse entities, namely, government, citizens, sensor infrastructure and other information data sources. This work pursues the challenge of achieving effective citizen collaboration by empowering them to prosume urban data across time. Particularly, this paper focuses on the query mapper; a key component of the IES Cities platform devised to democratize the development of open data-based mobile urban apps. This component allows developers not only to use available data, but also to contribute to existing datasets with the execution of SQL sentences. In addition, the component allows developers to create ad hoc storages for their applications, publishable as new datasets accessible by other consumers. As multiple users could be contributing and using a dataset, our solution also provides a data level permission mechanism to control how the platform manages the access to its datasets. We have evaluated the advantages brought forward by IES Cities from the developers' perspective by describing an exemplary urban app created on top of it. In addition, we include an evaluation of the main functionalities of the query mapper.
ToxCast Profiling in a Human Stem Cell Assay for ...
Standard practice for assessing disruptions in embryogenesis involves testing pregnant animals of two species, typically rats and rabbits, exposed during major organogenesis and evaluated just prior to term. Under this design the major manifestations of developmental toxicity are observed as one or more apical endpoints including intrauterine death, fetal growth retardation, structural malformations and variations. Alternative approaches to traditional developmental toxicity testing have been proposed in the form of in vitro data (e.g., embryonic stem cells, zebrafish embryos, HTS assays) and in silico models (e.g., computational toxicology). To increase the diversity of assays used to assess developmental toxicity in EPA’s ToxCast program, we tested the chemicals in Stemina’s metabolomics-based platform that utilizes the commecrially available H9 human embryonic stem cell line. The devTOXqP dataset for ToxCast of high-quality based on replicate samples and model performance (82% balanced accuracy, 0.71 sensitivity and 1.00 specificity). To date, 136 ToxCast chemicals (12.8% of 1065 tested) were positive in this platform; 48 triggered the biomarker signal without any change in hESC viability and 88 triggered activity concurrent with effects on cell viability. Work is in progress to complete the STM dataset entry into the TCPL, compare data with results from zFish and mESC platforms, profile bioactivity (ToxCastDB), endpoints (ToxRefDB), chemotypes (DSSTox)
Updates to FuncLab, a Matlab based GUI for handling receiver functions
NASA Astrophysics Data System (ADS)
Porritt, Robert W.; Miller, Meghan S.
2018-02-01
Receiver functions are a versatile tool commonly used in seismic imaging. Depending on how they are processed, they can be used to image discontinuity structure within the crust or mantle or they can be inverted for seismic velocity either directly or jointly with complementary datasets. However, modern studies generally require large datasets which can be challenging to handle; therefore, FuncLab was originally written as an interactive Matlab GUI to assist in handling these large datasets. This software uses a project database to allow interactive trace editing, data visualization, H-κ stacking for crustal thickness and Vp/Vs ratio, and common conversion point stacking while minimizing computational costs. Since its initial release, significant advances have been made in the implementation of web services and changes in the underlying Matlab platform have necessitated a significant revision to the software. Here, we present revisions to the software, including new features such as data downloading via irisFetch.m, receiver function calculations via processRFmatlab, on-the-fly cross-section tools, interface picking, and more. In the descriptions of the tools, we present its application to a test dataset in Michigan, Wisconsin, and neighboring areas following the passage of USArray Transportable Array. The software is made available online at https://robporritt.wordpress.com/software.
Using Third Party Data to Update a Reference Dataset in a Quality Evaluation Service
NASA Astrophysics Data System (ADS)
Xavier, E. M. A.; Ariza-López, F. J.; Ureña-Cámara, M. A.
2016-06-01
Nowadays it is easy to find many data sources for various regions around the globe. In this 'data overload' scenario there are few, if any, information available about the quality of these data sources. In order to easily provide these data quality information we presented the architecture of a web service for the automation of quality control of spatial datasets running over a Web Processing Service (WPS). For quality procedures that require an external reference dataset, like positional accuracy or completeness, the architecture permits using a reference dataset. However, this reference dataset is not ageless, since it suffers the natural time degradation inherent to geospatial features. In order to mitigate this problem we propose the Time Degradation & Updating Module which intends to apply assessed data as a tool to maintain the reference database updated. The main idea is to utilize datasets sent to the quality evaluation service as a source of 'candidate data elements' for the updating of the reference database. After the evaluation, if some elements of a candidate dataset reach a determined quality level, they can be used as input data to improve the current reference database. In this work we present the first design of the Time Degradation & Updating Module. We believe that the outcomes can be applied in the search of a full-automatic on-line quality evaluation platform.
Liang, Yunyun; Liu, Sanyang; Zhang, Shengli
2016-12-01
Apoptosis, or programed cell death, plays a central role in the development and homeostasis of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful for understanding the apoptosis mechanism. The prediction of subcellular localization of an apoptosis protein is still a challenging task, and existing methods mainly based on protein primary sequences. In this paper, we introduce a new position-specific scoring matrix (PSSM)-based method by using detrended cross-correlation (DCCA) coefficient of non-overlapping windows. Then a 190-dimensional (190D) feature vector is constructed on two widely used datasets: CL317 and ZD98, and support vector machine is adopted as classifier. To evaluate the proposed method, objective and rigorous jackknife cross-validation tests are performed on the two datasets. The results show that our approach offers a novel and reliable PSSM-based tool for prediction of apoptosis protein subcellular localization. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Michaelis, A.; Ganguly, S.; Nemani, R. R.; Votava, P.; Wang, W.; Lee, T. J.; Dungan, J. L.
2014-12-01
Sharing community-valued codes, intermediary datasets and results from individual efforts with others that are not in a direct funded collaboration can be a challenge. Cross organization collaboration is often impeded due to infrastructure security constraints, rigid financial controls, bureaucracy, and workforce nationalities, etc., which can force groups to work in a segmented fashion and/or through awkward and suboptimal web services. We show how a focused community may come together, share modeling and analysis codes, computing configurations, scientific results, knowledge and expertise on a public cloud platform; diverse groups of researchers working together at "arms length". Through the OpenNEX experimental workshop, users can view short technical "how-to" videos and explore encapsulated working environment. Workshop participants can easily instantiate Amazon Machine Images (AMI) or launch full cluster and data processing configurations within minutes. Enabling users to instantiate computing environments from configuration templates on large public cloud infrastructures, such as Amazon Web Services, may provide a mechanism for groups to easily use each others work and collaborate indirectly. Moreover, using the public cloud for this workshop allowed a single group to host a large read only data archive, making datasets of interest to the community widely available on the public cloud, enabling other groups to directly connect to the data and reduce the costs of the collaborative work by freeing other individual groups from redundantly retrieving, integrating or financing the storage of the datasets of interest.
NASA Technical Reports Server (NTRS)
Grotzinger, John P.
2003-01-01
Work has been completed on the digital mapping of a terminal Proterozoic reef complex in Namibia. This complex formed an isolated carbonate platform developed downdip on a carbonate ramp of the Nama Group. The stratigraphic evolution of the platform was digitally reconstructed from an extensive dataset that was compiled by using digital surveying technologies. The platform comprises three accommodation cycles in which each subsequent cycle experienced progressively greater influence of a long-term accommodation increase. Aggradation and progradation during the first cycle resulted in a flat, uniform, sheet-like platform. The coarsening and shallowing-upward sequence representing the first cycle is dominated by columnar stromatolitic thrombolites and massive dolostones with interbedded mudstone-grainstone at the base of the sequence grading into cross-bedded dolostones. The second cycle features aggradation, formation of a distinct margin containing thrombolite mounds and domes, and the development of a bucket geometry. Columnar stromatolitic thrombolites dominate the platform interior. The final stage of platform development shows a deepening trend with initial aggradation and formation of well-bedded, thin deposits in the interior and mound development at the margins. While the interior drowned, the platform margin kept up with rising sea level and a complex pinnacle reef formed containing fused and coalesced thrombolite mounds flanked by bioclastic grainstones (containing Cloudina and Namacalathus fossils) and collapse breccias. A set of isolated large thrombolite mounds flanked by shales indicate the final stage of the carbonate platform. During a progressive increase in accommodation, a flat-topped isolated carbonate platform becomes aerially less extensive by either backstepping or formation of smaller pinnacles or a combination of both. The overall geometric evolution of the studied platform from flat-topped to bucket with elevated margins is recorded in many Proterozoic and Phanerozoic isolated carbonate platforms with similar dimensions. The terminal Proterozoic, microbial-dominated, isolated carbonate platform of this study clearly illustrates that the answer to accommodation changes was already familiar among carbonate platforms before the dawn of metazoan-dominated platforms.
Data integration: Combined imaging and electrophysiology data in the cloud.
Kini, Lohith G; Davis, Kathryn A; Wagenaar, Joost B
2016-01-01
There has been an increasing effort to correlate electrophysiology data with imaging in patients with refractory epilepsy over recent years. IEEG.org provides a free-access, rapidly growing archive of imaging data combined with electrophysiology data and patient metadata. It currently contains over 1200 human and animal datasets, with multiple data modalities associated with each dataset (neuroimaging, EEG, EKG, de-identified clinical and experimental data, etc.). The platform is developed around the concept that scientific data sharing requires a flexible platform that allows sharing of data from multiple file formats. IEEG.org provides high- and low-level access to the data in addition to providing an environment in which domain experts can find, visualize, and analyze data in an intuitive manner. Here, we present a summary of the current infrastructure of the platform, available datasets and goals for the near future. Copyright © 2015 Elsevier Inc. All rights reserved.
Data integration: Combined Imaging and Electrophysiology data in the cloud
Kini, Lohith G.; Davis, Kathryn A.; Wagenaar, Joost B.
2015-01-01
There has been an increasing effort to correlate electrophysiology data with imaging in patients with refractory epilepsy over recent years. IEEG.org provides a free-access, rapidly growing archive of imaging data combined with electrophysiology data and patient metadata. It currently contains over 1200 human and animal datasets, with multiple data modalities associated with each dataset (neuroimaging, EEG, EKG, de-identified clinical and experimental data, etc.). The platform is developed around the concept that scientific data sharing requires a flexible platform that allows sharing of data from multiple file-formats. IEEG.org provides high and low-level access to the data in addition to providing an environment in which domain experts can find, visualize, and analyze data in an intuitive manner. Here, we present a summary of the current infrastructure of the platform, available datasets and goals for the near future. PMID:26044858
Using Browser Notebooks to Analyse Big Atmospheric Data-sets in the Cloud
NASA Astrophysics Data System (ADS)
Robinson, N.; Tomlinson, J.; Arribas, A.; Prudden, R.
2016-12-01
We are presenting an account of our experience building an ecosystem for the analysis of big atmospheric data-sets. By using modern technologies we have developed a prototype platform which is scaleable and capable of analysing very large atmospheric datasets. We tested different big-data ecosystems such as Hadoop MapReduce, Spark and Dask, in order to find the one which was best suited for analysis of multidimensional binary data such as NetCDF. We make extensive use of infrastructure-as-code and containerisation to provide a platform which is reusable, and which can scale to accommodate changes in demand. We make this platform readily accessible using browser based notebooks. As a result, analysts with minimal technology experience can, in tens of lines of Python, make interactive data-visualisation web pages, which can analyse very large amounts of data using cutting edge big-data technology
Fort Bliss Geothermal Area Data: Temperature profile, logs, schematic model and cross section
Adam Brandt
2015-11-15
This dataset contains a variety of data about the Fort Bliss geothermal area, part of the southern portion of the Tularosa Basin, New Mexico. The dataset contains schematic models for the McGregor Geothermal System, a shallow temperature survey of the Fort Bliss geothermal area. The dataset also contains Century OH logs, a full temperature profile, and complete logs from well RMI 56-5, including resistivity and porosity data, drill logs with drill rate, depth, lithology, mineralogy, fractures, temperature, pit total, gases, and descriptions among other measurements as well as CDL, CNL, DIL, GR Caliper and Temperature files. A shallow (2 meter depth) temperature survey of the Fort Bliss geothermal area with 63 data points is also included. Two cross sections through the Fort Bliss area, also included, show well position and depth. The surface map included shows faults and well spatial distribution. Inferred and observed fault distributions from gravity surveys around the Fort Bliss geothermal area.
NASA Astrophysics Data System (ADS)
Piermattei, Livia; Bozzi, Carlo Alberto; Mancini, Adriano; Tassetti, Anna Nora; Karel, Wilfried; Pfeifer, Norbert
2017-04-01
Unmanned aerial vehicles (UAVs) in combination with consumer grade cameras have become standard tools for photogrammetric applications and surveying. The recent generation of multispectral, cost-efficient and lightweight cameras has fostered a breakthrough in the practical application of UAVs for precision agriculture. For this application, multispectral cameras typically use Green, Red, Red-Edge (RE) and Near Infrared (NIR) wavebands to capture both visible and invisible images of crops and vegetation. These bands are very effective for deriving characteristics like soil productivity, plant health and overall growth. However, the quality of results is affected by the sensor architecture, the spatial and spectral resolutions, the pattern of image collection, and the processing of the multispectral images. In particular, collecting data with multiple sensors requires an accurate spatial co-registration of the various UAV image datasets. Multispectral processed data in precision agriculture are mainly presented as orthorectified mosaics used to export information maps and vegetation indices. This work aims to investigate the acquisition parameters and processing approaches of this new type of image data in order to generate orthoimages using different sensors and UAV platforms. Within our experimental area we placed a grid of artificial targets, whose position was determined with differential global positioning system (dGPS) measurements. Targets were used as ground control points to georeference the images and as checkpoints to verify the accuracy of the georeferenced mosaics. The primary aim is to present a method for the spatial co-registration of visible, Red-Edge, and NIR image sets. To demonstrate the applicability and accuracy of our methodology, multi-sensor datasets were collected over the same area and approximately at the same time using the fixed-wing UAV senseFly "eBee". The images were acquired with the camera Canon S110 RGB, the multispectral cameras Canon S110 NIR and S110 RE and with the multi-camera system Parrot Sequoia, which is composed of single-band cameras (Green, Red, Red Edge, NIR and RGB). Imagery from each sensor was georeferenced and mosaicked with the commercial software Agisoft PhotoScan Pro and different approaches for image orientation were compared. To assess the overall spatial accuracy of each dataset the root mean square error was computed between check point coordinates measured with dGPS and coordinates retrieved from georeferenced image mosaics. Additionally, image datasets from different UAV platforms (i.e. DJI Phantom 4Pro, DJI Phantom 3 professional, and DJI Inspire 1 Pro) were acquired over the same area and the spatial accuracy of the orthoimages was evaluated.
ASSISTments Dataset from Multiple Randomized Controlled Experiments
ERIC Educational Resources Information Center
Selent, Douglas; Patikorn, Thanaporn; Heffernan, Neil
2016-01-01
In this paper, we present a dataset consisting of data generated from 22 previously and currently running randomized controlled experiments inside the ASSISTments online learning platform. This dataset provides data mining opportunities for researchers to analyze ASSISTments data in a convenient format across multiple experiments at the same time.…
Datasets, Technologies and Products from the NASA/NOAA Electronic Theater 2002
NASA Technical Reports Server (NTRS)
Hasler, A. Fritz; Starr, David (Technical Monitor)
2001-01-01
An in depth look at the Earth Science datasets used in the Etheater Visualizations will be presented. This will include the satellite orbits, platforms, scan patterns, the size, temporal and spatial resolution, and compositing techniques used to obtain the datasets as well as the spectral bands utilized.
OpenStereo: Open Source, Cross-Platform Software for Structural Geology Analysis
NASA Astrophysics Data System (ADS)
Grohmann, C. H.; Campanha, G. A.
2010-12-01
Free and open source software (FOSS) are increasingly seen as synonyms of innovation and progress. Freedom to run, copy, distribute, study, change and improve the software (through access to the source code) assure a high level of positive feedback between users and developers, which results in stable, secure and constantly updated systems. Several software packages for structural geology analysis are available to the user, with commercial licenses or that can be downloaded at no cost from the Internet. Some provide basic tools of stereographic projections such as plotting poles, great circles, density contouring, eigenvector analysis, data rotation etc, while others perform more specific tasks, such as paleostress or geotechnical/rock stability analysis. This variety also means a wide range of data formating for input, Graphical User Interface (GUI) design and graphic export format. The majority of packages is built for MS-Windows and even though there are packages for the UNIX-based MacOS, there aren't native packages for *nix (UNIX, Linux, BSD etc) Operating Systems (OS), forcing the users to run these programs with emulators or virtual machines. Those limitations lead us to develop OpenStereo, an open source, cross-platform software for stereographic projections and structural geology. The software is written in Python, a high-level, cross-platform programming language and the GUI is designed with wxPython, which provide a consistent look regardless the OS. Numeric operations (like matrix and linear algebra) are performed with the Numpy module and all graphic capabilities are provided by the Matplolib library, including on-screen plotting and graphic exporting to common desktop formats (emf, eps, ps, pdf, png, svg). Data input is done with simple ASCII text files, with values of dip direction and dip/plunge separated by spaces, tabs or commas. The user can open multiple file at the same time (or the same file more than once), and overlay different elements of each dataset (poles, great circles etc). The GUI shows the opened files in a tree structure, similar to “layers” of many illustration software, where the vertical order of the files in the tree reflects the drawing order of the selected elements. At this stage, the software performs plotting operations of poles to planes, lineations, great circles, density contours and rose diagrams. A set of statistics is calculated for each file and its eigenvalues and eigenvectors are used to suggest if the data is clustered about a mean value or distributed along a girdle. Modified Flinn, Triangular and histograms plots are also available. Next step of development will focus on tools as merging and rotation of datasets, possibility to save 'projects' and paleostress analysis. In its current state, OpenStereo requires Python, wxPython, Numpy and Matplotlib installed in the system. We recommend installing PythonXY or the Enthought Python Distribution on MS-Windows and MacOS machines, since all dependencies are provided. Most Linux distributions provide an easy way to install all dependencies through software repositories. OpenStereo is released under the GNU General Public License. Programmers willing to contribute are encouraged to contact the authors directly. FAPESP Grant #09/17675-5
Wei, Wei; Ji, Zhanglong; He, Yupeng; Zhang, Kai; Ha, Yuanchi; Li, Qi; Ohno-Machado, Lucila
2018-01-01
Abstract The number and diversity of biomedical datasets grew rapidly in the last decade. A large number of datasets are stored in various repositories, with different formats. Existing dataset retrieval systems lack the capability of cross-repository search. As a result, users spend time searching datasets in known repositories, and they typically do not find new repositories. The biomedical and healthcare data discovery index ecosystem (bioCADDIE) team organized a challenge to solicit new indexing and searching strategies for retrieving biomedical datasets across repositories. We describe the work of one team that built a retrieval pipeline and examined its performance. The pipeline used online resources to supplement dataset metadata, automatically generated queries from users’ free-text questions, produced high-quality retrieval results and achieved the highest inferred Normalized Discounted Cumulative Gain among competitors. The results showed that it is a promising solution for cross-database, cross-domain and cross-repository biomedical dataset retrieval. Database URL: https://github.com/w2wei/dataset_retrieval_pipeline PMID:29688374
Operational use of open satellite data for marine water quality monitoring
NASA Astrophysics Data System (ADS)
Symeonidis, Panagiotis; Vakkas, Theodoros
2017-09-01
The purpose of this study was to develop an operational platform for marine water quality monitoring using near real time satellite data. The developed platform utilizes free and open satellite data available from different data sources like COPERNICUS, the European Earth Observation Initiative, or NASA, from different satellites and instruments. The quality of the marine environment is operationally evaluated using parameters like chlorophyll-a concentration, water color and Sea Surface Temperature (SST). For each parameter, there are more than one dataset available, from different data sources or satellites, to allow users to select the most appropriate dataset for their area or time of interest. The above datasets are automatically downloaded from the data provider's services and ingested to the central, spatial engine. The spatial data platform uses the Postgresql database with the PostGIS extension for spatial data storage and Geoserver for the provision of the spatial data services. The system provides daily, 10 days and monthly maps and time series of the above parameters. The information is provided using a web client which is based on the GET SDI PORTAL, an easy to use and feature rich geospatial visualization and analysis platform. The users can examine the temporal variation of the parameters using a simple time animation tool. In addition, with just one click on the map, the system provides an interactive time series chart for any of the parameters of the available datasets. The platform can be offered as Software as a Service (SaaS) to any area in the Mediterranean region.
NASA Astrophysics Data System (ADS)
Palma, J. L.; Belo-Pereira, M.; Leo, L. S.; Fernando, J.; Wildmann, N.; Gerz, T.; Rodrigues, C. V.; Lopes, A. S.; Lopes, J. C.
2017-12-01
Perdigão is the largest of a series of wind-mapping studies embedded in the on-going NEWA (New European Wind Atlas) Project. The intensive observational period of the Perdigão field experiment resulted in an unprecedented volume of data, covering several wind conditions through 46 consecutive days between May and June 2017. For researchers looking into specific events, it is time consuming to scrutinise the datasets looking for appropriate conditions. Such task becomes harder if the parameters of interest were not measured directly, instead requiring their computation from the raw datasets. This work will present the e-Science platform developed by University of Porto for the Perdigao dataset. The platform will assist scientists of Perdigao and the larger scientific community in extrapolating the datasets associated to specific flow regimes of interest as well as automatically performing post-processing/filtering operations internally in the platform. We will illustrate the flow regime categories identified in Perdigao based on several parameters such as weather type classification, cloud characteristics, as well as stability regime indicators (Brunt-Väisälä frequency, Scorer parameter, potential temperature inversion heights, dimensionless Richardson and Froude numbers) and wind regime indicators. Examples of some of the post-processing techniques available in the e-Science platform, such as the Savitzky-Golay low-pass filtering technique, will be also presented.
Development of a Platform to Enable Fully Automated Cross-Titration Experiments.
Cassaday, Jason; Finley, Michael; Squadroni, Brian; Jezequel-Sur, Sylvie; Rauch, Albert; Gajera, Bharti; Uebele, Victor; Hermes, Jeffrey; Zuck, Paul
2017-04-01
In the triage of hits from a high-throughput screening campaign or during the optimization of a lead compound, it is relatively routine to test compounds at multiple concentrations to determine potency and maximal effect. Additional follow-up experiments, such as agonist shift, can be quite valuable in ascertaining compound mechanism of action (MOA). However, these experiments require cross-titration of a test compound with the activating ligand of the receptor requiring 100-200 data points, severely limiting the number tested in MOA assays in a screening triage. We describe a process to enhance the throughput of such cross-titration experiments through the integration of Hewlett Packard's D300 digital dispenser onto one of our robotics platforms to enable on-the-fly cross-titration of compounds in a 1536-well plate format. The process handles all the compound management and data tracking, as well as the biological assay. The process relies heavily on in-house-built software and hardware, and uses our proprietary control software for the platform. Using this system, we were able to automate the cross-titration of compounds for both positive and negative allosteric modulators of two different G protein-coupled receptors (GPCRs) using two distinct assay detection formats, IP1 and Ca 2+ detection, on nearly 100 compounds for each target.
bioWeb3D: an online webGL 3D data visualisation tool
2013-01-01
Background Data visualization is critical for interpreting biological data. However, in practice it can prove to be a bottleneck for non trained researchers; this is especially true for three dimensional (3D) data representation. Whilst existing software can provide all necessary functionalities to represent and manipulate biological 3D datasets, very few are easily accessible (browser based), cross platform and accessible to non-expert users. Results An online HTML5/WebGL based 3D visualisation tool has been developed to allow biologists to quickly and easily view interactive and customizable three dimensional representations of their data along with multiple layers of information. Using the WebGL library Three.js written in Javascript, bioWeb3D allows the simultaneous visualisation of multiple large datasets inputted via a simple JSON, XML or CSV file, which can be read and analysed locally thanks to HTML5 capabilities. Conclusions Using basic 3D representation techniques in a technologically innovative context, we provide a program that is not intended to compete with professional 3D representation software, but that instead enables a quick and intuitive representation of reasonably large 3D datasets. PMID:23758781
GUIDEseq: a bioconductor package to analyze GUIDE-Seq datasets for CRISPR-Cas nucleases.
Zhu, Lihua Julie; Lawrence, Michael; Gupta, Ankit; Pagès, Hervé; Kucukural, Alper; Garber, Manuel; Wolfe, Scot A
2017-05-15
Genome editing technologies developed around the CRISPR-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions. These nucleases also hold tremendous promise for treating a variety of genetic disorders. In the context of their therapeutic application, it is important to identify the spectrum of genomic sequences that are cleaved by a candidate nuclease when programmed with a particular guide RNA, as well as the cleavage efficiency of these sites. Powerful new experimental approaches, such as GUIDE-seq, facilitate the sensitive, unbiased genome-wide detection of nuclease cleavage sites within the genome. Flexible bioinformatics analysis tools for processing GUIDE-seq data are needed. Here, we describe an open source, open development software suite, GUIDEseq, for GUIDE-seq data analysis and annotation as a Bioconductor package in R. The GUIDEseq package provides a flexible platform with more than 60 adjustable parameters for the analysis of datasets associated with custom nuclease applications. These parameters allow data analysis to be tailored to different nuclease platforms with different length and complexity in their guide and PAM recognition sequences or their DNA cleavage position. They also enable users to customize sequence aggregation criteria, and vary peak calling thresholds that can influence the number of potential off-target sites recovered. GUIDEseq also annotates potential off-target sites that overlap with genes based on genome annotation information, as these may be the most important off-target sites for further characterization. In addition, GUIDEseq enables the comparison and visualization of off-target site overlap between different datasets for a rapid comparison of different nuclease configurations or experimental conditions. For each identified off-target, the GUIDEseq package outputs mapped GUIDE-Seq read count as well as cleavage score from a user specified off-target cleavage score prediction algorithm permitting the identification of genomic sequences with unexpected cleavage activity. The GUIDEseq package enables analysis of GUIDE-data from various nuclease platforms for any species with a defined genomic sequence. This software package has been used successfully to analyze several GUIDE-seq datasets. The software, source code and documentation are freely available at http://www.bioconductor.org/packages/release/bioc/html/GUIDEseq.html .
Interactive client side data visualization with d3.js
NASA Astrophysics Data System (ADS)
Rodzianko, A.; Versteeg, R.; Johnson, D. V.; Soltanian, M. R.; Versteeg, O. J.; Girouard, M.
2015-12-01
Geoscience data associated with near surface research and operational sites is increasingly voluminous and heterogeneous (both in terms of providers and data types - e.g. geochemical, hydrological, geophysical, modeling data, of varying spatiotemporal characteristics). Such data allows scientists to investigate fundamental hydrological and geochemical processes relevant to agriculture, water resources and climate change. For scientists to easily share, model and interpret such data requires novel tools with capabilities for interactive data visualization. Under sponsorship of the US Department of Energy, Subsurface Insights is developing the Predictive Assimilative Framework (PAF): a cloud based subsurface monitoring platform which can manage, process and visualize large heterogeneous datasets. Over the last year we transitioned our visualization method from a server side approach (in which images and animations were generated using Jfreechart and Visit) to a client side one that utilizes the D3 Javascript library. Datasets are retrieved using web service calls to the server, returned as JSON objects and visualized within the browser. Users can interactively explore primary and secondary datasets from various field locations. Our current capabilities include interactive data contouring and heterogeneous time series data visualization. While this approach is very powerful and not necessarily unique, special attention needs to be paid to latency and responsiveness issues as well as to issues as cross browser code compatibility so that users have an identical, fluid and frustration-free experience across different computational platforms. We gratefully acknowledge support from the US Department of Energy under SBIR Award DOE DE-SC0009732, the use of data from the Lawrence Berkeley National Laboratory (LBNL) Sustainable Systems SFA Rifle field site and collaboration with LBNL SFA scientists.
An Evaluation of Database Solutions to Spatial Object Association
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, V S; Kurc, T; Saltz, J
2008-06-24
Object association is a common problem encountered in many applications. Spatial object association, also referred to as crossmatch of spatial datasets, is the problem of identifying and comparing objects in two datasets based on their positions in a common spatial coordinate system--one of the datasets may correspond to a catalog of objects observed over time in a multi-dimensional domain; the other dataset may consist of objects observed in a snapshot of the domain at a time point. The use of database management systems to the solve the object association problem provides portability across different platforms and also greater flexibility. Increasingmore » dataset sizes in today's applications, however, have made object association a data/compute-intensive problem that requires targeted optimizations for efficient execution. In this work, we investigate how database-based crossmatch algorithms can be deployed on different database system architectures and evaluate the deployments to understand the impact of architectural choices on crossmatch performance and associated trade-offs. We investigate the execution of two crossmatch algorithms on (1) a parallel database system with active disk style processing capabilities, (2) a high-throughput network database (MySQL Cluster), and (3) shared-nothing databases with replication. We have conducted our study in the context of a large-scale astronomy application with real use-case scenarios.« less
Release of (and lessons learned from mining) a pioneering large toxicogenomics database.
Sandhu, Komal S; Veeramachaneni, Vamsi; Yao, Xiang; Nie, Alex; Lord, Peter; Amaratunga, Dhammika; McMillian, Michael K; Verheyen, Geert R
2015-07-01
We release the Janssen Toxicogenomics database. This rat liver gene-expression database was generated using Codelink microarrays, and has been used over the past years within Janssen to derive signatures for multiple end points and to classify proprietary compounds. The release consists of gene-expression responses to 124 compounds, selected to give a broad coverage of liver-active compounds. A selection of the compounds were also analyzed on Affymetrix microarrays. The release includes results of an in-house reannotation pipeline to Entrez gene annotations, to classify probes into different confidence classes. High confidence unambiguously annotated probes were used to create gene-level data which served as starting point for cross-platform comparisons. Connectivity map-based similarity methods show excellent agreement between Codelink and Affymetrix runs of the same samples. We also compared our dataset with the Japanese Toxicogenomics Project and observed reasonable agreement, especially for compounds with stronger gene signatures. We describe an R-package containing the gene-level data and show how it can be used for expression-based similarity searches. Comparing the same biological samples run on the Affymetrix and the Codelink platform, good correspondence is observed using connectivity mapping approaches. As expected, this correspondence is smaller when the data are compared with an independent dataset such as TG-GATE. We hope that this collection of gene-expression profiles will be incorporated in toxicogenomics pipelines of users.
HIVprotI: an integrated web based platform for prediction and design of HIV proteins inhibitors.
Qureshi, Abid; Rajput, Akanksha; Kaur, Gazaldeep; Kumar, Manoj
2018-03-09
A number of anti-retroviral drugs are being used for treating Human Immunodeficiency Virus (HIV) infection. Due to emergence of drug resistant strains, there is a constant quest to discover more effective anti-HIV compounds. In this endeavor, computational tools have proven useful in accelerating drug discovery. Although methods were published to design a class of compounds against a specific HIV protein, but an integrated web server for the same is lacking. Therefore, we have developed support vector machine based regression models using experimentally validated data from ChEMBL repository. Quantitative structure activity relationship based features were selected for predicting inhibition activity of a compound against HIV proteins namely protease (PR), reverse transcriptase (RT) and integrase (IN). The models presented a maximum Pearson correlation coefficient of 0.78, 0.76, 0.74 and 0.76, 0.68, 0.72 during tenfold cross-validation on IC 50 and percent inhibition datasets of PR, RT, IN respectively. These models performed equally well on the independent datasets. Chemical space mapping, applicability domain analyses and other statistical tests further support robustness of the predictive models. Currently, we have identified a number of chemical descriptors that are imperative in predicting the compound inhibition potential. HIVprotI platform ( http://bioinfo.imtech.res.in/manojk/hivproti ) would be useful in virtual screening of inhibitors as well as designing of new molecules against the important HIV proteins for therapeutics development.
DOE Office of Scientific and Technical Information (OSTI.GOV)
VanderNoot, Victoria A.; Haroldsen, Brent L.; Renzi, Ronald F.
2010-03-01
In a multiyear research agreement with Tenix Investments Pty. Ltd., Sandia has been developing field deployable technologies for detection of biotoxins in water supply systems. The unattended water sensor or UWS employs microfluidic chip based gel electrophoresis for monitoring biological analytes in a small integrated sensor platform. This instrument collects, prepares, and analyzes water samples in an automated manner. Sample analysis is done using the {mu}ChemLab{trademark} analysis module. This report uses analysis results of two datasets collected using the UWS to estimate performance of the device. The first dataset is made up of samples containing ricin at varying concentrations andmore » is used for assessing instrument response and detection probability. The second dataset is comprised of analyses of water samples collected at a water utility which are used to assess the false positive probability. The analyses of the two sets are used to estimate the Receiver Operating Characteristic or ROC curves for the device at one set of operational and detection algorithm parameters. For these parameters and based on a statistical estimate, the ricin probability of detection is about 0.9 at a concentration of 5 nM for a false positive probability of 1 x 10{sup -6}.« less
IM-TORNADO: A Tool for Comparison of 16S Reads from Paired-End Libraries
Jeraldo, Patricio; Kalari, Krishna; Chen, Xianfeng; Bhavsar, Jaysheel; Mangalam, Ashutosh; White, Bryan; Nelson, Heidi; Kocher, Jean-Pierre; Chia, Nicholas
2014-01-01
Motivation 16S rDNA hypervariable tag sequencing has become the de facto method for accessing microbial diversity. Illumina paired-end sequencing, which produces two separate reads for each DNA fragment, has become the platform of choice for this application. However, when the two reads do not overlap, existing computational pipelines analyze data from read separately and underutilize the information contained in the paired-end reads. Results We created a workflow known as Illinois Mayo Taxon Organization from RNA Dataset Operations (IM-TORNADO) for processing non-overlapping reads while retaining maximal information content. Using synthetic mock datasets, we show that the use of both reads produced answers with greater correlation to those from full length 16S rDNA when looking at taxonomy, phylogeny, and beta-diversity. Availability and Implementation IM-TORNADO is freely available at http://sourceforge.net/projects/imtornado and produces BIOM format output for cross compatibility with other pipelines such as QIIME, mothur, and phyloseq. PMID:25506826
Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choe, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B; Gupta, Neha; Kohane, Isaac S; Green, Robert C; Kong, Sek Won
2014-08-01
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates. © 2014 WILEY PERIODICALS, INC.
Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choi, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B.; Gupta, Neha; Kohane, Isaac S.; Green, Robert C.; Kong, Sek Won
2014-01-01
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous SNVs; 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and ensemble genotyping would be essential to minimize false positive DNM candidates. PMID:24829188
Brown, Alan P; Drew, Philip; Knight, Brian; Marc, Philippe; Troth, Sean; Wuersch, Kuno; Zandee, Joyce
2016-12-01
Histopathology data comprise a critical component of pharmaceutical toxicology studies and are typically presented as finding incidence counts and severity scores per organ, and tabulated on multiple pages which can be challenging for review and aggregation of results. However, the SEND (Standard for Exchange of Nonclinical Data) standard provides a means for collecting and managing histopathology data in a uniform fashion which can allow informatics systems to archive, display and analyze data in novel ways. Various software applications have become available to convert histopathology data into graphical displays for analyses. A subgroup of the FDA-PhUSE Nonclinical Working Group conducted intra-industry surveys regarding the use of graphical displays of histopathology data. Visual cues, use-cases, the value of cross-domain and cross-study visualizations, and limitations were topics for discussion in the context of the surveys. The subgroup came to the following conclusions. Graphical displays appear advantageous as a communication tool to both pathologists and non-pathologists, and provide an efficient means for communicating pathology findings to project teams. Graphics can support hypothesis-generation which could include cross-domain interactive visualizations and/-or aggregating large datasets from multiple studies to observe and/or display patterns and trends. Incorporation of the SEND standard will provide a platform by which visualization tools will be able to aggregate, select and display information from complex and disparate datasets. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Changyong, Dou; Huadong, Guo; Chunming, Han; Ming, Liu
2014-03-01
With more and more Earth observation data available to the community, how to manage and sharing these valuable remote sensing datasets is becoming an urgent issue to be solved. The web based Geographical Information Systems (GIS) technology provides a convenient way for the users in different locations to share and make use of the same dataset. In order to efficiently use the airborne Synthetic Aperture Radar (SAR) remote sensing data acquired in the Airborne Remote Sensing Center of the Institute of Remote Sensing and Digital Earth (RADI), Chinese Academy of Sciences (CAS), a Web-GIS based platform for airborne SAR data management, distribution and sharing was designed and developed. The major features of the system include map based navigation search interface, full resolution imagery shown overlaid the map, and all the software adopted in the platform are Open Source Software (OSS). The functions of the platform include browsing the imagery on the map navigation based interface, ordering and downloading data online, image dataset and user management, etc. At present, the system is under testing in RADI and will come to regular operation soon.
Cross-Dependency Inference in Multi-Layered Networks: A Collaborative Filtering Perspective.
Chen, Chen; Tong, Hanghang; Xie, Lei; Ying, Lei; He, Qing
2017-08-01
The increasingly connected world has catalyzed the fusion of networks from different domains, which facilitates the emergence of a new network model-multi-layered networks. Examples of such kind of network systems include critical infrastructure networks, biological systems, organization-level collaborations, cross-platform e-commerce, and so forth. One crucial structure that distances multi-layered network from other network models is its cross-layer dependency, which describes the associations between the nodes from different layers. Needless to say, the cross-layer dependency in the network plays an essential role in many data mining applications like system robustness analysis and complex network control. However, it remains a daunting task to know the exact dependency relationships due to noise, limited accessibility, and so forth. In this article, we tackle the cross-layer dependency inference problem by modeling it as a collective collaborative filtering problem. Based on this idea, we propose an effective algorithm Fascinate that can reveal unobserved dependencies with linear complexity. Moreover, we derive Fascinate-ZERO, an online variant of Fascinate that can respond to a newly added node timely by checking its neighborhood dependencies. We perform extensive evaluations on real datasets to substantiate the superiority of our proposed approaches.
Driving forces of researchers mobility
NASA Astrophysics Data System (ADS)
Gargiulo, Floriana; Carletti, Timoteo
2014-05-01
Starting from the dataset of the publication corpus of the APS during the period 1955-2009, we reconstruct the individual researchers trajectories, namely the list of the consecutive affiliations for each scholar. Crossing this information with different geographic datasets we embed these trajectories in a spatial framework. Using methods from network theory and complex systems analysis we characterise these patterns in terms of topological network properties and we analyse the dependence of an academic path across different dimensions: the distance between two subsequent positions, the relative importance of the institutions (in terms of number of publications) and some socio-cultural traits. We show that distance is not always a good predictor for the next affiliation while other factors like ``the previous steps'' of the career of the researchers (in particular the first position) or the linguistic and historical similarity between two countries can have an important impact. Finally we show that the dataset exhibit a memory effect, hence the fate of a career strongly depends from the first two affiliations.
DART, a platform for the creation and registration of cone beam digital tomosynthesis datasets.
Sarkar, Vikren; Shi, Chengyu; Papanikolaou, Niko
2011-04-01
Digital tomosynthesis is an imaging modality that allows for tomographic reconstructions using only a fraction of the images needed for CT reconstruction. Since it offers the advantages of tomographic images with a smaller imaging dose delivered to the patient, the technique offers much promise for use in patient positioning prior to radiation delivery. This paper describes a software environment developed to help in the creation of digital tomosynthesis image sets from digital portal images using three different reconstruction algorithms. The software then allows for use of the tomograms for patient positioning or for dose recalculation if shifts are not applied, possibly as part of an adaptive radiotherapy regimen.
EFEHR - the European Facilities for Earthquake Hazard and Risk: beyond the web-platform
NASA Astrophysics Data System (ADS)
Danciu, Laurentiu; Wiemer, Stefan; Haslinger, Florian; Kastli, Philipp; Giardini, Domenico
2017-04-01
European Facilities for Earthquake Hazard and Risk (EEFEHR) represents the sustainable community resource for seismic hazard and risk in Europe. The EFEHR web platform is the main gateway to access data, models and tools as well as provide expertise relevant for assessment of seismic hazard and risk. The main services (databases and web-platform) are hosted at ETH Zurich and operated by the Swiss Seismological Service (Schweizerischer Erdbebendienst SED). EFEHR web-portal (www.efehr.org) collects and displays (i) harmonized datasets necessary for hazard and risk modeling, e.g. seismic catalogues, fault compilations, site amplifications, vulnerabilities, inventories; (ii) extensive seismic hazard products, namely hazard curves, uniform hazard spectra and maps for national and regional assessments. (ii) standardized configuration files for re-computing the regional seismic hazard models; (iv) relevant documentation of harmonized datasets, models and web-services. Today, EFEHR distributes full output of the 2013 European Seismic Hazard Model, ESHM13, as developed within the SHARE project (http://www.share-eu.org/); the latest results of the 2014 Earthquake Model of the Middle East (EMME14), derived within the EMME Project (www.emme-gem.org); the 2001 Global Seismic Hazard Assessment Project (GSHAP) results and the 2015 updates of the Swiss Seismic Hazard. New datasets related to either seismic hazard or risk will be incorporated as they become available. We present the currents status of the EFEHR platform, with focus on the challenges, summaries of the up-to-date datasets, user experience and feedback, as well as the roadmap to future technological innovation beyond the web-platform development. We also show the new services foreseen to fully integrate with the seismological core services of European Plate Observing System (EPOS).
Hoeck, W G
1994-06-01
InfoTrac TFD provides a graphical user interface (GUI) for viewing and manipulating datasets in the Transcription Factor Database, TFD. The interface was developed in Filemaker Pro 2.0 by Claris Corporation, which provides cross platform compatibility between Apple Macintosh computers running System 7.0 and higher and IBM-compatibles running Microsoft Windows 3.0 and higher. TFD ASCII-tables were formatted to fit data into several custom data tables using Add/Strip, a shareware utility and Filemaker Pro's lookup feature. The lookup feature was also put to use to allow TFD data tables to become linked within a flat-file database management system. The 'Navigator', consisting of several pop-up menus listing transcription factor abbreviations, facilitates the search for transcription factor entries. Data are presented onscreen in several layouts, that can be further customized by the user. InfoTrac TFD makes the transcription factor database accessible to a much wider community of scientists by making it available on two popular microcomputer platforms.
Efficient visualization of high-throughput targeted proteomics experiments: TAPIR.
Röst, Hannes L; Rosenberger, George; Aebersold, Ruedi; Malmström, Lars
2015-07-15
Targeted mass spectrometry comprises a set of powerful methods to obtain accurate and consistent protein quantification in complex samples. To fully exploit these techniques, a cross-platform and open-source software stack based on standardized data exchange formats is required. We present TAPIR, a fast and efficient Python visualization software for chromatograms and peaks identified in targeted proteomics experiments. The input formats are open, community-driven standardized data formats (mzML for raw data storage and TraML encoding the hierarchical relationships between transitions, peptides and proteins). TAPIR is scalable to proteome-wide targeted proteomics studies (as enabled by SWATH-MS), allowing researchers to visualize high-throughput datasets. The framework integrates well with existing automated analysis pipelines and can be extended beyond targeted proteomics to other types of analyses. TAPIR is available for all computing platforms under the 3-clause BSD license at https://github.com/msproteomicstools/msproteomicstools. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Clustering and Network Analysis of Reverse Phase Protein Array Data.
Byron, Adam
2017-01-01
Molecular profiling of proteins and phosphoproteins using a reverse phase protein array (RPPA) platform, with a panel of target-specific antibodies, enables the parallel, quantitative proteomic analysis of many biological samples in a microarray format. Hence, RPPA analysis can generate a high volume of multidimensional data that must be effectively interrogated and interpreted. A range of computational techniques for data mining can be applied to detect and explore data structure and to form functional predictions from large datasets. Here, two approaches for the computational analysis of RPPA data are detailed: the identification of similar patterns of protein expression by hierarchical cluster analysis and the modeling of protein interactions and signaling relationships by network analysis. The protocols use freely available, cross-platform software, are easy to implement, and do not require any programming expertise. Serving as data-driven starting points for further in-depth analysis, validation, and biological experimentation, these and related bioinformatic approaches can accelerate the functional interpretation of RPPA data.
MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud.
Expósito, Roberto R; Veiga, Jorge; González-Domínguez, Jorge; Touriño, Juan
2017-09-01
This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted MapReduce programming model to fully exploit Big Data technologies on cloud-based infrastructures. Written in Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for scalable Big Data processing. On a 16-node cluster deployed on the Amazon EC2 cloud platform, MarDRe is up to 8.52 times faster than a representative state-of-the-art tool. Source code in Java and Hadoop as well as a user's guide are freely available under the GNU GPLv3 license at http://mardre.des.udc.es . rreye@udc.es. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Cross-Cultural Detection of Depression from Nonverbal Behaviour.
Alghowinem, Sharifa; Goecke, Roland; Cohn, Jeffrey F; Wagner, Michael; Parker, Gordon; Breakspear, Michael
2015-05-01
Millions of people worldwide suffer from depression. Do commonalities exist in their nonverbal behavior that would enable cross-culturally viable screening and assessment of severity? We investigated the generalisability of an approach to detect depression severity cross-culturally using video-recorded clinical interviews from Australia, the USA and Germany. The material varied in type of interview, subtypes of depression and inclusion healthy control subjects, cultural background, and recording environment. The analysis focussed on temporal features of participants' eye gaze and head pose. Several approaches to training and testing within and between datasets were evaluated. The strongest results were found for training across all datasets and testing across datasets using leave-one-subject-out cross-validation. In contrast, generalisability was attenuated when training on only one or two of the three datasets and testing on subjects from the dataset(s) not used in training. These findings highlight the importance of using training data exhibiting the expected range of variability.
Wolff, Alexander; Bayerlová, Michaela; Gaedcke, Jochen; Kube, Dieter; Beißbarth, Tim
2018-01-01
Pipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances. Four commonly used RNA-Seq pipelines (STAR/HTSeq-Count/edgeR, STAR/RSEM/edgeR, Sailfish/edgeR, TopHat2/Cufflinks/CuffDiff)) were investigated on multiple levels (alignment and counting) and cross-compared with the microarray counterpart on the level of gene expression and gene ontology enrichment. For these comparisons we generated two matched microarray and RNA-Seq datasets: Burkitt Lymphoma cell line data and rectal cancer patient data. The overall mapping rate of STAR was 98.98% for the cell line dataset and 98.49% for the patient dataset. Tophat's overall mapping rate was 97.02% and 96.73%, respectively, while Sailfish had only an overall mapping rate of 84.81% and 54.44%. The correlation of gene expression in microarray and RNA-Seq data was moderately worse for the patient dataset (ρ = 0.67-0.69) than for the cell line dataset (ρ = 0.87-0.88). An exception were the correlation results of Cufflinks, which were substantially lower (ρ = 0.21-0.29 and 0.34-0.53). For both datasets we identified very low numbers of differentially expressed genes using the microarray platform. For RNA-Seq we checked the agreement of differentially expressed genes identified in the different pipelines and of GO-term enrichment results. In conclusion the combination of STAR aligner with HTSeq-Count followed by STAR aligner with RSEM and Sailfish generated differentially expressed genes best suited for the dataset at hand and in agreement with most of the other transcriptomics pipelines.
The vast datasets generated by next generation gene sequencing and expression profiling have transformed biological and translational research. However, technologies to produce large-scale functional genomics datasets, such as high-throughput detection of protein-protein interactions (PPIs), are still in early development. While a number of powerful technologies have been employed to detect PPIs, a singular PPI biosensor platform featured with both high sensitivity and robustness in a mammalian cell environment remains to be established.
Marcelino, Isabel; Lopes, David; Reis, Michael; Silva, Fernando; Laza, Rosalía; Pereira, António
2015-01-01
World's aging population is rising and the elderly are increasingly isolated socially and geographically. As a consequence, in many situations, they need assistance that is not granted in time. In this paper, we present a solution that follows the CRISP-DM methodology to detect the elderly's behavior pattern deviations that may indicate possible risk situations. To obtain these patterns, many variables are aggregated to ensure the alert system reliability and minimize eventual false positive alert situations. These variables comprehend information provided by body area network (BAN), by environment sensors, and also by the elderly's interaction in a service provider platform, called eServices--Elderly Support Service Platform. eServices is a scalable platform aggregating a service ecosystem developed specially for elderly people. This pattern recognition will further activate the adequate response. With the system evolution, it will learn to predict potential danger situations for a specified user, acting preventively and ensuring the elderly's safety and well-being. As the eServices platform is still in development, synthetic data, based on real data sample and empiric knowledge, is being used to populate the initial dataset. The presented work is a proof of concept of knowledge extraction using the eServices platform information. Regardless of not using real data, this work proves to be an asset, achieving a good performance in preventing alert situations.
Marcelino, Isabel; Laza, Rosalía
2015-01-01
World's aging population is rising and the elderly are increasingly isolated socially and geographically. As a consequence, in many situations, they need assistance that is not granted in time. In this paper, we present a solution that follows the CRISP-DM methodology to detect the elderly's behavior pattern deviations that may indicate possible risk situations. To obtain these patterns, many variables are aggregated to ensure the alert system reliability and minimize eventual false positive alert situations. These variables comprehend information provided by body area network (BAN), by environment sensors, and also by the elderly's interaction in a service provider platform, called eServices—Elderly Support Service Platform. eServices is a scalable platform aggregating a service ecosystem developed specially for elderly people. This pattern recognition will further activate the adequate response. With the system evolution, it will learn to predict potential danger situations for a specified user, acting preventively and ensuring the elderly's safety and well-being. As the eServices platform is still in development, synthetic data, based on real data sample and empiric knowledge, is being used to populate the initial dataset. The presented work is a proof of concept of knowledge extraction using the eServices platform information. Regardless of not using real data, this work proves to be an asset, achieving a good performance in preventing alert situations. PMID:25874219
GRDC. A Collaborative Framework for Radiological Background and Contextual Data Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brian J. Quiter; Ramakrishnan, Lavanya; Mark S. Bandstra
The Radiation Mobile Analysis Platform (RadMAP) is unique in its capability to collect both high quality radiological data from both gamma-ray detectors and fast neutron detectors and a broad array of contextual data that includes positioning and stance data, high-resolution 3D radiological data from weather sensors, LiDAR, and visual and hyperspectral cameras. The datasets obtained from RadMAP are both voluminous and complex and require analyses from highly diverse communities within both the national laboratory and academic communities. Maintaining a high level of transparency will enable analysis products to further enrich the RadMAP dataset. It is in this spirit of openmore » and collaborative data that the RadMAP team proposed to collect, calibrate, and make available online data from the RadMAP system. The Berkeley Data Cloud (BDC) is a cloud-based data management framework that enables web-based data browsing visualization, and connects curated datasets to custom workflows such that analysis products can be managed and disseminated while maintaining user access rights. BDC enables cloud-based analyses of large datasets in a manner that simulates real-time data collection, such that BDC can be used to test algorithm performance on real and source-injected datasets. Using the BDC framework, a subset of the RadMAP datasets have been disseminated via the Gamma Ray Data Cloud (GRDC) that is hosted through the National Energy Research Science Computing (NERSC) Center, enabling data access to over 40 users at 10 institutions.« less
KOLAM: a cross-platform architecture for scalable visualization and tracking in wide-area imagery
NASA Astrophysics Data System (ADS)
Fraser, Joshua; Haridas, Anoop; Seetharaman, Guna; Rao, Raghuveer M.; Palaniappan, Kannappan
2013-05-01
KOLAM is an open, cross-platform, interoperable, scalable and extensible framework supporting a novel multi- scale spatiotemporal dual-cache data structure for big data visualization and visual analytics. This paper focuses on the use of KOLAM for target tracking in high-resolution, high throughput wide format video also known as wide-area motion imagery (WAMI). It was originally developed for the interactive visualization of extremely large geospatial imagery of high spatial and spectral resolution. KOLAM is platform, operating system and (graphics) hardware independent, and supports embedded datasets scalable from hundreds of gigabytes to feasibly petabytes in size on clusters, workstations, desktops and mobile computers. In addition to rapid roam, zoom and hyper- jump spatial operations, a large number of simultaneously viewable embedded pyramid layers (also referred to as multiscale or sparse imagery), interactive colormap and histogram enhancement, spherical projection and terrain maps are supported. The KOLAM software architecture was extended to support airborne wide-area motion imagery by organizing spatiotemporal tiles in very large format video frames using a temporal cache of tiled pyramid cached data structures. The current version supports WAMI animation, fast intelligent inspection, trajectory visualization and target tracking (digital tagging); the latter by interfacing with external automatic tracking software. One of the critical needs for working with WAMI is a supervised tracking and visualization tool that allows analysts to digitally tag multiple targets, quickly review and correct tracking results and apply geospatial visual analytic tools on the generated trajectories. One-click manual tracking combined with multiple automated tracking algorithms are available to assist the analyst and increase human effectiveness.
Cross-species transferability and mapping of genomic and cDNA SSRs in pines
D. Chagne; P. Chaumeil; A. Ramboer; C. Collada; A. Guevara; M. T. Cervera; G. G. Vendramin; V. Garcia; J-M. Frigerio; Craig Echt; T. Richardson; Christophe Plomion
2004-01-01
Two unigene datasets of Pinus taeda and Pinus pinaster were screened to detect di-, tri and tetranucleotide repeated motifs using the SSRIT script. A total of 419 simple sequence repeats (SSRs) were identified, from which only 12.8% overlapped between the two sets. The position of the SSRs within the coding sequence were predicted...
Observations of Stratiform Lightning Flashes and Their Microphysical and Kinematic Environments
NASA Technical Reports Server (NTRS)
Lang, Timothy J.; Williams, Earle
2016-01-01
During the Midlatitude Continental Convective Clouds Experiment (MC3E), combined observations of clouds and precipitation were made from airborne and ground-based in situ and remote sensing platforms. These observations were coordinated for multiple mesoscale convective systems (MCSs) that passed over the MC3E domain in northern Oklahoma. Notably, during a storm on 20 May 2011 in situ and remote sensing airborne observations were made near the times and locations of stratiform positive cloud-to-ground (+CG) lightning flashes. These +CGs resulted from extremely large stratiform lightning flashes that were hundreds of km in length and lasted several seconds. This dataset provides an unprecedented look at kinematic and microphysical environments in the vicinity of large, powerful, and long-lived stratiform lightning flashes. We will use this dataset to understand the influence of low liquid water contents (LWCs) in the electrical charging of MCS stratiform regions.
Observations of Stratiform Lightning Flashes and Their Microphysical and Kinematic Environments
NASA Technical Reports Server (NTRS)
Lang, Timothy J.; Williams, Earle
2017-01-01
During the Midlatitude Continental Convective Clouds Experiment (MC3E), combined observations of clouds and precipitation were made from airborne and ground-based in situ and remote sensing platforms. These observations were coordinated for multiple mesoscale convective systems (MCSs) that passed over the MC3E domain in northern Oklahoma. Notably, during a storm on 20 May 2011 in situ and remote sensing airborne observations were made near the times and locations of stratiform positive cloud-to-ground (+CG) lightning flashes. These +CGs resulted from extremely large stratiform lightning flashes that were hundreds of km in length and lasted several seconds. This dataset provides an unprecedented look at kinematic and microphysical environments in the vicinity of large, powerful, and long-lived stratiform lightning flashes. We will use this dataset to understand the influence of low liquid water contents (LWCs) in the electrical charging of MCS stratiform regions.
Modular Track System For Positioning Mobile Robots
NASA Technical Reports Server (NTRS)
Miller, Jeff
1995-01-01
Conceptual system for positioning mobile robotic manipulators on large main structure includes modular tracks and ancillary structures assembled easily along with main structure. System, called "tracked robotic location system" (TROLS), originally intended for application to platforms in outer space, but TROLS concept might also prove useful on Earth; for example, to position robots in factories and warehouses. T-cross-section rail keeps mobile robot on track. Bar codes mark locations along track. Each robot equipped with bar-code-recognizing circuitry so it quickly finds way to assigned location.
A fully automated non-external marker 4D-CT sorting algorithm using a serial cine scanning protocol.
Carnes, Greg; Gaede, Stewart; Yu, Edward; Van Dyk, Jake; Battista, Jerry; Lee, Ting-Yim
2009-04-07
Current 4D-CT methods require external marker data to retrospectively sort image data and generate CT volumes. In this work we develop an automated 4D-CT sorting algorithm that performs without the aid of data collected from an external respiratory surrogate. The sorting algorithm requires an overlapping cine scan protocol. The overlapping protocol provides a spatial link between couch positions. Beginning with a starting scan position, images from the adjacent scan position (which spatial match the starting scan position) are selected by maximizing the normalized cross correlation (NCC) of the images at the overlapping slice position. The process was continued by 'daisy chaining' all couch positions using the selected images until an entire 3D volume was produced. The algorithm produced 16 phase volumes to complete a 4D-CT dataset. Additional 4D-CT datasets were also produced using external marker amplitude and phase angle sorting methods. The image quality of the volumes produced by the different methods was quantified by calculating the mean difference of the sorted overlapping slices from adjacent couch positions. The NCC sorted images showed a significant decrease in the mean difference (p < 0.01) for the five patients.
Cignetti, Fabien; Zedka, Milan; Vaugoyeau, Marianne; Assaiante, Christine
2013-01-01
Although there is suggestive evidence that a link exists between independent walking and the ability to establish anticipatory strategy to stabilize posture, the extent to which this skill facilitates the development of anticipatory postural control remains largely unknown. Here, we examined the role of independent walking on the infants' ability to anticipate predictable external perturbations. Non-walking infants, walking infants and adults were sitting on a platform that produced continuous rotation in the frontal plane. Surface electromyography (EMG) of neck and lower back muscles and the positions of markers located on the platform, the upper body and the head were recorded. Results from cross-correlation analysis between rectified and filtered EMGs and platform movement indicated that although muscle activation already occurred before platform movement in non-walking infants, only walking infants demonstrated an adult-like ability for anticipation. Moreover, results from further cross-correlation analysis between segmental angular displacement and platform movement together with measures of balance control at the end-points of rotation of the platform evidenced two sorts of behaviour. The adults behaved as a non-rigid non-inverted pendulum, rather stabilizing head in space, while both the walking and non-walking infants followed the platform, behaving as a rigid inverted pendulum. These results suggest that the acquisition of independent walking plays a role in the development of anticipatory postural control, likely improving the internal model for the sensorimotor control of posture. However, despite such improvement, integrating the dynamics of an external object, here the platform, within the model to maintain balance still remains challenging in infants.
Skounakis, Emmanouil; Farmaki, Christina; Sakkalis, Vangelis; Roniotis, Alexandros; Banitsas, Konstantinos; Graf, Norbert; Marias, Konstantinos
2010-01-01
This paper presents a novel, open access interactive platform for 3D medical image analysis, simulation and visualization, focusing in oncology images. The platform was developed through constant interaction and feedback from expert clinicians integrating a thorough analysis of their requirements while having an ultimate goal of assisting in accurately delineating tumors. It allows clinicians not only to work with a large number of 3D tomographic datasets but also to efficiently annotate multiple regions of interest in the same session. Manual and semi-automatic segmentation techniques combined with integrated correction tools assist in the quick and refined delineation of tumors while different users can add different components related to oncology such as tumor growth and simulation algorithms for improving therapy planning. The platform has been tested by different users and over large number of heterogeneous tomographic datasets to ensure stability, usability, extensibility and robustness with promising results. the platform, a manual and tutorial videos are available at: http://biomodeling.ics.forth.gr. it is free to use under the GNU General Public License.
Optimisation of multiplet identifier processing on a PLAYSTATION® 3
NASA Astrophysics Data System (ADS)
Hattori, Masami; Mizuno, Takashi
2010-02-01
To enable high-performance computing (HPC) for applications with large datasets using a Sony® PLAYSTATION® 3 (PS3™) video game console, we configured a hybrid system consisting of a Windows® PC and a PS3™. To validate this system, we implemented the real-time multiplet identifier (RTMI) application, which identifies multiplets of microearthquakes in terms of the similarity of their waveforms. The cross-correlation computation, which is a core algorithm of the RTMI application, was optimised for the PS3™ platform, while the rest of the computation, including data input and output remained on the PC. With this configuration, the core part of the algorithm ran 69 times faster than the original program, accelerating total computation speed more than five times. As a result, the system processed up to 2100 total microseismic events, whereas the original implementation had a limit of 400 events. These results indicate that this system enables high-performance computing for large datasets using the PS3™, as long as data transfer time is negligible compared with computation time.
BiodMHC: an online server for the prediction of MHC class II-peptide binding affinity.
Wang, Lian; Pan, Danling; Hu, Xihao; Xiao, Jinyu; Gao, Yangyang; Zhang, Huifang; Zhang, Yan; Liu, Juan; Zhu, Shanfeng
2009-05-01
Effective identification of major histocompatibility complex (MHC) molecules restricted peptides is a critical step in discovering immune epitopes. Although many online servers have been built to predict class II MHC-peptide binding affinity, they have been trained on different datasets, and thus fail in providing a unified comparison of various methods. In this paper, we present our implementation of seven popular predictive methods, namely SMM-align, ARB, SVR-pairwise, Gibbs sampler, ProPred, LP-top2, and MHCPred, on a single web server named BiodMHC (http://biod.whu.edu.cn/BiodMHC/index.html, the software is available upon request). Using a standard measure of AUC (Area Under the receiver operating characteristic Curves), we compare these methods by means of not only cross validation but also prediction on independent test datasets. We find that SMM-align, ProPred, SVR-pairwise, ARB, and Gibbs sampler are the five best-performing methods. For the binding affinity prediction of class II MHC-peptide, BiodMHC provides a convenient online platform for researchers to obtain binding information simultaneously using various methods.
Automating an integrated spatial data-mining model for landfill site selection
NASA Astrophysics Data System (ADS)
Abujayyab, Sohaib K. M.; Ahamad, Mohd Sanusi S.; Yahya, Ahmad Shukri; Ahmad, Siti Zubaidah; Aziz, Hamidi Abdul
2017-10-01
An integrated programming environment represents a robust approach to building a valid model for landfill site selection. One of the main challenges in the integrated model is the complicated processing and modelling due to the programming stages and several limitations. An automation process helps avoid the limitations and improve the interoperability between integrated programming environments. This work targets the automation of a spatial data-mining model for landfill site selection by integrating between spatial programming environment (Python-ArcGIS) and non-spatial environment (MATLAB). The model was constructed using neural networks and is divided into nine stages distributed between Matlab and Python-ArcGIS. A case study was taken from the north part of Peninsular Malaysia. 22 criteria were selected to utilise as input data and to build the training and testing datasets. The outcomes show a high-performance accuracy percentage of 98.2% in the testing dataset using 10-fold cross validation. The automated spatial data mining model provides a solid platform for decision makers to performing landfill site selection and planning operations on a regional scale.
Sequence Data for Clostridium autoethanogenum using Three Generations of Sequencing Technologies
Utturkar, Sagar M.; Klingeman, Dawn Marie; Bruno-Barcena, José M.; ...
2015-04-14
During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20 kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequencemore » datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data.« less
Gu, Joyce Xiuweu-Xu; Wei, Michael Yang; Rao, Pulivarthi H.; Lau, Ching C.; Behl, Sanjiv; Man, Tsz-Kwong
2007-01-01
With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator) that matches the BAC clones from array-based comparative genomic hybridization (aCGH) to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specific BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License. PMID:19936083
Gu, Joyce Xiuweu-Xu; Wei, Michael Yang; Rao, Pulivarthi H; Lau, Ching C; Behl, Sanjiv; Man, Tsz-Kwong
2007-10-06
With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator) that matches the BAC clones from array-based comparative genomic hybridization (aCGH) to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specific BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License.
Establishing a process for conducting cross-jurisdictional record linkage in Australia.
Moore, Hannah C; Guiver, Tenniel; Woollacott, Anthony; de Klerk, Nicholas; Gidding, Heather F
2016-04-01
To describe the realities of conducting a cross-jurisdictional data linkage project involving state and Australian Government-based data collections to inform future national data linkage programs of work. We outline the processes involved in conducting a Proof of Concept data linkage project including the implementation of national data integration principles, data custodian and ethical approval requirements, and establishment of data flows. The approval process involved nine approval and regulatory bodies and took more than two years. Data will be linked across 12 datasets involving three data linkage centres. A framework was established to allow data to flow between these centres while maintaining the separation principle that serves to protect the privacy of the individual. This will be the first project to link child immunisation records from an Australian Government dataset to other administrative health datasets for a population cohort covering 2 million births in two Australian states. Although the project experienced some delays, positive outcomes were realised, primarily the development of strong collaborations across key stakeholder groups including community engagement. We have identified several recommendations and enhancements to this now established framework to further streamline the process for data linkage studies involving Australian Government data. © 2015 Public Health Association of Australia.
CROPPER: a metagene creator resource for cross-platform and cross-species compendium studies.
Paananen, Jussi; Storvik, Markus; Wong, Garry
2006-09-22
Current genomic research methods provide researchers with enormous amounts of data. Combining data from different high-throughput research technologies commonly available in biological databases can lead to novel findings and increase research efficiency. However, combining data from different heterogeneous sources is often a very arduous task. These sources can be different microarray technology platforms, genomic databases, or experiments performed on various species. Our aim was to develop a software program that could facilitate the combining of data from heterogeneous sources, and thus allow researchers to perform genomic cross-platform/cross-species studies and to use existing experimental data for compendium studies. We have developed a web-based software resource, called CROPPER that uses the latest genomic information concerning different data identifiers and orthologous genes from the Ensembl database. CROPPER can be used to combine genomic data from different heterogeneous sources, allowing researchers to perform cross-platform/cross-species compendium studies without the need for complex computational tools or the requirement of setting up one's own in-house database. We also present an example of a simple cross-platform/cross-species compendium study based on publicly available Parkinson's disease data derived from different sources. CROPPER is a user-friendly and freely available web-based software resource that can be successfully used for cross-species/cross-platform compendium studies.
Multi-GNSS PPP-RTK: From Large- to Small-Scale Networks
Nadarajah, Nandakumaran; Wang, Kan; Choudhury, Mazher
2018-01-01
Precise point positioning (PPP) and its integer ambiguity resolution-enabled variant, PPP-RTK (real-time kinematic), can benefit enormously from the integration of multiple global navigation satellite systems (GNSS). In such a multi-GNSS landscape, the positioning convergence time is expected to be reduced considerably as compared to the one obtained by a single-GNSS setup. It is therefore the goal of the present contribution to provide numerical insights into the role taken by the multi-GNSS integration in delivering fast and high-precision positioning solutions (sub-decimeter and centimeter levels) using PPP-RTK. To that end, we employ the Curtin PPP-RTK platform and process data-sets of GPS, BeiDou Navigation Satellite System (BDS) and Galileo in stand-alone and combined forms. The data-sets are collected by various receiver types, ranging from high-end multi-frequency geodetic receivers to low-cost single-frequency mass-market receivers. The corresponding stations form a large-scale (Australia-wide) network as well as a small-scale network with inter-station distances less than 30 km. In case of the Australia-wide GPS-only ambiguity-float setup, 90% of the horizontal positioning errors (kinematic mode) are shown to become less than five centimeters after 103 min. The stated required time is reduced to 66 min for the corresponding GPS + BDS + Galieo setup. The time is further reduced to 15 min by applying single-receiver ambiguity resolution. The outcomes are supported by the positioning results of the small-scale network. PMID:29614040
Multi-GNSS PPP-RTK: From Large- to Small-Scale Networks.
Nadarajah, Nandakumaran; Khodabandeh, Amir; Wang, Kan; Choudhury, Mazher; Teunissen, Peter J G
2018-04-03
Precise point positioning (PPP) and its integer ambiguity resolution-enabled variant, PPP-RTK (real-time kinematic), can benefit enormously from the integration of multiple global navigation satellite systems (GNSS). In such a multi-GNSS landscape, the positioning convergence time is expected to be reduced considerably as compared to the one obtained by a single-GNSS setup. It is therefore the goal of the present contribution to provide numerical insights into the role taken by the multi-GNSS integration in delivering fast and high-precision positioning solutions (sub-decimeter and centimeter levels) using PPP-RTK. To that end, we employ the Curtin PPP-RTK platform and process data-sets of GPS, BeiDou Navigation Satellite System (BDS) and Galileo in stand-alone and combined forms. The data-sets are collected by various receiver types, ranging from high-end multi-frequency geodetic receivers to low-cost single-frequency mass-market receivers. The corresponding stations form a large-scale (Australia-wide) network as well as a small-scale network with inter-station distances less than 30 km. In case of the Australia-wide GPS-only ambiguity-float setup, 90% of the horizontal positioning errors (kinematic mode) are shown to become less than five centimeters after 103 min. The stated required time is reduced to 66 min for the corresponding GPS + BDS + Galieo setup. The time is further reduced to 15 min by applying single-receiver ambiguity resolution. The outcomes are supported by the positioning results of the small-scale network.
A reference human genome dataset of the BGISEQ-500 sequencer.
Huang, Jie; Liang, Xinming; Xuan, Yuankai; Geng, Chunyu; Li, Yuxiang; Lu, Haorong; Qu, Shoufang; Mei, Xianglin; Chen, Hongbo; Yu, Ting; Sun, Nan; Rao, Junhua; Wang, Jiahao; Zhang, Wenwei; Chen, Ying; Liao, Sha; Jiang, Hui; Liu, Xin; Yang, Zhaopeng; Mu, Feng; Gao, Shangxian
2017-05-01
BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform. © The Authors 2017. Published by Oxford University Press.
Learning to recognize rat social behavior: Novel dataset and cross-dataset application.
Lorbach, Malte; Kyriakou, Elisavet I; Poppe, Ronald; van Dam, Elsbeth A; Noldus, Lucas P J J; Veltkamp, Remco C
2018-04-15
Social behavior is an important aspect of rodent models. Automated measuring tools that make use of video analysis and machine learning are an increasingly attractive alternative to manual annotation. Because machine learning-based methods need to be trained, it is important that they are validated using data from different experiment settings. To develop and validate automated measuring tools, there is a need for annotated rodent interaction datasets. Currently, the availability of such datasets is limited to two mouse datasets. We introduce the first, publicly available rat social interaction dataset, RatSI. We demonstrate the practical value of the novel dataset by using it as the training set for a rat interaction recognition method. We show that behavior variations induced by the experiment setting can lead to reduced performance, which illustrates the importance of cross-dataset validation. Consequently, we add a simple adaptation step to our method and improve the recognition performance. Most existing methods are trained and evaluated in one experimental setting, which limits the predictive power of the evaluation to that particular setting. We demonstrate that cross-dataset experiments provide more insight in the performance of classifiers. With our novel, public dataset we encourage the development and validation of automated recognition methods. We are convinced that cross-dataset validation enhances our understanding of rodent interactions and facilitates the development of more sophisticated recognition methods. Combining them with adaptation techniques may enable us to apply automated recognition methods to a variety of animals and experiment settings. Copyright © 2017 Elsevier B.V. All rights reserved.
Latent feature decompositions for integrative analysis of multi-platform genomic data
Gregory, Karl B.; Momin, Amin A.; Coombes, Kevin R.; Baladandayuthapani, Veerabhadran
2015-01-01
Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and epigenetic characteristics illuminate the roles their complex relationships play in disease progression and outcomes. However, integrative methods for diverse genomics data are faced with the challenges of ultra-high dimensionality and the existence of complex interactions both within and between platforms. We propose a novel modeling framework for integrative analysis based on decompositions of the large number of platform-specific features into a smaller number of latent features. Subsequently we build a predictive model for clinical outcomes accounting for both within- and between-platform interactions based on Bayesian model averaging procedures. Principal components, partial least squares and non-negative matrix factorization as well as sparse counterparts of each are used to define the latent features, and the performance of these decompositions is compared both on real and simulated data. The latent feature interactions are shown to preserve interactions between the original features and not only aid prediction but also allow explicit selection of outcome-related features. The methods are motivated by and applied to, a glioblastoma multiforme dataset from The Cancer Genome Atlas to predict patient survival times integrating gene expression, microRNA, copy number and methylation data. For the glioblastoma data, we find a high concordance between our selected prognostic genes and genes with known associations with glioblastoma. In addition, our model discovers several relevant cross-platform interactions such as copy number variation associated gene dosing and epigenetic regulation through promoter methylation. On simulated data, we show that our proposed method successfully incorporates interactions within and between genomic platforms to aid accurate prediction and variable selection. Our methods perform best when principal components are used to define the latent features. PMID:26146492
Thorsen, Jonathan; Brejnrod, Asker; Mortensen, Martin; Rasmussen, Morten A; Stokholm, Jakob; Al-Soud, Waleed Abu; Sørensen, Søren; Bisgaard, Hans; Waage, Johannes
2016-11-25
There is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease. A common approach for examining bacterial communities is high-throughput sequencing of 16S rRNA gene hypervariable regions, aggregating sequence-similar amplicons into operational taxonomic units (OTUs). Strategies for detecting differential relative abundance of OTUs between sample conditions include classical statistical approaches as well as a plethora of newer methods, many borrowing from the related field of RNA-seq analysis. This effort is complicated by unique data characteristics, including sparsity, sequencing depth variation, and nonconformity of read counts to theoretical distributions, which is often exacerbated by exploratory and/or unbalanced study designs. Here, we assess the robustness of available methods for (1) inference in differential relative abundance analysis and (2) beta-diversity-based sample separation, using a rigorous benchmarking framework based on large clinical 16S microbiome datasets from different sources. Running more than 380,000 full differential relative abundance tests on real datasets with permuted case/control assignments and in silico-spiked OTUs, we identify large differences in method performance on a range of parameters, including false positive rates, sensitivity to sparsity and case/control balances, and spike-in retrieval rate. In large datasets, methods with the highest false positive rates also tend to have the best detection power. For beta-diversity-based sample separation, we show that library size normalization has very little effect and that the distance metric is the most important factor in terms of separation power. Our results, generalizable to datasets from different sequencing platforms, demonstrate how the choice of method considerably affects analysis outcome. Here, we give recommendations for tools that exhibit low false positive rates, have good retrieval power across effect sizes and case/control proportions, and have low sparsity bias. Result output from some commonly used methods should be interpreted with caution. We provide an easily extensible framework for benchmarking of new methods and future microbiome datasets.
BEASTling: A software tool for linguistic phylogenetics using BEAST 2
Forkel, Robert; Kaiping, Gereon A.; Atkinson, Quentin D.
2017-01-01
We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files used by BEAST to specify analyses. By taking advantage of Creative Commons-licensed data from the Glottolog language catalog, BEASTling allows the user to conveniently filter datasets using names for recognised language families, to impose monophyly constraints so that inferred language trees are backward compatible with Glottolog classifications, or to assign geographic location data to languages for phylogeographic analyses. Support for the emerging cross-linguistic linked data format (CLDF) permits easy incorporation of data published in cross-linguistic linked databases into analyses. BEASTling is intended to make the power of Bayesian analysis more accessible to historical linguists without strong programming backgrounds, in the hopes of encouraging communication and collaboration between those developing computational models of language evolution (who are typically not linguists) and relevant domain experts. PMID:28796784
BEASTling: A software tool for linguistic phylogenetics using BEAST 2.
Maurits, Luke; Forkel, Robert; Kaiping, Gereon A; Atkinson, Quentin D
2017-01-01
We present a new open source software tool called BEASTling, designed to simplify the preparation of Bayesian phylogenetic analyses of linguistic data using the BEAST 2 platform. BEASTling transforms comparatively short and human-readable configuration files into the XML files used by BEAST to specify analyses. By taking advantage of Creative Commons-licensed data from the Glottolog language catalog, BEASTling allows the user to conveniently filter datasets using names for recognised language families, to impose monophyly constraints so that inferred language trees are backward compatible with Glottolog classifications, or to assign geographic location data to languages for phylogeographic analyses. Support for the emerging cross-linguistic linked data format (CLDF) permits easy incorporation of data published in cross-linguistic linked databases into analyses. BEASTling is intended to make the power of Bayesian analysis more accessible to historical linguists without strong programming backgrounds, in the hopes of encouraging communication and collaboration between those developing computational models of language evolution (who are typically not linguists) and relevant domain experts.
NASA Technical Reports Server (NTRS)
Mast, F. W.; Newby, N. J.; Young, L. R.
2002-01-01
The effects of cross-coupled stimuli on the semicircular canals are shown to be influenced by the position of the subject's head with respect to gravity and the axis of rotation, but not by the subject's head position relative to the trunk. Seventeen healthy subjects made head yaw movements out of the horizontal plane while lying on a horizontal platform (MIT short radius centrifuge) rotating at 23 rpm about an earth-vertical axis. The subjects reported the magnitude and duration of the illusory pitch or roll sensations elicited by the cross-coupled rotational stimuli acting on the semicircular canals. The results suggest an influence of head position relative to gravity. The magnitude estimation is higher and the sensation decays more slowly when the head's final position is toward nose-up (gravity in the subject's head x-z-plane) compared to when the head is turned toward the side (gravity in the subject's head y-z-plane). The results are discussed with respect to artificial gravity in space and the possible role of pre-adaptation to cross-coupled angular accelerations on earth.
Exploring Plant Co-Expression and Gene-Gene Interactions with CORNET 3.0.
Van Bel, Michiel; Coppens, Frederik
2017-01-01
Selecting and filtering a reference expression and interaction dataset when studying specific pathways and regulatory interactions can be a very time-consuming and error-prone task. In order to reduce the duplicated efforts required to amass such datasets, we have created the CORNET (CORrelation NETworks) platform which allows for easy access to a wide variety of data types: coexpression data, protein-protein interactions, regulatory interactions, and functional annotations. The CORNET platform outputs its results in either text format or through the Cytoscape framework, which is automatically launched by the CORNET website.CORNET 3.0 is the third iteration of the web platform designed for the user exploration of the coexpression space of plant genomes, with a focus on the model species Arabidopsis thaliana. Here we describe the platform: the tools, data, and best practices when using the platform. We indicate how the platform can be used to infer networks from a set of input genes, such as upregulated genes from an expression experiment. By exploring the network, new target and regulator genes can be discovered, allowing for follow-up experiments and more in-depth study. We also indicate how to avoid common pitfalls when evaluating the networks and how to avoid over interpretation of the results.All CORNET versions are available at http://bioinformatics.psb.ugent.be/cornet/ .
McAllister, Shane C; Schleiss, Mark R; Arbefeville, Sophie; Steiner, Marie E; Hanson, Ryan S; Pollock, Catherine; Ferrieri, Patricia
2015-01-01
Enterovirus D68 (EV-D68) is an emerging virus known to cause sporadic disease and occasional epidemics of severe lower respiratory tract infection. However, the true prevalence of infection with EV-D68 is unknown, due in part to the lack of a rapid and specific nucleic acid amplification test as well as the infrequency with which respiratory samples are analyzed by enterovirus surveillance programs. During the 2014 EV-D68 epidemic in the United States, we noted an increased frequency of "low-positive" results for human rhinovirus (HRV) detected in respiratory tract samples using the GenMark Diagnostics eSensor respiratory viral panel, a multiplex PCR assay able to detect 14 known respiratory viruses but not enteroviruses. We simultaneously noted markedly increased admissions to our Pediatric Intensive Care Unit for severe lower respiratory tract infections in patients both with and without a history of reactive airway disease. Accordingly, we hypothesized that these "low-positive" RVP results were due to EV-D68 rather than rhinovirus infection. Sequencing of the picornavirus 5' untranslated region (5'-UTR) of 49 samples positive for HRV by the GenMark RVP revealed that 33 (67.3%) were in fact EV-D68. Notably, the mean intensity of the HRV RVP result was significantly lower in the sequence-identified EV-D68 samples (20.3 nA) compared to HRV (129.7 nA). Using a cut-off of 40 nA for the differentiation of EV-D68 from HRV resulted in 94% sensitivity and 88% specificity. The robust diagnostic characteristics of our data suggest that the cross-reactivity of EV-D68 and HRV on the GenMark Diagnostics eSensor RVP platform may be an important factor to consider in making accurate molecular diagnosis of EV-D68 at institutions utilizing this system or other molecular respiratory platforms that may also cross-react.
PR-PR: Cross-Platform Laboratory Automation System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Linshiz, G; Stawski, N; Goyal, G
To enable protocol standardization, sharing, and efficient implementation across laboratory automation platforms, we have further developed the PR-PR open-source high-level biology-friendly robot programming language as a cross-platform laboratory automation system. Beyond liquid-handling robotics, PR-PR now supports microfluidic and microscopy platforms, as well as protocol translation into human languages, such as English. While the same set of basic PR-PR commands and features are available for each supported platform, the underlying optimization and translation modules vary from platform to platform. Here, we describe these further developments to PR-PR, and demonstrate the experimental implementation and validation of PR-PR protocols for combinatorial modified Goldenmore » Gate DNA assembly across liquid-handling robotic, microfluidic, and manual platforms. To further test PR-PR cross-platform performance, we then implement and assess PR-PR protocols for Kunkel DNA mutagenesis and hierarchical Gibson DNA assembly for microfluidic and manual platforms.« less
PR-PR: cross-platform laboratory automation system.
Linshiz, Gregory; Stawski, Nina; Goyal, Garima; Bi, Changhao; Poust, Sean; Sharma, Monica; Mutalik, Vivek; Keasling, Jay D; Hillson, Nathan J
2014-08-15
To enable protocol standardization, sharing, and efficient implementation across laboratory automation platforms, we have further developed the PR-PR open-source high-level biology-friendly robot programming language as a cross-platform laboratory automation system. Beyond liquid-handling robotics, PR-PR now supports microfluidic and microscopy platforms, as well as protocol translation into human languages, such as English. While the same set of basic PR-PR commands and features are available for each supported platform, the underlying optimization and translation modules vary from platform to platform. Here, we describe these further developments to PR-PR, and demonstrate the experimental implementation and validation of PR-PR protocols for combinatorial modified Golden Gate DNA assembly across liquid-handling robotic, microfluidic, and manual platforms. To further test PR-PR cross-platform performance, we then implement and assess PR-PR protocols for Kunkel DNA mutagenesis and hierarchical Gibson DNA assembly for microfluidic and manual platforms.
Sargeant, Tobias; Laperrière, David; Ismail, Houssam; Boucher, Geneviève; Rozendaal, Marieke; Lavallée, Vincent-Philippe; Ashton-Beaucage, Dariel; Wilhelm, Brian; Hébert, Josée; Hilton, Douglas J.
2017-01-01
Abstract Genome-wide transcriptome profiling has enabled non-supervised classification of tumours, revealing different sub-groups characterized by specific gene expression features. However, the biological significance of these subtypes remains for the most part unclear. We describe herein an interactive platform, Minimum Spanning Trees Inferred Clustering (MiSTIC), that integrates the direct visualization and comparison of the gene correlation structure between datasets, the analysis of the molecular causes underlying co-variations in gene expression in cancer samples, and the clinical annotation of tumour sets defined by the combined expression of selected biomarkers. We have used MiSTIC to highlight the roles of specific transcription factors in breast cancer subtype specification, to compare the aspects of tumour heterogeneity targeted by different prognostic signatures, and to highlight biomarker interactions in AML. A version of MiSTIC preloaded with datasets described herein can be accessed through a public web server (http://mistic.iric.ca); in addition, the MiSTIC software package can be obtained (github.com/iric-soft/MiSTIC) for local use with personalized datasets. PMID:28472340
Data Type Registry - Cross Road Between Catalogs, Data And Semantics
NASA Astrophysics Data System (ADS)
Richard, S. M.; Zaslavsky, I.; Bristol, S.
2017-12-01
As more data become accessible online, the opportunity is increasing to improve search for information within datasets and for automating some levels of data integration. A prerequisite for these advances is indexing the kinds of information that are present in datasets and providing machine actionable descriptions of data structures. We are exploring approaches to enabling these capabilities in the EarthCube DigitalCrust and Data Discovery Hub Building Block projects, building on the Data type registry (DTR) workgroup activity in the Research Data Alliance. We are prototyping a registry implementation using the CNRI Cordra platform and API to enable 'deep registration' of datasets for building hydrogeologic models of the Earth's Crust, and executing complex science scenarios for river chemistry and coral bleaching data. These use cases require the ability to respond to queries such as: What are properties of Entity X; What entities include property Y (or L, M, N…), and What DataTypes are about Entity X and include property Y. Development of the registry to enable these capabilities requires more in-depth metadata than is commonly available, so we are also exploring approaches to analyzing simple tabular data to automate recognition of entities and properties, and assist users with establishing semantic mappings to data integration vocabularies. This poster will review the current capabilities and implementation of a data type registry.
When drug discovery meets web search: Learning to Rank for ligand-based virtual screening.
Zhang, Wei; Ji, Lijuan; Chen, Yanan; Tang, Kailin; Wang, Haiping; Zhu, Ruixin; Jia, Wei; Cao, Zhiwei; Liu, Qi
2015-01-01
The rapid increase in the emergence of novel chemical substances presents a substantial demands for more sophisticated computational methodologies for drug discovery. In this study, the idea of Learning to Rank in web search was presented in drug virtual screening, which has the following unique capabilities of 1). Applicable of identifying compounds on novel targets when there is not enough training data available for these targets, and 2). Integration of heterogeneous data when compound affinities are measured in different platforms. A standard pipeline was designed to carry out Learning to Rank in virtual screening. Six Learning to Rank algorithms were investigated based on two public datasets collected from Binding Database and the newly-published Community Structure-Activity Resource benchmark dataset. The results have demonstrated that Learning to rank is an efficient computational strategy for drug virtual screening, particularly due to its novel use in cross-target virtual screening and heterogeneous data integration. To the best of our knowledge, we have introduced here the first application of Learning to Rank in virtual screening. The experiment workflow and algorithm assessment designed in this study will provide a standard protocol for other similar studies. All the datasets as well as the implementations of Learning to Rank algorithms are available at http://www.tongji.edu.cn/~qiliu/lor_vs.html. Graphical AbstractThe analogy between web search and ligand-based drug discovery.
Building a better search engine for earth science data
NASA Astrophysics Data System (ADS)
Armstrong, E. M.; Yang, C. P.; Moroni, D. F.; McGibbney, L. J.; Jiang, Y.; Huang, T.; Greguska, F. R., III; Li, Y.; Finch, C. J.
2017-12-01
Free text data searching of earth science datasets has been implemented with varying degrees of success and completeness across the spectrum of the 12 NASA earth sciences data centers. At the JPL Physical Oceanography Distributed Active Archive Center (PO.DAAC) the search engine has been developed around the Solr/Lucene platform. Others have chosen other popular enterprise search platforms like Elasticsearch. Regardless, the default implementations of these search engines leveraging factors such as dataset popularity, term frequency and inverse document term frequency do not fully meet the needs of precise relevancy and ranking of earth science search results. For the PO.DAAC, this shortcoming has been identified for several years by its external User Working Group that has assigned several recommendations to improve the relevancy and discoverability of datasets related to remotely sensed sea surface temperature, ocean wind, waves, salinity, height and gravity that comprise a total count of over 500 public availability datasets. Recently, the PO.DAAC has teamed with an effort led by George Mason University to improve the improve the search and relevancy ranking of oceanographic data via a simple search interface and powerful backend services called MUDROD (Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to Improve Data Discovery) funded by the NASA AIST program. MUDROD has mined and utilized the combination of PO.DAAC earth science dataset metadata, usage metrics, and user feedback and search history to objectively extract relevance for improved data discovery and access. In addition to improved dataset relevance and ranking, the MUDROD search engine also returns recommendations to related datasets and related user queries. This presentation will report on use cases that drove the architecture and development, and the success metrics and improvements on search precision and recall that MUDROD has demonstrated over the existing PO.DAAC search interfaces.
NASA Astrophysics Data System (ADS)
Shute, J.; Carriere, L.; Duffy, D.; Hoy, E.; Peters, J.; Shen, Y.; Kirschbaum, D.
2017-12-01
The NASA Center for Climate Simulation (NCCS) at the Goddard Space Flight Center is building and maintaining an Enterprise GIS capability for its stakeholders, to include NASA scientists, industry partners, and the public. This platform is powered by three GIS subsystems operating in a highly-available, virtualized environment: 1) the Spatial Analytics Platform is the primary NCCS GIS and provides users discoverability of the vast DigitalGlobe/NGA raster assets within the NCCS environment; 2) the Disaster Mapping Platform provides mapping and analytics services to NASA's Disaster Response Group; and 3) the internal (Advanced Data Analytics Platform/ADAPT) enterprise GIS provides users with the full suite of Esri and open source GIS software applications and services. All systems benefit from NCCS's cutting edge infrastructure, to include an InfiniBand network for high speed data transfers; a mixed/heterogeneous environment featuring seamless sharing of information between Linux and Windows subsystems; and in-depth system monitoring and warning systems. Due to its co-location with the NCCS Discover High Performance Computing (HPC) environment and the Advanced Data Analytics Platform (ADAPT), the GIS platform has direct access to several large NCCS datasets including DigitalGlobe/NGA, Landsat, MERRA, and MERRA2. Additionally, the NCCS ArcGIS Desktop Windows virtual machines utilize existing NetCDF and OPeNDAP assets for visualization, modelling, and analysis - thus eliminating the need for data duplication. With the advent of this platform, Earth scientists have full access to vast data repositories and the industry-leading tools required for successful management and analysis of these multi-petabyte, global datasets. The full system architecture and integration with scientific datasets will be presented. Additionally, key applications and scientific analyses will be explained, to include the NASA Global Landslide Catalog (GLC) Reporter crowdsourcing application, the NASA GLC Viewer discovery and analysis tool, the DigitalGlobe/NGA Data Discovery Tool, the NASA Disaster Response Group Mapping Platform (https://maps.disasters.nasa.gov), and support for NASA's Arctic - Boreal Vulnerability Experiment (ABoVE).
IN and CCN Measurements on RV Polarstern and Cape Verde
NASA Astrophysics Data System (ADS)
Welti, André; Herenz, Paul; Henning, Silvia; Stratmann, Frank
2016-04-01
Two field campaigns, one situated on RV Polarstern (Oct. - Dec. 2015) and one on the Cape Verde islands (Jan. - Feb. 2016) measuring ice nuclei (IN) and cloud condensation nuclei (CCN) concentrations as a function of supersaturation and temperature are presented. The Polarstern cruise from Bremerhaven to Cape Town yields a cross section of IN and CCN concentrations from 54°N to 35°S and passes the Cape Verde Islands at 15°N. Measurements were conducted using the commercial CCNC and SPIN instruments from DMT. During both campaigns, a comprehensive set of aerosol characterization data including size distribution, optical properties and chemical information were measured in parallel. The ship based measurements provide a measure of variability in IN/CCN concentration with geographic position. As an example a clear influence on IN and CCN number concentration of the Saharan desert dust outflow between the Canary Islands and Cape Verde or the continental aerosol from Europe and South Africa was observed. The measurements on Cape Verde provide information on the temporal variability at a fixed position varying between clean marine and dust influenced conditions. Both datasets are related to auxiliary data of aerosol size distribution and chemical composition. The datasets are used to distinguish the influence of local sources and background concentration of IN/CCN. By combining of the geographically fix measurements with the geographical cross section, typical ranges of IN and CCN concentration are derived. The datasets will be part of the BACCHUS database thereby providing valuable input for future climate modeling activities.
TDat: An Efficient Platform for Processing Petabyte-Scale Whole-Brain Volumetric Images.
Li, Yuxin; Gong, Hui; Yang, Xiaoquan; Yuan, Jing; Jiang, Tao; Li, Xiangning; Sun, Qingtao; Zhu, Dan; Wang, Zhenyu; Luo, Qingming; Li, Anan
2017-01-01
Three-dimensional imaging of whole mammalian brains at single-neuron resolution has generated terabyte (TB)- and even petabyte (PB)-sized datasets. Due to their size, processing these massive image datasets can be hindered by the computer hardware and software typically found in biological laboratories. To fill this gap, we have developed an efficient platform named TDat, which adopts a novel data reformatting strategy by reading cuboid data and employing parallel computing. In data reformatting, TDat is more efficient than any other software. In data accessing, we adopted parallelization to fully explore the capability for data transmission in computers. We applied TDat in large-volume data rigid registration and neuron tracing in whole-brain data with single-neuron resolution, which has never been demonstrated in other studies. We also showed its compatibility with various computing platforms, image processing software and imaging systems.
NASA Astrophysics Data System (ADS)
Saxena, Nishank; Hofmann, Ronny; Alpak, Faruk O.; Berg, Steffen; Dietderich, Jesse; Agarwal, Umang; Tandon, Kunj; Hunter, Sander; Freeman, Justin; Wilson, Ove Bjorn
2017-11-01
We generate a novel reference dataset to quantify the impact of numerical solvers, boundary conditions, and simulation platforms. We consider a variety of microstructures ranging from idealized pipes to digital rocks. Pore throats of the digital rocks considered are large enough to be well resolved with state-of-the-art micro-computerized tomography technology. Permeability is computed using multiple numerical engines, 12 in total, including, Lattice-Boltzmann, computational fluid dynamics, voxel based, fast semi-analytical, and known empirical models. Thus, we provide a measure of uncertainty associated with flow computations of digital media. Moreover, the reference and standards dataset generated is the first of its kind and can be used to test and improve new fluid flow algorithms. We find that there is an overall good agreement between solvers for idealized cross-section shape pipes. As expected, the disagreement increases with increase in complexity of the pore space. Numerical solutions for pipes with sinusoidal variation of cross section show larger variability compared to pipes of constant cross-section shapes. We notice relatively larger variability in computed permeability of digital rocks with coefficient of variation (of up to 25%) in computed values between various solvers. Still, these differences are small given other subsurface uncertainties. The observed differences between solvers can be attributed to several causes including, differences in boundary conditions, numerical convergence criteria, and parameterization of fundamental physics equations. Solvers that perform additional meshing of irregular pore shapes require an additional step in practical workflows which involves skill and can introduce further uncertainty. Computation times for digital rocks vary from minutes to several days depending on the algorithm and available computational resources. We find that more stringent convergence criteria can improve solver accuracy but at the expense of longer computation time.
Ng, Kenney; Ghoting, Amol; Steinhubl, Steven R.; Stewart, Walter F.; Malin, Bradley; Sun, Jimeng
2014-01-01
Objective Healthcare analytics research increasingly involves the construction of predictive models for disease targets across varying patient cohorts using electronic health records (EHRs). To facilitate this process, it is critical to support a pipeline of tasks: 1) cohort construction, 2) feature construction, 3) cross-validation, 4) feature selection, and 5) classification. To develop an appropriate model, it is necessary to compare and refine models derived from a diversity of cohorts, patient-specific features, and statistical frameworks. The goal of this work is to develop and evaluate a predictive modeling platform that can be used to simplify and expedite this process for health data. Methods To support this goal, we developed a PARAllel predictive MOdeling (PARAMO) platform which 1) constructs a dependency graph of tasks from specifications of predictive modeling pipelines, 2) schedules the tasks in a topological ordering of the graph, and 3) executes those tasks in parallel. We implemented this platform using Map-Reduce to enable independent tasks to run in parallel in a cluster computing environment. Different task scheduling preferences are also supported. Results We assess the performance of PARAMO on various workloads using three datasets derived from the EHR systems in place at Geisinger Health System and Vanderbilt University Medical Center and an anonymous longitudinal claims database. We demonstrate significant gains in computational efficiency against a standard approach. In particular, PARAMO can build 800 different models on a 300,000 patient data set in 3 hours in parallel compared to 9 days if running sequentially. Conclusion This work demonstrates that an efficient parallel predictive modeling platform can be developed for EHR data. This platform can facilitate large-scale modeling endeavors and speed-up the research workflow and reuse of health information. This platform is only a first step and provides the foundation for our ultimate goal of building analytic pipelines that are specialized for health data researchers. PMID:24370496
Ng, Kenney; Ghoting, Amol; Steinhubl, Steven R; Stewart, Walter F; Malin, Bradley; Sun, Jimeng
2014-04-01
Healthcare analytics research increasingly involves the construction of predictive models for disease targets across varying patient cohorts using electronic health records (EHRs). To facilitate this process, it is critical to support a pipeline of tasks: (1) cohort construction, (2) feature construction, (3) cross-validation, (4) feature selection, and (5) classification. To develop an appropriate model, it is necessary to compare and refine models derived from a diversity of cohorts, patient-specific features, and statistical frameworks. The goal of this work is to develop and evaluate a predictive modeling platform that can be used to simplify and expedite this process for health data. To support this goal, we developed a PARAllel predictive MOdeling (PARAMO) platform which (1) constructs a dependency graph of tasks from specifications of predictive modeling pipelines, (2) schedules the tasks in a topological ordering of the graph, and (3) executes those tasks in parallel. We implemented this platform using Map-Reduce to enable independent tasks to run in parallel in a cluster computing environment. Different task scheduling preferences are also supported. We assess the performance of PARAMO on various workloads using three datasets derived from the EHR systems in place at Geisinger Health System and Vanderbilt University Medical Center and an anonymous longitudinal claims database. We demonstrate significant gains in computational efficiency against a standard approach. In particular, PARAMO can build 800 different models on a 300,000 patient data set in 3h in parallel compared to 9days if running sequentially. This work demonstrates that an efficient parallel predictive modeling platform can be developed for EHR data. This platform can facilitate large-scale modeling endeavors and speed-up the research workflow and reuse of health information. This platform is only a first step and provides the foundation for our ultimate goal of building analytic pipelines that are specialized for health data researchers. Copyright © 2013 Elsevier Inc. All rights reserved.
IMAGE EXPLORER: Astronomical Image Analysis on an HTML5-based Web Application
NASA Astrophysics Data System (ADS)
Gopu, A.; Hayashi, S.; Young, M. D.
2014-05-01
Large datasets produced by recent astronomical imagers cause the traditional paradigm for basic visual analysis - typically downloading one's entire image dataset and using desktop clients like DS9, Aladin, etc. - to not scale, despite advances in desktop computing power and storage. This paper describes Image Explorer, a web framework that offers several of the basic visualization and analysis functionality commonly provided by tools like DS9, on any HTML5 capable web browser on various platforms. It uses a combination of the modern HTML5 canvas, JavaScript, and several layers of lossless PNG tiles producted from the FITS image data. Astronomers are able to rapidly and simultaneously open up several images on their web-browser, adjust the intensity min/max cutoff or its scaling function, and zoom level, apply color-maps, view position and FITS header information, execute typically used data reduction codes on the corresponding FITS data using the FRIAA framework, and overlay tiles for source catalog objects, etc.
BrainBrowser: distributed, web-based neurological data visualization.
Sherif, Tarek; Kassis, Nicolas; Rousseau, Marc-Étienne; Adalat, Reza; Evans, Alan C
2014-01-01
Recent years have seen massive, distributed datasets become the norm in neuroimaging research, and the methodologies used to analyze them have, in response, become more collaborative and exploratory. Tools and infrastructure are continuously being developed and deployed to facilitate research in this context: grid computation platforms to process the data, distributed data stores to house and share them, high-speed networks to move them around and collaborative, often web-based, platforms to provide access to and sometimes manage the entire system. BrainBrowser is a lightweight, high-performance JavaScript visualization library built to provide easy-to-use, powerful, on-demand visualization of remote datasets in this new research environment. BrainBrowser leverages modern web technologies, such as WebGL, HTML5 and Web Workers, to visualize 3D surface and volumetric neuroimaging data in any modern web browser without requiring any browser plugins. It is thus trivial to integrate BrainBrowser into any web-based platform. BrainBrowser is simple enough to produce a basic web-based visualization in a few lines of code, while at the same time being robust enough to create full-featured visualization applications. BrainBrowser can dynamically load the data required for a given visualization, so no network bandwidth needs to be waisted on data that will not be used. BrainBrowser's integration into the standardized web platform also allows users to consider using 3D data visualization in novel ways, such as for data distribution, data sharing and dynamic online publications. BrainBrowser is already being used in two major online platforms, CBRAIN and LORIS, and has been used to make the 1TB MACACC dataset openly accessible.
BrainBrowser: distributed, web-based neurological data visualization
Sherif, Tarek; Kassis, Nicolas; Rousseau, Marc-Étienne; Adalat, Reza; Evans, Alan C.
2015-01-01
Recent years have seen massive, distributed datasets become the norm in neuroimaging research, and the methodologies used to analyze them have, in response, become more collaborative and exploratory. Tools and infrastructure are continuously being developed and deployed to facilitate research in this context: grid computation platforms to process the data, distributed data stores to house and share them, high-speed networks to move them around and collaborative, often web-based, platforms to provide access to and sometimes manage the entire system. BrainBrowser is a lightweight, high-performance JavaScript visualization library built to provide easy-to-use, powerful, on-demand visualization of remote datasets in this new research environment. BrainBrowser leverages modern web technologies, such as WebGL, HTML5 and Web Workers, to visualize 3D surface and volumetric neuroimaging data in any modern web browser without requiring any browser plugins. It is thus trivial to integrate BrainBrowser into any web-based platform. BrainBrowser is simple enough to produce a basic web-based visualization in a few lines of code, while at the same time being robust enough to create full-featured visualization applications. BrainBrowser can dynamically load the data required for a given visualization, so no network bandwidth needs to be waisted on data that will not be used. BrainBrowser's integration into the standardized web platform also allows users to consider using 3D data visualization in novel ways, such as for data distribution, data sharing and dynamic online publications. BrainBrowser is already being used in two major online platforms, CBRAIN and LORIS, and has been used to make the 1TB MACACC dataset openly accessible. PMID:25628562
MetaCoMET: a web platform for discovery and visualization of the core microbiome
USDA-ARS?s Scientific Manuscript database
A key component of the analysis of microbiome datasets is the identification of OTUs shared between multiple experimental conditions, commonly referred to as the core microbiome. Results: We present a web platform named MetaCoMET that enables the discovery and visualization of the core microbiome an...
OpenNEX, a private-public partnership in support of the national climate assessment
NASA Astrophysics Data System (ADS)
Nemani, R. R.; Wang, W.; Michaelis, A.; Votava, P.; Ganguly, S.
2016-12-01
The NASA Earth Exchange (NEX) is a collaborative computing platform that has been developed with the objective of bringing scientists together with the software tools, massive global datasets, and supercomputing resources necessary to accelerate research in Earth systems science and global change. NEX is funded as an enabling tool for sustaining the national climate assessment. Over the past five years, researchers have used the NEX platform and produced a number of data sets highly relevant to the National Climate Assessment. These include high-resolution climate projections using different downscaling techniques and trends in historical climate from satellite data. To enable a broader community in exploiting the above datasets, the NEX team partnered with public cloud providers to create the OpenNEX platform. OpenNEX provides ready access to NEX data holdings on a number of public cloud platforms along with pertinent analysis tools and workflows in the form of Machine Images and Docker Containers, lectures and tutorials by experts. We will showcase some of the applications of OpenNEX data and tools by the community on Amazon Web Services, Google Cloud and the NEX Sandbox.
Verification of target motion effects on SAR imagery using the Gotcha GMTI challenge dataset
NASA Astrophysics Data System (ADS)
Hack, Dan E.; Saville, Michael A.
2010-04-01
This paper investigates the relationship between a ground moving target's kinematic state and its SAR image. While effects such as cross-range offset, defocus, and smearing appear well understood, their derivations in the literature typically employ simplifications of the radar/target geometry and assume point scattering targets. This study adopts a geometrical model for understanding target motion effects in SAR imagery, termed the target migration path, and focuses on experimental verification of predicted motion effects using both simulated and empirical datasets based on the Gotcha GMTI challenge dataset. Specifically, moving target imagery is generated from three data sources: first, simulated phase history for a moving point target; second, simulated phase history for a moving vehicle derived from a simulated Mazda MPV X-band signature; and third, empirical phase history from the Gotcha GMTI challenge dataset. Both simulated target trajectories match the truth GPS target position history from the Gotcha GMTI challenge dataset, allowing direct comparison between all three imagery sets and the predicted target migration path. This paper concludes with a discussion of the parallels between the target migration path and the measurement model within a Kalman filtering framework, followed by conclusions.
NASA Astrophysics Data System (ADS)
Shrestha, S. R.; Collow, T. W.; Rose, B.
2016-12-01
Scientific datasets are generated from various sources and platforms but they are typically produced either by earth observation systems or by modelling systems. These are widely used for monitoring, simulating, or analyzing measurements that are associated with physical, chemical, and biological phenomena over the ocean, atmosphere, or land. A significant subset of scientific datasets stores values directly as rasters or in a form that can be rasterized. This is where a value exists at every cell in a regular grid spanning the spatial extent of the dataset. Government agencies like NOAA, NASA, EPA, USGS produces large volumes of near real-time, forecast, and historical data that drives climatological and meteorological studies, and underpins operations ranging from weather prediction to sea ice loss. Modern science is computationally intensive because of the availability of an enormous amount of scientific data, the adoption of data-driven analysis, and the need to share these dataset and research results with the public. ArcGIS as a platform is sophisticated and capable of handling such complex domain. We'll discuss constructs and capabilities applicable to multidimensional gridded data that can be conceptualized as a multivariate space-time cube. Building on the concept of a two-dimensional raster, a typical multidimensional raster dataset could contain several "slices" within the same spatial extent. We will share a case from the NOAA Climate Forecast Systems Reanalysis (CFSR) multidimensional data as an example of how large collections of rasters can be efficiently organized and managed through a data model within a geodatabase called "Mosaic dataset" and dynamically transformed and analyzed using raster functions. A raster function is a lightweight, raster-valued transformation defined over a mixed set of raster and scalar input. That means, just like any tool, you can provide a raster function with input parameters. It enables dynamic processing of only the data that's being displayed on the screen or requested by an application. We will present the dynamic processing and analysis of CFSR data using the chains of raster function and share it as dynamic multidimensional image service. This workflow and capabilities can be easily applied to any scientific data formats that are supported in mosaic dataset.
Coletta, Alain; Molter, Colin; Duqué, Robin; Steenhoff, David; Taminau, Jonatan; de Schaetzen, Virginie; Meganck, Stijn; Lazar, Cosmin; Venet, David; Detours, Vincent; Nowé, Ann; Bersini, Hugues; Weiss Solís, David Y
2012-11-18
Genomics datasets are increasingly useful for gaining biomedical insights, with adoption in the clinic underway. However, multiple hurdles related to data management stand in the way of their efficient large-scale utilization. The solution proposed is a web-based data storage hub. Having clear focus, flexibility and adaptability, InSilico DB seamlessly connects genomics dataset repositories to state-of-the-art and free GUI and command-line data analysis tools. The InSilico DB platform is a powerful collaborative environment, with advanced capabilities for biocuration, dataset sharing, and dataset subsetting and combination. InSilico DB is available from https://insilicodb.org.
Cadastral Database Positional Accuracy Improvement
NASA Astrophysics Data System (ADS)
Hashim, N. M.; Omar, A. H.; Ramli, S. N. M.; Omar, K. M.; Din, N.
2017-10-01
Positional Accuracy Improvement (PAI) is the refining process of the geometry feature in a geospatial dataset to improve its actual position. This actual position relates to the absolute position in specific coordinate system and the relation to the neighborhood features. With the growth of spatial based technology especially Geographical Information System (GIS) and Global Navigation Satellite System (GNSS), the PAI campaign is inevitable especially to the legacy cadastral database. Integration of legacy dataset and higher accuracy dataset like GNSS observation is a potential solution for improving the legacy dataset. However, by merely integrating both datasets will lead to a distortion of the relative geometry. The improved dataset should be further treated to minimize inherent errors and fitting to the new accurate dataset. The main focus of this study is to describe a method of angular based Least Square Adjustment (LSA) for PAI process of legacy dataset. The existing high accuracy dataset known as National Digital Cadastral Database (NDCDB) is then used as bench mark to validate the results. It was found that the propose technique is highly possible for positional accuracy improvement of legacy spatial datasets.
Student Activity and Profile Datasets from an Online Video-Based Collaborative Learning Experience
ERIC Educational Resources Information Center
Martín, Estefanía; Gértrudix, Manuel; Urquiza-Fuentes, Jaime; Haya, Pablo A.
2015-01-01
This paper describes two datasets extracted from a video-based educational experience using a social and collaborative platform. The length of the trial was 3 months. It involved 111 students from two different courses. Twenty-nine came from Computer Engineering (CE) course and 82 from Media and Communication (M&C) course. They were organised…
Stabilisation problem in biaxial platform
NASA Astrophysics Data System (ADS)
Lindner, Tymoteusz; Rybarczyk, Dominik; Wyrwał, Daniel
2016-12-01
The article describes investigation of rolling ball stabilization problem on a biaxial platform. The aim of the control system proposed here is to stabilize ball moving on a plane in equilibrium point. The authors proposed a control algorithm based on cascade PID and they compared it with another control method. The article shows the results of the accuracy of ball stabilization and influence of applied filter on the signal waveform. The application used to detect the ball position measured by digital camera has been written using a cross platform .Net wrapper to the OpenCV image processing library - EmguCV. The authors used the bipolar stepper motor with dedicated electronic controller. The data between the computer and the designed controller are sent with use of the RS232 standard. The control stand is based on ATmega series microcontroller.
DIVE: A Graph-based Visual Analytics Framework for Big Data
Rysavy, Steven J.; Bromley, Dennis; Daggett, Valerie
2014-01-01
The need for data-centric scientific tools is growing; domains like biology, chemistry, and physics are increasingly adopting computational approaches. As a result, scientists must now deal with the challenges of big data. To address these challenges, we built a visual analytics platform named DIVE: Data Intensive Visualization Engine. DIVE is a data-agnostic, ontologically-expressive software framework capable of streaming large datasets at interactive speeds. Here we present the technical details of the DIVE platform, multiple usage examples, and a case study from the Dynameomics molecular dynamics project. We specifically highlight our novel contributions to structured data model manipulation and high-throughput streaming of large, structured datasets. PMID:24808197
Liao, Qiuyan; Cowling, Benjamin J; Lam, Wendy Wing Tak; Fielding, Richard
2011-06-01
Understanding population responses to influenza helps optimize public health interventions. Relevant theoretical frameworks remain nascent. To model associations between trust in information, perceived hygiene effectiveness, knowledge about the causes of influenza, perceived susceptibility and worry, and personal hygiene practices (PHPs) associated with influenza. Cross-sectional household telephone surveys on avian influenza A/H5N1 (2006) and pandemic influenza A/H1N1 (2009) gathered comparable data on trust in formal and informal sources of influenza information, influenza-related knowledge, perceived hygiene effectiveness, worry, perceived susceptibility, and PHPs. Exploratory factor analysis confirmed domain content while confirmatory factor analysis was used to evaluate the extracted factors. The hypothesized model, compiled from different theoretical frameworks, was optimized with structural equation modelling using the A/H5N1 data. The optimized model was then tested against the A/H1N1 dataset. The model was robust across datasets though corresponding path weights differed. Trust in formal information was positively associated with perceived hygiene effectiveness which was positively associated with PHPs in both datasets. Trust in formal information was positively associated with influenza worry in A/H5N1 data, and with knowledge of influenza cause in A/H1N1 data, both variables being positively associated with PHPs. Trust in informal information was positively associated with influenza worry in both datasets. Independent of information trust, perceived influenza susceptibility associated with influenza worry. Worry associated with PHPs in A/H5N1 data only. Knowledge of influenza cause and perceived PHP effectiveness were associated with PHPs. Improving trust in formal information should increase PHPs. Worry was significantly associated with PHPs in A/H5N1.
Samanipour, Saer; Baz-Lomba, Jose A; Alygizakis, Nikiforos A; Reid, Malcolm J; Thomaidis, Nikolaos S; Thomas, Kevin V
2017-06-09
LC-HR-QTOF-MS recently has become a commonly used approach for the analysis of complex samples. However, identification of small organic molecules in complex samples with the highest level of confidence is a challenging task. Here we report on the implementation of a two stage algorithm for LC-HR-QTOF-MS datasets. We compared the performances of the two stage algorithm, implemented via NIVA_MZ_Analyzer™, with two commonly used approaches (i.e. feature detection and XIC peak picking, implemented via UNIFI by Waters and TASQ by Bruker, respectively) for the suspect analysis of four influent wastewater samples. We first evaluated the cross platform compatibility of LC-HR-QTOF-MS datasets generated via instruments from two different manufacturers (i.e. Waters and Bruker). Our data showed that with an appropriate spectral weighting function the spectra recorded by the two tested instruments are comparable for our analytes. As a consequence, we were able to perform full spectral comparison between the data generated via the two studied instruments. Four extracts of wastewater influent were analyzed for 89 analytes, thus 356 detection cases. The analytes were divided into 158 detection cases of artificial suspect analytes (i.e. verified by target analysis) and 198 true suspects. The two stage algorithm resulted in a zero rate of false positive detection, based on the artificial suspect analytes while producing a rate of false negative detection of 0.12. For the conventional approaches, the rates of false positive detection varied between 0.06 for UNIFI and 0.15 for TASQ. The rates of false negative detection for these methods ranged between 0.07 for TASQ and 0.09 for UNIFI. The effect of background signal complexity on the two stage algorithm was evaluated through the generation of a synthetic signal. We further discuss the boundaries of applicability of the two stage algorithm. The importance of background knowledge and experience in evaluating the reliability of results during the suspect screening was evaluated. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Liu, Meng-Wei; Chang, Hao-Jung; Lee, Shu-sheng; Lee, Chih-Kung
2016-03-01
Tuberculosis is a highly contagious disease such that global latent patient can be as high as one third of the world population. Currently, latent tuberculosis was diagnosed by stimulating the T cells to produce the biomarker of tuberculosis, i.e., interferon-γ. In this paper, we developed a paraboloidal mirror enabled surface plasmon resonance (SPR) interferometer that has the potential to also integrate ellipsometry to analyze the antibody and antigen reactions. To examine the feasibility of developing a platform for cross calibrating the performance and detection limit of various bio-detection techniques, electrochemical impedance spectroscopy (EIS) method was also implemented onto a biochip that can be incorporated into this newly developed platform. The microfluidic channel of the biochip was functionalized by coating the interferon-γ antibody so as to enhance the detection specificity. To facilitate the processing steps needed for using the biochip to detect various antigen of vastly different concentrations, a kinetic mount was also developed to guarantee the biochip re-positioning accuracy whenever the biochip was removed and placed back for another round of detection. With EIS being utilized, SPR was also adopted to observe the real-time signals on the computer in order to analyze the success of each biochip processing steps such as functionalization, wash, etc. Finally, the EIS results and the optical signals obtained from the newly developed optical detection platform was cross-calibrated. Preliminary experimental results demonstrate the accuracy and performance of SPR and EIS measurement done at the newly integrated platform.
Shen, Yi; Wang, Zhanwei; Loo, Lenora WM; Ni, Yan; Jia, Wei; Fei, Peiwen; Risch, Harvey A.; Katsaros, Dionyssios; Yu, Herbert
2015-01-01
Long non-coding RNAs (lncRNAs) are a class of newly recognized DNA transcripts that have diverse biological activities. Dysregulation of lncRNAs may be involved in many pathogenic processes including cancer. Recently, we found an intergenic lncRNA, LINC00472, whose expression was correlated with breast cancer progression and patient survival. Our findings were consistent across multiple clinical datasets and supported by results from in vitro experiments. To evaluate further the role of LINC00472 in breast cancer, we used various online databases to investigate possible mechanisms that might affect LINC00472 expression in breast cancer. We also analyzed associations of LINC00472 with estrogen receptor, tumor grade, and molecular subtypes in additional online datasets generated by microarray platforms different from the one we investigated previously. We found that LINC00472 expression in breast cancer was regulated more possibly by promoter methylation than by the alteration of gene copy number. Analysis of additional datasets confirmed our previous findings of high expression of LINC00472 associated with ER-positive and low-grade tumors and favorable molecular subtypes. Finally, in nine datasets, we examined the association of LINC00472 expression with disease-free survival in patients with grade 2 tumors. Meta-analysis of the datasets showed that LINC00472 expression in breast tumors predicted the recurrence of breast cancer in patients with grade 2 tumors. In summary, our analyses confirm that LINC00472 is functionally a tumor suppressor, and that assessing its expression in breast tumors may have clinical implications in breast cancer management. PMID:26564482
Shen, Yi; Wang, Zhanwei; Loo, Lenora W M; Ni, Yan; Jia, Wei; Fei, Peiwen; Risch, Harvey A; Katsaros, Dionyssios; Yu, Herbert
2015-12-01
Long non-coding RNAs (lncRNAs) are a class of newly recognized DNA transcripts that have diverse biological activities. Dysregulation of lncRNAs may be involved in many pathogenic processes including cancer. Recently, we found an intergenic lncRNA, LINC00472, whose expression was correlated with breast cancer progression and patient survival. Our findings were consistent across multiple clinical datasets and supported by results from in vitro experiments. To evaluate further the role of LINC00472 in breast cancer, we used various online databases to investigate possible mechanisms that might affect LINC00472 expression in breast cancer. We also analyzed associations of LINC00472 with estrogen receptor, tumor grade, and molecular subtypes in additional online datasets generated by microarray platforms different from the one we investigated previously. We found that LINC00472 expression in breast cancer was regulated more possibly by promoter methylation than by the alteration of gene copy number. Analysis of additional datasets confirmed our previous findings of high expression of LINC00472 associated with ER-positive and low-grade tumors and favorable molecular subtypes. Finally, in nine datasets, we examined the association of LINC00472 expression with disease-free survival in patients with grade 2 tumors. Meta-analysis of the datasets showed that LINC00472 expression in breast tumors predicted the recurrence of breast cancer in patients with grade 2 tumors. In summary, our analyses confirm that LINC00472 is functionally a tumor suppressor, and that assessing its expression in breast tumors may have clinical implications in breast cancer management.
NASA Astrophysics Data System (ADS)
Kruithof, Maarten C.; Bouma, Henri; Fischer, Noëlle M.; Schutte, Klamer
2016-10-01
Object recognition is important to understand the content of video and allow flexible querying in a large number of cameras, especially for security applications. Recent benchmarks show that deep convolutional neural networks are excellent approaches for object recognition. This paper describes an approach of domain transfer, where features learned from a large annotated dataset are transferred to a target domain where less annotated examples are available as is typical for the security and defense domain. Many of these networks trained on natural images appear to learn features similar to Gabor filters and color blobs in the first layer. These first-layer features appear to be generic for many datasets and tasks while the last layer is specific. In this paper, we study the effect of copying all layers and fine-tuning a variable number. We performed an experiment with a Caffe-based network on 1000 ImageNet classes that are randomly divided in two equal subgroups for the transfer from one to the other. We copy all layers and vary the number of layers that is fine-tuned and the size of the target dataset. We performed additional experiments with the Keras platform on CIFAR-10 dataset to validate general applicability. We show with both platforms and both datasets that the accuracy on the target dataset improves when more target data is used. When the target dataset is large, it is beneficial to freeze only a few layers. For a large target dataset, the network without transfer learning performs better than the transfer network, especially if many layers are frozen. When the target dataset is small, it is beneficial to transfer (and freeze) many layers. For a small target dataset, the transfer network boosts generalization and it performs much better than the network without transfer learning. Learning time can be reduced by freezing many layers in a network.
Cross-platform learning: on the nature of children's learning from multiple media platforms.
Fisch, Shalom M
2013-01-01
It is increasingly common for an educational media project to span several media platforms (e.g., TV, Web, hands-on materials), assuming that the benefits of learning from multiple media extend beyond those gained from one medium alone. Yet research typically has investigated learning from a single medium in isolation. This paper reviews several recent studies to explore cross-platform learning (i.e., learning from combined use of multiple media platforms) and how such learning compares to learning from one medium. The paper discusses unique benefits of cross-platform learning, a theoretical mechanism to explain how these benefits might arise, and questions for future research in this emerging field. Copyright © 2013 Wiley Periodicals, Inc., A Wiley Company.
Learning by Doing: How to Develop a Cross-Platform Web App
ERIC Educational Resources Information Center
Huynh, Minh; Ghimire, Prashant
2015-01-01
As mobile devices become prevalent, there is always a need for apps. How hard is it to develop an app, especially a cross-platform app? The paper shares an experience in a project that involved the development of a student services web app that can be run on cross-platform mobile devices. The paper first describes the background of the project,…
Using Kokkos for Performant Cross-Platform Acceleration of Liquid Rocket Simulations
2017-05-08
NUMBER (Include area code) 08 May 2017 Briefing Charts 05 April 2017 - 08 May 2017 Using Kokkos for Performant Cross-Platform Acceleration of Liquid ...ERC Incorporated RQRC AFRL-West Using Kokkos for Performant Cross-Platform Acceleration of Liquid Rocket Simulations 2DISTRIBUTION A: Approved for... Liquid Rocket Combustion Simulation SPACE simulation of rotating detonation engine (courtesy of Dr. Christopher Lietz) 3DISTRIBUTION A: Approved
ViPAR: a software platform for the Virtual Pooling and Analysis of Research Data.
Carter, Kim W; Francis, Richard W; Carter, K W; Francis, R W; Bresnahan, M; Gissler, M; Grønborg, T K; Gross, R; Gunnes, N; Hammond, G; Hornig, M; Hultman, C M; Huttunen, J; Langridge, A; Leonard, H; Newman, S; Parner, E T; Petersson, G; Reichenberg, A; Sandin, S; Schendel, D E; Schalkwyk, L; Sourander, A; Steadman, C; Stoltenberg, C; Suominen, A; Surén, P; Susser, E; Sylvester Vethanayagam, A; Yusof, Z
2016-04-01
Research studies exploring the determinants of disease require sufficient statistical power to detect meaningful effects. Sample size is often increased through centralized pooling of disparately located datasets, though ethical, privacy and data ownership issues can often hamper this process. Methods that facilitate the sharing of research data that are sympathetic with these issues and which allow flexible and detailed statistical analyses are therefore in critical need. We have created a software platform for the Virtual Pooling and Analysis of Research data (ViPAR), which employs free and open source methods to provide researchers with a web-based platform to analyse datasets housed in disparate locations. Database federation permits controlled access to remotely located datasets from a central location. The Secure Shell protocol allows data to be securely exchanged between devices over an insecure network. ViPAR combines these free technologies into a solution that facilitates 'virtual pooling' where data can be temporarily pooled into computer memory and made available for analysis without the need for permanent central storage. Within the ViPAR infrastructure, remote sites manage their own harmonized research dataset in a database hosted at their site, while a central server hosts the data federation component and a secure analysis portal. When an analysis is initiated, requested data are retrieved from each remote site and virtually pooled at the central site. The data are then analysed by statistical software and, on completion, results of the analysis are returned to the user and the virtually pooled data are removed from memory. ViPAR is a secure, flexible and powerful analysis platform built on open source technology that is currently in use by large international consortia, and is made publicly available at [http://bioinformatics.childhealthresearch.org.au/software/vipar/]. © The Author 2015. Published by Oxford University Press on behalf of the International Epidemiological Association.
Lemieux, Sebastien; Sargeant, Tobias; Laperrière, David; Ismail, Houssam; Boucher, Geneviève; Rozendaal, Marieke; Lavallée, Vincent-Philippe; Ashton-Beaucage, Dariel; Wilhelm, Brian; Hébert, Josée; Hilton, Douglas J; Mader, Sylvie; Sauvageau, Guy
2017-07-27
Genome-wide transcriptome profiling has enabled non-supervised classification of tumours, revealing different sub-groups characterized by specific gene expression features. However, the biological significance of these subtypes remains for the most part unclear. We describe herein an interactive platform, Minimum Spanning Trees Inferred Clustering (MiSTIC), that integrates the direct visualization and comparison of the gene correlation structure between datasets, the analysis of the molecular causes underlying co-variations in gene expression in cancer samples, and the clinical annotation of tumour sets defined by the combined expression of selected biomarkers. We have used MiSTIC to highlight the roles of specific transcription factors in breast cancer subtype specification, to compare the aspects of tumour heterogeneity targeted by different prognostic signatures, and to highlight biomarker interactions in AML. A version of MiSTIC preloaded with datasets described herein can be accessed through a public web server (http://mistic.iric.ca); in addition, the MiSTIC software package can be obtained (github.com/iric-soft/MiSTIC) for local use with personalized datasets. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets
Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L
2014-01-01
Background As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Methods Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Results Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Conclusions Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. PMID:24464852
NASA Astrophysics Data System (ADS)
Dawes, N.; Salehi, A.; Clifton, A.; Bavay, M.; Aberer, K.; Parlange, M. B.; Lehning, M.
2010-12-01
It has long been known that environmental processes are cross-disciplinary, but data has continued to be acquired and held for a single purpose. Swiss Experiment is a rapidly evolving cross-disciplinary, distributed sensor data infrastructure, where tools for the environmental science community stem directly from computer science research. The platform uses the bleeding edge of computer science to acquire, store and distribute data and metadata from all environmental science disciplines at a variety of temporal and spatial resolutions. SwissEx is simultaneously developing new technologies to allow low cost, high spatial and temporal resolution measurements such that small areas can be intensely monitored. This data is then combined with existing widespread, low density measurements in the cross-disciplinary platform to provide well documented datasets, which are of use to multiple research disciplines. We present a flexible, generic infrastructure at an advanced stage of development. The infrastructure makes the most of Web 2.0 technologies for a collaborative working environment and as a user interface for a metadata database. This environment is already closely integrated with GSN, an open-source database middleware developed under Swiss Experiment for acquisition and storage of generic time-series data (2D and 3D). GSN can be queried directly by common data processing packages and makes data available in real-time to models and 3rd party software interfaces via its web service interface. It also provides real-time push or pull data exchange between instances, a user management system which leaves data owners in charge of their data, advanced real-time processing and much more. The SwissEx interface is increasingly gaining users and supporting environmental science in Switzerland. It is also an integral part of environmental education projects ClimAtscope and O3E, where the technologies can provide rapid feedback of results for children of all ages and where the data from their own stations can be compared to national data networks.
SATORI: a system for ontology-guided visual exploration of biomedical data repositories.
Lekschas, Fritz; Gehlenborg, Nils
2018-04-01
The ever-increasing number of biomedical datasets provides tremendous opportunities for re-use but current data repositories provide limited means of exploration apart from text-based search. Ontological metadata annotations provide context by semantically relating datasets. Visualizing this rich network of relationships can improve the explorability of large data repositories and help researchers find datasets of interest. We developed SATORI-an integrative search and visual exploration interface for the exploration of biomedical data repositories. The design is informed by a requirements analysis through a series of semi-structured interviews. We evaluated the implementation of SATORI in a field study on a real-world data collection. SATORI enables researchers to seamlessly search, browse and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface. SATORI is an open-source web application, which is freely available at http://satori.refinery-platform.org and integrated into the Refinery Platform. nils@hms.harvard.edu. Supplementary data are available at Bioinformatics online.
PIVOT: platform for interactive analysis and visualization of transcriptomics data.
Zhu, Qin; Fisher, Stephen A; Dueck, Hannah; Middleton, Sarah; Khaladkar, Mugdha; Kim, Junhyong
2018-01-05
Many R packages have been developed for transcriptome analysis but their use often requires familiarity with R and integrating results of different packages requires scripts to wrangle the datatypes. Furthermore, exploratory data analyses often generate multiple derived datasets such as data subsets or data transformations, which can be difficult to track. Here we present PIVOT, an R-based platform that wraps open source transcriptome analysis packages with a uniform user interface and graphical data management that allows non-programmers to interactively explore transcriptomics data. PIVOT supports more than 40 popular open source packages for transcriptome analysis and provides an extensive set of tools for statistical data manipulations. A graph-based visual interface is used to represent the links between derived datasets, allowing easy tracking of data versions. PIVOT further supports automatic report generation, publication-quality plots, and program/data state saving, such that all analysis can be saved, shared and reproduced. PIVOT will allow researchers with broad background to easily access sophisticated transcriptome analysis tools and interactively explore transcriptome datasets.
Zhou, Zhen; Wang, Jian-Bao; Zang, Yu-Feng; Pan, Gang
2018-01-01
Classification approaches have been increasingly applied to differentiate patients and normal controls using resting-state functional magnetic resonance imaging data (RS-fMRI). Although most previous classification studies have reported promising accuracy within individual datasets, achieving high levels of accuracy with multiple datasets remains challenging for two main reasons: high dimensionality, and high variability across subjects. We used two independent RS-fMRI datasets (n = 31, 46, respectively) both with eyes closed (EC) and eyes open (EO) conditions. For each dataset, we first reduced the number of features to a small number of brain regions with paired t-tests, using the amplitude of low frequency fluctuation (ALFF) as a metric. Second, we employed a new method for feature extraction, named the PAIR method, examining EC and EO as paired conditions rather than independent conditions. Specifically, for each dataset, we obtained EC minus EO (EC—EO) maps of ALFF from half of subjects (n = 15 for dataset-1, n = 23 for dataset-2) and obtained EO—EC maps from the other half (n = 16 for dataset-1, n = 23 for dataset-2). A support vector machine (SVM) method was used for classification of EC RS-fMRI mapping and EO mapping. The mean classification accuracy of the PAIR method was 91.40% for dataset-1, and 92.75% for dataset-2 in the conventional frequency band of 0.01–0.08 Hz. For cross-dataset validation, we applied the classifier from dataset-1 directly to dataset-2, and vice versa. The mean accuracy of cross-dataset validation was 94.93% for dataset-1 to dataset-2 and 90.32% for dataset-2 to dataset-1 in the 0.01–0.08 Hz range. For the UNPAIR method, classification accuracy was substantially lower (mean 69.89% for dataset-1 and 82.97% for dataset-2), and was much lower for cross-dataset validation (64.69% for dataset-1 to dataset-2 and 64.98% for dataset-2 to dataset-1) in the 0.01–0.08 Hz range. In conclusion, for within-group design studies (e.g., paired conditions or follow-up studies), we recommend the PAIR method for feature extraction. In addition, dimensionality reduction with strong prior knowledge of specific brain regions should also be considered for feature selection in neuroimaging studies. PMID:29375288
LSD: Large Survey Database framework
NASA Astrophysics Data System (ADS)
Juric, Mario
2012-09-01
The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures.
Project FLOSSIE: Marine Data Stewardship at the Waterline
NASA Astrophysics Data System (ADS)
Bouchard, R. H.; Jensen, R. E.; Riley, R. E.
2016-02-01
There are more than 10 million wave records from platforms of the National Data Buoy Center (NDBC) that are archived by National Oceanic and Atmospheric Administration (NOAA). A considerable number of these were measured from the 61 NOMAD (Navy Oceanographic Meteorological Automatic Device) hulls that NDBC has used to make wave measurements since October 1979. Many of these measurements were made before the era of modern marine data stewardship. These long records lend themselves to investigations of climate trends and variability either directly by the measurements themselves, or indirectly by validating long-term numerical wave models or remote sensing applications. However studies (e.g., Gemmrich et al. 2011) indicate that discontinuities and increased variability of the measurements can arise from changing wave systems and platforms. The value of these records is undermined by the lack of understanding or documentation of technology changes - a critical component of data stewardship. To support its mission of long-term understanding of coastal waves and wave models, the U.S. Army Corps of Engineers, Coastal Hydraulics Laboratory (CHL) sponsored the FLOSSIE Project to gage the effects of technology changes on the long-term wave measurements from NOMAD hulls. On behalf of CHL, NDBC engineering and operations integrated old, new, and leading edge technologies on one NOMAD hull. The hull was successfully deployed in July 2015 at the Wave Evaluation and Testing area off of Monterey Bay, CA. The area hosts an NDBC 3-m hull with cross-generational-technologies and a reference standard in a Datawell Waverider buoy. Thus cross-generational and cross-platform inter-comparisons can be performed simultaneously to an accepted standard. The analysis goes beyond the bulk wave parameters. The analysis will examine the energy and directional distributions over the frequency range of wind-generated waves. The project is named in honor of the pioneering World War II Naval meteorologist, Commander Florence (Flossie) Van Straten (1913 - 1992), USNR, who coined the acronym for NOMAD. This paper will discuss the goals of the project, present preliminary data results and application to the long-term measurements, and outline the plans incorporating Best Practices of Marine Data Stewardship for the resulting datasets.
NASA Astrophysics Data System (ADS)
Prodanovic, M.; Esteva, M.; Hanlon, M.; Nanda, G.; Agarwal, P.
2015-12-01
Recent advances in imaging have provided a wealth of 3D datasets that reveal pore space microstructure (nm to cm length scale) and allow investigation of nonlinear flow and mechanical phenomena from first principles using numerical approaches. This framework has popularly been called "digital rock physics". Researchers, however, have trouble storing and sharing the datasets both due to their size and the lack of standardized image types and associated metadata for volumetric datasets. This impedes scientific cross-validation of the numerical approaches that characterize large scale porous media properties, as well as development of multiscale approaches required for correct upscaling. A single research group typically specializes in an imaging modality and/or related modeling on a single length scale, and lack of data-sharing infrastructure makes it difficult to integrate different length scales. We developed a sustainable, open and easy-to-use repository called the Digital Rocks Portal, that (1) organizes images and related experimental measurements of different porous materials, (2) improves access to them for a wider community of geosciences or engineering researchers not necessarily trained in computer science or data analysis. Once widely accepter, the repository will jumpstart productivity and enable scientific inquiry and engineering decisions founded on a data-driven basis. This is the first repository of its kind. We show initial results on incorporating essential software tools and pipelines that make it easier for researchers to store and reuse data, and for educators to quickly visualize and illustrate concepts to a wide audience. For data sustainability and continuous access, the portal is implemented within the reliable, 24/7 maintained High Performance Computing Infrastructure supported by the Texas Advanced Computing Center (TACC) at the University of Texas at Austin. Long-term storage is provided through the University of Texas System Research Cyber-infrastructure initiative.
Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie
2018-01-01
Abstract Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. PMID:29106630
Age differences in the Big Five across the life span: evidence from two national samples.
Donnellan, M Brent; Lucas, Richard E
2008-09-01
Cross-sectional age differences in the Big Five personality traits were investigated using 2 large datasets from Great Britain and Germany: the British Household Panel Study (BHPS; N > or = 14,039) and the German Socio-Economic Panel Study (GSEOP; N > or = 20,852). Participants, who ranged in age from 16 to the mid-80s, completed a 15-item version of the Big Five Inventory (e.g., John & Srivastava, 1999) in either 2005 or 2006. The observed age trends were generally consistent across both datasets. Extraversion and Openness were negatively associated with age, whereas Agreeableness was positively associated with age. Average levels of Conscientiousness were highest for participants in middle age. The only exception was Neuroticism, which was slightly negatively associated with age in the BHPS and slightly positively associated with age in the GSEOP. Neither gender nor education level were consistent moderators of age differences in the Big Five. (c) 2008 APA, all rights reserved
CrossLink: a novel method for cross-condition classification of cancer subtypes.
Ma, Chifeng; Sastry, Konduru S; Flore, Mario; Gehani, Salah; Al-Bozom, Issam; Feng, Yusheng; Serpedin, Erchin; Chouchane, Lotfi; Chen, Yidong; Huang, Yufei
2016-08-22
We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another. To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature. We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 %, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles. A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.
MMX-I: data-processing software for multimodal X-ray imaging and tomography.
Bergamaschi, Antoine; Medjoubi, Kadda; Messaoudi, Cédric; Marco, Sergio; Somogyi, Andrea
2016-05-01
A new multi-platform freeware has been developed for the processing and reconstruction of scanning multi-technique X-ray imaging and tomography datasets. The software platform aims to treat different scanning imaging techniques: X-ray fluorescence, phase, absorption and dark field and any of their combinations, thus providing an easy-to-use data processing tool for the X-ray imaging user community. A dedicated data input stream copes with the input and management of large datasets (several hundred GB) collected during a typical multi-technique fast scan at the Nanoscopium beamline and even on a standard PC. To the authors' knowledge, this is the first software tool that aims at treating all of the modalities of scanning multi-technique imaging and tomography experiments.
Characterisation of mental health conditions in social media using Informed Deep Learning.
Gkotsis, George; Oellrich, Anika; Velupillai, Sumithra; Liakata, Maria; Hubbard, Tim J P; Dobson, Richard J B; Dutta, Rina
2017-03-22
The number of people affected by mental illness is on the increase and with it the burden on health and social care use, as well as the loss of both productivity and quality-adjusted life-years. Natural language processing of electronic health records is increasingly used to study mental health conditions and risk behaviours on a large scale. However, narrative notes written by clinicians do not capture first-hand the patients' own experiences, and only record cross-sectional, professional impressions at the point of care. Social media platforms have become a source of 'in the moment' daily exchange, with topics including well-being and mental health. In this study, we analysed posts from the social media platform Reddit and developed classifiers to recognise and classify posts related to mental illness according to 11 disorder themes. Using a neural network and deep learning approach, we could automatically recognise mental illness-related posts in our balenced dataset with an accuracy of 91.08% and select the correct theme with a weighted average accuracy of 71.37%. We believe that these results are a first step in developing methods to characterise large amounts of user-generated content that could support content curation and targeted interventions.
Characterisation of mental health conditions in social media using Informed Deep Learning
NASA Astrophysics Data System (ADS)
Gkotsis, George; Oellrich, Anika; Velupillai, Sumithra; Liakata, Maria; Hubbard, Tim J. P.; Dobson, Richard J. B.; Dutta, Rina
2017-03-01
The number of people affected by mental illness is on the increase and with it the burden on health and social care use, as well as the loss of both productivity and quality-adjusted life-years. Natural language processing of electronic health records is increasingly used to study mental health conditions and risk behaviours on a large scale. However, narrative notes written by clinicians do not capture first-hand the patients’ own experiences, and only record cross-sectional, professional impressions at the point of care. Social media platforms have become a source of ‘in the moment’ daily exchange, with topics including well-being and mental health. In this study, we analysed posts from the social media platform Reddit and developed classifiers to recognise and classify posts related to mental illness according to 11 disorder themes. Using a neural network and deep learning approach, we could automatically recognise mental illness-related posts in our balenced dataset with an accuracy of 91.08% and select the correct theme with a weighted average accuracy of 71.37%. We believe that these results are a first step in developing methods to characterise large amounts of user-generated content that could support content curation and targeted interventions.
PRGdb: a bioinformatics platform for plant resistance gene analysis
Sanseverino, Walter; Roma, Guglielmo; De Simone, Marco; Faino, Luigi; Melito, Sara; Stupka, Elia; Frusciante, Luigi; Ercolano, Maria Raffaella
2010-01-01
PRGdb is a web accessible open-source (http://www.prgdb.org) database that represents the first bioinformatic resource providing a comprehensive overview of resistance genes (R-genes) in plants. PRGdb holds more than 16 000 known and putative R-genes belonging to 192 plant species challenged by 115 different pathogens and linked with useful biological information. The complete database includes a set of 73 manually curated reference R-genes, 6308 putative R-genes collected from NCBI and 10463 computationally predicted putative R-genes. Thanks to a user-friendly interface, data can be examined using different query tools. A home-made prediction pipeline called Disease Resistance Analysis and Gene Orthology (DRAGO), based on reference R-gene sequence data, was developed to search for plant resistance genes in public datasets such as Unigene and Genbank. New putative R-gene classes containing unknown domain combinations were discovered and characterized. The development of the PRG platform represents an important starting point to conduct various experimental tasks. The inferred cross-link between genomic and phenotypic information allows access to a large body of information to find answers to several biological questions. The database structure also permits easy integration with other data types and opens up prospects for future implementations. PMID:19906694
GenomicTools: a computational platform for developing high-throughput analytics in genomics.
Tsirigos, Aristotelis; Haiminen, Niina; Bilal, Erhan; Utro, Filippo
2012-01-15
Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and a C++ API, for the analysis and manipulation of high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. Additionally, the GenomicTools platform is designed to analyze large datasets of any size by minimizing memory requirements. In practical applications, where comparable, GenomicTools outperforms existing tools in terms of both time and memory usage. The GenomicTools platform (version 2.0.0) was implemented in C++. The source code, documentation, user manual, example datasets and scripts are available online at http://code.google.com/p/ibm-cbc-genomic-tools.
MASPECTRAS: a platform for management and analysis of proteomics LC-MS/MS data
Hartler, Jürgen; Thallinger, Gerhard G; Stocker, Gernot; Sturn, Alexander; Burkard, Thomas R; Körner, Erik; Rader, Robert; Schmidt, Andreas; Mechtler, Karl; Trajanoski, Zlatko
2007-01-01
Background The advancements of proteomics technologies have led to a rapid increase in the number, size and rate at which datasets are generated. Managing and extracting valuable information from such datasets requires the use of data management platforms and computational approaches. Results We have developed the MAss SPECTRometry Analysis System (MASPECTRAS), a platform for management and analysis of proteomics LC-MS/MS data. MASPECTRAS is based on the Proteome Experimental Data Repository (PEDRo) relational database schema and follows the guidelines of the Proteomics Standards Initiative (PSI). Analysis modules include: 1) import and parsing of the results from the search engines SEQUEST, Mascot, Spectrum Mill, X! Tandem, and OMSSA; 2) peptide validation, 3) clustering of proteins based on Markov Clustering and multiple alignments; and 4) quantification using the Automated Statistical Analysis of Protein Abundance Ratios algorithm (ASAPRatio). The system provides customizable data retrieval and visualization tools, as well as export to PRoteomics IDEntifications public repository (PRIDE). MASPECTRAS is freely available at Conclusion Given the unique features and the flexibility due to the use of standard software technology, our platform represents significant advance and could be of great interest to the proteomics community. PMID:17567892
GC31G-1182: Opennex, a Private-Public Partnership in Support of the National Climate Assessment
NASA Technical Reports Server (NTRS)
Nemani, Ramakrishna R.; Wang, Weile; Michaelis, Andrew; Votava, Petr; Ganguly, Sangram
2016-01-01
The NASA Earth Exchange (NEX) is a collaborative computing platform that has been developed with the objective of bringing scientists together with the software tools, massive global datasets, and supercomputing resources necessary to accelerate research in Earth systems science and global change. NEX is funded as an enabling tool for sustaining the national climate assessment. Over the past five years, researchers have used the NEX platform and produced a number of data sets highly relevant to the National Climate Assessment. These include high-resolution climate projections using different downscaling techniques and trends in historical climate from satellite data. To enable a broader community in exploiting the above datasets, the NEX team partnered with public cloud providers to create the OpenNEX platform. OpenNEX provides ready access to NEX data holdings on a number of public cloud platforms along with pertinent analysis tools and workflows in the form of Machine Images and Docker Containers, lectures and tutorials by experts. We will showcase some of the applications of OpenNEX data and tools by the community on Amazon Web Services, Google Cloud and the NEX Sandbox.
Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses
Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M.; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V.; Ma’ayan, Avi
2018-01-01
Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated ‘canned’ analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools. PMID:29485625
Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses.
Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V; Ma'ayan, Avi
2018-02-27
Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated 'canned' analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools.
NASA Astrophysics Data System (ADS)
Jain, Sankalp; Kotsampasakou, Eleni; Ecker, Gerhard F.
2018-05-01
Cheminformatics datasets used in classification problems, especially those related to biological or physicochemical properties, are often imbalanced. This presents a major challenge in development of in silico prediction models, as the traditional machine learning algorithms are known to work best on balanced datasets. The class imbalance introduces a bias in the performance of these algorithms due to their preference towards the majority class. Here, we present a comparison of the performance of seven different meta-classifiers for their ability to handle imbalanced datasets, whereby Random Forest is used as base-classifier. Four different datasets that are directly (cholestasis) or indirectly (via inhibition of organic anion transporting polypeptide 1B1 and 1B3) related to liver toxicity were chosen for this purpose. The imbalance ratio in these datasets ranges between 4:1 and 20:1 for negative and positive classes, respectively. Three different sets of molecular descriptors for model development were used, and their performance was assessed in 10-fold cross-validation and on an independent validation set. Stratified bagging, MetaCost and CostSensitiveClassifier were found to be the best performing among all the methods. While MetaCost and CostSensitiveClassifier provided better sensitivity values, Stratified Bagging resulted in high balanced accuracies.
NASA Astrophysics Data System (ADS)
Jain, Sankalp; Kotsampasakou, Eleni; Ecker, Gerhard F.
2018-04-01
Cheminformatics datasets used in classification problems, especially those related to biological or physicochemical properties, are often imbalanced. This presents a major challenge in development of in silico prediction models, as the traditional machine learning algorithms are known to work best on balanced datasets. The class imbalance introduces a bias in the performance of these algorithms due to their preference towards the majority class. Here, we present a comparison of the performance of seven different meta-classifiers for their ability to handle imbalanced datasets, whereby Random Forest is used as base-classifier. Four different datasets that are directly (cholestasis) or indirectly (via inhibition of organic anion transporting polypeptide 1B1 and 1B3) related to liver toxicity were chosen for this purpose. The imbalance ratio in these datasets ranges between 4:1 and 20:1 for negative and positive classes, respectively. Three different sets of molecular descriptors for model development were used, and their performance was assessed in 10-fold cross-validation and on an independent validation set. Stratified bagging, MetaCost and CostSensitiveClassifier were found to be the best performing among all the methods. While MetaCost and CostSensitiveClassifier provided better sensitivity values, Stratified Bagging resulted in high balanced accuracies.
Taft, L M; Evans, R S; Shyu, C R; Egger, M J; Chawla, N; Mitchell, J A; Thornton, S N; Bray, B; Varner, M
2009-04-01
The IOM report, Preventing Medication Errors, emphasizes the overall lack of knowledge of the incidence of adverse drug events (ADE). Operating rooms, emergency departments and intensive care units are known to have a higher incidence of ADE. Labor and delivery (L&D) is an emergency care unit that could have an increased risk of ADE, where reported rates remain low and under-reporting is suspected. Risk factor identification with electronic pattern recognition techniques could improve ADE detection rates. The objective of the present study is to apply Synthetic Minority Over Sampling Technique (SMOTE) as an enhanced sampling method in a sparse dataset to generate prediction models to identify ADE in women admitted for labor and delivery based on patient risk factors and comorbidities. By creating synthetic cases with the SMOTE algorithm and using a 10-fold cross-validation technique, we demonstrated improved performance of the Naïve Bayes and the decision tree algorithms. The true positive rate (TPR) of 0.32 in the raw dataset increased to 0.67 in the 800% over-sampled dataset. Enhanced performance from classification algorithms can be attained with the use of synthetic minority class oversampling techniques in sparse clinical datasets. Predictive models created in this manner can be used to develop evidence based ADE monitoring systems.
Jong, Victor L; Novianti, Putri W; Roes, Kit C B; Eijkemans, Marinus J C
2014-12-01
The literature shows that classifiers perform differently across datasets and that correlations within datasets affect the performance of classifiers. The question that arises is whether the correlation structure within datasets differ significantly across diseases. In this study, we evaluated the homogeneity of correlation structures within and between datasets of six etiological disease categories; inflammatory, immune, infectious, degenerative, hereditary and acute myeloid leukemia (AML). We also assessed the effect of filtering; detection call and variance filtering on correlation structures. We downloaded microarray datasets from ArrayExpress for experiments meeting predefined criteria and ended up with 12 datasets for non-cancerous diseases and six for AML. The datasets were preprocessed by a common procedure incorporating platform-specific recommendations and the two filtering methods mentioned above. Homogeneity of correlation matrices between and within datasets of etiological diseases was assessed using the Box's M statistic on permuted samples. We found that correlation structures significantly differ between datasets of the same and/or different etiological disease categories and that variance filtering eliminates more uncorrelated probesets than detection call filtering and thus renders the data highly correlated.
NASA Astrophysics Data System (ADS)
Delgado, Juan A.; Altuve, Miguel; Nabhan Homsi, Masun
2015-12-01
This paper introduces a robust method based on the Support Vector Machine (SVM) algorithm to detect the presence of Fetal QRS (fQRS) complexes in electrocardiogram (ECG) recordings provided by the PhysioNet/CinC challenge 2013. ECG signals are first segmented into contiguous frames of 250 ms duration and then labeled in six classes. Fetal segments are tagged according to the position of fQRS complex within each one. Next, segment features extraction and dimensionality reduction are obtained by applying principal component analysis on Haar-wavelet transform. After that, two sub-datasets are generated to separate representative segments from atypical ones. Imbalanced class problem is dealt by applying sampling without replacement on each sub-dataset. Finally, two SVMs are trained and cross-validated using the two balanced sub-datasets separately. Experimental results show that the proposed approach achieves high performance rates in fetal heartbeats detection that reach up to 90.95% of accuracy, 92.16% of sensitivity, 88.51% of specificity, 94.13% of positive predictive value and 84.96% of negative predictive value. A comparative study is also carried out to show the performance of other two machine learning algorithms for fQRS complex estimation, which are K-nearest neighborhood and Bayesian network.
Collegial Activity Learning between Heterogeneous Sensors.
Feuz, Kyle D; Cook, Diane J
2017-11-01
Activity recognition algorithms have matured and become more ubiquitous in recent years. However, these algorithms are typically customized for a particular sensor platform. In this paper we introduce PECO, a Personalized activity ECOsystem, that transfers learned activity information seamlessly between sensor platforms in real time so that any available sensor can continue to track activities without requiring its own extensive labeled training data. We introduce a multi-view transfer learning algorithm that facilitates this information handoff between sensor platforms and provide theoretical performance bounds for the algorithm. In addition, we empirically evaluate PECO using datasets that utilize heterogeneous sensor platforms to perform activity recognition. These results indicate that not only can activity recognition algorithms transfer important information to new sensor platforms, but any number of platforms can work together as colleagues to boost performance.
Richard, Arianne C; Lyons, Paul A; Peters, James E; Biasci, Daniele; Flint, Shaun M; Lee, James C; McKinney, Eoin F; Siegel, Richard M; Smith, Kenneth G C
2014-08-04
Although numerous investigations have compared gene expression microarray platforms, preprocessing methods and batch correction algorithms using constructed spike-in or dilution datasets, there remains a paucity of studies examining the properties of microarray data using diverse biological samples. Most microarray experiments seek to identify subtle differences between samples with variable background noise, a scenario poorly represented by constructed datasets. Thus, microarray users lack important information regarding the complexities introduced in real-world experimental settings. The recent development of a multiplexed, digital technology for nucleic acid measurement enables counting of individual RNA molecules without amplification and, for the first time, permits such a study. Using a set of human leukocyte subset RNA samples, we compared previously acquired microarray expression values with RNA molecule counts determined by the nCounter Analysis System (NanoString Technologies) in selected genes. We found that gene measurements across samples correlated well between the two platforms, particularly for high-variance genes, while genes deemed unexpressed by the nCounter generally had both low expression and low variance on the microarray. Confirming previous findings from spike-in and dilution datasets, this "gold-standard" comparison demonstrated signal compression that varied dramatically by expression level and, to a lesser extent, by dataset. Most importantly, examination of three different cell types revealed that noise levels differed across tissues. Microarray measurements generally correlate with relative RNA molecule counts within optimal ranges but suffer from expression-dependent accuracy bias and precision that varies across datasets. We urge microarray users to consider expression-level effects in signal interpretation and to evaluate noise properties in each dataset independently.
The truth about mouse, human, worms and yeast
2004-01-01
Genome comparisons are behind the powerful new annotation methods being developed to find all human genes, as well as genes from other genomes. Genomes are now frequently being studied in pairs to provide cross-comparison datasets. This 'Noah's Ark' approach often reveals unsuspected genes and may support the deletion of false-positive predictions. Joining mouse and human as the cross-comparison dataset for the first two mammals are: two Drosophila species, D. melanogaster and D. pseudoobscura; two sea squirts, Ciona intestinalis and Ciona savignyi; four yeast (Saccharomyces) species; two nematodes, Caenorhabditis elegans and Caenorhabditis briggsae; and two pufferfish (Takefugu rubripes and Tetraodon nigroviridis). Even genomes like yeast and C. elegans, which have been known for more than five years, are now being significantly improved. Methods developed for yeast or nematodes will now be applied to mouse and human, and soon to additional mammals such as rat and dog, to identify all the mammalian protein-coding genes. Current large disparities between human Unigene predictions (127,835 genes) and gene-scanning methods (45,000 genes) still need to be resolved. This will be the challenge during the next few years. PMID:15601543
The truth about mouse, human, worms and yeast.
Nelson, David R; Nebert, Daniel W
2004-01-01
Genome comparisons are behind the powerful new annotation methods being developed to find all human genes, as well as genes from other genomes. Genomes are now frequently being studied in pairs to provide cross-comparison datasets. This 'Noah's Ark' approach often reveals unsuspected genes and may support the deletion of false-positive predictions. Joining mouse and human as the cross-comparison dataset for the first two mammals are: two Drosophila species, D. melanogaster and D. pseudoobscura; two sea squirts, Ciona intestinalis and Ciona savignyi; four yeast (Saccharomyces) species; two nematodes, Caenorhabditis elegans and Caenorhabditis briggsae; and two pufferfish (Takefugu rubripes and Tetraodon nigroviridis). Even genomes like yeast and C. elegans, which have been known for more than five years, are now being significantly improved. Methods developed for yeast or nematodes will now be applied to mouse and human, and soon to additional mammals such as rat and dog, to identify all the mammalian protein-coding genes. Current large disparities between human Unigene predictions (127,835 genes) and gene-scanning methods (45,000 genes) still need to be resolved. This will be the challenge during the next few years.
Multi-source Geospatial Data Analysis with Google Earth Engine
NASA Astrophysics Data System (ADS)
Erickson, T.
2014-12-01
The Google Earth Engine platform is a cloud computing environment for data analysis that combines a public data catalog with a large-scale computational facility optimized for parallel processing of geospatial data. The data catalog is a multi-petabyte archive of georeferenced datasets that include images from Earth observing satellite and airborne sensors (examples: USGS Landsat, NASA MODIS, USDA NAIP), weather and climate datasets, and digital elevation models. Earth Engine supports both a just-in-time computation model that enables real-time preview and debugging during algorithm development for open-ended data exploration, and a batch computation mode for applying algorithms over large spatial and temporal extents. The platform automatically handles many traditionally-onerous data management tasks, such as data format conversion, reprojection, and resampling, which facilitates writing algorithms that combine data from multiple sensors and/or models. Although the primary use of Earth Engine, to date, has been the analysis of large Earth observing satellite datasets, the computational platform is generally applicable to a wide variety of use cases that require large-scale geospatial data analyses. This presentation will focus on how Earth Engine facilitates the analysis of geospatial data streams that originate from multiple separate sources (and often communities) and how it enables collaboration during algorithm development and data exploration. The talk will highlight current projects/analyses that are enabled by this functionality.https://earthengine.google.org
NASA Astrophysics Data System (ADS)
Valentine, G. A.
2012-12-01
VHub (short for VolcanoHub, and accessible at vhub.org) is an online platform for collaboration in research and training related to volcanoes, the hazards they pose, and risk mitigation. The underlying concept is to provide a mechanism that enables workers to share information with colleagues around the globe; VHub and similar hub technologies could prove very powerful in collaborating and communicating about circum-Pacific volcanic hazards. Collaboration occurs around several different points: (1) modeling and simulation; (2) data sharing; (3) education and training; (4) volcano observatories; and (5) project-specific groups. VHub promotes modeling and simulation in two ways: (1) some models can be implemented on VHub for online execution. This eliminates the need to download and compile a code on a local computer. VHub can provide a central "warehouse" for such models that should result in broader dissemination. VHub also provides a platform that supports the more complex CFD models by enabling the sharing of code development and problem-solving knowledge, benchmarking datasets, and the development of validation exercises. VHub also provides a platform for sharing of data and datasets. The VHub development team is implementing the iRODS data sharing middleware (see irods.org). iRODS allows a researcher to access data that are located at participating data sources around the world (a "cloud" of data) as if the data were housed in a single virtual database. Education and training is another important use of the VHub platform. Audio-video recordings of seminars, PowerPoint slide sets, and educational simulations are all items that can be placed onto VHub for use by the community or by selected collaborators. An important point is that the "manager" of a given educational resource (or any other resource, such as a dataset or a model) can control the privacy of that resource, ranging from private (only accessible by, and known to, specific collaborators) to completely public. Materials for use in the classroom can be shared via VHub. VHub is a very useful platform for project-specific collaborations. With a group site on VHub where collaborators share documents, datasets, maps, and have ongoing discussions using the discussion board function. VHub is funded by the U.S. National Science Foundation, and is participating in development of larger earth-science cyberinfrastructure initiatives (EarthCube), as well as supporting efforts such as the Global Volcano Model.
NASA Astrophysics Data System (ADS)
Ali, E. S. M.; Spencer, B.; McEwen, M. R.; Rogers, D. W. O.
2015-02-01
In this study, a quantitative estimate is derived for the uncertainty in the XCOM photon mass attenuation coefficients in the energy range of interest to external beam radiation therapy—i.e. 100 keV (orthovoltage) to 25 MeV—using direct comparisons of experimental data against Monte Carlo models and theoretical XCOM data. Two independent datasets are used. The first dataset is from our recent transmission measurements and the corresponding EGSnrc calculations (Ali et al 2012 Med. Phys. 39 5990-6003) for 10-30 MV photon beams from the research linac at the National Research Council Canada. The attenuators are graphite and lead, with a total of 140 data points and an experimental uncertainty of ˜0.5% (k = 1). An optimum energy-independent cross section scaling factor that minimizes the discrepancies between measurements and calculations is used to deduce cross section uncertainty. The second dataset is from the aggregate of cross section measurements in the literature for graphite and lead (49 experiments, 288 data points). The dataset is compared to the sum of the XCOM data plus the IAEA photonuclear data. Again, an optimum energy-independent cross section scaling factor is used to deduce the cross section uncertainty. Using the average result from the two datasets, the energy-independent cross section uncertainty estimate is 0.5% (68% confidence) and 0.7% (95% confidence). The potential for energy-dependent errors is discussed. Photon cross section uncertainty is shown to be smaller than the current qualitative ‘envelope of uncertainty’ of the order of 1-2%, as given by Hubbell (1999 Phys. Med. Biol 44 R1-22).
Irigoyen, Antonio; Jimenez-Luna, Cristina; Benavides, Manuel; Caba, Octavio; Gallego, Javier; Ortuño, Francisco Manuel; Guillen-Ponce, Carmen; Rojas, Ignacio; Aranda, Enrique; Torres, Carolina; Prados, Jose
2018-01-01
Applying differentially expressed genes (DEGs) to identify feasible biomarkers in diseases can be a hard task when working with heterogeneous datasets. Expression data are strongly influenced by technology, sample preparation processes, and/or labeling methods. The proliferation of different microarray platforms for measuring gene expression increases the need to develop models able to compare their results, especially when different technologies can lead to signal values that vary greatly. Integrative meta-analysis can significantly improve the reliability and robustness of DEG detection. The objective of this work was to develop an integrative approach for identifying potential cancer biomarkers by integrating gene expression data from two different platforms. Pancreatic ductal adenocarcinoma (PDAC), where there is an urgent need to find new biomarkers due its late diagnosis, is an ideal candidate for testing this technology. Expression data from two different datasets, namely Affymetrix and Illumina (18 and 36 PDAC patients, respectively), as well as from 18 healthy controls, was used for this study. A meta-analysis based on an empirical Bayesian methodology (ComBat) was then proposed to integrate these datasets. DEGs were finally identified from the integrated data by using the statistical programming language R. After our integrative meta-analysis, 5 genes were commonly identified within the individual analyses of the independent datasets. Also, 28 novel genes that were not reported by the individual analyses ('gained' genes) were also discovered. Several of these gained genes have been already related to other gastroenterological tumors. The proposed integrative meta-analysis has revealed novel DEGs that may play an important role in PDAC and could be potential biomarkers for diagnosing the disease.
Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets.
Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L
2014-01-01
As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P
2012-03-15
Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.
NASA Astrophysics Data System (ADS)
Cavallo, Eugenio; Biddoccu, Marcella; Bagagiolo, Giorgia; De Marziis, Massimo; Gaia Forni, Emanuela; Alemanno, Laura; Ferraris, Stefano; Canone, Davide; Previati, Maurizio; Turconi, Laura; Arattano, Massimo; Coviello, Velio
2016-04-01
Environmental sensor monitoring is continuously developing, both in terms of quantity (i.e. measurement sites), and quality (i.e. technological innovation). Environmental monitoring is carried out by either public or private entities for their own specific purposes, such as scientific research, civil protection, support to industrial and agricultural activities, services for citizens, security, education, and information. However, the acquired dataset could be cross-appealing, hence, being interesting for purposes that diverted from their main intended use. The CIRCE project (Cooperative Internet-of-Data Rural-alpine Community Environment) aimed to gather, manage, use and distribute data obtained from sensors and from people, in a multipurpose approach. The CIRCE project was selected within a call for tender launched by Piedmont Region (in collaboration with CSI Piemonte) in order to improve the digital ecosystem represented by YUCCA, an open source platform oriented to the acquisition, sharing and reuse of data resulting both from real-time and on-demand applications. The partnership of the CIRCE project was made by scientific research bodies (IMAMOTER-CNR, IRPI-CNR, DIST) together with SMEs involved in environmental monitoring and ICT sectors (namely: 3a srl, EnviCons srl, Impresa Verde Cuneo srl, and NetValue srl). Within the project a shared network of agro-meteo-hydrological sensors has been created. Then a platform and its interface for collection, management and distribution of data has been developed. The CIRCE network is currently constituted by a total amount of 171 sensors remotely connected and originally belonging to different networks. They are settled-up in order to monitor and investigate agro-meteo-hydrological processes in different rural and mountain areas of Piedmont Region (NW-Italy), including some very sensitive locations, but difficult to access. Each sensor network differs from each other, in terms of purpose of monitoring, monitored parameters, instrumentation, system architecture, data acquisition and communication processes. In addition to real-time data, the CIRCE database includes many historical datasets, which were uniformed to the adopted database architecture. Such datasets were collected before the implementation of the project both from the connected sensors, and from sensors no longer active. In order to attempt to reduce the gap between the research community and end users, specific APP for smartphones and tablets were created. Such tools facilitate the access and the enrichment of the CIRCE database both for the hydrological section (APP IDRO) than for the agro-meteorological section (APP AGRO). Non-specialists may participate in enrichment of the sensor punctual data with sending qualitative and quantitative information about the observed processes (e.g. watercourse levels, erosion processes, presence of pathogens, damage pictures, etc.). The territorial investigation and the data acquisition also involved groups of citizens (namely farmers, technician and volunteers), that were engaged in creating and testing the informatics tools, according with the "Living Lab" approach. Finally, the CIRCE platform was interfaced with the YUCCA platform, allowing an open access to the CIRCE dataset and its integration in the SmartDataNet system of the Regione Piemonte public administration. The CIRCE project was funded by EU FESR, by Italian Government and Regione Piemonte within the programme Regione Piemonte POR/FESR 2007-2013.
MMX-I: data-processing software for multimodal X-ray imaging and tomography
Bergamaschi, Antoine; Medjoubi, Kadda; Messaoudi, Cédric; Marco, Sergio; Somogyi, Andrea
2016-01-01
A new multi-platform freeware has been developed for the processing and reconstruction of scanning multi-technique X-ray imaging and tomography datasets. The software platform aims to treat different scanning imaging techniques: X-ray fluorescence, phase, absorption and dark field and any of their combinations, thus providing an easy-to-use data processing tool for the X-ray imaging user community. A dedicated data input stream copes with the input and management of large datasets (several hundred GB) collected during a typical multi-technique fast scan at the Nanoscopium beamline and even on a standard PC. To the authors’ knowledge, this is the first software tool that aims at treating all of the modalities of scanning multi-technique imaging and tomography experiments. PMID:27140159
Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie; Zhang, Gong
2018-01-04
Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Digest of celestial X-ray missions and experiments
NASA Technical Reports Server (NTRS)
Locke, M. C.
1982-01-01
Information on instruments, the platforms that carried them, and the data they gathered is presented. Instrument selection was confined to detectors operating in the 0.20 to 300 keV range. Included are brief descriptions of the spacecraft, experiment packages and missions. Cross-referenced indexes are provided for types of instruments, energy ranges, time spans covered, positional catalogs and observational catalogs. Data sets from these experiments (NSSDC) are described.
Benchmarking protein classification algorithms via supervised cross-validation.
Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor
2008-04-24
Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.
SEEG initiative estimates of Brazilian greenhouse gas emissions from 1970 to 2015.
de Azevedo, Tasso Rezende; Costa Junior, Ciniro; Brandão Junior, Amintas; Cremer, Marcelo Dos Santos; Piatto, Marina; Tsai, David Shiling; Barreto, Paulo; Martins, Heron; Sales, Márcio; Galuchi, Tharic; Rodrigues, Alessandro; Morgado, Renato; Ferreira, André Luis; Barcellos E Silva, Felipe; Viscondi, Gabriel de Freitas; Dos Santos, Karoline Costal; Cunha, Kamyla Borges da; Manetti, Andrea; Coluna, Iris Moura Esteves; Albuquerque, Igor Reis de; Junior, Shigueo Watanabe; Leite, Clauber; Kishinami, Roberto
2018-05-29
This work presents the SEEG platform, a 46-year long dataset of greenhouse gas emissions (GHG) in Brazil (1970-2015) providing more than 2 million data records for the Agriculture, Energy, Industry, Waste and Land Use Change Sectors at national and subnational levels. The SEEG dataset was developed by the Climate Observatory, a Brazilian civil society initiative, based on the IPCC guidelines and Brazilian National Inventories embedded with country specific emission factors and processes, raw data from multiple official and non-official sources, and organized together with social and economic indicators. Once completed, the SEEG dataset was converted into a spreadsheet format and shared via web-platform that, by means of simple queries, allows users to search data by emission sources and country and state activities. Because of its effectiveness in producing and making available data on a consistent and accessible basis, SEEG may significantly increase the capacity of civil society, scientists and stakeholders to understand and anticipate trends related to GHG emissions as well as its implications to public policies in Brazil.
Targeted metabolomics and medication classification data from participants in the ADNI1 cohort.
St John-Williams, Lisa; Blach, Colette; Toledo, Jon B; Rotroff, Daniel M; Kim, Sungeun; Klavins, Kristaps; Baillie, Rebecca; Han, Xianlin; Mahmoudiandehkordi, Siamak; Jack, John; Massaro, Tyler J; Lucas, Joseph E; Louie, Gregory; Motsinger-Reif, Alison A; Risacher, Shannon L; Saykin, Andrew J; Kastenmüller, Gabi; Arnold, Matthias; Koal, Therese; Moseley, M Arthur; Mangravite, Lara M; Peters, Mette A; Tenenbaum, Jessica D; Thompson, J Will; Kaddurah-Daouk, Rima
2017-10-17
Alzheimer's disease (AD) is the most common neurodegenerative disease presenting major health and economic challenges that continue to grow. Mechanisms of disease are poorly understood but significant data point to metabolic defects that might contribute to disease pathogenesis. The Alzheimer Disease Metabolomics Consortium (ADMC) in partnership with Alzheimer Disease Neuroimaging Initiative (ADNI) is creating a comprehensive biochemical database for AD. Using targeted and non- targeted metabolomics and lipidomics platforms we are mapping metabolic pathway and network failures across the trajectory of disease. In this report we present quantitative metabolomics data generated on serum from 199 control, 356 mild cognitive impairment and 175 AD subjects enrolled in ADNI1 using AbsoluteIDQ-p180 platform, along with the pipeline for data preprocessing and medication classification for confound correction. The dataset presented here is the first of eight metabolomics datasets being generated for broad biochemical investigation of the AD metabolome. We expect that these collective metabolomics datasets will provide valuable resources for researchers to identify novel molecular mechanisms contributing to AD pathogenesis and disease phenotypes.
Integrated genome browser: visual analytics platform for genomics.
Freese, Nowlan H; Norris, David C; Loraine, Ann E
2016-07-15
Genome browsers that support fast navigation through vast datasets and provide interactive visual analytics functions can help scientists achieve deeper insight into biological systems. Toward this end, we developed Integrated Genome Browser (IGB), a highly configurable, interactive and fast open source desktop genome browser. Here we describe multiple updates to IGB, including all-new capabilities to display and interact with data from high-throughput sequencing experiments. To demonstrate, we describe example visualizations and analyses of datasets from RNA-Seq, ChIP-Seq and bisulfite sequencing experiments. Understanding results from genome-scale experiments requires viewing the data in the context of reference genome annotations and other related datasets. To facilitate this, we enhanced IGB's ability to consume data from diverse sources, including Galaxy, Distributed Annotation and IGB-specific Quickload servers. To support future visualization needs as new genome-scale assays enter wide use, we transformed the IGB codebase into a modular, extensible platform for developers to create and deploy all-new visualizations of genomic data. IGB is open source and is freely available from http://bioviz.org/igb aloraine@uncc.edu. © The Author 2016. Published by Oxford University Press.
Targeted metabolomics and medication classification data from participants in the ADNI1 cohort
St John-Williams, Lisa; Blach, Colette; Toledo, Jon B.; Rotroff, Daniel M.; Kim, Sungeun; Klavins, Kristaps; Baillie, Rebecca; Han, Xianlin; Mahmoudiandehkordi, Siamak; Jack, John; Massaro, Tyler J.; Lucas, Joseph E.; Louie, Gregory; Motsinger-Reif, Alison A.; Risacher, Shannon L.; Saykin, Andrew J.; Kastenmüller, Gabi; Arnold, Matthias; Koal, Therese; Moseley, M. Arthur; Mangravite, Lara M.; Peters, Mette A.; Tenenbaum, Jessica D.; Thompson, J. Will; Kaddurah-Daouk, Rima
2017-01-01
Alzheimer’s disease (AD) is the most common neurodegenerative disease presenting major health and economic challenges that continue to grow. Mechanisms of disease are poorly understood but significant data point to metabolic defects that might contribute to disease pathogenesis. The Alzheimer Disease Metabolomics Consortium (ADMC) in partnership with Alzheimer Disease Neuroimaging Initiative (ADNI) is creating a comprehensive biochemical database for AD. Using targeted and non- targeted metabolomics and lipidomics platforms we are mapping metabolic pathway and network failures across the trajectory of disease. In this report we present quantitative metabolomics data generated on serum from 199 control, 356 mild cognitive impairment and 175 AD subjects enrolled in ADNI1 using AbsoluteIDQ-p180 platform, along with the pipeline for data preprocessing and medication classification for confound correction. The dataset presented here is the first of eight metabolomics datasets being generated for broad biochemical investigation of the AD metabolome. We expect that these collective metabolomics datasets will provide valuable resources for researchers to identify novel molecular mechanisms contributing to AD pathogenesis and disease phenotypes. PMID:29039849
SEEG initiative estimates of Brazilian greenhouse gas emissions from 1970 to 2015
de Azevedo, Tasso Rezende; Costa Junior, Ciniro; Brandão Junior, Amintas; Cremer, Marcelo dos Santos; Piatto, Marina; Tsai, David Shiling; Barreto, Paulo; Martins, Heron; Sales, Márcio; Galuchi, Tharic; Rodrigues, Alessandro; Morgado, Renato; Ferreira, André Luis; Barcellos e Silva, Felipe; Viscondi, Gabriel de Freitas; dos Santos, Karoline Costal; Cunha, Kamyla Borges da; Manetti, Andrea; Coluna, Iris Moura Esteves; Albuquerque, Igor Reis de; Junior, Shigueo Watanabe; Leite, Clauber; Kishinami, Roberto
2018-01-01
This work presents the SEEG platform, a 46-year long dataset of greenhouse gas emissions (GHG) in Brazil (1970–2015) providing more than 2 million data records for the Agriculture, Energy, Industry, Waste and Land Use Change Sectors at national and subnational levels. The SEEG dataset was developed by the Climate Observatory, a Brazilian civil society initiative, based on the IPCC guidelines and Brazilian National Inventories embedded with country specific emission factors and processes, raw data from multiple official and non-official sources, and organized together with social and economic indicators. Once completed, the SEEG dataset was converted into a spreadsheet format and shared via web-platform that, by means of simple queries, allows users to search data by emission sources and country and state activities. Because of its effectiveness in producing and making available data on a consistent and accessible basis, SEEG may significantly increase the capacity of civil society, scientists and stakeholders to understand and anticipate trends related to GHG emissions as well as its implications to public policies in Brazil. PMID:29809176
An, Ji-Yong; You, Zhu-Hong; Meng, Fan-Rong; Xu, Shu-Juan; Wang, Yin
2016-05-18
Protein-Protein Interactions (PPIs) play essential roles in most cellular processes. Knowledge of PPIs is becoming increasingly more important, which has prompted the development of technologies that are capable of discovering large-scale PPIs. Although many high-throughput biological technologies have been proposed to detect PPIs, there are unavoidable shortcomings, including cost, time intensity, and inherently high false positive and false negative rates. For the sake of these reasons, in silico methods are attracting much attention due to their good performances in predicting PPIs. In this paper, we propose a novel computational method known as RVM-AB that combines the Relevance Vector Machine (RVM) model and Average Blocks (AB) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the AB feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We performed five-fold cross-validation experiments on yeast and Helicobacter pylori datasets, and achieved very high accuracies of 92.98% and 95.58% respectively, which is significantly better than previous works. In addition, we also obtained good prediction accuracies of 88.31%, 89.46%, 91.08%, 91.55%, and 94.81% on other five independent datasets C. elegans, M. musculus, H. sapiens, H. pylori, and E. coli for cross-species prediction. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-AB method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool. To facilitate extensive studies for future proteomics research, we developed a freely available web server called RVMAB-PPI in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/ppi_ab/.
Pratte, Gabrielle; Hurtubise, Karen; Rivard, Lisa; Berbari, Jade; Camden, Chantal
2018-01-01
Web platforms are increasingly used to support virtual interactions between members of communities of practice (CoP). However, little is known about how to develop these platforms to support the implementation of best practices for health care professionals. The aim of this article is to explore pediatric physiotherapists' (PTs) perspectives regarding the utility and usability of the characteristic of a web platform developed to support virtual communities of practice (vCoP). This study adopted an explanatory sequential mixed methods design. A web platform supporting the interactions of vCoP members was developed for PTs working with children with developmental coordination disorder. Specific strategies and features were created to support the effectiveness of the platform across three domains: social, information-quality, and system-quality factors. Quantitative data were collected from a cross-sectional survey (n = 41) after 5 months of access to the web platform. Descriptive statistics were calculated. Qualitative data were also collected from semistructured interviews (n = 9), which were coded, interpreted, and analyzed by using Boucher's Web Ergonomics Conceptual Framework. The utility of web platform characteristics targeting the three key domain factors were generally perceived positively by PTs. However, web platform usability issues were noted by PTs, including problems with navigation and information retrieval. Web platform aiming to support vCoP should be carefully developed to target potential users' needs. Whenever possible, users should co-construct the web platform with vCoP developers. Moreover, each of the developed characteristics (eg, newsletter, search function) should be evaluated in terms of utility and usability for the users.
Comparisons of Supergranule Properties from SDO/HMI with Other Datasets
NASA Technical Reports Server (NTRS)
Pesnell, William Dean; Williams, Peter E.
2010-01-01
While supergranules, a component of solar convection, have been well studied through the use of Dopplergrams, other datasets also exhibit these features. Quiet Sun magnetograms show local magnetic field elements distributed around the boundaries of supergranule cells, notably clustering at the common apex points of adjacent cells, while more solid cellular features are seen near active regions. Ca II K images are notable for exhibiting the chromospheric network representing a cellular distribution of local magnetic field lines across the solar disk that coincides with supergranulation boundaries. Measurements at 304 A further above the solar surface also show a similar pattern to the chromospheric network, but the boundaries are more nebulous in nature. While previous observations of these different solar features were obtained with a variety of instruments, SDO provides a single platform, from which the relevant data products at a high cadence and high-definition image quality are delivered. The images may also be cross-referenced due to their coincidental time of observation. We present images of these different solar features from HMI & AIA and use them to make composite images of supergranules at different atmospheric layers in which they manifest. We also compare each data product to equivalent data from previous observations, for example HMI magnetograms with those from MDI.
Explosion Monitoring with Machine Learning: A LSTM Approach to Seismic Event Discrimination
NASA Astrophysics Data System (ADS)
Magana-Zook, S. A.; Ruppert, S. D.
2017-12-01
The streams of seismic data that analysts look at to discriminate natural from man- made events will soon grow from gigabytes of data per day to exponentially larger rates. This is an interesting problem as the requirement for real-time answers to questions of non-proliferation will remain the same, and the analyst pool cannot grow as fast as the data volume and velocity will. Machine learning is a tool that can solve the problem of seismic explosion monitoring at scale. Using machine learning, and Long Short-term Memory (LSTM) models in particular, analysts can become more efficient by focusing their attention on signals of interest. From a global dataset of earthquake and explosion events, a model was trained to recognize the different classes of events, given their spectrograms. Optimal recurrent node count and training iterations were found, and cross validation was performed to evaluate model performance. A 10-fold mean accuracy of 96.92% was achieved on a balanced dataset of 30,002 instances. Given that the model is 446.52 MB it can be used to simultaneously characterize all incoming signals by researchers looking at events in isolation on desktop machines, as well as at scale on all of the nodes of a real-time streaming platform. LLNL-ABS-735911
Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades.
Orchard, Garrick; Jayawant, Ajinkya; Cohen, Gregory K; Thakor, Nitish
2015-01-01
Creating datasets for Neuromorphic Vision is a challenging task. A lack of available recordings from Neuromorphic Vision sensors means that data must typically be recorded specifically for dataset creation rather than collecting and labeling existing data. The task is further complicated by a desire to simultaneously provide traditional frame-based recordings to allow for direct comparison with traditional Computer Vision algorithms. Here we propose a method for converting existing Computer Vision static image datasets into Neuromorphic Vision datasets using an actuated pan-tilt camera platform. Moving the sensor rather than the scene or image is a more biologically realistic approach to sensing and eliminates timing artifacts introduced by monitor updates when simulating motion on a computer monitor. We present conversion of two popular image datasets (MNIST and Caltech101) which have played important roles in the development of Computer Vision, and we provide performance metrics on these datasets using spike-based recognition algorithms. This work contributes datasets for future use in the field, as well as results from spike-based algorithms against which future works can compare. Furthermore, by converting datasets already popular in Computer Vision, we enable more direct comparison with frame-based approaches.
Segmentation and Visual Analysis of Whole-Body Mouse Skeleton microSPECT
Khmelinskii, Artem; Groen, Harald C.; Baiker, Martin; de Jong, Marion; Lelieveldt, Boudewijn P. F.
2012-01-01
Whole-body SPECT small animal imaging is used to study cancer, and plays an important role in the development of new drugs. Comparing and exploring whole-body datasets can be a difficult and time-consuming task due to the inherent heterogeneity of the data (high volume/throughput, multi-modality, postural and positioning variability). The goal of this study was to provide a method to align and compare side-by-side multiple whole-body skeleton SPECT datasets in a common reference, thus eliminating acquisition variability that exists between the subjects in cross-sectional and multi-modal studies. Six whole-body SPECT/CT datasets of BALB/c mice injected with bone targeting tracers 99mTc-methylene diphosphonate (99mTc-MDP) and 99mTc-hydroxymethane diphosphonate (99mTc-HDP) were used to evaluate the proposed method. An articulated version of the MOBY whole-body mouse atlas was used as a common reference. Its individual bones were registered one-by-one to the skeleton extracted from the acquired SPECT data following an anatomical hierarchical tree. Sequential registration was used while constraining the local degrees of freedom (DoFs) of each bone in accordance to the type of joint and its range of motion. The Articulated Planar Reformation (APR) algorithm was applied to the segmented data for side-by-side change visualization and comparison of data. To quantitatively evaluate the proposed algorithm, bone segmentations of extracted skeletons from the correspondent CT datasets were used. Euclidean point to surface distances between each dataset and the MOBY atlas were calculated. The obtained results indicate that after registration, the mean Euclidean distance decreased from 11.5±12.1 to 2.6±2.1 voxels. The proposed approach yielded satisfactory segmentation results with minimal user intervention. It proved to be robust for “incomplete” data (large chunks of skeleton missing) and for an intuitive exploration and comparison of multi-modal SPECT/CT cross-sectional mouse data. PMID:23152834
Earth resources instrumentation for the Space Station Polar Platform
NASA Technical Reports Server (NTRS)
Donohoe, Martin J.; Vane, Deborah
1986-01-01
The spacecraft and payloads of the Space Station Polar Platform program are described in a brief overview. Present plans call for one platform in a descending morning-equator-crossing orbit at 824 km and two or three platforms in ascending afternoon-crossing orbits at 542-824 km. The components of the NASA Earth Observing System (EOS) and NOAA payloads are listed in tables and briefly characterized, and data-distribution requirements and the mission development schedule are discussed. A drawing of the platform, a graph showing the spectral coverage of the EOS instruments, and a glossary of acronyms are provided.
Hair segmentation using adaptive threshold from edge and branch length measures.
Lee, Ian; Du, Xian; Anthony, Brian
2017-10-01
Non-invasive imaging techniques allow the monitoring of skin structure and diagnosis of skin diseases in clinical applications. However, hair in skin images hampers the imaging and classification of the skin structure of interest. Although many hair segmentation methods have been proposed for digital hair removal, a major challenge in hair segmentation remains in detecting hairs that are thin, overlapping, of similar contrast or color to underlying skin, or overlaid on highly-textured skin structure. To solve the problem, we present an automatic hair segmentation method that uses edge density (ED) and mean branch length (MBL) to measure hair. First, hair is detected by the integration of top-hat transform and modified second-order Gaussian filter. Second, we employ a robust adaptive threshold of ED and MBL to generate a hair mask. Third, the hair mask is refined by k-NN classification of hair and skin pixels. The proposed algorithm was tested using two datasets of healthy skin images and lesion images respectively. These datasets were taken from different imaging platforms in various illumination levels and varying skin colors. We compared the hair detection and segmentation results from our algorithm and six other hair segmentation methods of state of the art. Our method exhibits high value of sensitivity: 75% and specificity: 95%, which indicates significantly higher accuracy and better balance between true positive and false positive detection than the other methods. Published by Elsevier Ltd.
Targeted exploration and analysis of large cross-platform human transcriptomic compendia
Zhu, Qian; Wong, Aaron K; Krishnan, Arjun; Aure, Miriam R; Tadych, Alicja; Zhang, Ran; Corney, David C; Greene, Casey S; Bongo, Lars A; Kristensen, Vessela N; Charikar, Moses; Li, Kai; Troyanskaya, Olga G.
2016-01-01
We present SEEK (http://seek.princeton.edu), a query-based search engine across very large transcriptomic data collections, including thousands of human data sets from almost 50 microarray and next-generation sequencing platforms. SEEK uses a novel query-level cross-validation-based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify query-coregulated genes, pathways, and processes. SEEK provides cross-platform handling, multi-gene query search, iterative metadata-based search refinement, and extensive visualization-based analysis options. PMID:25581801
Pfiffner, P; Stadler, B M; Rasi, C; Scala, E; Mari, A
2012-02-01
Using an in silico allergen clustering method, we have recently shown that allergen extracts are highly cross-reactive. Here we used serological data from a multi-array IgE test based on recombinant or highly purified natural allergens to evaluate whether co-reactions are true cross-reactions or co-sensitizations by allergens with the same motifs. The serum database consisted of 3142 samples, each tested against 103 highly purified natural or recombinant allergens. Cross-reactivity was predicted by an iterative motif-finding algorithm through sequence motifs identified in 2708 known allergens. Allergen proteins containing the same motifs cross-reacted as predicted. However, proteins with identical motifs revealed a hierarchy in the degree of cross-reaction: The more frequent an allergen was positive in the allergic population, the less frequently it was cross-reacting and vice versa. Co-sensitization was analyzed by splitting the dataset into patient groups that were most likely sensitized through geographical occurrence of allergens. Interestingly, most co-reactions are cross-reactions but not co-sensitizations. The observed hierarchy of cross-reactivity may play an important role for the future management of allergic diseases. © 2011 John Wiley & Sons A/S.
Holloway, Andrew J; Oshlack, Alicia; Diyagama, Dileepa S; Bowtell, David DL; Smyth, Gordon K
2006-01-01
Background Concerns are often raised about the accuracy of microarray technologies and the degree of cross-platform agreement, but there are yet no methods which can unambiguously evaluate precision and sensitivity for these technologies on a whole-array basis. Results A methodology is described for evaluating the precision and sensitivity of whole-genome gene expression technologies such as microarrays. The method consists of an easy-to-construct titration series of RNA samples and an associated statistical analysis using non-linear regression. The method evaluates the precision and responsiveness of each microarray platform on a whole-array basis, i.e., using all the probes, without the need to match probes across platforms. An experiment is conducted to assess and compare four widely used microarray platforms. All four platforms are shown to have satisfactory precision but the commercial platforms are superior for resolving differential expression for genes at lower expression levels. The effective precision of the two-color platforms is improved by allowing for probe-specific dye-effects in the statistical model. The methodology is used to compare three data extraction algorithms for the Affymetrix platforms, demonstrating poor performance for the commonly used proprietary algorithm relative to the other algorithms. For probes which can be matched across platforms, the cross-platform variability is decomposed into within-platform and between-platform components, showing that platform disagreement is almost entirely systematic rather than due to measurement variability. Conclusion The results demonstrate good precision and sensitivity for all the platforms, but highlight the need for improved probe annotation. They quantify the extent to which cross-platform measures can be expected to be less accurate than within-platform comparisons for predicting disease progression or outcome. PMID:17118209
Reverse phase protein arrays in signaling pathways: a data integration perspective
Creighton, Chad J; Huang, Shixia
2015-01-01
The reverse phase protein array (RPPA) data platform provides expression data for a prespecified set of proteins, across a set of tissue or cell line samples. Being able to measure either total proteins or posttranslationally modified proteins, even ones present at lower abundances, RPPA represents an excellent way to capture the state of key signaling transduction pathways in normal or diseased cells. RPPA data can be combined with those of other molecular profiling platforms, in order to obtain a more complete molecular picture of the cell. This review offers perspective on the use of RPPA as a component of integrative molecular analysis, using recent case examples from The Cancer Genome Altas consortium, showing how RPPA may provide additional insight into cancer besides what other data platforms may provide. There also exists a clear need for effective visualization approaches to RPPA-based proteomic results; this was highlighted by the recent challenge, put forth by the HPN-DREAM consortium, to develop visualization methods for a highly complex RPPA dataset involving many cancer cell lines, stimuli, and inhibitors applied over time course. In this review, we put forth a number of general guidelines for effective visualization of complex molecular datasets, namely, showing the data, ordering data elements deliberately, enabling generalization, focusing on relevant specifics, and putting things into context. We give examples of how these principles can be utilized in visualizing the intrinsic subtypes of breast cancer and in meaningfully displaying the entire HPN-DREAM RPPA dataset within a single page. PMID:26185419
Quantification of the thorax-to-abdomen breathing ratio for breathing motion modeling.
White, Benjamin M; Zhao, Tianyu; Lamb, James; Bradley, Jeffrey D; Low, Daniel A
2013-06-01
The purpose of this study was to develop a methodology to quantitatively measure the thorax-to-abdomen breathing ratio from a 4DCT dataset for breathing motion modeling and breathing motion studies. The thorax-to-abdomen breathing ratio was quantified by measuring the rate of cross-sectional volume increase throughout the thorax and abdomen as a function of tidal volume. Twenty-six 16-slice 4DCT patient datasets were acquired during quiet respiration using a protocol that acquired 25 ciné scans at each couch position. Fifteen datasets included data from the neck through the pelvis. Tidal volume, measured using a spirometer and abdominal pneumatic bellows, was used as breathing-cycle surrogates. The cross-sectional volume encompassed by the skin contour when compared for each CT slice against the tidal volume exhibited a nearly linear relationship. A robust iteratively reweighted least squares regression analysis was used to determine η(i), defined as the amount of cross-sectional volume expansion at each slice i per unit tidal volume. The sum Ση(i) throughout all slices was predicted to be the ratio of the geometric expansion of the lung and the tidal volume; 1.11. The Xiphoid process was selected as the boundary between the thorax and abdomen. The Xiphoid process slice was identified in a scan acquired at mid-inhalation. The imaging protocol had not originally been designed for purposes of measuring the thorax-to-abdomen breathing ratio so the scans did not extend to the anatomy with η(i) = 0. Extrapolation of η(i)-η(i) = 0 was used to include the entire breathing volume. The thorax and abdomen regions were individually analyzed to determine the thorax-to-abdomen breathing ratios. There were 11 image datasets that had been scanned only through the thorax. For these cases, the abdomen breathing component was equal to 1.11 - Ση(i) where the sum was taken throughout the thorax. The average Ση(i) for thorax and abdomen image datasets was found to be 1.20 ± 0.17, close to the expected value of 1.11. The thorax-to-abdomen breathing ratio was 0.32 ± 0.24. The average Ση(i) was 0.26 ± 0.14 in the thorax and 0.93 ± 0.22 in the abdomen. In the scan datasets that encompassed only the thorax, the average Ση(i) was 0.21 ± 0.11. A method to quantify the relationship between abdomen and thoracic breathing was developed and characterized.
Neuroimaging Data Sharing on the Neuroinformatics Database Platform
Book, Gregory A; Stevens, Michael; Assaf, Michal; Glahn, David; Pearlson, Godfrey D
2015-01-01
We describe the Neuroinformatics Database (NiDB), an open-source database platform for archiving, analysis, and sharing of neuroimaging data. Data from the multi-site projects Autism Brain Imaging Data Exchange (ABIDE), Bipolar-Schizophrenia Network on Intermediate Phenotypes parts one and two (B-SNIP1, B-SNIP2), and Monetary Incentive Delay task (MID) are available for download from the public instance of NiDB, with more projects sharing data as it becomes available. As demonstrated by making several large datasets available, NiDB is an extensible platform appropriately suited to archive and distribute shared neuroimaging data. PMID:25888923
NASA Astrophysics Data System (ADS)
Ajemba, Peter O.; Durdle, Nelson G.; Hill, Doug L.; Raso, V. J.
2006-02-01
The influence of posture and re-positioning (sway and breathing) on the accuracy of a torso imaging system for assessing scoliosis was evaluated. The system comprised of a rotating positioning platform and one or two laser digitizers. It required four partial-scans taken at 90 ° intervals over 10 seconds to generate two complete torso scans. Its accuracy was previously determined to be 1.1+/-0.9mm. Ten evenly spaced cross-sections obtained from forty scans of five volunteers in four postures (free-standing, holding side supports, holding front supports and with their hands on their shoulders) were used to assess the variability due to posture. Twenty cross-sections from twenty scans of two volunteers holding side supports were used to assess the variability due to positioning. The variability due to posture was less than 4mm at each cross-section for all volunteers. Variability due to sway ranged from 0-3.5mm while that due to breathing ranged from 0-3mm for both volunteers. Holding side supports was the best posture. Taking the four shots within 10 seconds was optimal. As major torso features that are indicative of scoliosis are larger than 4mm in size, the system could be used in obtaining complete torso images used in assessing and managing scoliosis.
Liu, Guang-Hui; Shen, Hong-Bin; Yu, Dong-Jun
2016-04-01
Accurately predicting protein-protein interaction sites (PPIs) is currently a hot topic because it has been demonstrated to be very useful for understanding disease mechanisms and designing drugs. Machine-learning-based computational approaches have been broadly utilized and demonstrated to be useful for PPI prediction. However, directly applying traditional machine learning algorithms, which often assume that samples in different classes are balanced, often leads to poor performance because of the severe class imbalance that exists in the PPI prediction problem. In this study, we propose a novel method for improving PPI prediction performance by relieving the severity of class imbalance using a data-cleaning procedure and reducing predicted false positives with a post-filtering procedure: First, a machine-learning-based data-cleaning procedure is applied to remove those marginal targets, which may potentially have a negative effect on training a model with a clear classification boundary, from the majority samples to relieve the severity of class imbalance in the original training dataset; then, a prediction model is trained on the cleaned dataset; finally, an effective post-filtering procedure is further used to reduce potential false positive predictions. Stringent cross-validation and independent validation tests on benchmark datasets demonstrated the efficacy of the proposed method, which exhibits highly competitive performance compared with existing state-of-the-art sequence-based PPIs predictors and should supplement existing PPI prediction methods.
Browser App Approach: Can It Be an Answer to the Challenges in Cross-Platform App Development?
ERIC Educational Resources Information Center
Huynh, Minh; Ghimire, Prashant
2017-01-01
Aim/Purpose: As smartphones proliferate, many different platforms begin to emerge. The challenge to developers as well as IS [Information Systems] educators and students is how to learn the skills to design and develop apps to run on cross-platforms. Background: For developers, the purpose of this paper is to describe an alternative to the complex…
Characterisation of mental health conditions in social media using Informed Deep Learning
Gkotsis, George; Oellrich, Anika; Velupillai, Sumithra; Liakata, Maria; Hubbard, Tim J. P.; Dobson, Richard J. B.; Dutta, Rina
2017-01-01
The number of people affected by mental illness is on the increase and with it the burden on health and social care use, as well as the loss of both productivity and quality-adjusted life-years. Natural language processing of electronic health records is increasingly used to study mental health conditions and risk behaviours on a large scale. However, narrative notes written by clinicians do not capture first-hand the patients’ own experiences, and only record cross-sectional, professional impressions at the point of care. Social media platforms have become a source of ‘in the moment’ daily exchange, with topics including well-being and mental health. In this study, we analysed posts from the social media platform Reddit and developed classifiers to recognise and classify posts related to mental illness according to 11 disorder themes. Using a neural network and deep learning approach, we could automatically recognise mental illness-related posts in our balenced dataset with an accuracy of 91.08% and select the correct theme with a weighted average accuracy of 71.37%. We believe that these results are a first step in developing methods to characterise large amounts of user-generated content that could support content curation and targeted interventions. PMID:28327593
2017-04-01
ADVANCED VISUALIZATION AND INTERACTIVE DISPLAY RAPID INNOVATION AND DISCOVERY EVALUATION RESEARCH (VISRIDER) PROGRAM TASK 6: POINT CLOUD...To) OCT 2013 – SEP 2014 4. TITLE AND SUBTITLE ADVANCED VISUALIZATION AND INTERACTIVE DISPLAY RAPID INNOVATION AND DISCOVERY EVALUATION RESEARCH...various point cloud visualization techniques for viewing large scale LiDAR datasets. Evaluate their potential use for thick client desktop platforms
Ground-based remote sensing of tropospheric water vapour isotopologues within the project MUSICA
NASA Astrophysics Data System (ADS)
Schneider, M.; Barthlott, S.; Hase, F.; González, Y.; Yoshimura, K.; García, O. E.; Sepúlveda, E.; Gomez-Pelaez, A.; Gisi, M.; Kohlhepp, R.; Dohe, S.; Blumenstock, T.; Wiegele, A.; Christner, E.; Strong, K.; Weaver, D.; Palm, M.; Deutscher, N. M.; Warneke, T.; Notholt, J.; Lejeune, B.; Demoulin, P.; Jones, N.; Griffith, D. W. T.; Smale, D.; Robinson, J.
2012-12-01
Within the project MUSICA (MUlti-platform remote Sensing of Isotopologues for investigating the Cycle of Atmospheric water), long-term tropospheric water vapour isotopologue data records are provided for ten globally distributed ground-based mid-infrared remote sensing stations of the NDACC (Network for the Detection of Atmospheric Composition Change). We present a new method allowing for an extensive and straightforward characterisation of the complex nature of such isotopologue remote sensing datasets. We demonstrate that the MUSICA humidity profiles are representative for most of the troposphere with a vertical resolution ranging from about 2 km (in the lower troposphere) to 8 km (in the upper troposphere) and with an estimated precision of better than 10%. We find that the sensitivity with respect to the isotopologue composition is limited to the lower and middle troposphere, whereby we estimate a precision of about 30‰ for the ratio between the two isotopologues HD16O and H216O. The measurement noise, the applied atmospheric temperature profiles, the uncertainty in the spectral baseline, and the cross-dependence on humidity are the leading error sources. We introduce an a posteriori correction method of the cross-dependence on humidity, and we recommend applying it to isotopologue ratio remote sensing datasets in general. In addition, we present mid-infrared CO2 retrievals and use them for demonstrating the MUSICA network-wide data consistency. In order to indicate the potential of long-term isotopologue remote sensing data if provided with a well-documented quality, we present a climatology and compare it to simulations of an isotope incorporated AGCM (Atmospheric General Circulation Model). We identify differences in the multi-year mean and seasonal cycles that significantly exceed the estimated errors, thereby indicating deficits in the modeled atmospheric water cycle.
Telu, Kelly H.; Yan, Xinjian; Wallace, William E.; Stein, Stephen E.; Simón-Manso, Yamil
2016-01-01
RATIONALE The metabolite profiling of a NIST plasma Standard Reference Material (SRM 1950) on different LC-MS platforms showed significant differences. Although these findings suggest caution when interpreting metabolomics results, the degree of overlap of both profiles allowed us to use tandem mass spectral libraries of recurrent spectra to evaluate to what extent these results are transferable across platforms and to develop cross-platform chemical signatures. METHODS Non-targeted global metabolite profiles of SRM 1950 were obtained on different LC-MS platforms using reversed phase chromatography and different chromatographic scales (nano, conventional and UHPLC). The data processing and the metabolite differential analysis were carried out using publically available (XCMS), proprietary (Mass Profiler Professional) and in-house software (NIST pipeline). RESULTS Repeatability and intermediate precision showed that the non-targeted SRM 1950 profiling was highly reproducible when working on the same platform (RSD < 2%); however, substantial differences were found in the LC-MS patterns originating on different platforms or even using different chromatographic scales (conventional HPLC, UHPLC and nanoLC) on the same platform. A substantial degree of overlap (common molecular features) was also found. A procedure to generate consistent chemical signatures using tandem mass spectral libraries of recurrent spectra is proposed. CONLUSIONS Different platforms rendered significantly different metabolite profiles, but the results were highly reproducible when working within one platform. Tandem mass spectral libraries of recurrent spectra are proposed to evaluate the degree of transferability of chemical signatures generated on different platforms. Chemical signatures based on our procedure are most likely cross-platform transferable. PMID:26842580
Time series smoother for effect detection.
You, Cheng; Lin, Dennis K J; Young, S Stanley
2018-01-01
In environmental epidemiology, it is often encountered that multiple time series data with a long-term trend, including seasonality, cannot be fully adjusted by the observed covariates. The long-term trend is difficult to separate from abnormal short-term signals of interest. This paper addresses how to estimate the long-term trend in order to recover short-term signals. Our case study demonstrates that the current spline smoothing methods can result in significant positive and negative cross-correlations from the same dataset, depending on how the smoothing parameters are chosen. To circumvent this dilemma, three classes of time series smoothers are proposed to detrend time series data. These smoothers do not require fine tuning of parameters and can be applied to recover short-term signals. The properties of these smoothers are shown with both a case study using a factorial design and a simulation study using datasets generated from the original dataset. General guidelines are provided on how to discover short-term signals from time series with a long-term trend. The benefit of this research is that a problem is identified and characteristics of possible solutions are determined.
Time series smoother for effect detection
Lin, Dennis K. J.; Young, S. Stanley
2018-01-01
In environmental epidemiology, it is often encountered that multiple time series data with a long-term trend, including seasonality, cannot be fully adjusted by the observed covariates. The long-term trend is difficult to separate from abnormal short-term signals of interest. This paper addresses how to estimate the long-term trend in order to recover short-term signals. Our case study demonstrates that the current spline smoothing methods can result in significant positive and negative cross-correlations from the same dataset, depending on how the smoothing parameters are chosen. To circumvent this dilemma, three classes of time series smoothers are proposed to detrend time series data. These smoothers do not require fine tuning of parameters and can be applied to recover short-term signals. The properties of these smoothers are shown with both a case study using a factorial design and a simulation study using datasets generated from the original dataset. General guidelines are provided on how to discover short-term signals from time series with a long-term trend. The benefit of this research is that a problem is identified and characteristics of possible solutions are determined. PMID:29684033
Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun
2018-01-01
The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.
Online collaboration and model sharing in volcanology via VHub.org
NASA Astrophysics Data System (ADS)
Valentine, G.; Patra, A. K.; Bajo, J. V.; Bursik, M. I.; Calder, E.; Carn, S. A.; Charbonnier, S. J.; Connor, C.; Connor, L.; Courtland, L. M.; Gallo, S.; Jones, M.; Palma Lizana, J. L.; Moore-Russo, D.; Renschler, C. S.; Rose, W. I.
2013-12-01
VHub (short for VolcanoHub, and accessible at vhub.org) is an online platform for barrier free access to high end modeling and simulation and collaboration in research and training related to volcanoes, the hazards they pose, and risk mitigation. The underlying concept is to provide a platform, building upon the successful HUBzero software infrastructure (hubzero.org), that enables workers to collaborate online and to easily share information, modeling and analysis tools, and educational materials with colleagues around the globe. Collaboration occurs around several different points: (1) modeling and simulation; (2) data sharing; (3) education and training; (4) volcano observatories; and (5) project-specific groups. VHub promotes modeling and simulation in two ways: (1) some models can be implemented on VHub for online execution. VHub can provide a central warehouse for such models that should result in broader dissemination. VHub also provides a platform that supports the more complex CFD models by enabling the sharing of code development and problem-solving knowledge, benchmarking datasets, and the development of validation exercises. VHub also provides a platform for sharing of data and datasets. The VHub development team is implementing the iRODS data sharing middleware (see irods.org). iRODS allows a researcher to access data that are located at participating data sources around the world (a cloud of data) as if the data were housed in a single virtual database. Projects associated with VHub are also going to introduce the use of data driven workflow tools to support the use of multistage analysis processes where computing and data are integrated for model validation, hazard analysis etc. Audio-video recordings of seminars, PowerPoint slide sets, and educational simulations are all items that can be placed onto VHub for use by the community or by selected collaborators. An important point is that the manager of a given educational resource (or any other resource, such as a dataset or a model) can control the privacy of that resource, ranging from private (only accessible by, and known to, specific collaborators) to completely public. VHub is a very useful platform for project-specific collaborations. With a group site on VHub collaborators share documents, datasets, maps, and have ongoing discussions using the discussion board function. VHub is funded by the U.S. National Science Foundation, and is participating in development of larger earth-science cyberinfrastructure initiatives (EarthCube), as well as supporting efforts such as the Global Volcano Model. Emerging VHub-facilitated efforts include model benchmarking, collaborative code development, and growth in online modeling tools.
Ellis, Katherine; Godbole, Suneeta; Marshall, Simon; Lanckriet, Gert; Staudenmayer, John; Kerr, Jacqueline
2014-01-01
Active travel is an important area in physical activity research, but objective measurement of active travel is still difficult. Automated methods to measure travel behaviors will improve research in this area. In this paper, we present a supervised machine learning method for transportation mode prediction from global positioning system (GPS) and accelerometer data. We collected a dataset of about 150 h of GPS and accelerometer data from two research assistants following a protocol of prescribed trips consisting of five activities: bicycling, riding in a vehicle, walking, sitting, and standing. We extracted 49 features from 1-min windows of this data. We compared the performance of several machine learning algorithms and chose a random forest algorithm to classify the transportation mode. We used a moving average output filter to smooth the output predictions over time. The random forest algorithm achieved 89.8% cross-validated accuracy on this dataset. Adding the moving average filter to smooth output predictions increased the cross-validated accuracy to 91.9%. Machine learning methods are a viable approach for automating measurement of active travel, particularly for measuring travel activities that traditional accelerometer data processing methods misclassify, such as bicycling and vehicle travel.
Navigation-supported diagnosis of the substantia nigra by matching midbrain sonography and MRI
NASA Astrophysics Data System (ADS)
Salah, Zein; Weise, David; Preim, Bernhard; Classen, Joseph; Rose, Georg
2012-03-01
Transcranial sonography (TCS) is a well-established neuroimaging technique that allows for visualizing several brainstem structures, including the substantia nigra, and helps for the diagnosis and differential diagnosis of various movement disorders, especially in Parkinsonian syndromes. However, proximate brainstem anatomy can hardly be recognized due to the limited image quality of B-scans. In this paper, a visualization system for the diagnosis of the substantia nigra is presented, which utilizes neuronavigated TCS to reconstruct tomographical slices from registered MRI datasets and visualizes them simultaneously with corresponding TCS planes in realtime. To generate MRI tomographical slices, the tracking data of the calibrated ultrasound probe are passed to an optimized slicing algorithm, which computes cross sections at arbitrary positions and orientations from the registered MRI dataset. The extracted MRI cross sections are finally fused with the region of interest from the ultrasound image. The system allows for the computation and visualization of slices at a near real-time rate. Primary tests of the system show an added value to the pure sonographic imaging. The system also allows for reconstructing volumetric (3D) ultrasonic data of the region of interest, and thus contributes to enhancing the diagnostic yield of midbrain sonography.
Zou, Lingyun; Wang, Zhengzhi; Huang, Jiaomin
2007-12-01
Subcellular location is one of the key biological characteristics of proteins. Position-specific profiles (PSP) have been introduced as important characteristics of proteins in this article. In this study, to obtain position-specific profiles, the Position Specific Iterative-Basic Local Alignment Search Tool (PSI-BLAST) has been used to search for protein sequences in a database. Position-specific scoring matrices are extracted from the profiles as one class of characteristics. Four-part amino acid compositions and 1st-7th order dipeptide compositions have also been calculated as the other two classes of characteristics. Therefore, twelve characteristic vectors are extracted from each of the protein sequences. Next, the characteristic vectors are weighed by a simple weighing function and inputted into a BP neural network predictor named PSP-Weighted Neural Network (PSP-WNN). The Levenberg-Marquardt algorithm is employed to adjust the weight matrices and thresholds during the network training instead of the error back propagation algorithm. With a jackknife test on the RH2427 dataset, PSP-WNN has achieved a higher overall prediction accuracy of 88.4% rather than the prediction results by the general BP neural network, Markov model, and fuzzy k-nearest neighbors algorithm on this dataset. In addition, the prediction performance of PSP-WNN has been evaluated with a five-fold cross validation test on the PK7579 dataset and the prediction results have been consistently better than those of the previous method on the basis of several support vector machines, using compositions of both amino acids and amino acid pairs. These results indicate that PSP-WNN is a powerful tool for subcellular localization prediction. At the end of the article, influences on prediction accuracy using different weighting proportions among three characteristic vector categories have been discussed. An appropriate proportion is considered by increasing the prediction accuracy.
Comparison of Sea-Air CO2 Flux Estimates Using Satellite-Based Versus Mooring Wind Speed Data
NASA Astrophysics Data System (ADS)
Sutton, A. J.; Sabine, C. L.; Feely, R. A.; Wanninkhof, R. H.
2016-12-01
The global ocean is a major sink of anthropogenic CO2, absorbing approximately 27% of CO2 emissions since the beginning of the industrial revolution. Any variation or change in the ocean CO2 sink has implications for future climate. Observations of sea-air CO2 flux have relied primarily on ship-based underway measurements of partial pressure of CO2 (pCO2) combined with satellite, model, or multi-platform wind products. Direct measurements of ΔpCO2 (seawater - air pCO2) and wind speed from moored platforms now allow for high-resolution CO2 flux time series. Here we present a comparison of CO2 flux calculated from moored ΔpCO2 measured on four moorings in different biomes of the Pacific Ocean in combination with: 1) Cross-Calibrated Multi-Platform (CCMP) winds or 2) wind speed measurements made on ocean reference moorings excluded from the CCMP dataset. Preliminary results show using CCMP winds overestimates CO2 flux on average by 5% at the Kuroshio Extension Observatory, Ocean Station Papa, WHOI Hawaii Ocean Timeseries Station, and Stratus. In general, CO2 flux seasonality follows patterns of seawater pCO2 and SST with periods of CO2 outgassing during summer and CO2 uptake during winter at these locations. Any offsets or seasonal biases in CCMP winds could impact global ocean sink estimates using this data product. Here we present patterns and trends between the two CO2 flux estimates and discuss the potential implications for tracking variability and change in global ocean CO2 uptake.
Nephele: a cloud platform for simplified, standardized and reproducible microbiome data analysis.
Weber, Nick; Liou, David; Dommer, Jennifer; MacMenamin, Philip; Quiñones, Mariam; Misner, Ian; Oler, Andrew J; Wan, Joe; Kim, Lewis; Coakley McCarthy, Meghan; Ezeji, Samuel; Noble, Karlynn; Hurt, Darrell E
2018-04-15
Widespread interest in the study of the microbiome has resulted in data proliferation and the development of powerful computational tools. However, many scientific researchers lack the time, training, or infrastructure to work with large datasets or to install and use command line tools. The National Institute of Allergy and Infectious Diseases (NIAID) has created Nephele, a cloud-based microbiome data analysis platform with standardized pipelines and a simple web interface for transforming raw data into biological insights. Nephele integrates common microbiome analysis tools as well as valuable reference datasets like the healthy human subjects cohort of the Human Microbiome Project (HMP). Nephele is built on the Amazon Web Services cloud, which provides centralized and automated storage and compute capacity, thereby reducing the burden on researchers and their institutions. https://nephele.niaid.nih.gov and https://github.com/niaid/Nephele. darrell.hurt@nih.gov.
Nephele: a cloud platform for simplified, standardized and reproducible microbiome data analysis
Weber, Nick; Liou, David; Dommer, Jennifer; MacMenamin, Philip; Quiñones, Mariam; Misner, Ian; Oler, Andrew J; Wan, Joe; Kim, Lewis; Coakley McCarthy, Meghan; Ezeji, Samuel; Noble, Karlynn; Hurt, Darrell E
2018-01-01
Abstract Motivation Widespread interest in the study of the microbiome has resulted in data proliferation and the development of powerful computational tools. However, many scientific researchers lack the time, training, or infrastructure to work with large datasets or to install and use command line tools. Results The National Institute of Allergy and Infectious Diseases (NIAID) has created Nephele, a cloud-based microbiome data analysis platform with standardized pipelines and a simple web interface for transforming raw data into biological insights. Nephele integrates common microbiome analysis tools as well as valuable reference datasets like the healthy human subjects cohort of the Human Microbiome Project (HMP). Nephele is built on the Amazon Web Services cloud, which provides centralized and automated storage and compute capacity, thereby reducing the burden on researchers and their institutions. Availability and implementation https://nephele.niaid.nih.gov and https://github.com/niaid/Nephele Contact darrell.hurt@nih.gov PMID:29028892
Husen, Peter; Tarasov, Kirill; Katafiasz, Maciej; Sokol, Elena; Vogt, Johannes; Baumgart, Jan; Nitsch, Robert; Ekroos, Kim; Ejsing, Christer S
2013-01-01
Global lipidomics analysis across large sample sizes produces high-content datasets that require dedicated software tools supporting lipid identification and quantification, efficient data management and lipidome visualization. Here we present a novel software-based platform for streamlined data processing, management and visualization of shotgun lipidomics data acquired using high-resolution Orbitrap mass spectrometry. The platform features the ALEX framework designed for automated identification and export of lipid species intensity directly from proprietary mass spectral data files, and an auxiliary workflow using database exploration tools for integration of sample information, computation of lipid abundance and lipidome visualization. A key feature of the platform is the organization of lipidomics data in "database table format" which provides the user with an unsurpassed flexibility for rapid lipidome navigation using selected features within the dataset. To demonstrate the efficacy of the platform, we present a comparative neurolipidomics study of cerebellum, hippocampus and somatosensory barrel cortex (S1BF) from wild-type and knockout mice devoid of the putative lipid phosphate phosphatase PRG-1 (plasticity related gene-1). The presented framework is generic, extendable to processing and integration of other lipidomic data structures, can be interfaced with post-processing protocols supporting statistical testing and multivariate analysis, and can serve as an avenue for disseminating lipidomics data within the scientific community. The ALEX software is available at www.msLipidomics.info.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thakur, Gautam S; Bhaduri, Budhendra L; Piburn, Jesse O
Geospatial intelligence has traditionally relied on the use of archived and unvarying data for planning and exploration purposes. In consequence, the tools and methods that are architected to provide insight and generate projections only rely on such datasets. Albeit, if this approach has proven effective in several cases, such as land use identification and route mapping, it has severely restricted the ability of researchers to inculcate current information in their work. This approach is inadequate in scenarios requiring real-time information to act and to adjust in ever changing dynamic environments, such as evacuation and rescue missions. In this work, wemore » propose PlanetSense, a platform for geospatial intelligence that is built to harness the existing power of archived data and add to that, the dynamics of real-time streams, seamlessly integrated with sophisticated data mining algorithms and analytics tools for generating operational intelligence on the fly. The platform has four main components i) GeoData Cloud a data architecture for storing and managing disparate datasets; ii) Mechanism to harvest real-time streaming data; iii) Data analytics framework; iv) Presentation and visualization through web interface and RESTful services. Using two case studies, we underpin the necessity of our platform in modeling ambient population and building occupancy at scale.« less
Telu, Kelly H; Yan, Xinjian; Wallace, William E; Stein, Stephen E; Simón-Manso, Yamil
2016-03-15
The metabolite profiling of a NIST plasma Standard Reference Material (SRM 1950) on different liquid chromatography/mass spectrometry (LC/MS) platforms showed significant differences. Although these findings suggest caution when interpreting metabolomics results, the degree of overlap of both profiles allowed us to use tandem mass spectral libraries of recurrent spectra to evaluate to what extent these results are transferable across platforms and to develop cross-platform chemical signatures. Non-targeted global metabolite profiles of SRM 1950 were obtained on different LC/MS platforms using reversed-phase chromatography and different chromatographic scales (conventional HPLC, UHPLC and nanoLC). The data processing and the metabolite differential analysis were carried out using publically available (XCMS), proprietary (Mass Profiler Professional) and in-house software (NIST pipeline). Repeatability and intermediate precision showed that the non-targeted SRM 1950 profiling was highly reproducible when working on the same platform (relative standard deviation (RSD) <2%); however, substantial differences were found in the LC/MS patterns originating on different platforms or even using different chromatographic scales (conventional HPLC, UHPLC and nanoLC) on the same platform. A substantial degree of overlap (common molecular features) was also found. A procedure to generate consistent chemical signatures using tandem mass spectral libraries of recurrent spectra is proposed. Different platforms rendered significantly different metabolite profiles, but the results were highly reproducible when working within one platform. Tandem mass spectral libraries of recurrent spectra are proposed to evaluate the degree of transferability of chemical signatures generated on different platforms. Chemical signatures based on our procedure are most likely cross-platform transferable. Published in 2016. This article is a U.S. Government work and is in the public domain in the USA.
NASA Astrophysics Data System (ADS)
Hudspeth, W. B.; Barrett, H.; Diller, S.; Valentin, G.
2016-12-01
Energize is New Mexico's Experimental Program to Stimulate Competitive Research (NM EPSCoR), funded by the NSF with a focus on building capacity to conduct scientific research. Energize New Mexico leverages the work of faculty and students from NM universities and colleges to provide the tools necessary to a quantitative, science-driven discussion of the state's water policy options and to realize New Mexico's potential for sustainable energy development. This presentation discusses the architectural details of NM EPSCoR's collaborative data management system, GSToRE, and how New Mexico researchers use it to share and analyze diverse research data, with the goal of attaining sustainable energy development in the state.The Earth Data Analysis Center (EDAC) at The University of New Mexico leads the development of computational interoperability capacity that allows the wide use and sharing of energy-related data among NM EPSCoR researchers. Data from a variety of research disciplines is stored and maintained in EDAC's Geographic Storage, Transformation and Retrieval Engine (GSToRE), a distributed platform for large-scale vector and raster data discovery, subsetting, and delivery via Web services that are based on Open Geospatial Consortium (OGC) and REST Web-service standards. Researchers upload and register scientific datasets using a front-end client that collects the critical metadata. In addition, researchers have the option to register their datasets with DataONE, a national, community-driven project that provides access to data across multiple member repositories. The GSToRE platform maintains a searchable, core collection of metadata elements that can be used to deliver metadata in multiple formats, including ISO 19115-2/19139 and FGDC CSDGM. Stored metadata elements also permit the platform to automate the registration of Energize datasets into DataONE, once the datasets are approved for release to the public.
ES-doc-errata: an issue tracker platform for CMIP6
NASA Astrophysics Data System (ADS)
Ben Nasser, Atef; Levavasseur, Guillaume; Greenslade, Mark; Denvil, Sébastien
2017-04-01
In the context of overseeing the quality of data, and as a result of the inherent complexity of projects such as CMIP5/6, it is a mandatory task to keep track of the status of datasets and the version evolution they sustain in their life-cycle. The ESdoc-errata project aims to keep track of the issues affecting specific versions of datasets/files. It enables users to resolve the history tree of each dataset/file enabling a better choice of the data used in their work based on the data status. The ES-doc-errata project has been designed and built on top of the Parent-IDentifiers handle service that will be deployed in the next iteration of the CMIP project, by ensuring maximum usability of ESGF ecosystem and encapsulated in the ES-doc structure. Consuming PIDs from handle service is guided by a specifically built algorithm that extracts meta-data regarding the issues that may or may not affect the quality of datasets/files and cause newer version to be published replacing older deprecated versions. This algorithm is able to deduce the nature of the flaws to the file granularity, that is of high value to the end-user. This new platform has been designed keeping in mind usability by end-users specialized in the data publishing process or other scientists requiring feedback on reliability of data required for their work. To this end, a specific set of rules and a code of conduct has been defined. A validation process ensures the quality of this newly introduced errata meta-data , an authentication safe-guard was implemented to prevent tampering with the archived data, and a wide variety of tools were put at users disposal to interact safely with the platform including a command-line client and a dedicated front-end.
Spatio-Temporal Gap Analysis of OBIS-SEAMAP Project Data: Assessment and Way Forward
Kot, Connie Y.; Fujioka, Ei; Hazen, Lucie J.; Best, Benjamin D.; Read, Andrew J.; Halpin, Patrick N.
2010-01-01
The OBIS-SEAMAP project has acquired and served high-quality marine mammal, seabird, and sea turtle data to the public since its inception in 2002. As data accumulated, spatial and temporal biases resulted and a comprehensive gap analysis was needed in order to assess coverage to direct data acquisition for the OBIS-SEAMAP project and for taxa researchers should true gaps in knowledge exist. All datasets published on OBIS-SEAMAP up to February 2009 were summarized spatially and temporally. Seabirds comprised the greatest number of records, compared to the other two taxa, and most records were from shipboard surveys, compared to the other three platforms. Many of the point observations and polyline tracklines were located in northern and central Atlantic and the northeastern and central-eastern Pacific. The Southern Hemisphere generally had the lowest representation of data, with the least number of records in the southern Atlantic and western Pacific regions. Temporally, records of observations for all taxa were the lowest in fall although the number of animals sighted was lowest in the winter. Oceanographic coverage of observations varied by platform for each taxa, which showed that using two or more platforms represented habitat ranges better than using only one alone. Accessible and published datasets not already incorporated do exist within spatial and temporal gaps identified. Other related open-source data portals also contain data that fill gaps, emphasizing the importance of dedicated data exchange. Temporal and spatial gaps were mostly a result of data acquisition effort, development of regional partnerships and collaborations, and ease of field data collection. Future directions should include fostering partnerships with researchers in the Southern Hemisphere while targeting datasets containing species with limited representation. These results can facilitate prioritizing datasets needed to be represented and for planning research for true gaps in space and time. PMID:20886047
Spatio-temporal gap analysis of OBIS-SEAMAP project data: assessment and way forward.
Kot, Connie Y; Fujioka, Ei; Hazen, Lucie J; Best, Benjamin D; Read, Andrew J; Halpin, Patrick N
2010-09-24
The OBIS-SEAMAP project has acquired and served high-quality marine mammal, seabird, and sea turtle data to the public since its inception in 2002. As data accumulated, spatial and temporal biases resulted and a comprehensive gap analysis was needed in order to assess coverage to direct data acquisition for the OBIS-SEAMAP project and for taxa researchers should true gaps in knowledge exist. All datasets published on OBIS-SEAMAP up to February 2009 were summarized spatially and temporally. Seabirds comprised the greatest number of records, compared to the other two taxa, and most records were from shipboard surveys, compared to the other three platforms. Many of the point observations and polyline tracklines were located in northern and central Atlantic and the northeastern and central-eastern Pacific. The Southern Hemisphere generally had the lowest representation of data, with the least number of records in the southern Atlantic and western Pacific regions. Temporally, records of observations for all taxa were the lowest in fall although the number of animals sighted was lowest in the winter. Oceanographic coverage of observations varied by platform for each taxa, which showed that using two or more platforms represented habitat ranges better than using only one alone. Accessible and published datasets not already incorporated do exist within spatial and temporal gaps identified. Other related open-source data portals also contain data that fill gaps, emphasizing the importance of dedicated data exchange. Temporal and spatial gaps were mostly a result of data acquisition effort, development of regional partnerships and collaborations, and ease of field data collection. Future directions should include fostering partnerships with researchers in the Southern Hemisphere while targeting datasets containing species with limited representation. These results can facilitate prioritizing datasets needed to be represented and for planning research for true gaps in space and time.
Future Directions for Astronomical Image Display
NASA Technical Reports Server (NTRS)
Mandel, Eric
2000-01-01
In the "Future Directions for Astronomical Image Displav" project, the Smithsonian Astrophysical Observatory (SAO) and the National Optical Astronomy Observatories (NOAO) evolved our existing image display program into fully extensible. cross-platform image display software. We also devised messaging software to support integration of image display into astronomical analysis systems. Finally, we migrated our software from reliance on Unix and the X Window System to a platform-independent architecture that utilizes the cross-platform Tcl/Tk technology.
Statistical link between external climate forcings and modes of ocean variability
NASA Astrophysics Data System (ADS)
Malik, Abdul; Brönnimann, Stefan; Perona, Paolo
2017-07-01
In this study we investigate statistical link between external climate forcings and modes of ocean variability on inter-annual (3-year) to centennial (100-year) timescales using de-trended semi-partial-cross-correlation analysis technique. To investigate this link we employ observations (AD 1854-1999), climate proxies (AD 1600-1999), and coupled Atmosphere-Ocean-Chemistry Climate Model simulations with SOCOL-MPIOM (AD 1600-1999). We find robust statistical evidence that Atlantic multi-decadal oscillation (AMO) has intrinsic positive correlation with solar activity in all datasets employed. The strength of the relationship between AMO and solar activity is modulated by volcanic eruptions and complex interaction among modes of ocean variability. The observational dataset reveals that El Niño southern oscillation (ENSO) has statistically significant negative intrinsic correlation with solar activity on decadal to multi-decadal timescales (16-27-year) whereas there is no evidence of a link on a typical ENSO timescale (2-7-year). In the observational dataset, the volcanic eruptions do not have a link with AMO on a typical AMO timescale (55-80-year) however the long-term datasets (proxies and SOCOL-MPIOM output) show that volcanic eruptions have intrinsic negative correlation with AMO on inter-annual to multi-decadal timescales. The Pacific decadal oscillation has no link with solar activity, however, it has positive intrinsic correlation with volcanic eruptions on multi-decadal timescales (47-54-year) in reconstruction and decadal to multi-decadal timescales (16-32-year) in climate model simulations. We also find evidence of a link between volcanic eruptions and ENSO, however, the sign of relationship is not consistent between observations/proxies and climate model simulations.
Architectural Implications for Spatial Object Association Algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, V S; Kurc, T; Saltz, J
2009-01-29
Spatial object association, also referred to as cross-match of spatial datasets, is the problem of identifying and comparing objects in two or more datasets based on their positions in a common spatial coordinate system. In this work, we evaluate two crossmatch algorithms that are used for astronomical sky surveys, on the following database system architecture configurations: (1) Netezza Performance Server R, a parallel database system with active disk style processing capabilities, (2) MySQL Cluster, a high-throughput network database system, and (3) a hybrid configuration consisting of a collection of independent database system instances with data replication support. Our evaluation providesmore » insights about how architectural characteristics of these systems affect the performance of the spatial crossmatch algorithms. We conducted our study using real use-case scenarios borrowed from a large-scale astronomy application known as the Large Synoptic Survey Telescope (LSST).« less
Real-time FPGA-based radar imaging for smart mobility systems
NASA Astrophysics Data System (ADS)
Saponara, Sergio; Neri, Bruno
2016-04-01
The paper presents an X-band FMCW (Frequency Modulated Continuous Wave) Radar Imaging system, called X-FRI, for surveillance in smart mobility applications. X-FRI allows for detecting the presence of targets (e.g. obstacles in a railway crossing or urban road crossing, or ships in a small harbor), as well as their speed and their position. With respect to alternative solutions based on LIDAR or camera systems, X-FRI operates in real-time also in bad lighting and weather conditions, night and day. The radio-frequency transceiver is realized through COTS (Commercial Off The Shelf) components on a single-board. An FPGA-based baseband platform allows for real-time Radar image processing.
Utility of AIRS Retrievals for Climate Studies
NASA Technical Reports Server (NTRS)
Molnar, Guyla I.; Susskind, Joel
2007-01-01
Satellites provide an ideal platform to study the Earth-atmosphere system on practically all spatial and temporal scales. Thus, one may expect that their rapidly growing datasets could provide crucial insights not only for short-term weather processes/predictions but into ongoing and future climate change processes as well. Though Earth-observing satellites have been around for decades, extracting climatically reliable information from their widely varying datasets faces rather formidable challenges. AIRS/AMSU is a state of the art infrared/microwave sounding system that was launched on the EOS Aqua platform on May 4, 2002, and has been providing operational quality measurements since September 2002. In addition to temperature and atmospheric constituent profiles, outgoing longwave radiation and basic cloud parameters are also derived from the AIRS/AMSU observations. However, so far the AIRS products have not been rigorously evaluated and/or validated on a large scale. Here we present preliminary assessments of monthly and 8-day mean AIRS "Version 4.0" retrieved products (available to the public through the DAAC at NASA/GSFC) to assess their utility for climate studies. First we present "consistency checks" by evaluating the time series of means, and "anomalies" (relative to the first 4 full years' worth of AIRS "climate statistics") of several climatically important retrieved parameters. Finally, we also present preliminary results regarding interrelationships of some of these geophysical variables, to assess to what extent they are consistent with the known physics of climate variability/change. In particular, we find at least one observed relationship which contradicts current general circulation climate (GCM) model results: the global water vapor climate feedback which is expected to be strongly positive is deduced to be slightly negative (shades of the "Lindzen effect"?). Though the current AIRS climatology covers only -4.5 years, it will hopefully extend much further into the future.
NASA Astrophysics Data System (ADS)
Yang, D.; Fu, C. S.; Binford, M. W.
2017-12-01
The southeastern United States has high landscape heterogeneity, withheavily managed forestlands, highly developed agriculture lands, and multiple metropolitan areas. Human activities are transforming and altering land patterns and structures in both negative and positive manners. A land-use map for at the greater scale is a heavy computation task but is critical to most landowners, researchers, and decision makers, enabling them to make informed decisions for varying objectives. There are two major difficulties in generating the classification maps at the regional scale: the necessity of large training point sets and the expensive computation cost-in terms of both money and time-in classifier modeling. Volunteered Geographic Information (VGI) opens a new era in mapping and visualizing our world, where the platform is open for collecting valuable georeferenced information by volunteer citizens, and the data is freely available to the public. As one of the most well-known VGI initiatives, OpenStreetMap (OSM) contributes not only road network distribution, but also the potential for using this data to justify land cover and land use classifications. Google Earth Engine (GEE) is a platform designed for cloud-based mapping with a robust and fast computing power. Most large scale and national mapping approaches confuse "land cover" and "land-use", or build up the land-use database based on modeled land cover datasets. Unlike most other large-scale approaches, we distinguish and differentiate land-use from land cover. By focusing our prime objective of mapping land-use and management practices, a robust regional land-use mapping approach is developed by incorporating the OpenstreepMap dataset into Earth observation remote sensing imageries instead of the often-used land cover base maps.
A training course on tropical cyclones over the eastern Pacific Ocean
NASA Astrophysics Data System (ADS)
Farfan, L. M.; Pozo, D.; Raga, G.; Romero, R.; Zavala, J.
2008-05-01
As part of a research project funded by the Inter-American Institute for Global Change Research (IAI), we are performing a short course based on the current understanding of tropical cyclones in the eastern Pacific basin. In particular, we are focused in discussing the formation and intensification off the Mexican coast. Our main goal is to train students from higher-education institutions from selected countries in Latin America. Our approach includes the review of climatological features derived from the best-track dataset issued by the National Hurricane Center. Using this dataset, we built a climatology of relevant positions and storm tracks for the base period 1970-2006. Additionally, we designed hands-on sessions in which students analyze satellite imagery from several platforms (GOES, QuikSCATT and TRMM) along with mesoscale model simulations from the WRF model. Case studies that resulted in landfall over northwestern Mexico are used; this includes Hurricanes John, Lane and Paul all of which developed during the season of 2006. So far, the course has been taught in the Atmospheric Sciences Department at the University of Buenos Aires, Argentina, and in La Paz, Mexico, with students from Mexico, Chile, Brazil, Costa Rica and Cuba.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kang, Shujiang; Kline, Keith L; Nair, S. Surendran
A global energy crop productivity model that provides geospatially explicit quantitative details on biomass potential and factors affecting sustainability would be useful, but does not exist now. This study describes a modeling platform capable of meeting many challenges associated with global-scale agro-ecosystem modeling. We designed an analytical framework for bioenergy crops consisting of six major components: (i) standardized natural resources datasets, (ii) global field-trial data and crop management practices, (iii) simulation units and management scenarios, (iv) model calibration and validation, (v) high-performance computing (HPC) simulation, and (vi) simulation output processing and analysis. The HPC-Environmental Policy Integrated Climate (HPC-EPIC) model simulatedmore » a perennial bioenergy crop, switchgrass (Panicum virgatum L.), estimating feedstock production potentials and effects across the globe. This modeling platform can assess soil C sequestration, net greenhouse gas (GHG) emissions, nonpoint source pollution (e.g., nutrient and pesticide loss), and energy exchange with the atmosphere. It can be expanded to include additional bioenergy crops (e.g., miscanthus, energy cane, and agave) and food crops under different management scenarios. The platform and switchgrass field-trial dataset are available to support global analysis of biomass feedstock production potential and corresponding metrics of sustainability.« less
Dreyer, Florian S; Cantone, Martina; Eberhardt, Martin; Jaitly, Tanushree; Walter, Lisa; Wittmann, Jürgen; Gupta, Shailendra K; Khan, Faiz M; Wolkenhauer, Olaf; Pützer, Brigitte M; Jäck, Hans-Martin; Heinzerling, Lucie; Vera, Julio
2018-06-01
Cellular phenotypes are established and controlled by complex and precisely orchestrated molecular networks. In cancer, mutations and dysregulations of multiple molecular factors perturb the regulation of these networks and lead to malignant transformation. High-throughput technologies are a valuable source of information to establish the complex molecular relationships behind the emergence of malignancy, but full exploitation of this massive amount of data requires bioinformatics tools that rely on network-based analyses. In this report we present the Virtual Melanoma Cell, an online tool developed to facilitate the mining and interpretation of high-throughput data on melanoma by biomedical researches. The platform is based on a comprehensive, manually generated and expert-validated regulatory map composed of signaling pathways important in malignant melanoma. The Virtual Melanoma Cell is a tool designed to accept, visualize and analyze user-generated datasets. It is available at: https://www.vcells.net/melanoma. To illustrate the utilization of the web platform and the regulatory map, we have analyzed a large publicly available dataset accounting for anti-PD1 immunotherapy treatment of malignant melanoma patients. Copyright © 2018 Elsevier B.V. All rights reserved.
An incremental anomaly detection model for virtual machines.
Zhang, Hancui; Chen, Shuyu; Liu, Jun; Zhou, Zhen; Wu, Tianshu
2017-01-01
Self-Organizing Map (SOM) algorithm as an unsupervised learning method has been applied in anomaly detection due to its capabilities of self-organizing and automatic anomaly prediction. However, because of the algorithm is initialized in random, it takes a long time to train a detection model. Besides, the Cloud platforms with large scale virtual machines are prone to performance anomalies due to their high dynamic and resource sharing characters, which makes the algorithm present a low accuracy and a low scalability. To address these problems, an Improved Incremental Self-Organizing Map (IISOM) model is proposed for anomaly detection of virtual machines. In this model, a heuristic-based initialization algorithm and a Weighted Euclidean Distance (WED) algorithm are introduced into SOM to speed up the training process and improve model quality. Meanwhile, a neighborhood-based searching algorithm is presented to accelerate the detection time by taking into account the large scale and high dynamic features of virtual machines on cloud platform. To demonstrate the effectiveness, experiments on a common benchmark KDD Cup dataset and a real dataset have been performed. Results suggest that IISOM has advantages in accuracy and convergence velocity of anomaly detection for virtual machines on cloud platform.
An incremental anomaly detection model for virtual machines
Zhang, Hancui; Chen, Shuyu; Liu, Jun; Zhou, Zhen; Wu, Tianshu
2017-01-01
Self-Organizing Map (SOM) algorithm as an unsupervised learning method has been applied in anomaly detection due to its capabilities of self-organizing and automatic anomaly prediction. However, because of the algorithm is initialized in random, it takes a long time to train a detection model. Besides, the Cloud platforms with large scale virtual machines are prone to performance anomalies due to their high dynamic and resource sharing characters, which makes the algorithm present a low accuracy and a low scalability. To address these problems, an Improved Incremental Self-Organizing Map (IISOM) model is proposed for anomaly detection of virtual machines. In this model, a heuristic-based initialization algorithm and a Weighted Euclidean Distance (WED) algorithm are introduced into SOM to speed up the training process and improve model quality. Meanwhile, a neighborhood-based searching algorithm is presented to accelerate the detection time by taking into account the large scale and high dynamic features of virtual machines on cloud platform. To demonstrate the effectiveness, experiments on a common benchmark KDD Cup dataset and a real dataset have been performed. Results suggest that IISOM has advantages in accuracy and convergence velocity of anomaly detection for virtual machines on cloud platform. PMID:29117245
Evaluation of Smartphone Inertial Sensor Performance for Cross-Platform Mobile Applications
Kos, Anton; Tomažič, Sašo; Umek, Anton
2016-01-01
Smartphone sensors are being increasingly used in mobile applications. The performance of sensors varies considerably among different smartphone models and the development of a cross-platform mobile application might be a very complex and demanding task. A publicly accessible resource containing real-life-situation smartphone sensor parameters could be of great help for cross-platform developers. To address this issue we have designed and implemented a pilot participatory sensing application for measuring, gathering, and analyzing smartphone sensor parameters. We start with smartphone accelerometer and gyroscope bias and noise parameters. The application database presently includes sensor parameters of more than 60 different smartphone models of different platforms. It is a modest, but important start, offering information on several statistical parameters of the measured smartphone sensors and insights into their performance. The next step, a large-scale cloud-based version of the application, is already planned. The large database of smartphone sensor parameters may prove particularly useful for cross-platform developers. It may also be interesting for individual participants who would be able to check-up and compare their smartphone sensors against a large number of similar or identical models. PMID:27049391
NASA Astrophysics Data System (ADS)
Zlatanovic, Nikola; Milovanovic, Irina; Cotric, Jelena
2014-05-01
Drainage basins are for the most part ungauged or poorly gauged not only in Serbia but in most parts of the world, usually due to insufficient funds, but also the decommission of river gauges in upland catchments to focus on downstream areas which are more populated. Very often, design discharges are needed for these streams or rivers where no streamflow data is available, for various applications. Examples include river training works for flood protection measures or erosion control, design of culverts, water supply facilities, small hydropower plants etc. The estimation of discharges in ungauged basins is most often performed using rainfall-runoff models, whose parameters heavily rely on geomorphometric attributes of the basin (e.g. catchment area, elevation, slopes of channels and hillslopes etc.). The calculation of these, as well as other paramaters, is most often done in GIS (Geographic Information System) software environments. This study deals with the application of freely available and open source software and datasets for automating rainfall-runoff analysis of ungauged basins using methodologies currently in use hydrological practice. The R programming language was used for scripting and automating the hydrological calculations, coupled with SAGA GIS (System for Automated Geoscientivic Analysis) for geocomputing functions and terrain analysis. Datasets used in the analyses include the freely available SRTM (Shuttle Radar Topography Mission) terrain data, CORINE (Coordination of Information on the Environment) Land Cover data, as well as soil maps and rainfall data. The choice of free and open source software and datasets makes the project ideal for academic and research purposes and cross-platform projects. The geomorphometric module was tested on more than 100 catchments throughout Serbia and compared to manually calculated values (using topographic maps). The discharge estimation module was tested on 21 catchments where data were available and compared to results obtained by frequency analysis of annual maximum discharge. The geomorphometric module of the calculation system showed excellent results, saving a great deal of time that would otherwise have been spent on manual processing of geospatial data. This type of automated analysis presented in this study will enable a much quicker hydrologic analysis on multiple watersheds, providing the platform for further research into spatial variability of runoff.
Analyzing engagement in a web-based intervention platform through visualizing log-data.
Morrison, Cecily; Doherty, Gavin
2014-11-13
Engagement has emerged as a significant cross-cutting concern within the development of Web-based interventions. There have been calls to institute a more rigorous approach to the design of Web-based interventions, to increase both the quantity and quality of engagement. One approach would be to use log-data to better understand the process of engagement and patterns of use. However, an important challenge lies in organizing log-data for productive analysis. Our aim was to conduct an initial exploration of the use of visualizations of log-data to enhance understanding of engagement with Web-based interventions. We applied exploratory sequential data analysis to highlight sequential aspects of the log data, such as time or module number, to provide insights into engagement. After applying a number of processing steps, a range of visualizations were generated from the log-data. We then examined the usefulness of these visualizations for understanding the engagement of individual users and the engagement of cohorts of users. The visualizations created are illustrated with two datasets drawn from studies using the SilverCloud Platform: (1) a small, detailed dataset with interviews (n=19) and (2) a large dataset (n=326) with 44,838 logged events. We present four exploratory visualizations of user engagement with a Web-based intervention, including Navigation Graph, Stripe Graph, Start-Finish Graph, and Next Action Heat Map. The first represents individual usage and the last three, specific aspects of cohort usage. We provide examples of each with a discussion of salient features. Log-data analysis through data visualization is an alternative way of exploring user engagement with Web-based interventions, which can yield different insights than more commonly used summative measures. We describe how understanding the process of engagement through visualizations can support the development and evaluation of Web-based interventions. Specifically, we show how visualizations can (1) allow inspection of content or feature usage in a temporal relationship to the overall program at different levels of granularity, (2) detect different patterns of use to consider personalization in the design process, (3) detect usability issues, (4) enable exploratory analysis to support the design of statistical queries to summarize the data, (5) provide new opportunities for real-time evaluation, and (6) examine assumptions about interactivity that underlie many summative measures in this field.
Analyzing Engagement in a Web-Based Intervention Platform Through Visualizing Log-Data
2014-01-01
Background Engagement has emerged as a significant cross-cutting concern within the development of Web-based interventions. There have been calls to institute a more rigorous approach to the design of Web-based interventions, to increase both the quantity and quality of engagement. One approach would be to use log-data to better understand the process of engagement and patterns of use. However, an important challenge lies in organizing log-data for productive analysis. Objective Our aim was to conduct an initial exploration of the use of visualizations of log-data to enhance understanding of engagement with Web-based interventions. Methods We applied exploratory sequential data analysis to highlight sequential aspects of the log data, such as time or module number, to provide insights into engagement. After applying a number of processing steps, a range of visualizations were generated from the log-data. We then examined the usefulness of these visualizations for understanding the engagement of individual users and the engagement of cohorts of users. The visualizations created are illustrated with two datasets drawn from studies using the SilverCloud Platform: (1) a small, detailed dataset with interviews (n=19) and (2) a large dataset (n=326) with 44,838 logged events. Results We present four exploratory visualizations of user engagement with a Web-based intervention, including Navigation Graph, Stripe Graph, Start–Finish Graph, and Next Action Heat Map. The first represents individual usage and the last three, specific aspects of cohort usage. We provide examples of each with a discussion of salient features. Conclusions Log-data analysis through data visualization is an alternative way of exploring user engagement with Web-based interventions, which can yield different insights than more commonly used summative measures. We describe how understanding the process of engagement through visualizations can support the development and evaluation of Web-based interventions. Specifically, we show how visualizations can (1) allow inspection of content or feature usage in a temporal relationship to the overall program at different levels of granularity, (2) detect different patterns of use to consider personalization in the design process, (3) detect usability issues, (4) enable exploratory analysis to support the design of statistical queries to summarize the data, (5) provide new opportunities for real-time evaluation, and (6) examine assumptions about interactivity that underlie many summative measures in this field. PMID:25406097
Research on cross - Project software defect prediction based on transfer learning
NASA Astrophysics Data System (ADS)
Chen, Ya; Ding, Xiaoming
2018-04-01
According to the two challenges in the prediction of cross-project software defects, the distribution differences between the source project and the target project dataset and the class imbalance in the dataset, proposing a cross-project software defect prediction method based on transfer learning, named NTrA. Firstly, solving the source project data's class imbalance based on the Augmented Neighborhood Cleaning Algorithm. Secondly, the data gravity method is used to give different weights on the basis of the attribute similarity of source project and target project data. Finally, a defect prediction model is constructed by using Trad boost algorithm. Experiments were conducted using data, come from NASA and SOFTLAB respectively, from a published PROMISE dataset. The results show that the method has achieved good values of recall and F-measure, and achieved good prediction results.
NASA Astrophysics Data System (ADS)
Gordov, Evgeny; Lykosov, Vasily; Krupchatnikov, Vladimir; Okladnikov, Igor; Titov, Alexander; Shulgina, Tamara
2013-04-01
Analysis of growing volume of related to climate change data from sensors and model outputs requires collaborative multidisciplinary efforts of researchers. To do it timely and in reliable way one needs in modern information-computational infrastructure supporting integrated studies in the field of environmental sciences. Recently developed experimental software and hardware platform Climate (http://climate.scert.ru/) provides required environment for regional climate change related investigations. The platform combines modern web 2.0 approach, GIS-functionality and capabilities to run climate and meteorological models, process large geophysical datasets and support relevant analysis. It also supports joint software development by distributed research groups, and organization of thematic education for students and post-graduate students. In particular, platform software developed includes dedicated modules for numerical processing of regional and global modeling results for consequent analysis and visualization. Also run of integrated into the platform WRF and «Planet Simulator» models, modeling results data preprocessing and visualization is provided. All functions of the platform are accessible by a user through a web-portal using common graphical web-browser in the form of an interactive graphical user interface which provides, particularly, capabilities of selection of geographical region of interest (pan and zoom), data layers manipulation (order, enable/disable, features extraction) and visualization of results. Platform developed provides users with capabilities of heterogeneous geophysical data analysis, including high-resolution data, and discovering of tendencies in climatic and ecosystem changes in the framework of different multidisciplinary researches. Using it even unskilled user without specific knowledge can perform reliable computational processing and visualization of large meteorological, climatic and satellite monitoring datasets through unified graphical web-interface. Partial support of RF Ministry of Education and Science grant 8345, SB RAS Program VIII.80.2 and Projects 69, 131, 140 and APN CBA2012-16NSY project is acknowledged.
Omicseq: a web-based search engine for exploring omics datasets
Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng
2017-01-01
Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462
NASA Astrophysics Data System (ADS)
Shi, Chunhua; Huang, Ying; Guo, Dong; Zhou, Shunwu; Hu, Kaixi; Liu, Yu
2018-05-01
The South Asian High (SAH) has an important influence on atmospheric circulation and the Asian climate in summer. However, current comparative analyses of the SAH are mostly between reanalysis datasets and there is a lack of sounding data. We therefore compared the climatology, trends and abrupt changes in the SAH in the Japanese 55-year Reanalysis (JRA-55) dataset, the National Centers for Environmental Prediction Climate Forecast System Reanalysis (NCEP-CFSR) dataset, the European Center for Medium-Range Weather Forecasts Reanalysis Interim (ERA-interim) dataset and radiosonde data from China using linear analysis and a sliding t-test. The trends in geopotential height in the control area of the SAH were positive in the JRA-55, NCEP-CFSR and ERA-interim datasets, but negative in the radiosonde data in the time period 1979-2014. The negative trends for the SAH were significant at the 90% confidence level in the radiosonde data from May to September. The positive trends in the NCEP-CFSR dataset were significant at the 90% confidence level in May, July, August and September, but the positive trends in the JRA-55 and ERA-Interim were only significant at the 90% confidence level in September. The reasons for the differences in the trends of the SAH between the radiosonde data and the three reanalysis datasets in the time period 1979-2014 were updates to the sounding systems, changes in instrumentation and improvements in the radiation correction method for calculations around the year 2000. We therefore analyzed the trends in the two time periods of 1979-2000 and 2001-2014 separately. From 1979 to 2000, the negative SAH trends in the radiosonde data mainly agreed with the negative trends in the NCEP-CFSR dataset, but were in contrast with the positive trends in the JRA-55 and ERA-Interim datasets. In 2001-2014, however, the trends in the SAH were positive in all four datasets and most of the trends in the radiosonde and NCEP-CFSR datasets were significant. It is therefore better to use the NCEP-CFSR dataset than the JRA-55 and ERA-Interim datasets when discussing trends in the SAH.
The first MICCAI challenge on PET tumor segmentation.
Hatt, Mathieu; Laurent, Baptiste; Ouahabi, Anouar; Fayad, Hadi; Tan, Shan; Li, Laquan; Lu, Wei; Jaouen, Vincent; Tauber, Clovis; Czakon, Jakub; Drapejkowski, Filip; Dyrka, Witold; Camarasu-Pop, Sorina; Cervenansky, Frédéric; Girard, Pascal; Glatard, Tristan; Kain, Michael; Yao, Yao; Barillot, Christian; Kirov, Assen; Visvikis, Dimitris
2018-02-01
Automatic functional volume segmentation in PET images is a challenge that has been addressed using a large array of methods. A major limitation for the field has been the lack of a benchmark dataset that would allow direct comparison of the results in the various publications. In the present work, we describe a comparison of recent methods on a large dataset following recommendations by the American Association of Physicists in Medicine (AAPM) task group (TG) 211, which was carried out within a MICCAI (Medical Image Computing and Computer Assisted Intervention) challenge. Organization and funding was provided by France Life Imaging (FLI). A dataset of 176 images combining simulated, phantom and clinical images was assembled. A website allowed the participants to register and download training data (n = 19). Challengers then submitted encapsulated pipelines on an online platform that autonomously ran the algorithms on the testing data (n = 157) and evaluated the results. The methods were ranked according to the arithmetic mean of sensitivity and positive predictive value. Sixteen teams registered but only four provided manuscripts and pipeline(s) for a total of 10 methods. In addition, results using two thresholds and the Fuzzy Locally Adaptive Bayesian (FLAB) were generated. All competing methods except one performed with median accuracy above 0.8. The method with the highest score was the convolutional neural network-based segmentation, which significantly outperformed 9 out of 12 of the other methods, but not the improved K-Means, Gaussian Model Mixture and Fuzzy C-Means methods. The most rigorous comparative study of PET segmentation algorithms to date was carried out using a dataset that is the largest used in such studies so far. The hierarchy amongst the methods in terms of accuracy did not depend strongly on the subset of datasets or the metrics (or combination of metrics). All the methods submitted by the challengers except one demonstrated good performance with median accuracy scores above 0.8. Copyright © 2017 Elsevier B.V. All rights reserved.
Li, Pin-Lan; Zhang, Yang
2013-01-01
Recent studies have demonstrated that cross talk between ceramide and redox signaling modulates various cell activities and functions and contributes to the development of cardiovascular diseases and renal dysfunctions. Ceramide triggers the generation of reactive oxygen species (ROS) and increases oxidative stress in many mammalian cells and animal models. On the other hand, inhibition of ROS-generating enzymes or treatment of antioxidants impairs sphingomyelinase activation and ceramide production. As a mechanism, ceramide-enriched signaling platforms, special cell membrane rafts (MR) (formerly lipid rafts), provide an important microenvironment to mediate the cross talk of ceramide and redox signaling to exert a corresponding regulatory role on cell and organ functions. In this regard, activation of acid sphingomyelinase and generation of ceramide mediate the formation of ceramide-enriched membrane platforms, where transmembrane signals are transmitted or amplified through recruitment, clustering, assembling, or integration of various signaling molecules. A typical such signaling platform is MR redox signaling platform that is centered on ceramide production and aggregation leading to recruitment and assembling of NADPH oxidase to form an active complex in the cell plasma membrane. This redox signaling platform not only conducts redox signaling or regulation but also facilitates a feedforward amplification of both ceramide and redox signaling. In addition to this membrane MR redox signaling platform, the cross talk between ceramide and redox signaling may occur in other cell compartments. This book chapter focuses on the molecular mechanisms, spatial-temporal regulations, and implications of this cross talk between ceramide and redox signaling, which may provide novel insights into the understanding of both ceramide and redox signaling pathways.
Regenerator cross arm seal assembly
Jackman, Anthony V.
1988-01-01
A seal assembly for disposition between a cross arm on a gas turbine engine block and a regenerator disc, the seal assembly including a platform coextensive with the cross arm, a seal and wear layer sealingly and slidingly engaging the regenerator disc, a porous and compliant support layer between the platform and the seal and wear layer porous enough to permit flow of cooling air therethrough and compliant to accommodate relative thermal growth and distortion, a dike between the seal and wear layer and the platform for preventing cross flow through the support layer between engine exhaust and pressurized air passages, and air diversion passages for directing unregenerated pressurized air through the support layer to cool the seal and wear layer and then back into the flow of regenerated pressurized air.
QU at TREC-2015: Building Real-Time Systems for Tweet Filtering and Question Answering
2015-11-20
from Yahoo ! An- swers. We adopted a very simple approach that searched an archived Yahoo ! Answers QA dataset for similar questions to the asked ones and...users to post and answer questions. Yahoo ! An- swers1 is by far one of the largest sQA platforms. Questions and answers on such platforms share some...multiple domains [5]. However, the existence of large social question answering websites, such as Yahoo ! Answers specifically, makes the development of
NASA Astrophysics Data System (ADS)
Li, Shuiqing; Guan, Shoude; Hou, Yijun; Liu, Yahao; Bi, Fan
2018-05-01
A long-term trend of significant wave height (SWH) in China's coastal seas was examined based on three datasets derived from satellite measurements and numerical hindcasts. One set of altimeter data were obtained from the GlobWave, while the other two datasets of numerical hindcasts were obtained from the third-generation wind wave model, WAVEWATCH III, forced by wind fields from the Cross-Calibrated Multi-Platform (CCMP) and NCEP's Climate Forecast System Reanalysis (CFSR). The mean and extreme wave trends were estimated for the period 1992-2010 with respect to the annual mean and the 99th-percentile values of SWH, respectively. The altimeter wave trend estimates feature considerable uncertainties owing to the sparse sampling rate. Furthermore, the extreme wave trend tends to be overestimated because of the increasing sampling rate over time. Numerical wave trends strongly depend on the quality of the wind fields, as the CCMP waves significantly overestimate the wave trend, whereas the CFSR waves tend to underestimate the trend. Corresponding adjustments were applied which effectively improved the trend estimates from the altimeter and numerical data. The adjusted results show generally increasing mean wave trends, while the extreme wave trends are more spatially-varied, from decreasing trends prevailing in the South China Sea to significant increasing trends mainly in the East China Sea.
Cserhati, Matyas F; Pandey, Sanjit; Beaudoin, James J; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S
2015-01-01
We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33,017,407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database. © The Author(s) 2015. Published by Oxford University Press.
Cserhati, Matyas F.; Pandey, Sanjit; Beaudoin, James J.; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S.
2015-01-01
We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33 017 407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database. Database URL: http://nntc-dcc.unmc.edu PMID:26228431
Vision-based gait impairment analysis for aided diagnosis.
Ortells, Javier; Herrero-Ezquerro, María Trinidad; Mollineda, Ramón A
2018-02-12
Gait is a firsthand reflection of health condition. This belief has inspired recent research efforts to automate the analysis of pathological gait, in order to assist physicians in decision-making. However, most of these efforts rely on gait descriptions which are difficult to understand by humans, or on sensing technologies hardly available in ambulatory services. This paper proposes a number of semantic and normalized gait features computed from a single video acquired by a low-cost sensor. Far from being conventional spatio-temporal descriptors, features are aimed at quantifying gait impairment, such as gait asymmetry from several perspectives or falling risk. They were designed to be invariant to frame rate and image size, allowing cross-platform comparisons. Experiments were formulated in terms of two databases. A well-known general-purpose gait dataset is used to establish normal references for features, while a new database, introduced in this work, provides samples under eight different walking styles: one normal and seven impaired patterns. A number of statistical studies were carried out to prove the sensitivity of features at measuring the expected pathologies, providing enough evidence about their accuracy. Graphical Abstract Graphical abstract reflecting main contributions of the manuscript: at the top, a robust, semantic and easy-to-interpret feature set to describe impaired gait patterns; at the bottom, a new dataset consisting of video-recordings of a number of volunteers simulating different patterns of pathological gait, where features were statistically assessed.
Feng, Yinling; Wang, Xuefeng
2017-03-01
In order to investigate commonly disturbed genes and pathways in various brain regions of patients with Parkinson's disease (PD), microarray datasets from previous studies were collected and systematically analyzed. Different normalization methods were applied to microarray datasets from different platforms. A strategy combining gene co‑expression networks and clinical information was adopted, using weighted gene co‑expression network analysis (WGCNA) to screen for commonly disturbed genes in different brain regions of patients with PD. Functional enrichment analysis of commonly disturbed genes was performed using the Database for Annotation, Visualization, and Integrated Discovery (DAVID). Co‑pathway relationships were identified with Pearson's correlation coefficient tests and a hypergeometric distribution‑based test. Common genes in pathway pairs were selected out and regarded as risk genes. A total of 17 microarray datasets from 7 platforms were retained for further analysis. Five gene coexpression modules were identified, containing 9,745, 736, 233, 101 and 93 genes, respectively. One module was significantly correlated with PD samples and thus the 736 genes it contained were considered to be candidate PD‑associated genes. Functional enrichment analysis demonstrated that these genes were implicated in oxidative phosphorylation and PD. A total of 44 pathway pairs and 52 risk genes were revealed, and a risk gene pathway relationship network was constructed. Eight modules were identified and were revealed to be associated with PD, cancers and metabolism. A number of disturbed pathways and risk genes were unveiled in PD, and these findings may help advance understanding of PD pathogenesis.
Wang, Min; Tian, Yun
2018-01-01
The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711
Cross platform development using Delphi and Kylix
DOE Office of Scientific and Technical Information (OSTI.GOV)
McDonald, J.L.; Nishimura, H.; Timossi, C.
2002-10-08
A cross platform component for EPICS Simple Channel Access (SCA) has been developed for the use with Delphi on Windows and Kylix on Linux. An EPICS controls GUI application developed on Windows runs on Linux by simply rebuilding it, and vice versa. This paper describes the technical details of the component.
Wood, Jeffrey J.; Lynne, Sarah D.; Langer, David A.; Wood, Patricia A.; Clark, Shaunna L.; Eddy, J. Mark; Ialongo, Nicholas
2011-01-01
This study tests a model of reciprocal influences between absenteeism and youth psychopathology using three longitudinal datasets (Ns= 20745, 2311, and 671). Participants in 1st through 12th grades were interviewed annually or bi-annually. Measures of psychopathology include self-, parent-, and teacher-report questionnaires. Structural cross-lagged regression models were tested. In a nationally representative dataset (Add Health), middle school students with relatively greater absenteeism at study year 1 tended towards increased depression and conduct problems in study year 2, over and above the effects of autoregressive associations and demographic covariates. The opposite direction of effects was found for both middle and high school students. Analyses with two regionally representative datasets were also partially supportive. Longitudinal links were more evident in adolescence than in childhood. PMID:22188462
Gregori, Josep; Villarreal, Laura; Sánchez, Alex; Baselga, José; Villanueva, Josep
2013-12-16
The microarray community has shown that the low reproducibility observed in gene expression-based biomarker discovery studies is partially due to relying solely on p-values to get the lists of differentially expressed genes. Their conclusions recommended complementing the p-value cutoff with the use of effect-size criteria. The aim of this work was to evaluate the influence of such an effect-size filter on spectral counting-based comparative proteomic analysis. The results proved that the filter increased the number of true positives and decreased the number of false positives and the false discovery rate of the dataset. These results were confirmed by simulation experiments where the effect size filter was used to evaluate systematically variable fractions of differentially expressed proteins. Our results suggest that relaxing the p-value cut-off followed by a post-test filter based on effect size and signal level thresholds can increase the reproducibility of statistical results obtained in comparative proteomic analysis. Based on our work, we recommend using a filter consisting of a minimum absolute log2 fold change of 0.8 and a minimum signal of 2-4 SpC on the most abundant condition for the general practice of comparative proteomics. The implementation of feature filtering approaches could improve proteomic biomarker discovery initiatives by increasing the reproducibility of the results obtained among independent laboratories and MS platforms. Quality control analysis of microarray-based gene expression studies pointed out that the low reproducibility observed in the lists of differentially expressed genes could be partially attributed to the fact that these lists are generated relying solely on p-values. Our study has established that the implementation of an effect size post-test filter improves the statistical results of spectral count-based quantitative proteomics. The results proved that the filter increased the number of true positives whereas decreased the false positives and the false discovery rate of the datasets. The results presented here prove that a post-test filter applying a reasonable effect size and signal level thresholds helps to increase the reproducibility of statistical results in comparative proteomic analysis. Furthermore, the implementation of feature filtering approaches could improve proteomic biomarker discovery initiatives by increasing the reproducibility of results obtained among independent laboratories and MS platforms. This article is part of a Special Issue entitled: Standardization and Quality Control in Proteomics. Copyright © 2013 Elsevier B.V. All rights reserved.
Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences
2014-01-01
Background The reproducibility of transcriptomic biomarkers across datasets remains poor, limiting clinical application. We and others have suggested that this is in-part caused by differential error-structure between datasets, and their incomplete removal by pre-processing algorithms. Methods To test this hypothesis, we systematically assessed the effects of pre-processing on biomarker classification using 24 different pre-processing methods and 15 distinct signatures of tumour hypoxia in 10 datasets (2,143 patients). Results We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions. Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures. Conclusions Assessing biomarkers using an ensemble of pre-processing techniques shows clear value across multiple diseases, datasets and biomarkers. Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers. While further research is required, this approach has the potential to become a standard for transcriptomic biomarkers. PMID:24902696
The BioMedical Evidence Graph (BMEG) | Informatics Technology for Cancer Research (ITCR)
The BMEG is a Cancer Data integration Platform that utilizes methods collected from DREAM challenges and applied to large datasets, such as the TCGA, and makes them avalible for analysis using a high performance graph database
NASA Astrophysics Data System (ADS)
Gross, M. B.; Mayernik, M. S.; Rowan, L. R.; Khan, H.; Boler, F. M.; Maull, K. E.; Stott, D.; Williams, S.; Corson-Rikert, J.; Johns, E. M.; Daniels, M. D.; Krafft, D. B.
2015-12-01
UNAVCO, UCAR, and Cornell University are working together to leverage semantic web technologies to enable discovery of people, datasets, publications and other research products, as well as the connections between them. The EarthCollab project, an EarthCube Building Block, is enhancing an existing open-source semantic web application, VIVO, to address connectivity gaps across distributed networks of researchers and resources related to the following two geoscience-based communities: (1) the Bering Sea Project, an interdisciplinary field program whose data archive is hosted by NCAR's Earth Observing Laboratory (EOL), and (2) UNAVCO, a geodetic facility and consortium that supports diverse research projects informed by geodesy. People, publications, datasets and grant information have been mapped to an extended version of the VIVO-ISF ontology and ingested into VIVO's database. Data is ingested using a custom set of scripts that include the ability to perform basic automated and curated disambiguation. VIVO can display a page for every object ingested, including connections to other objects in the VIVO database. A dataset page, for example, includes the dataset type, time interval, DOI, related publications, and authors. The dataset type field provides a connection to all other datasets of the same type. The author's page will show, among other information, related datasets and co-authors. Information previously spread across several unconnected databases is now stored in a single location. In addition to VIVO's default display, the new database can also be queried using SPARQL, a query language for semantic data. EarthCollab will also extend the VIVO web application. One such extension is the ability to cross-link separate VIVO instances across institutions, allowing local display of externally curated information. For example, Cornell's VIVO faculty pages will display UNAVCO's dataset information and UNAVCO's VIVO will display Cornell faculty member contact and position information. Additional extensions, including enhanced geospatial capabilities, will be developed following task-centered usability testing.
AutoDockFR: Advances in Protein-Ligand Docking with Explicitly Specified Binding Site Flexibility
Ravindranath, Pradeep Anand; Forli, Stefano; Goodsell, David S.; Olson, Arthur J.; Sanner, Michel F.
2015-01-01
Automated docking of drug-like molecules into receptors is an essential tool in structure-based drug design. While modeling receptor flexibility is important for correctly predicting ligand binding, it still remains challenging. This work focuses on an approach in which receptor flexibility is modeled by explicitly specifying a set of receptor side-chains a-priori. The challenges of this approach include the: 1) exponential growth of the search space, demanding more efficient search methods; and 2) increased number of false positives, calling for scoring functions tailored for flexible receptor docking. We present AutoDockFR–AutoDock for Flexible Receptors (ADFR), a new docking engine based on the AutoDock4 scoring function, which addresses the aforementioned challenges with a new Genetic Algorithm (GA) and customized scoring function. We validate ADFR using the Astex Diverse Set, demonstrating an increase in efficiency and reliability of its GA over the one implemented in AutoDock4. We demonstrate greatly increased success rates when cross-docking ligands into apo receptors that require side-chain conformational changes for ligand binding. These cross-docking experiments are based on two datasets: 1) SEQ17 –a receptor diversity set containing 17 pairs of apo-holo structures; and 2) CDK2 –a ligand diversity set composed of one CDK2 apo structure and 52 known bound inhibitors. We show that, when cross-docking ligands into the apo conformation of the receptors with up to 14 flexible side-chains, ADFR reports more correctly cross-docked ligands than AutoDock Vina on both datasets with solutions found for 70.6% vs. 35.3% systems on SEQ17, and 76.9% vs. 61.5% on CDK2. ADFR also outperforms AutoDock Vina in number of top ranking solutions on both datasets. Furthermore, we show that correctly docked CDK2 complexes re-create on average 79.8% of all pairwise atomic interactions between the ligand and moving receptor atoms in the holo complexes. Finally, we show that down-weighting the receptor internal energy improves the ranking of correctly docked poses and that runtime for AutoDockFR scales linearly when side-chain flexibility is added. PMID:26629955
2012-05-16
Regional Command RCP Route Clearance Platoon RSOI Reception, Staging, Onward Movement, Integration SBCT Stryker Brigade Combat Team TOE Table of...Point (ASPs), and field hospital platforms; prepare river crossing sites; and support port repair due to Hydraulic Excavator (HYEX), provides force...platforms, FARPS, supply routes, roads, control points, fire bases, tank ditches, ASPs, and field hospital platforms; prepare river crossing sites; and
Basavanhally, Ajay; Viswanath, Satish; Madabhushi, Anant
2015-01-01
Clinical trials increasingly employ medical imaging data in conjunction with supervised classifiers, where the latter require large amounts of training data to accurately model the system. Yet, a classifier selected at the start of the trial based on smaller and more accessible datasets may yield inaccurate and unstable classification performance. In this paper, we aim to address two common concerns in classifier selection for clinical trials: (1) predicting expected classifier performance for large datasets based on error rates calculated from smaller datasets and (2) the selection of appropriate classifiers based on expected performance for larger datasets. We present a framework for comparative evaluation of classifiers using only limited amounts of training data by using random repeated sampling (RRS) in conjunction with a cross-validation sampling strategy. Extrapolated error rates are subsequently validated via comparison with leave-one-out cross-validation performed on a larger dataset. The ability to predict error rates as dataset size increases is demonstrated on both synthetic data as well as three different computational imaging tasks: detecting cancerous image regions in prostate histopathology, differentiating high and low grade cancer in breast histopathology, and detecting cancerous metavoxels in prostate magnetic resonance spectroscopy. For each task, the relationships between 3 distinct classifiers (k-nearest neighbor, naive Bayes, Support Vector Machine) are explored. Further quantitative evaluation in terms of interquartile range (IQR) suggests that our approach consistently yields error rates with lower variability (mean IQRs of 0.0070, 0.0127, and 0.0140) than a traditional RRS approach (mean IQRs of 0.0297, 0.0779, and 0.305) that does not employ cross-validation sampling for all three datasets. PMID:25993029
NASA Astrophysics Data System (ADS)
Malik, Abdul; Brönnimann, Stefan
2017-09-01
The Modes of Ocean Variability (MOV) namely Atlantic Multidecadal Oscillation (AMO), Pacific Decadal Oscillation (PDO), and El Niño Southern Oscillation (ENSO) can have significant impacts on Indian Summer Monsoon Rainfall (ISMR) on different timescales. The timescales at which these MOV interacts with ISMR and the factors which may perturb their relationship with ISMR need to be investigated. We employ De-trended Cross-Correlation Analysis (DCCA), and De-trended Partial-Cross-Correlation Analysis (DPCCA) to study the timescales of interaction of ISMR with AMO, PDO, and ENSO using observational dataset (AD 1854-1999), and atmosphere-ocean-chemistry climate model simulations with SOCOL-MPIOM (AD 1600-1999). Further, this study uses De-trended Semi-Partial Cross-Correlation Analysis (DSPCCA) to address the relation between solar variability and the ISMR. We find statistically significant evidence of intrinsic correlations of ISMR with AMO, PDO, and ENSO on different timescales, consistent between model simulations and observations. However, the model fails to capture modulation in intrinsic relationship between ISRM and MOV due to external signals. Our analysis indicates that AMO is a potential source of non-stationary relationship between ISMR and ENSO. Furthermore, the pattern of correlation between ISMR and Total Solar Irradiance (TSI) is inconsistent between observations and model simulations. The observational dataset indicates statistically insignificant negative intrinsic correlation between ISMR and TSI on decadal-to-centennial timescales. This statistically insignificant negative intrinsic correlation is transformed to statistically significant positive extrinsic by AMO on 61-86-year timescale. We propose a new mechanism for Sun-monsoon connection which operates through AMO by changes in summer (June-September; JJAS) meridional gradient of tropospheric temperatures (ΔTTJJAS). There is a negative (positive) intrinsic correlation between ΔTTJJAS (AMO) and TSI. The negative intrinsic correlation between ΔTTJJAS and TSI indicates that high (low) solar activity weakens (strengthens) the meridional gradient of tropospheric temperature during the summer monsoon season and subsequently the weak (strong) ΔTTJJAS decreases (increases) the ISMR. However, the presence of AMO transforms the negative intrinsic relation between ΔTTJJAS and TSI into positive extrinsic and strengthens the ISMR. We conclude that the positive relation between ISMR and solar activity, as found by other authors, is mainly due to the effect of AMO on ISMR.
NASA Astrophysics Data System (ADS)
Malik, Abdul; Brönnimann, Stefan
2018-06-01
The Modes of Ocean Variability (MOV) namely Atlantic Multidecadal Oscillation (AMO), Pacific Decadal Oscillation (PDO), and El Niño Southern Oscillation (ENSO) can have significant impacts on Indian Summer Monsoon Rainfall (ISMR) on different timescales. The timescales at which these MOV interacts with ISMR and the factors which may perturb their relationship with ISMR need to be investigated. We employ De-trended Cross-Correlation Analysis (DCCA), and De-trended Partial-Cross-Correlation Analysis (DPCCA) to study the timescales of interaction of ISMR with AMO, PDO, and ENSO using observational dataset (AD 1854-1999), and atmosphere-ocean-chemistry climate model simulations with SOCOL-MPIOM (AD 1600-1999). Further, this study uses De-trended Semi-Partial Cross-Correlation Analysis (DSPCCA) to address the relation between solar variability and the ISMR. We find statistically significant evidence of intrinsic correlations of ISMR with AMO, PDO, and ENSO on different timescales, consistent between model simulations and observations. However, the model fails to capture modulation in intrinsic relationship between ISRM and MOV due to external signals. Our analysis indicates that AMO is a potential source of non-stationary relationship between ISMR and ENSO. Furthermore, the pattern of correlation between ISMR and Total Solar Irradiance (TSI) is inconsistent between observations and model simulations. The observational dataset indicates statistically insignificant negative intrinsic correlation between ISMR and TSI on decadal-to-centennial timescales. This statistically insignificant negative intrinsic correlation is transformed to statistically significant positive extrinsic by AMO on 61-86-year timescale. We propose a new mechanism for Sun-monsoon connection which operates through AMO by changes in summer (June-September; JJAS) meridional gradient of tropospheric temperatures (ΔTTJJAS). There is a negative (positive) intrinsic correlation between ΔTTJJAS (AMO) and TSI. The negative intrinsic correlation between ΔTTJJAS and TSI indicates that high (low) solar activity weakens (strengthens) the meridional gradient of tropospheric temperature during the summer monsoon season and subsequently the weak (strong) ΔTTJJAS decreases (increases) the ISMR. However, the presence of AMO transforms the negative intrinsic relation between ΔTTJJAS and TSI into positive extrinsic and strengthens the ISMR. We conclude that the positive relation between ISMR and solar activity, as found by other authors, is mainly due to the effect of AMO on ISMR.
Golbamaki, Azadi; Benfenati, Emilio; Golbamaki, Nazanin; Manganaro, Alberto; Merdivan, Erinc; Roncaglioni, Alessandra; Gini, Giuseppina
2016-04-02
In this study, new molecular fragments associated with genotoxic and nongenotoxic carcinogens are introduced to estimate the carcinogenic potential of compounds. Two rule-based carcinogenesis models were developed with the aid of SARpy: model R (from rodents' experimental data) and model E (from human carcinogenicity data). Structural alert extraction method of SARpy uses a completely automated and unbiased manner with statistical significance. The carcinogenicity models developed in this study are collections of carcinogenic potential fragments that were extracted from two carcinogenicity databases: the ANTARES carcinogenicity dataset with information from bioassay on rats and the combination of ISSCAN and CGX datasets, which take into accounts human-based assessment. The performance of these two models was evaluated in terms of cross-validation and external validation using a 258 compound case study dataset. Combining R and H predictions and scoring a positive or negative result when both models are concordant on a prediction, increased accuracy to 72% and specificity to 79% on the external test set. The carcinogenic fragments present in the two models were compared and analyzed from the point of view of chemical class. The results of this study show that the developed rule sets will be a useful tool to identify some new structural alerts of carcinogenicity and provide effective information on the molecular structures of carcinogenic chemicals.
Media-Education Convergence: Applying Transmedia Storytelling Edutainment in E-Learning Environments
ERIC Educational Resources Information Center
Kalogeras, Stavroula
2013-01-01
In the era of media convergence, transmedia (cross-media/cross-platform/multi-platform) narratives are catering to users who are willing to immerse themselves in their favorite entertainment content. The inherent interactivity of the Internet and the emotional engagement of story can lead to innovative pedagogies in media rich environments. This…
Wacker, Soren; Noskov, Sergei Yu
2018-05-01
Drug-induced abnormal heart rhythm known as Torsades de Pointes (TdP) is a potential lethal ventricular tachycardia found in many patients. Even newly released anti-arrhythmic drugs, like ivabradine with HCN channel as a primary target, block the hERG potassium current in overlapping concentration interval. Promiscuous drug block to hERG channel may potentially lead to perturbation of the action potential duration (APD) and TdP, especially when with combined with polypharmacy and/or electrolyte disturbances. The example of novel anti-arrhythmic ivabradine illustrates clinically important and ongoing deficit in drug design and warrants for better screening methods. There is an urgent need to develop new approaches for rapid and accurate assessment of how drugs with complex interactions and multiple subcellular targets can predispose or protect from drug-induced TdP. One of the unexpected outcomes of compulsory hERG screening implemented in USA and European Union resulted in large datasets of IC 50 values for various molecules entering the market. The abundant data allows now to construct predictive machine-learning (ML) models. Novel ML algorithms and techniques promise better accuracy in determining IC 50 values of hERG blockade that is comparable or surpassing that of the earlier QSAR or molecular modeling technique. To test the performance of modern ML techniques, we have developed a computational platform integrating various workflows for quantitative structure activity relationship (QSAR) models using data from the ChEMBL database. To establish predictive powers of ML-based algorithms we computed IC 50 values for large dataset of molecules and compared it to automated patch clamp system for a large dataset of hERG blocking and non-blocking drugs, an industry gold standard in studies of cardiotoxicity. The optimal protocol with high sensitivity and predictive power is based on the novel eXtreme gradient boosting (XGBoost) algorithm. The ML-platform with XGBoost displays excellent performance with a coefficient of determination of up to R 2 ~0.8 for pIC 50 values in evaluation datasets, surpassing other metrics and approaches available in literature. Ultimately, the ML-based platform developed in our work is a scalable framework with automation potential to interact with other developing technologies in cardiotoxicity field, including high-throughput electrophysiology measurements delivering large datasets of profiled drugs, rapid synthesis and drug development via progress in synthetic biology.
Li, Pin-Lan; Zhang, Yang
2013-01-01
Recent studies have demonstrated that cross talk between ceramide and redox signaling modulates various cell activities and functions and contributes to the development of cardiovascular diseases and renal dysfunctions. Ceramide triggers the generation of reactive oxygen species (ROS) and increases oxidative stress in many mammalian cells and animal models. On the other hand, inhibition of ROS-generating enzymes or treatment of antioxidants impairs sphingomyelinase activation and ceramide production. As a mechanism, ceramide-enriched signaling platforms, special cell membrane rafts (MR) (formerly lipid rafts), provide an important microenvironment to mediate the cross talk of ceramide and redox signaling to exert a corresponding regulatory role on cell and organ functions. In this regard, activation of acid sphingomyelinase and generation of ceramide mediate the formation of ceramide-enriched membrane platforms, where trans-membrane signals are transmitted or amplified through recruitment, clustering, assembling, or integration of various signaling molecules. A typical such signaling platform is MR redox signaling platform that is centered on ceramide production and aggregation leading to recruitment and assembling of NADPH oxidase to form an active complex in the cell plasma membrane. This redox signaling platform not only conducts redox signaling or regulation but also facilitates a feedforward amplification of both ceramide and redox signaling. In addition to this membrane MR redox signaling platform, the cross talk between ceramide and redox signaling may occur in other cell compartments. This book chapter focuses on the molecular mechanisms, spatial–temporal regulations, and implications of this cross talk between ceramide and redox signaling, which may provide novel insights into the understanding of both ceramide and redox signaling pathways. PMID:23563657
Optimized hardware framework of MLP with random hidden layers for classification applications
NASA Astrophysics Data System (ADS)
Zyarah, Abdullah M.; Ramesh, Abhishek; Merkel, Cory; Kudithipudi, Dhireesha
2016-05-01
Multilayer Perceptron Networks with random hidden layers are very efficient at automatic feature extraction and offer significant performance improvements in the training process. They essentially employ large collection of fixed, random features, and are expedient for form-factor constrained embedded platforms. In this work, a reconfigurable and scalable architecture is proposed for the MLPs with random hidden layers with a customized building block based on CORDIC algorithm. The proposed architecture also exploits fixed point operations for area efficiency. The design is validated for classification on two different datasets. An accuracy of ~ 90% for MNIST dataset and 75% for gender classification on LFW dataset was observed. The hardware has 299 speed-up over the corresponding software realization.
The PO.DAAC Portal and its use of the Drupal Framework
NASA Astrophysics Data System (ADS)
Alarcon, C.; Huang, T.; Bingham, A.; Cosic, S.
2011-12-01
The Physical Oceanography Distributed Active Archive Center portal (http://podaac.jpl.nasa.gov) is the primary interface for discovering and accessing oceanographic datasets collected from the vantage point of space. In addition, it provides information about NASA's satellite missions and operational activities at the data center. Recently the portal underwent a major redesign and deployment utilizing the Drupal framework. The Drupal framework was chosen as the platform for the portal due to its flexibility, open source community, and modular infrastructure. The portal features efficient content addition and management, mailing lists, forums, role based access control, and a faceted dataset browse capability. The dataset browsing was built as a custom Drupal module and integrates with a SOLR search engine.
Mining the archives: a cross-platform analysis of gene ...
Formalin-fixed paraffin-embedded (FFPE) tissue samples represent a potentially invaluable resource for genomic research into the molecular basis of disease. However, use of FFPE samples in gene expression studies has been limited by technical challenges resulting from degradation of nucleic acids. Here we evaluated gene expression profiles derived from fresh-frozen (FRO) and FFPE mouse liver tissues using two DNA microarray protocols and two whole transcriptome sequencing (RNA-seq) library preparation methodologies. The ribo-depletion protocol outperformed the other three methods by having the highest correlations of differentially expressed genes (DEGs) and best overlap of pathways between FRO and FFPE groups. We next tested the effect of sample time in formalin (18 hours or 3 weeks) on gene expression profiles. Hierarchical clustering of the datasets indicated that test article treatment, and not preservation method, was the main driver of gene expression profiles. Meta- and pathway analyses indicated that biological responses were generally consistent for 18-hour and 3-week FFPE samples compared to FRO samples. However, clear erosion of signal intensity with time in formalin was evident, and DEG numbers differed by platform and preservation method. Lastly, we investigated the effect of age in FFPE block on genomic profiles. RNA-seq analysis of 8-, 19-, and 26-year-old control blocks using the ribo-depletion protocol resulted in comparable quality metrics, inc
NASA Astrophysics Data System (ADS)
Quach, N.; Huang, T.; Boening, C.; Gill, K. M.
2016-12-01
Research related to sea level rise crosses multiple disciplines from sea ice to land hydrology. The NASA Sea Level Change Portal (SLCP) is a one-stop source for current sea level change information and data, including interactive tools for accessing and viewing regional data, a virtual dashboard of sea level indicators, and ongoing updates through a suite of editorial products that include content articles, graphics, videos, and animations. The architecture behind the SLCP makes it possible to integrate web content and data relevant to sea level change that are archived across various data centers as well as new data generated by sea level change principal investigators. The Extensible Data Gateway Environment (EDGE) is incorporated into the SLCP architecture to provide a unified platform for web content and science data discovery. EDGE is a data integration platform designed to facilitate high-performance geospatial data discovery and access with the ability to support multi-metadata standard specifications. EDGE has the capability to retrieve data from one or more sources and package the resulting sets into a single response to the requestor. With this unified endpoint, the Data Analysis Tool that is available on the SLCP can retrieve dataset and granule level metadata as well as perform geospatial search on the data. This talk focuses on the architecture that makes it possible to seamlessly integrate and enable discovery of disparate data relevant to sea level rise.
Assembly Platform For Use In Outer Space
NASA Technical Reports Server (NTRS)
Rao, Niranjan S.; Buddington, Patricia A.
1995-01-01
Report describes conceptual platform or framework for use in assembling other structures and spacecraft in outer space. Consists of three fixed structural beams comprising central beam and two cross beams. Robotic manipulators spaced apart on platform to provide telerobotic operation of platform by either space-station or ground crews. Platform and attached vehicles function synergistically to achieve maximum performance for intended purposes.
GEC Plasma Data Exchange Project
NASA Astrophysics Data System (ADS)
Pitchford, L. C.
2013-08-01
In 2010 the Gaseous Electronics Conference (GEC), a major international conference for the low temperature plasma science (LTPS) community, initiated the Plasma Data Exchange Project (PDEP). The PDEP is an informal, community-based project that aims to address, at least in part, the well-recognized needs for the community to organize the means of collecting, evaluating and sharing data both for modelling and for interpretation of experiments. The emphasis to date in the PDEP has been on data related to the electron and ion components of these plasmas rather than on the plasma chemistry. At the heart of the PDEP is the open-access website, LXCat [1], developed by researchers at LAPLACE (Laboratoire Plasma et Conversion d'Energie, Toulouse, France). LXCat is a platform for archiving and manipulating collections of data related to electron scattering and transport in cold, neutral gases, organized in databases set-up by individual members or institutions of the LTPS community. At present, 15 databases of electron scattering data, contributed by groups around the world, can be accessed on LXCat. These databases include complete sets of electron cross sections, over an energy range from thermal to nominally 1 keV, for almost 40 ground-state neutral species and partial sets of data for about 30 other neutral, excited and ionized species. 'Complete' implies that all the major electron momentum and energy loss processes are well described in the dataset. Such 'complete' datasets can be used as input to a Boltzmann calculation of the electron energy distribution function (generally non-Maxwellian), and electron transport and rate coefficients can be obtained in pure gases or mixtures by averaging over the distribution function. Online tools enable importing and exporting data, plotting and comparing different sets of data. An online version of the Boltzmann equation solver BOLSIG+ [2] is also available on the LXCat site. Other members of the community have contributed their collections of transport and rate coefficient data, and comparisons of calculated and measured data can also be made online through the LXCat site. A large body of data for ion scattering and transport is available on the sister site, ICECat [3], which is now being merged into a new and improved LXCat platform [4] under development. The GEC hosted workshops on the PDEP in 2011 and 2012, with the third in the series being planned for October 2013. The purpose of these workshops has been to report progress towards the evaluation of data available on LXCat or elsewhere. The focus of the 2011 workshop was electron scattering and transport in noble gases, and the articles in this cluster issue were originally reported at that occasion. The 2012 workshop focused on electron transport in simple molecular gases, and plans are to publish documentation and evaluations of datasets for H2, N2 and O2, as reported at the 2012 GEC. The focus topic for the 2013 workshop is electron scattering in H2O and other complex molecules. The first three papers (paper I on Ar, by Pitchford et al [5], paper II on He and Ne, by Alves et al [6], and paper III on Kr and Xe, by Bordage et al [7]) in this cluster issue aim to provide documentation of the datasets available on LXCat for noble gases. Paper IV by Klaus Bartschat [8] gives an overview of theoretical methods for calculations of electron-atom scattering cross sections. This is important because, in some cases, theory has now advanced to the point of being able to provide complete sets of electron scattering cross-sections in noble gases to the accuracy required for use in plasma modelling. The discussion provided in the four papers in this cluster issue is intended to help users decide which datasets best fit their needs. We urge the users of these data to include complete and proper references in all publications. Open-access data should not become anonymous data! Finally, it is with sadness that we acknowledge the passing of our colleague Art Phelps in December 2012. Art was a key person on the GEC Plasma Data Exchange Project and a major contributor to the LXCat site, having made available his compilations of electron and ion scattering cross sections as well as many of his unpublished notes and documents on related issues. Throughout his long and remarkably productive career, Art Phelps held the bar high for the entire GEC community. We can only aspire to his example. We dedicate this cluster issue to him with the knowledge that he could have improved on these papers but with the hope that he would have been satisfied with their final versions.
Martin, Erika G; Law, Jennie; Ran, Weijia; Helbig, Natalie; Birkhead, Guthrie S
Government datasets are newly available on open data platforms that are publicly accessible, available in nonproprietary formats, free of charge, and with unlimited use and distribution rights. They provide opportunities for health research, but their quality and usability are unknown. To describe available open health data, identify whether data are presented in a way that is aligned with best practices and usable for researchers, and examine differences across platforms. Two reviewers systematically reviewed a random sample of data offerings on NYC OpenData (New York City, all offerings, n = 37), Health Data NY (New York State, 25% sample, n = 71), and HealthData.gov (US Department of Health and Human Services, 5% sample, n = 75), using a standard coding guide. Three open health data platforms at the federal, New York State, and New York City levels. Data characteristics from the coding guide were aggregated into summary indices for intrinsic data quality, contextual data quality, adherence to the Dublin Core metadata standards, and the 5-star open data deployment scheme. One quarter of the offerings were structured datasets; other presentation styles included charts (14.7%), documents describing data (12.0%), maps (10.9%), and query tools (7.7%). Health Data NY had higher intrinsic data quality (P < .001), contextual data quality (P < .001), and Dublin Core metadata standards adherence (P < .001). All met basic "web availability" open data standards; fewer met higher standards of "hyperlinked to other data." Although all platforms need improvement, they already provide readily available data for health research. Sustained effort on improving open data websites and metadata is necessary for ensuring researchers use these data, thereby increasing their research value.
OpenAQ: A Platform to Aggregate and Freely Share Global Air Quality Data
NASA Astrophysics Data System (ADS)
Hasenkopf, C. A.; Flasher, J. C.; Veerman, O.; DeWitt, H. L.
2015-12-01
Thousands of ground-based air quality monitors around the world publicly publish real-time air quality data; however, researchers and the public do not have access to this information in the ways most useful to them. Often, air quality data are posted on obscure websites showing only current values, are programmatically inaccessible, and/or are in inconsistent data formats across sites. Yet, historical and programmatic access to such a global dataset would be transformative to several scientific fields, from epidemiology to low-cost sensor technologies to estimates of ground-level aerosol by satellite retrievals. To increase accessibility and standardize this disparate dataset, we have built OpenAQ, an innovative, open platform created by a group of scientists and open data programmers. The source code for the platform is viewable at github.com/openaq. Currently, we are aggregating, storing, and making publicly available real-time air quality data (PM2.5, PM10, SO2, NO2, and O3) via an Application Program Interface (API). We will present the OpenAQ platform, which currently has the following specific capabilities: A continuous ingest mechanism for some of the most polluted cities, generalizable to more sources An API providing data-querying, including ability to filter by location, measurement type and value and date, as well as custom sort options A generalized, chart-based visualization tool to explore data accessible via the API At this stage, we are seeking wider participation and input from multiple research communities in expanding our data retrieval sites, standardizing our protocols, receiving feedback on quality issues, and creating tools that can be built on top of this open platform.
GWATCH: a web platform for automated gene association discovery analysis.
Svitin, Anton; Malov, Sergey; Cherkasov, Nikolay; Geerts, Paul; Rotkevich, Mikhail; Dobrynin, Pavel; Shevchenko, Andrey; Guan, Li; Troyer, Jennifer; Hendrickson, Sher; Dilks, Holli Hutcheson; Oleksyk, Taras K; Donfield, Sharyne; Gomperts, Edward; Jabs, Douglas A; Sezgin, Efe; Van Natta, Mark; Harrigan, P Richard; Brumme, Zabrina L; O'Brien, Stephen J
2014-01-01
As genome-wide sequence analyses for complex human disease determinants are expanding, it is increasingly necessary to develop strategies to promote discovery and validation of potential disease-gene associations. Here we present a dynamic web-based platform - GWATCH - that automates and facilitates four steps in genetic epidemiological discovery: 1) Rapid gene association search and discovery analysis of large genome-wide datasets; 2) Expanded visual display of gene associations for genome-wide variants (SNPs, indels, CNVs), including Manhattan plots, 2D and 3D snapshots of any gene region, and a dynamic genome browser illustrating gene association chromosomal regions; 3) Real-time validation/replication of candidate or putative genes suggested from other sources, limiting Bonferroni genome-wide association study (GWAS) penalties; 4) Open data release and sharing by eliminating privacy constraints (The National Human Genome Research Institute (NHGRI) Institutional Review Board (IRB), informed consent, The Health Insurance Portability and Accountability Act (HIPAA) of 1996 etc.) on unabridged results, which allows for open access comparative and meta-analysis. GWATCH is suitable for both GWAS and whole genome sequence association datasets. We illustrate the utility of GWATCH with three large genome-wide association studies for HIV-AIDS resistance genes screened in large multicenter cohorts; however, association datasets from any study can be uploaded and analyzed by GWATCH.
Kinematics of an in-parallel actuated manipulator based on the Stewart platform mechanism
NASA Technical Reports Server (NTRS)
Williams, Robert L., II
1992-01-01
This paper presents kinematic equations and solutions for an in-parallel actuated robotic mechanism based on Stewart's platform. These equations are required for inverse position and resolved rate (inverse velocity) platform control. NASA LaRC has a Vehicle Emulator System (VES) platform designed by MIT which is based on Stewart's platform. The inverse position solution is straight-forward and computationally inexpensive. Given the desired position and orientation of the moving platform with respect to the base, the lengths of the prismatic leg actuators are calculated. The forward position solution is more complicated and theoretically has 16 solutions. The position and orientation of the moving platform with respect to the base is calculated given the leg actuator lengths. Two methods are pursued in this paper to solve this problem. The resolved rate (inverse velocity) solution is derived. Given the desired Cartesian velocity of the end-effector, the required leg actuator rates are calculated. The Newton-Raphson Jacobian matrix resulting from the second forward position kinematics solution is a modified inverse Jacobian matrix. Examples and simulations are given for the VES.
Ellis, Katherine; Godbole, Suneeta; Marshall, Simon; Lanckriet, Gert; Staudenmayer, John; Kerr, Jacqueline
2014-01-01
Background: Active travel is an important area in physical activity research, but objective measurement of active travel is still difficult. Automated methods to measure travel behaviors will improve research in this area. In this paper, we present a supervised machine learning method for transportation mode prediction from global positioning system (GPS) and accelerometer data. Methods: We collected a dataset of about 150 h of GPS and accelerometer data from two research assistants following a protocol of prescribed trips consisting of five activities: bicycling, riding in a vehicle, walking, sitting, and standing. We extracted 49 features from 1-min windows of this data. We compared the performance of several machine learning algorithms and chose a random forest algorithm to classify the transportation mode. We used a moving average output filter to smooth the output predictions over time. Results: The random forest algorithm achieved 89.8% cross-validated accuracy on this dataset. Adding the moving average filter to smooth output predictions increased the cross-validated accuracy to 91.9%. Conclusion: Machine learning methods are a viable approach for automating measurement of active travel, particularly for measuring travel activities that traditional accelerometer data processing methods misclassify, such as bicycling and vehicle travel. PMID:24795875
A cross-country Exchange Market Pressure (EMP) dataset.
Desai, Mohit; Patnaik, Ila; Felman, Joshua; Shah, Ajay
2017-06-01
The data presented in this article are related to the research article titled - "An exchange market pressure measure for cross country analysis" (Patnaik et al. [1]). In this article, we present the dataset for Exchange Market Pressure values (EMP) for 139 countries along with their conversion factors, ρ (rho). Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values) for the point estimates of ρ 's. Using the standard errors of estimates of ρ 's, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.
GenomeHubs: simple containerized setup of a custom Ensembl database and web server for any species
Kumar, Sujai; Stevens, Lewis; Blaxter, Mark
2017-01-01
Abstract As the generation and use of genomic datasets is becoming increasingly common in all areas of biology, the need for resources to collate, analyse and present data from one or more genome projects is becoming more pressing. The Ensembl platform is a powerful tool to make genome data and cross-species analyses easily accessible through a web interface and a comprehensive application programming interface. Here we introduce GenomeHubs, which provide a containerized environment to facilitate the setup and hosting of custom Ensembl genome browsers. This simplifies mirroring of existing content and import of new genomic data into the Ensembl database schema. GenomeHubs also provide a set of analysis containers to decorate imported genomes with results of standard analyses and functional annotations and support export to flat files, including EMBL format for submission of assemblies and annotations to International Nucleotide Sequence Database Collaboration. Database URL: http://GenomeHubs.org PMID:28605774
Integrity, standards, and QC-related issues with big data in pre-clinical drug discovery.
Brothers, John F; Ung, Matthew; Escalante-Chong, Renan; Ross, Jermaine; Zhang, Jenny; Cha, Yoonjeong; Lysaght, Andrew; Funt, Jason; Kusko, Rebecca
2018-06-01
The tremendous expansion of data analytics and public and private big datasets presents an important opportunity for pre-clinical drug discovery and development. In the field of life sciences, the growth of genetic, genomic, transcriptomic and proteomic data is partly driven by a rapid decline in experimental costs as biotechnology improves throughput, scalability, and speed. Yet far too many researchers tend to underestimate the challenges and consequences involving data integrity and quality standards. Given the effect of data integrity on scientific interpretation, these issues have significant implications during preclinical drug development. We describe standardized approaches for maximizing the utility of publicly available or privately generated biological data and address some of the common pitfalls. We also discuss the increasing interest to integrate and interpret cross-platform data. Principles outlined here should serve as a useful broad guide for existing analytical practices and pipelines and as a tool for developing additional insights into therapeutics using big data. Copyright © 2018 Elsevier Inc. All rights reserved.
A phase space model of Fourier ptychographic microscopy
Horstmeyer, Roarke; Yang, Changhuei
2014-01-01
A new computational imaging technique, termed Fourier ptychographic microscopy (FPM), uses a sequence of low-resolution images captured under varied illumination to iteratively converge upon a high-resolution complex sample estimate. Here, we propose a mathematical model of FPM that explicitly connects its operation to conventional ptychography, a common procedure applied to electron and X-ray diffractive imaging. Our mathematical framework demonstrates that under ideal illumination conditions, conventional ptychography and FPM both produce datasets that are mathematically linked by a linear transformation. We hope this finding encourages the future cross-pollination of ideas between two otherwise unconnected experimental imaging procedures. In addition, the coherence state of the illumination source used by each imaging platform is critical to successful operation, yet currently not well understood. We apply our mathematical framework to demonstrate that partial coherence uniquely alters both conventional ptychography’s and FPM’s captured data, but up to a certain threshold can still lead to accurate resolution-enhanced imaging through appropriate computational post-processing. We verify this theoretical finding through simulation and experiment. PMID:24514995
NASA Astrophysics Data System (ADS)
Wang, Tusheng; Yang, Yuanyuan; Zhang, Jianguo
2013-03-01
In order to enable multiple disciplines of medical researchers, clinical physicians and biomedical engineers working together in a secured, efficient, and transparent cooperative environment, we had designed an e-Science platform for biomedical imaging research and application cross multiple academic institutions and hospitals in Shanghai by using grid-based or cloud-based distributed architecture and presented this work in SPIE Medical Imaging conference held in San Diego in 2012. However, when the platform integrates more and more nodes over different networks, the first challenge is that how to monitor and maintain all the hosts and services operating cross multiple academic institutions and hospitals in the e-Science platform, such as DICOM and Web based image communication services, messaging services and XDS ITI transaction services. In this presentation, we presented a system design and implementation of intelligent monitoring and management which can collect system resource status of every node in real time, alert when node or service failure occurs, and can finally improve the robustness, reliability and service continuity of this e-Science platform.
Roy, Anuradha; Fuller, Clifton D; Rosenthal, David I; Thomas, Charles R
2015-08-28
Comparison of imaging measurement devices in the absence of a gold-standard comparator remains a vexing problem; especially in scenarios where multiple, non-paired, replicated measurements occur, as in image-guided radiotherapy (IGRT). As the number of commercially available IGRT presents a challenge to determine whether different IGRT methods may be used interchangeably, an unmet need conceptually parsimonious and statistically robust method to evaluate the agreement between two methods with replicated observations. Consequently, we sought to determine, using an previously reported head and neck positional verification dataset, the feasibility and utility of a Comparison of Measurement Methods with the Mixed Effects Procedure Accounting for Replicated Evaluations (COM3PARE), a unified conceptual schema and analytic algorithm based upon Roy's linear mixed effects (LME) model with Kronecker product covariance structure in a doubly multivariate set-up, for IGRT method comparison. An anonymized dataset consisting of 100 paired coordinate (X/ measurements from a sequential series of head and neck cancer patients imaged near-simultaneously with cone beam CT (CBCT) and kilovoltage X-ray (KVX) imaging was used for model implementation. Software-suggested CBCT and KVX shifts for the lateral (X), vertical (Y) and longitudinal (Z) dimensions were evaluated for bias, inter-method (between-subject variation), intra-method (within-subject variation), and overall agreement using with a script implementing COM3PARE with the MIXED procedure of the statistical software package SAS (SAS Institute, Cary, NC, USA). COM3PARE showed statistically significant bias agreement and difference in inter-method between CBCT and KVX was observed in the Z-axis (both p - value<0.01). Intra-method and overall agreement differences were noted as statistically significant for both the X- and Z-axes (all p - value<0.01). Using pre-specified criteria, based on intra-method agreement, CBCT was deemed preferable for X-axis positional verification, with KVX preferred for superoinferior alignment. The COM3PARE methodology was validated as feasible and useful in this pilot head and neck cancer positional verification dataset. COM3PARE represents a flexible and robust standardized analytic methodology for IGRT comparison. The implemented SAS script is included to encourage other groups to implement COM3PARE in other anatomic sites or IGRT platforms.
Ground robotic measurement of aeolian processes
NASA Astrophysics Data System (ADS)
Qian, Feifei; Jerolmack, Douglas; Lancaster, Nicholas; Nikolich, George; Reverdy, Paul; Roberts, Sonia; Shipley, Thomas; Van Pelt, R. Scott; Zobeck, Ted M.; Koditschek, Daniel E.
2017-08-01
Models of aeolian processes rely on accurate measurements of the rates of sediment transport by wind, and careful evaluation of the environmental controls of these processes. Existing field approaches typically require intensive, event-based experiments involving dense arrays of instruments. These devices are often cumbersome and logistically difficult to set up and maintain, especially near steep or vegetated dune surfaces. Significant advances in instrumentation are needed to provide the datasets that are required to validate and improve mechanistic models of aeolian sediment transport. Recent advances in robotics show great promise for assisting and amplifying scientists' efforts to increase the spatial and temporal resolution of many environmental measurements governing sediment transport. The emergence of cheap, agile, human-scale robotic platforms endowed with increasingly sophisticated sensor and motor suites opens up the prospect of deploying programmable, reactive sensor payloads across complex terrain in the service of aeolian science. This paper surveys the need and assesses the opportunities and challenges for amassing novel, highly resolved spatiotemporal datasets for aeolian research using partially-automated ground mobility. We review the limitations of existing measurement approaches for aeolian processes, and discuss how they may be transformed by ground-based robotic platforms, using examples from our initial field experiments. We then review how the need to traverse challenging aeolian terrains and simultaneously make high-resolution measurements of critical variables requires enhanced robotic capability. Finally, we conclude with a look to the future, in which robotic platforms may operate with increasing autonomy in harsh conditions. Besides expanding the completeness of terrestrial datasets, bringing ground-based robots to the aeolian research community may lead to unexpected discoveries that generate new hypotheses to expand the science itself.
Technical note: RabbitCT--an open platform for benchmarking 3D cone-beam reconstruction algorithms.
Rohkohl, C; Keck, B; Hofmann, H G; Hornegger, J
2009-09-01
Fast 3D cone beam reconstruction is mandatory for many clinical workflows. For that reason, researchers and industry work hard on hardware-optimized 3D reconstruction. Backprojection is a major component of many reconstruction algorithms that require a projection of each voxel onto the projection data, including data interpolation, before updating the voxel value. This step is the bottleneck of most reconstruction algorithms and the focus of optimization in recent publications. A crucial limitation, however, of these publications is that the presented results are not comparable to each other. This is mainly due to variations in data acquisitions, preprocessing, and chosen geometries and the lack of a common publicly available test dataset. The authors provide such a standardized dataset that allows for substantial comparison of hardware accelerated backprojection methods. They developed an open platform RabbitCT (www.rabbitCT.com) for worldwide comparison in backprojection performance and ranking on different architectures using a specific high resolution C-arm CT dataset of a rabbit. This includes a sophisticated benchmark interface, a prototype implementation in C++, and image quality measures. At the time of writing, six backprojection implementations are already listed on the website. Optimizations include multithreading using Intel threading building blocks and OpenMP, vectorization using SSE, and computation on the GPU using CUDA 2.0. There is a need for objectively comparing backprojection implementations for reconstruction algorithms. RabbitCT aims to provide a solution to this problem by offering an open platform with fair chances for all participants. The authors are looking forward to a growing community and await feedback regarding future evaluations of novel software- and hardware-based acceleration schemes.
NASA Astrophysics Data System (ADS)
Smith, M. J.; Vardaro, M.; Crowley, M. F.; Glenn, S. M.; Schofield, O.; Belabbassi, L.; Garzio, L. M.; Knuth, F.; Fram, J. P.; Kerfoot, J.
2016-02-01
The Ocean Observatories Initiative (OOI), funded by the National Science Foundation, provides users with access to long-term datasets from a variety of oceanographic sensors. The Endurance Array in the Pacific Ocean consists of two separate lines off the coasts of Oregon and Washington. The Oregon line consists of 7 moorings, two cabled benthic experiment packages and 6 underwater gliders. The Washington line comprises 6 moorings and 6 gliders. Each mooring is outfitted with a variety of instrument packages. The raw data from these instruments are sent to shore via satellite communication and in some cases, via fiber optic cable. Raw data is then sent to the cyberinfrastructure (CI) group at Rutgers where it is aggregated, parsed into thousands of different data streams, and integrated into a software package called uFrame. The OOI CI delivers the data to the general public via a web interface that outputs data into commonly used scientific data file formats such as JSON, netCDF, and CSV. The Rutgers data management team has developed a series of command-line Python tools that streamline data acquisition in order to facilitate the QA/QC review process. The first step in the process is querying the uFrame database for a list of all available platforms. From this list, a user can choose a specific platform and automatically download all available datasets from the specified platform. The downloaded dataset is plotted using a generalized Python netcdf plotting routine that utilizes a data visualization toolbox called matplotlib. This routine loads each netCDF file separately and outputs plots by each available parameter. These Python tools have been uploaded to a Github repository that is openly available to help facilitate OOI data access and visualization.
Omicseq: a web-based search engine for exploring omics datasets.
Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S
2017-07-03
The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
NASA Astrophysics Data System (ADS)
Hazelaar, Colien; Dahele, Max; Mostafavi, Hassan; van der Weide, Lineke; Slotman, Ben; Verbakel, Wilko
2018-06-01
Lung tumors treated in breath-hold are subject to inter- and intra-breath-hold variations, which makes tumor position monitoring during each breath-hold important. A markerless technique is desirable, but limited tumor visibility on kV images makes this challenging. We evaluated if template matching + triangulation of kV projection images acquired during breath-hold stereotactic treatments could determine 3D tumor position. Band-pass filtering and/or digital tomosynthesis (DTS) were used as image pre-filtering/enhancement techniques. On-board kV images continuously acquired during volumetric modulated arc irradiation of (i) a 3D-printed anthropomorphic thorax phantom with three lung tumors (n = 6 stationary datasets, n = 2 gradually moving), and (ii) four patients (13 datasets) were analyzed. 2D reference templates (filtered DRRs) were created from planning CT data. Normalized cross-correlation was used for 2D matching between templates and pre-filtered/enhanced kV images. For 3D verification, each registration was triangulated with multiple previous registrations. Generally applicable image processing/algorithm settings for lung tumors in breath-hold were identified. For the stationary phantom, the interquartile range of the 3D position vector was on average 0.25 mm for 12° DTS + band-pass filtering (average detected positions in 2D = 99.7%, 3D = 96.1%, and 3D excluding first 12° due to triangulation angle = 99.9%) compared to 0.81 mm for band-pass filtering only (55.8/52.9/55.0%). For the moving phantom, RMS errors for the lateral/longitudinal/vertical direction after 12° DTS + band-pass filtering were 1.5/0.4/1.1 mm and 2.2/0.3/3.2 mm. For the clinical data, 2D position was determined for at least 93% of each dataset and 3D position excluding first 12° for at least 82% of each dataset using 12° DTS + band-pass filtering. Template matching + triangulation using DTS + band-pass filtered images could accurately determine the position of stationary lung tumors. However, triangulation was less accurate/reliable for targets with continuous, gradual displacement in the lateral and vertical directions. This technique is therefore currently most suited to detect/monitor offsets occurring between initial setup and the start of treatment, inter-breath-hold variations, and tumors with predominantly longitudinal motion.
Two-dimensional turbulence cross-correlation functions in the edge of NSTX
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zweben, S. J.; Stotler, D. P.; Scotti, F.
The 2D radial vs. poloidal cross-correlation functions of edge plasma turbulence were measured near the outer midplane using a gas puff imaging (GPI) diagnostic on NSTX. These correlation functions were evaluated at radii r = 0 cm, ±3 cm, and ±6 cm from the separatrix and poloidal locations p = 0 cm and ±7.5 cm from the GPI poloidal center line for 20 different shots. The ellipticity ε and tilt angle φ of the positive cross-correlation regions and the minimum negative cross-correlation “cmin” and total negative over positive values “neg/pos” were evaluated for each of these cases. The average resultsmore » over this dataset were ε = 2.2 ± 0.9, φ = 87° ± 34° (i.e., poloidally oriented), cmin =-0.30 ± 0.15, and neg/pos = 0.25 ± 0.24. Thus, there was a significant variation in these correlation results within this database, with dependences on the location within the image, the magnetic geometry, and the plasma parameters. In conclusion, possible causes for this variation are discussed, including the misalignment of the GPI view with the local B field line, the magnetic shear of field lines at the edge, the poloidal flow shear of the turbulence, blob-hole correlations, and the neutral density 'shadowing' effect in GPI.« less
Two-dimensional turbulence cross-correlation functions in the edge of NSTX
Zweben, S. J.; Stotler, D. P.; Scotti, F.; ...
2017-09-26
The 2D radial vs. poloidal cross-correlation functions of edge plasma turbulence were measured near the outer midplane using a gas puff imaging (GPI) diagnostic on NSTX. These correlation functions were evaluated at radii r = 0 cm, ±3 cm, and ±6 cm from the separatrix and poloidal locations p = 0 cm and ±7.5 cm from the GPI poloidal center line for 20 different shots. The ellipticity ε and tilt angle φ of the positive cross-correlation regions and the minimum negative cross-correlation “cmin” and total negative over positive values “neg/pos” were evaluated for each of these cases. The average resultsmore » over this dataset were ε = 2.2 ± 0.9, φ = 87° ± 34° (i.e., poloidally oriented), cmin =-0.30 ± 0.15, and neg/pos = 0.25 ± 0.24. Thus, there was a significant variation in these correlation results within this database, with dependences on the location within the image, the magnetic geometry, and the plasma parameters. In conclusion, possible causes for this variation are discussed, including the misalignment of the GPI view with the local B field line, the magnetic shear of field lines at the edge, the poloidal flow shear of the turbulence, blob-hole correlations, and the neutral density 'shadowing' effect in GPI.« less
Human Papillomavirus Genotyping Using an Automated Film-Based Chip Array
Erali, Maria; Pattison, David C.; Wittwer, Carl T.; Petti, Cathy A.
2009-01-01
The INFINITI HPV-QUAD assay is a commercially available genotyping platform for human papillomavirus (HPV) that uses multiplex PCR, followed by automated processing for primer extension, hybridization, and detection. The analytical performance of the HPV-QUAD assay was evaluated using liquid cervical cytology specimens, and the results were compared with those results obtained using the digene High-Risk HPV hc2 Test (HC2). The specimen types included Surepath and PreservCyt transport media, as well as residual SurePath and HC2 transport media from the HC2 assay. The overall concordance of positive and negative results following the resolution of indeterminate and intermediate results was 83% among the 197 specimens tested. HC2 positive (+) and HPV-QUAD negative (−) results were noted in 24 specimens that were shown by real-time PCR and sequence analysis to contain no HPV, HPV types that were cross-reactive in the HC2 assay, or low virus levels. Conversely, HC2 (−) and HPV-QUAD (+) results were noted in four specimens and were subsequently attributed to cross-contamination. The most common HPV types to be identified in this study were HPV16, HPV18, HPV52/58, and HPV39/56. We show that the HPV-QUAD assay is a user friendly, automated system for the identification of distinct HPV genotypes. Based on its analytical performance, future studies with this platform are warranted to assess its clinical utility for HPV detection and genotyping. PMID:19644025
Human papillomavirus genotyping using an automated film-based chip array.
Erali, Maria; Pattison, David C; Wittwer, Carl T; Petti, Cathy A
2009-09-01
The INFINITI HPV-QUAD assay is a commercially available genotyping platform for human papillomavirus (HPV) that uses multiplex PCR, followed by automated processing for primer extension, hybridization, and detection. The analytical performance of the HPV-QUAD assay was evaluated using liquid cervical cytology specimens, and the results were compared with those results obtained using the digene High-Risk HPV hc2 Test (HC2). The specimen types included Surepath and PreservCyt transport media, as well as residual SurePath and HC2 transport media from the HC2 assay. The overall concordance of positive and negative results following the resolution of indeterminate and intermediate results was 83% among the 197 specimens tested. HC2 positive (+) and HPV-QUAD negative (-) results were noted in 24 specimens that were shown by real-time PCR and sequence analysis to contain no HPV, HPV types that were cross-reactive in the HC2 assay, or low virus levels. Conversely, HC2 (-) and HPV-QUAD (+) results were noted in four specimens and were subsequently attributed to cross-contamination. The most common HPV types to be identified in this study were HPV16, HPV18, HPV52/58, and HPV39/56. We show that the HPV-QUAD assay is a user friendly, automated system for the identification of distinct HPV genotypes. Based on its analytical performance, future studies with this platform are warranted to assess its clinical utility for HPV detection and genotyping.
NASA Astrophysics Data System (ADS)
Qin, M.; Wan, X.; Shao, Y. Y.; Li, S. Y.
2018-04-01
Vision-based navigation has become an attractive solution for autonomous navigation for planetary exploration. This paper presents our work of designing and building an autonomous vision-based GPS-denied unmanned vehicle and developing an ARFM (Adaptive Robust Feature Matching) based VO (Visual Odometry) software for its autonomous navigation. The hardware system is mainly composed of binocular stereo camera, a pan-and tilt, a master machine, a tracked chassis. And the ARFM-based VO software system contains four modules: camera calibration, ARFM-based 3D reconstruction, position and attitude calculation, BA (Bundle Adjustment) modules. Two VO experiments were carried out using both outdoor images from open dataset and indoor images captured by our vehicle, the results demonstrate that our vision-based unmanned vehicle is able to achieve autonomous localization and has the potential for future planetary exploration.
Enhanced zinc consumption causes memory deficits and increased brain levels of zinc
Flinn, J.M.; Hunter, D.; Linkous, D.H.; Lanzirotti, A.; Smith, L.N.; Brightwell, J.; Jones, B.F.
2005-01-01
Zinc deficiency has been shown to impair cognitive functioning, but little work has been done on the effects of elevated zinc. This research examined the effect on memory of raising Sprague-Dawley rats on enhanced levels of zinc (10 ppm ZnCO3; 0.153 mM) in the drinking water for periods of 3 or 9 months, both pre- and postnatally. Controls were raised on lab water. Memory was tested in a series of Morris Water Maze (MWM) experiments, and zinc-treated rats were found to have impairments in both reference and working memory. They were significantly slower to find a stationary platform and showed greater thigmotaxicity, a measure of anxiety. On a working memory task, where the platform was moved each day, zinc-treated animals had longer latencies over both trials and days, swam further from the platform, and showed greater thigmotaxicity. On trials using an Atlantis platform, which remained in one place but was lowered on probe trials, the zinc-treated animals had significantly fewer platform crossings, spent less time in the target quadrant, and did not swim as close to the platform position. They had significantly greater latency on nonprobe trials. Microprobe synchrotron X-ray fluorescence (??SXRF) confirmed that brain zinc levels were increased by adding ZnCO 3 to the drinking water. These data show that long-term dietary administration of zinc can lead to impairments in cognitive function. ?? 2004 Elsevier Inc. All rights reserved.
He, Jiankang; Du, Yanan; Guo, Yuqi; Hancock, Matthew J.; Wang, Ben; Shin, Hyeongho; Wu, Jinhui; Li, Dichen; Khademhosseini, Ali
2010-01-01
Combinatorial material synthesis is a powerful approach for creating composite material libraries for the high-throughput screening of cell–material interactions. Although current combinatorial screening platforms have been tremendously successful in identifying target (termed “hit”) materials from composite material libraries, new material synthesis approaches are needed to further optimize the concentrations and blending ratios of the component materials. Here we employed a microfluidic platform to rapidly synthesize composite materials containing cross-gradients of gelatin and chitosan for investigating cell–biomaterial interactions. The microfluidic synthesis of the cross-gradient was optimized experimentally and theoretically to produce quantitatively controllable variations in the concentrations and blending ratios of the two components. The anisotropic chemical compositions of the gelatin/chitosan cross-gradients were characterized by Fourier transform infrared spectrometry and X-ray photoelectron spectrometry. The three-dimensional (3D) porous gelatin/chitosan cross-gradient materials were shown to regulate the cellular morphology and proliferation of smooth muscle cells (SMCs) in a gradient-dependent manner. We envision that our microfluidic cross-gradient platform may accelerate the material development processes involved in a wide range of biomedical applications. PMID:20721897
ESPAS: the European e-science platform to access near-Earth space data (Invited)
NASA Astrophysics Data System (ADS)
Belehaki, A.; Hapgood, M. A.; Ritschel, B.; Manola, N.
2013-12-01
The aim of ESPAS platform is to integrate heterogeneous data from the earth's thermosphere, ionosphere, plasmasphere and magnetosphere. ESPAS supports the systematic exploration of multipoint measurements from the near-Earth space through homogenised access to multi-instrument data. It provides access to more than 40 datasets: Cluster, EISCAT, GIRO, DIAS, SWACI, CHAMP, SuperDARN, FPI, magnetometers INGV, SGO, DTU, IMAGE, TGO, IMAGE/RPI, ACE, SOHO, PROBA2, NOAA/POES, etc. The concept of extensibility to new data sets is an important element in the ESPAS architecture. Within the first year of the project, the main components of the system have been developed, namely, the data model, the XML schemas for metadata exchange format, the ontology, the wrapper installed at the data nodes so that the main platform harvest the metadata, the main platform built on the D-NET framework and the GUI with its designed workflows. The first working prototype supports the search for datasets among a selected number of databases (i.e., EDAM, DIAS, Cluster, SWACI data). The next immediate step would be the implementation of search for characteristics within the datasets. For the second release we are planning to deploy tools for conjunctions between ground-space and space-space and for coincidences. For the final phase of the project the ESPAS infrastructure will be extensively tested through the application of several use cases, designed to serve the needs of the wide interdisciplinary users and producers communities, such as the ionospheric, thermospheric, magnetospheric, space weather and space climate communities, the geophysics community, the space communications engineering, HF users, satellite operators, navigation and surveillance systems, and space agencies. The final ESPAS platform is expected to be delivered in 2015. The abstract is submitted on behalf of the ESPAS-FP7EU team (http://www.espas-fp7.eu): Mike Hapgood, Anna Belehaki, Spiros Ventouras, Natalia Manola, Antonis Lebesis, Bruno Zolesi, Tatjana Gerzen, Ingemar Häggström, Anna Charisi, Ivan Galkin, Jurgen Watermann, Matthew Angling, Timo Asikainen, Alan Aylward, Henrike Barkmann, Peter Bergqvist, Andrew Bushell, Fabien Darrouzet, Dimitris Dialetis, Carl-Fredrik Enell, Daniel Heynderickx, Norbert Jakowski, Magnar Johnsen, Jean Lilensten, Ian McCrea, Kalevi Mursula, Bogdan Nicula, Michael Pezzopane, Viviane Pierrard, Bodo Reinisch, Bernd Ritschel, Luca Spogli, Iwona Stanislawska, Claudia Stolle, Eija Tanskanen, Ioanna Tsagouri, Esa Turunen, Thomas Ulich, Matthew Wild, Tim Yeoman
A comparative analysis of dynamic grids vs. virtual grids using the A3pviGrid framework.
Shankaranarayanan, Avinas; Amaldas, Christine
2010-11-01
With the proliferation of Quad/Multi-core micro-processors in mainstream platforms such as desktops and workstations; a large number of unused CPU cycles can be utilized for running virtual machines (VMs) as dynamic nodes in distributed environments. Grid services and its service oriented business broker now termed cloud computing could deploy image based virtualization platforms enabling agent based resource management and dynamic fault management. In this paper we present an efficient way of utilizing heterogeneous virtual machines on idle desktops as an environment for consumption of high performance grid services. Spurious and exponential increases in the size of the datasets are constant concerns in medical and pharmaceutical industries due to the constant discovery and publication of large sequence databases. Traditional algorithms are not modeled at handing large data sizes under sudden and dynamic changes in the execution environment as previously discussed. This research was undertaken to compare our previous results with running the same test dataset with that of a virtual Grid platform using virtual machines (Virtualization). The implemented architecture, A3pviGrid utilizes game theoretic optimization and agent based team formation (Coalition) algorithms to improve upon scalability with respect to team formation. Due to the dynamic nature of distributed systems (as discussed in our previous work) all interactions were made local within a team transparently. This paper is a proof of concept of an experimental mini-Grid test-bed compared to running the platform on local virtual machines on a local test cluster. This was done to give every agent its own execution platform enabling anonymity and better control of the dynamic environmental parameters. We also analyze performance and scalability of Blast in a multiple virtual node setup and present our findings. This paper is an extension of our previous research on improving the BLAST application framework using dynamic Grids on virtualization platforms such as the virtual box.
Cross-Dataset Analysis and Visualization Driven by Expressive Web Services
NASA Astrophysics Data System (ADS)
Alexandru Dumitru, Mircea; Catalin Merticariu, Vlad
2015-04-01
The deluge of data that is hitting us every day from satellite and airborne sensors is changing the workflow of environmental data analysts and modelers. Web geo-services play now a fundamental role, and are no longer needed to preliminary download and store the data, but rather they interact in real-time with GIS applications. Due to the very large amount of data that is curated and made available by web services, it is crucial to deploy smart solutions for optimizing network bandwidth, reducing duplication of data and moving the processing closer to the data. In this context we have created a visualization application for analysis and cross-comparison of aerosol optical thickness datasets. The application aims to help researchers identify and visualize discrepancies between datasets coming from various sources, having different spatial and time resolutions. It also acts as a proof of concept for integration of OGC Web Services under a user-friendly interface that provides beautiful visualizations of the explored data. The tool was built on top of the World Wind engine, a Java based virtual globe built by NASA and the open source community. For data retrieval and processing we exploited the OGC Web Coverage Service potential: the most exciting aspect being its processing extension, a.k.a. the OGC Web Coverage Processing Service (WCPS) standard. A WCPS-compliant service allows a client to execute a processing query on any coverage offered by the server. By exploiting a full grammar, several different kinds of information can be retrieved from one or more datasets together: scalar condensers, cross-sectional profiles, comparison maps and plots, etc. This combination of technology made the application versatile and portable. As the processing is done on the server-side, we ensured that the minimal amount of data is transferred and that the processing is done on a fully-capable server, leaving the client hardware resources to be used for rendering the visualization. The application offers a set of features to visualize and cross-compare the datasets. Users can select a region of interest in space and time on which an aerosol map layer is plotted. Hovmoeller time-latitude and time-longitude profiles can be displayed by selecting orthogonal cross-sections on the globe. Statistics about the selected dataset are also displayed in different text and plot formats. The datasets can also be cross-compared either by using the delta map tool or the merged map tool. For more advanced users, a WCPS query console is also offered allowing users to process their data with ad-hoc queries and then choose how to display the results. Overall, the user has a rich set of tools that can be used to visualize and cross-compare the aerosol datasets. With our application we have shown how the NASA WorldWind framework can be used to display results processed efficiently - and entirely - on the server side using the expressiveness of the OGC WCPS web-service. The application serves not only as a proof of concept of a new paradigm in working with large geospatial data but also as an useful tool for environmental data analysts.
Software Tools for Development on the Peregrine System | High-Performance
Computing | NREL Software Tools for Development on the Peregrine System Software Tools for and manage software at the source code level. Cross-Platform Make and SCons The "Cross-Platform Make" (CMake) package is from Kitware, and SCons is a modern software build tool based on Python
Frickmann, Hagen; Hinz, Rebecca; Rojak, Sandra; Bonow, Insa; Ruben, Stefanie; Wegner, Christine; Zielke, Iris; Hagen, Ralf Matthias; Tannich, Egbert
2018-05-12
We assessed a commercial loop-mediated amplification (LAMP) platform for its reliability as a screening tool for malaria parasite detection. A total of 1000 blood samples from patients with suspected or confirmed malaria submitted to the German National Reference Center for Tropical Pathogens were subjected to LAMP using the Meridian illumigene Malaria platform. Results were compared with microscopy from thick and thin blood films in all cases. In case of discordant results between LAMP and microscopy (n = 60), confirmation testing was performed with real-time PCR. Persistence of circulating parasite DNA was analyzed by serial assessments of blood samples following malaria treatment. Out of 1000 blood samples analyzed, 238 were positive for malaria parasites according to microscopy (n = 181/1000) or PCR (additional n = 57/60). LAMP demonstrated sensitivity of 98.7% (235/238), specificity of 99.6% (759/762), positive predictive value (PPV) of 98.7% (235/238) and negative predictive value (NPV) of 99.6% (759/762), respectively. For first slides of patients with malaria and for follow-up slides, sensitivity values were 99.1% (106/107) and 98.5% (129/131), respectively. The performance of the Meridian illumigene Malaria platform is suitable for initial screening of patients suspected of clinical malaria. Copyright © 2018 Elsevier Ltd. All rights reserved.
Earth Science Data for a Mobile Age
NASA Astrophysics Data System (ADS)
Oostra, D.; Chambers, L. H.; Lewis, P. M.; Baize, R.; Oots, P.; Rogerson, T.; Crecelius, S.; Coleman, T.
2012-12-01
Earth science data access needs to be interoperable and automatic. Recently, increasingly savvy data users combined with more complex web and mobile applications have placed increasing demands on how this Earth science data is being delivered to educators and students. The MY NASA DATA (MND) and S'COOL projects are developing a strategy to interact with the education community in the age of mobile devices and platforms. How can we provide data and meaningful scientific experiences to educational users through mobile technologies? This initiative will seek out existing technologies and stakeholders within the Earth Science community to identify datasets that are relevant and appropriate for mobile application development and use by the educational community. Targeting efforts within the educational community will give the project a better understanding of the previous attempts at data/mobile application use in the classroom and its problems. In addition, we will query developers and data providers on what successes and failures they've experienced in trying to provide data for applications designed on mobile platforms. This feedback will be implemented in new websites, applications and lessons that will provide authentic scientific experiences for students and end users. We want to create tools that help sort through the vast amounts of NASA data, and deliver it to users automatically. NASA provides millions of gigabytes of data that is publicly available through a large number of services spread across the World Wide Web. Accessing and navigating this data can be time consuming and problematic with variety of file types and methods for accessing this data. The MND project, through its' Live Access Server system, provides selected datasets that are relevant and targets National Standards of Learning for educators to easily integrate into existing curricula. In the future, we want to provide desired data to users with automatic updates, anticipate future data queries/needs and generate new data combinations--targeting users with a web 3.0 methodology. We will examine applications that give users direct access to data in near real-time and find solutions for the educational community. MND and S'COOL will identify trends in the mobile and web application sectors to provide the greatest effect upon relevant audiences within the science and educational communities. Greater access is the goal, with an acute focus on educating our future explorers and scientists with tools and data that will provide the most efficacy, use, and enriching science experiences. Current trends point to cross-platform web applications as being the most effective and efficient means of delivering content, data, and ideas to end users. Universal availability of key datasets on any device will encourage users to continue to use data and attract potential data users and providers. Projected Outcomes Initially, the outcome for this work is to increase the effectiveness of the MND and S'COOL projects by learning more about our users needs and anticipating how data will be used in the future. Through our work we will increase exposure and ease of access to NASA datasets relevant to our communities. Our goal is to focus on our participants mobile usage in the classroom, thereby gaining a greater understanding on how data is being used to teach students about the Earth and begin to develop better tools and technologies.
Nind, Thomas; Galloway, James; McAllister, Gordon; Scobbie, Donald; Bonney, Wilfred; Hall, Christopher; Tramma, Leandro; Reel, Parminder; Groves, Martin; Appleby, Philip; Doney, Alex; Guthrie, Bruce; Jefferson, Emily
2018-05-22
The Health Informatics Centre (HIC) at the University of Dundee provides a service to securely host clinical datasets and extract relevant data for anonymised cohorts to researchers to enable them to answer key research questions. As is common in research using routine healthcare data, the service was historically delivered using ad-hoc processes resulting in the slow provision of data whose provenance was often hidden to the researchers using it. This paper describes the development and evaluation of the Research Data Management Platform (RDMP): an open source tool to load, manage, clean, and curate longitudinal healthcare data for research and provide reproducible and updateable datasets for defined cohorts to researchers. Between 2013 and 2017, RDMP tool implementation tripled the productivity of Data Analysts producing data releases for researchers from 7.1 to 25.3 per month; and reduced the error rate from 12.7% to 3.1%. The effort on data management reduced from a mean of 24.6 to 3.0 hours per data release. The waiting time for researchers to receive data after agreeing a specification reduced from approximately 6 months to less than one week. The software is scalable and currently manages 163 datasets. 1,321 data extracts for research have been produced with the largest extract linking data from 70 different datasets. The tools and processes that encompass the RDMP not only fulfil the research data management requirements of researchers but also support the seamless collaboration of data cleaning, data transformation, data summarisation and data quality assessment activities by different research groups.
Parasol: An Architecture for Cross-Cloud Federated Graph Querying
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lieberman, Michael; Choudhury, Sutanay; Hughes, Marisa
2014-06-22
Large scale data fusion of multiple datasets can often provide in- sights that examining datasets individually cannot. However, when these datasets reside in different data centers and cannot be collocated due to technical, administrative, or policy barriers, a unique set of problems arise that hamper querying and data fusion. To ad- dress these problems, a system and architecture named Parasol is presented that enables federated queries over graph databases residing in multiple clouds. Parasol’s design is flexible and requires only minimal assumptions for participant clouds. Query optimization techniques are also described that are compatible with Parasol’s lightweight architecture. Experiments onmore » a prototype implementation of Parasol indicate its suitability for cross-cloud federated graph queries.« less
NASA Astrophysics Data System (ADS)
Minnett, R.; Jarboe, N.; Koppers, A. A.; Tauxe, L.; Constable, C.
2013-12-01
EarthRef.org is a geoscience umbrella website for several databases and data and model repository portals. These portals, unified in the mandate to preserve their respective data and promote scientific collaboration in their fields, are also disparate in their schemata. The Magnetics Information Consortium (http://earthref.org/MagIC/) is a grass-roots cyberinfrastructure effort envisioned by the paleo- and rock magnetic scientific community to archive their wealth of peer-reviewed raw data and interpretations from studies on natural and synthetic samples and relies on a partially strict subsumptive hierarchical data model. The Geochemical Earth Reference Model (http://earthref.org/GERM/) portal focuses on the chemical characterization of the Earth and relies on two data schemata: a repository of peer-reviewed reservoir geochemistry, and a database of partition coefficients for rocks, minerals, and elements. The Seamount Biogeosciences Network (http://earthref.org/SBN/) encourages the collaboration between the diverse disciplines involved in seamount research and includes the Seamount Catalog (http://earthref.org/SC/) of bathymetry and morphology. All of these portals also depend on the EarthRef Reference Database (http://earthref.org/ERR/) for publication reference metadata and the EarthRef Digital Archive (http://earthref.org/ERDA/), a generic repository of data objects and their metadata. The development of the new MagIC Search Interface (http://earthref.org/MagIC/search/) centers on a reusable platform designed to be flexible enough for largely heterogeneous datasets and to scale up to datasets with tens of millions of records. The HTML5 web application and Oracle 11g database residing at the San Diego Supercomputer Center (SDSC) support the online contribution and editing of complex datasets in a spreadsheet environment and the browsing and filtering of these contributions in the context of thousands of other datasets. EarthRef.org is in the process of implementing this platform across all of its data portals in spite of the wide variety of data schemata and is dedicated to serving the geoscience community with as little effort from the end-users as possible.
Zhang, Yiye; Padman, Rema
2017-01-01
Patients with multiple chronic conditions (MCC) pose an increasingly complex health management challenge worldwide, particularly due to the significant gap in our understanding of how to provide coordinated care. Drawing on our prior research on learning data-driven clinical pathways from actual practice data, this paper describes a prototype, interactive platform for visualizing the pathways of MCC to support shared decision making. Created using Python web framework, JavaScript library and our clinical pathway learning algorithm, the visualization platform allows clinicians and patients to learn the dominant patterns of co-progression of multiple clinical events from their own data, and interactively explore and interpret the pathways. We demonstrate functionalities of the platform using a cluster of 36 patients, identified from a dataset of 1,084 patients, who are diagnosed with at least chronic kidney disease, hypertension, and diabetes. Future evaluation studies will explore the use of this platform to better understand and manage MCC.
Berthon, Beatrice; Spezi, Emiliano; Galavis, Paulina; Shepherd, Tony; Apte, Aditya; Hatt, Mathieu; Fayad, Hadi; De Bernardi, Elisabetta; Soffientini, Chiara D; Ross Schmidtlein, C; El Naqa, Issam; Jeraj, Robert; Lu, Wei; Das, Shiva; Zaidi, Habib; Mawlawi, Osama R; Visvikis, Dimitris; Lee, John A; Kirov, Assen S
2017-08-01
The aim of this paper is to define the requirements and describe the design and implementation of a standard benchmark tool for evaluation and validation of PET-auto-segmentation (PET-AS) algorithms. This work follows the recommendations of Task Group 211 (TG211) appointed by the American Association of Physicists in Medicine (AAPM). The recommendations published in the AAPM TG211 report were used to derive a set of required features and to guide the design and structure of a benchmarking software tool. These items included the selection of appropriate representative data and reference contours obtained from established approaches and the description of available metrics. The benchmark was designed in a way that it could be extendable by inclusion of bespoke segmentation methods, while maintaining its main purpose of being a standard testing platform for newly developed PET-AS methods. An example of implementation of the proposed framework, named PETASset, was built. In this work, a selection of PET-AS methods representing common approaches to PET image segmentation was evaluated within PETASset for the purpose of testing and demonstrating the capabilities of the software as a benchmark platform. A selection of clinical, physical, and simulated phantom data, including "best estimates" reference contours from macroscopic specimens, simulation template, and CT scans was built into the PETASset application database. Specific metrics such as Dice Similarity Coefficient (DSC), Positive Predictive Value (PPV), and Sensitivity (S), were included to allow the user to compare the results of any given PET-AS algorithm to the reference contours. In addition, a tool to generate structured reports on the evaluation of the performance of PET-AS algorithms against the reference contours was built. The variation of the metric agreement values with the reference contours across the PET-AS methods evaluated for demonstration were between 0.51 and 0.83, 0.44 and 0.86, and 0.61 and 1.00 for DSC, PPV, and the S metric, respectively. Examples of agreement limits were provided to show how the software could be used to evaluate a new algorithm against the existing state-of-the art. PETASset provides a platform that allows standardizing the evaluation and comparison of different PET-AS methods on a wide range of PET datasets. The developed platform will be available to users willing to evaluate their PET-AS methods and contribute with more evaluation datasets. © 2017 The Authors. Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
The biodigital human: a web-based 3D platform for medical visualization and education.
Qualter, John; Sculli, Frank; Oliker, Aaron; Napier, Zachary; Lee, Sabrina; Garcia, Julio; Frenkel, Sally; Harnik, Victoria; Triola, Marc
2012-01-01
NYU School of Medicine's Division of Educational Informatics in collaboration with BioDigital Systems LLC (New York, NY) has created a virtual human body dataset that is being used for visualization, education and training and is accessible over modern web browsers.
Net Venn - An integrated network analysis web platform for gene lists
USDA-ARS?s Scientific Manuscript database
Many lists containing biological identifiers such as gene lists have been generated in various genomics projects. Identifying the overlap among gene lists can enable us to understand the similarities and differences between the datasets. Here, we present an interactome network-based web application...
Task and Participant Scheduling of Trading Platforms in Vehicular Participatory Sensing Networks
Shi, Heyuan; Song, Xiaoyu; Gu, Ming; Sun, Jiaguang
2016-01-01
The vehicular participatory sensing network (VPSN) is now becoming more and more prevalent, and additionally has shown its great potential in various applications. A general VPSN consists of many tasks from task, publishers, trading platforms and a crowd of participants. Some literature treats publishers and the trading platform as a whole, which is impractical since they are two independent economic entities with respective purposes. For a trading platform in markets, its purpose is to maximize the profit by selecting tasks and recruiting participants who satisfy the requirements of accepted tasks, rather than to improve the quality of each task. This scheduling problem for a trading platform consists of two parts: which tasks should be selected and which participants to be recruited? In this paper, we investigate the scheduling problem in vehicular participatory sensing with the predictable mobility of each vehicle. A genetic-based trading scheduling algorithm (GTSA) is proposed to solve the scheduling problem. Experiments with a realistic dataset of taxi trajectories demonstrate that GTSA algorithm is efficient for trading platforms to gain considerable profit in VPSN. PMID:27916807
Task and Participant Scheduling of Trading Platforms in Vehicular Participatory Sensing Networks.
Shi, Heyuan; Song, Xiaoyu; Gu, Ming; Sun, Jiaguang
2016-11-28
The vehicular participatory sensing network (VPSN) is now becoming more and more prevalent, and additionally has shown its great potential in various applications. A general VPSN consists of many tasks from task, publishers, trading platforms and a crowd of participants. Some literature treats publishers and the trading platform as a whole, which is impractical since they are two independent economic entities with respective purposes. For a trading platform in markets, its purpose is to maximize the profit by selecting tasks and recruiting participants who satisfy the requirements of accepted tasks, rather than to improve the quality of each task. This scheduling problem for a trading platform consists of two parts: which tasks should be selected and which participants to be recruited? In this paper, we investigate the scheduling problem in vehicular participatory sensing with the predictable mobility of each vehicle. A genetic-based trading scheduling algorithm (GTSA) is proposed to solve the scheduling problem. Experiments with a realistic dataset of taxi trajectories demonstrate that GTSA algorithm is efficient for trading platforms to gain considerable profit in VPSN.
A robust prognostic signature for hormone-positive node-negative breast cancer.
Griffith, Obi L; Pepin, François; Enache, Oana M; Heiser, Laura M; Collisson, Eric A; Spellman, Paul T; Gray, Joe W
2013-01-01
Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients. RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment.
A robust prognostic signature for hormone-positive node-negative breast cancer
2013-01-01
Background Systemic chemotherapy in the adjuvant setting can cure breast cancer in some patients that would otherwise recur with incurable, metastatic disease. However, since only a fraction of patients would have recurrence after surgery alone, the challenge is to stratify high-risk patients (who stand to benefit from systemic chemotherapy) from low-risk patients (who can safely be spared treatment related toxicities and costs). Methods We focus here on risk stratification in node-negative, ER-positive, HER2-negative breast cancer. We use a large database of publicly available microarray datasets to build a random forests classifier and develop a robust multi-gene mRNA transcription-based predictor of relapse free survival at 10 years, which we call the Random Forests Relapse Score (RFRS). Performance was assessed by internal cross-validation, multiple independent data sets, and comparison to existing algorithms using receiver-operating characteristic and Kaplan-Meier survival analysis. Internal redundancy of features was determined using k-means clustering to define optimal signatures with smaller numbers of primary genes, each with multiple alternates. Results Internal OOB cross-validation for the initial (full-gene-set) model on training data reported an ROC AUC of 0.704, which was comparable to or better than those reported previously or obtained by applying existing methods to our dataset. Three risk groups with probability cutoffs for low, intermediate, and high-risk were defined. Survival analysis determined a highly significant difference in relapse rate between these risk groups. Validation of the models against independent test datasets showed highly similar results. Smaller 17-gene and 8-gene optimized models were also developed with minimal reduction in performance. Furthermore, the signature was shown to be almost equally effective on both hormone-treated and untreated patients. Conclusions RFRS allows flexibility in both the number and identity of genes utilized from thousands to as few as 17 or eight genes, each with multiple alternatives. The RFRS reports a probability score strongly correlated with risk of relapse. This score could therefore be used to assign systemic chemotherapy specifically to those high-risk patients most likely to benefit from further treatment. PMID:24112773
Impact of missing data on the efficiency of homogenisation: experiments with ACMANTv3
NASA Astrophysics Data System (ADS)
Domonkos, Peter; Coll, John
2018-04-01
The impact of missing data on the efficiency of homogenisation with ACMANTv3 is examined with simulated monthly surface air temperature test datasets. The homogeneous database is derived from an earlier benchmarking of daily temperature data in the USA, and then outliers and inhomogeneities (IHs) are randomly inserted into the time series. Three inhomogeneous datasets are generated and used, one with relatively few and small IHs, another one with IHs of medium frequency and size, and a third one with large and frequent IHs. All of the inserted IHs are changes to the means. Most of the IHs are single sudden shifts or pair of shifts resulting in platform-shaped biases. Each test dataset consists of 158 time series of 100 years length, and their mean spatial correlation is 0.68-0.88. For examining the impacts of missing data, seven experiments are performed, in which 18 series are left complete, while variable quantities (10-70%) of the data of the other 140 series are removed. The results show that data gaps have a greater impact on the monthly root mean squared error (RMSE) than the annual RMSE and trend bias. When data with a large ratio of gaps is homogenised, the reduction of the upper 5% of the monthly RMSE is the least successful, but even there, the efficiency remains positive. In terms of reducing the annual RMSE and trend bias, the efficiency is 54-91%. The inclusion of short and incomplete series with sufficient spatial correlation in all cases improves the efficiency of homogenisation with ACMANTv3.
sbtools: A package connecting R to cloud-based data for collaborative online research
Winslow, Luke; Chamberlain, Scott; Appling, Alison P.; Read, Jordan S.
2016-01-01
The adoption of high-quality tools for collaboration and reproducible research such as R and Github is becoming more common in many research fields. While Github and other version management systems are excellent resources, they were originally designed to handle code and scale poorly to large text-based or binary datasets. A number of scientific data repositories are coming online and are often focused on dataset archival and publication. To handle collaborative workflows using large scientific datasets, there is increasing need to connect cloud-based online data storage to R. In this article, we describe how the new R package sbtools enables direct access to the advanced online data functionality provided by ScienceBase, the U.S. Geological Survey’s online scientific data storage platform.
Zheng, G; Tannast, M; Anderegg, C; Siebenrock, K A; Langlotz, F
2007-07-01
We developed an object-oriented cross-platform program to perform three-dimensional (3D) analysis of hip joint morphology using two-dimensional (2D) anteroposterior (AP) pelvic radiographs. Landmarks extracted from 2D AP pelvic radiographs and optionally an additional lateral pelvic X-ray were combined with a cone beam projection model to reconstruct 3D hip joints. Since individual pelvic orientation can vary considerably, a method for standardizing pelvic orientation was implemented to determine the absolute tilt/rotation. The evaluation of anatomically morphologic differences was achieved by reconstructing the projected acetabular rim and the measured hip parameters as if obtained in a standardized neutral orientation. The program had been successfully used to interactively objectify acetabular version in hips with femoro-acetabular impingement or developmental dysplasia. Hip(2)Norm is written in object-oriented programming language C++ using cross-platform software Qt (TrollTech, Oslo, Norway) for graphical user interface (GUI) and is transportable to any platform.
Daniel, Christel; Ouagne, David; Sadou, Eric; Forsberg, Kerstin; Gilchrist, Mark Mc; Zapletal, Eric; Paris, Nicolas; Hussain, Sajjad; Jaulent, Marie-Christine; MD, Dipka Kalra
2016-01-01
With the development of platforms enabling the use of routinely collected clinical data in the context of international clinical research, scalable solutions for cross border semantic interoperability need to be developed. Within the context of the IMI EHR4CR project, we first defined the requirements and evaluation criteria of the EHR4CR semantic interoperability platform and then developed the semantic resources and supportive services and tooling to assist hospital sites in standardizing their data for allowing the execution of the project use cases. The experience gained from the evaluation of the EHR4CR platform accessing to semantically equivalent data elements across 11 European participating EHR systems from 5 countries demonstrated how far the mediation model and mapping efforts met the expected requirements of the project. Developers of semantic interoperability platforms are beginning to address a core set of requirements in order to reach the goal of developing cross border semantic integration of data. PMID:27570649
A Set of Free Cross-Platform Authoring Programs for Flexible Web-Based CALL Exercises
ERIC Educational Resources Information Center
O'Brien, Myles
2012-01-01
The Mango Suite is a set of three freely downloadable cross-platform authoring programs for flexible network-based CALL exercises. They are Adobe Air applications, so they can be used on Windows, Macintosh, or Linux computers, provided the freely-available Adobe Air has been installed on the computer. The exercises which the programs generate are…
NASA Astrophysics Data System (ADS)
Rettmann, M. E.; Holmes, D. R., III; Gunawan, M. S.; Ge, X.; Karwoski, R. A.; Breen, J. F.; Packer, D. L.; Robb, R. A.
2012-03-01
Geometric analysis of the left atrium and pulmonary veins is important for studying reverse structural remodeling following cardiac ablation therapy. It has been shown that the left atrium decreases in volume and the pulmonary vein ostia decrease in diameter following ablation therapy. Most analysis techniques, however, require laborious manual tracing of image cross-sections. Pulmonary vein diameters are typically measured at the junction between the left atrium and pulmonary veins, called the pulmonary vein ostia, with manually drawn lines on volume renderings or on image cross-sections. In this work, we describe a technique for making semi-automatic measurements of the left atrium and pulmonary vein ostial diameters from high resolution CT scans and multi-phase datasets. The left atrium and pulmonary veins are segmented from a CT volume using a 3D volume approach and cut planes are interactively positioned to separate the pulmonary veins from the body of the left atrium. The cut plane is also used to compute the pulmonary vein ostial diameter. Validation experiments are presented which demonstrate the ability to repeatedly measure left atrial volume and pulmonary vein diameters from high resolution CT scans, as well as the feasibility of this approach for analyzing dynamic, multi-phase datasets. In the high resolution CT scans the left atrial volume measurements show high repeatability with approximately 4% intra-rater repeatability and 8% inter-rater repeatability. Intra- and inter-rater repeatability for pulmonary vein diameter measurements range from approximately 2 to 4 mm. For the multi-phase CT datasets, differences in left atrial volumes between a standard slice-by-slice approach and the proposed 3D volume approach are small, with percent differences on the order of 3% to 6%.
Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome.
Bush, Stephen J; Muriuki, Charity; McCulloch, Mary E B; Farquhar, Iseabail L; Clark, Emily L; Hume, David A
2018-04-24
mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci. Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.
Boubela, Roland N.; Kalcher, Klaudius; Huf, Wolfgang; Našel, Christian; Moser, Ewald
2016-01-01
Technologies for scalable analysis of very large datasets have emerged in the domain of internet computing, but are still rarely used in neuroimaging despite the existence of data and research questions in need of efficient computation tools especially in fMRI. In this work, we present software tools for the application of Apache Spark and Graphics Processing Units (GPUs) to neuroimaging datasets, in particular providing distributed file input for 4D NIfTI fMRI datasets in Scala for use in an Apache Spark environment. Examples for using this Big Data platform in graph analysis of fMRI datasets are shown to illustrate how processing pipelines employing it can be developed. With more tools for the convenient integration of neuroimaging file formats and typical processing steps, big data technologies could find wider endorsement in the community, leading to a range of potentially useful applications especially in view of the current collaborative creation of a wealth of large data repositories including thousands of individual fMRI datasets. PMID:26778951
DATS, the data tag suite to enable discoverability of datasets.
Sansone, Susanna-Assunta; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Alter, George; Grethe, Jeffrey S; Xu, Hua; Fore, Ian M; Lyle, Jared; Gururaj, Anupama E; Chen, Xiaoling; Kim, Hyeon-Eui; Zong, Nansu; Li, Yueling; Liu, Ruiling; Ozyurt, I Burak; Ohno-Machado, Lucila
2017-06-06
Today's science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an international collaborative effort funded by the National Institutes of Health (NIH)'s Big Data to Knowledge (BD2K) initiative, we have designed and implemented the DAta Tag Suite (DATS) model to support the DataMed data discovery index. DataMed's goal is to be for data what PubMed has been for the scientific literature. Akin to the Journal Article Tag Suite (JATS) used in PubMed, the DATS model enables submission of metadata on datasets to DataMed. DATS has a core set of elements, which are generic and applicable to any type of dataset, and an extended set that can accommodate more specialized data types. DATS is a platform-independent model also available as an annotated serialization in schema.org, which in turn is widely used by major search engines like Google, Microsoft, Yahoo and Yandex.
Siretskiy, Alexey; Sundqvist, Tore; Voznesenskiy, Mikhail; Spjuth, Ola
2015-01-01
New high-throughput technologies, such as massively parallel sequencing, have transformed the life sciences into a data-intensive field. The most common e-infrastructure for analyzing this data consists of batch systems that are based on high-performance computing resources; however, the bioinformatics software that is built on this platform does not scale well in the general case. Recently, the Hadoop platform has emerged as an interesting option to address the challenges of increasingly large datasets with distributed storage, distributed processing, built-in data locality, fault tolerance, and an appealing programming methodology. In this work we introduce metrics and report on a quantitative comparison between Hadoop and a single node of conventional high-performance computing resources for the tasks of short read mapping and variant calling. We calculate efficiency as a function of data size and observe that the Hadoop platform is more efficient for biologically relevant data sizes in terms of computing hours for both split and un-split data files. We also quantify the advantages of the data locality provided by Hadoop for NGS problems, and show that a classical architecture with network-attached storage will not scale when computing resources increase in numbers. Measurements were performed using ten datasets of different sizes, up to 100 gigabases, using the pipeline implemented in Crossbow. To make a fair comparison, we implemented an improved preprocessor for Hadoop with better performance for splittable data files. For improved usability, we implemented a graphical user interface for Crossbow in a private cloud environment using the CloudGene platform. All of the code and data in this study are freely available as open source in public repositories. From our experiments we can conclude that the improved Hadoop pipeline scales better than the same pipeline on high-performance computing resources, we also conclude that Hadoop is an economically viable option for the common data sizes that are currently used in massively parallel sequencing. Given that datasets are expected to increase over time, Hadoop is a framework that we envision will have an increasingly important role in future biological data analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Begoli, Edmon; Dunning, Ted; Charlie, Frasure
We present a service platform for schema-leess exploration of data and discovery of patient-related statistics from healthcare data sets. The architecture of this platform is motivated by the need for fast, schema-less, and flexible approaches to SQL-based exploration and discovery of information embedded in the common, heterogeneously structured healthcare data sets and supporting components (electronic health records, practice management systems, etc.) The motivating use cases described in the paper are clinical trials candidate discovery, and a treatment effectiveness analysis. Following the use cases, we discuss the key features and software architecture of the platform, the underlying core components (Apache Parquet,more » Drill, the web services server), and the runtime profiles and performance characteristics of the platform. We conclude by showing dramatic speedup with some approaches, and the performance tradeoffs and limitations of others.« less
Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing.
Fang, Chao; Zhong, Huanzi; Lin, Yuxiang; Chen, Bing; Han, Mo; Ren, Huahui; Lu, Haorong; Luber, Jacob M; Xia, Min; Li, Wangsheng; Stein, Shayna; Xu, Xun; Zhang, Wenwei; Drmanac, Radoje; Wang, Jian; Yang, Huanming; Hammarström, Lennart; Kostic, Aleksandar D; Kristiansen, Karsten; Li, Junhua
2018-03-01
More extensive use of metagenomic shotgun sequencing in microbiome research relies on the development of high-throughput, cost-effective sequencing. Here we present a comprehensive evaluation of the performance of the new high-throughput sequencing platform BGISEQ-500 for metagenomic shotgun sequencing and compare its performance with that of 2 Illumina platforms. Using fecal samples from 20 healthy individuals, we evaluated the intra-platform reproducibility for metagenomic sequencing on the BGISEQ-500 platform in a setup comprising 8 library replicates and 8 sequencing replicates. Cross-platform consistency was evaluated by comparing 20 pairwise replicates on the BGISEQ-500 platform vs the Illumina HiSeq 2000 platform and the Illumina HiSeq 4000 platform. In addition, we compared the performance of the 2 Illumina platforms against each other. By a newly developed overall accuracy quality control method, an average of 82.45 million high-quality reads (96.06% of raw reads) per sample, with 90.56% of bases scoring Q30 and above, was obtained using the BGISEQ-500 platform. Quantitative analyses revealed extremely high reproducibility between BGISEQ-500 intra-platform replicates. Cross-platform replicates differed slightly more than intra-platform replicates, yet a high consistency was observed. Only a low percentage (2.02%-3.25%) of genes exhibited significant differences in relative abundance comparing the BGISEQ-500 and HiSeq platforms, with a bias toward genes with higher GC content being enriched on the HiSeq platforms. Our study provides the first set of performance metrics for human gut metagenomic sequencing data using BGISEQ-500. The high accuracy and technical reproducibility confirm the applicability of the new platform for metagenomic studies, though caution is still warranted when combining metagenomic data from different platforms.
Apparatus for unloading nuclear fuel pellets from a sintering boat
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bucher, G.D.; Raymond, T.E.
1987-02-10
An apparatus is described for unloading nuclear fuel pellets from a loaded sintering boat having an open top, comprising: (a) means for receiving the boat in an upright position with the pellets contained therein, the boat receiving means including a platform for supporting the loaded boat in the upright position, the boat supporting platform having first and second portions; (b) means for clamping the boat including a pair of plates disposed at lateral sides of the boat and being movable in a first direction relative to one another for applying clamping forces to the boat on the platform and inmore » a second direction relative to one another for releasing the clamping forces from the boat. The pair of plates have inner surfaces facing toward one another, the first and second platform portions of the boat supporting platform being mounted to the plates on the respective facing surfaces thereof and disposed in a common plane. One of the plates and one of the platform portions mounted thereto are disposed in a stationary position and the other of the plates and the other of the platform portions mounted thereto are movable relative thereto in the first and second directions for applying and releasing clamping forces to and from the boat while the boat is supported in the upright position by the platform portions; (c) means for transferring the clamped boat from the upright position to an inverted position and then back to the upright position; and (d) means of receiving the pellets from the clamped boat as the boat is being transferred from the upright position to the inverted position.« less
A multilayer network dataset of interaction and influence spreading in a virtual world
NASA Astrophysics Data System (ADS)
Jankowski, Jarosław; Michalski, Radosław; Bródka, Piotr
2017-10-01
Presented data contains the record of five spreading campaigns that occurred in a virtual world platform. Users distributed avatars between each other during the campaigns. The processes varied in time and range and were either incentivized or not incentivized. Campaign data is accompanied by events. The data can be used to build a multilayer network to place the campaigns in a wider context. To the best of the authors' knowledge, the study is the first publicly available dataset containing a complete real multilayer social network together, along with five complete spreading processes in it.
3D/2D model-to-image registration by imitation learning for cardiac procedures.
Toth, Daniel; Miao, Shun; Kurzendorfer, Tanja; Rinaldi, Christopher A; Liao, Rui; Mansi, Tommaso; Rhode, Kawal; Mountney, Peter
2018-05-12
In cardiac interventions, such as cardiac resynchronization therapy (CRT), image guidance can be enhanced by involving preoperative models. Multimodality 3D/2D registration for image guidance, however, remains a significant research challenge for fundamentally different image data, i.e., MR to X-ray. Registration methods must account for differences in intensity, contrast levels, resolution, dimensionality, field of view. Furthermore, same anatomical structures may not be visible in both modalities. Current approaches have focused on developing modality-specific solutions for individual clinical use cases, by introducing constraints, or identifying cross-modality information manually. Machine learning approaches have the potential to create more general registration platforms. However, training image to image methods would require large multimodal datasets and ground truth for each target application. This paper proposes a model-to-image registration approach instead, because it is common in image-guided interventions to create anatomical models for diagnosis, planning or guidance prior to procedures. An imitation learning-based method, trained on 702 datasets, is used to register preoperative models to intraoperative X-ray images. Accuracy is demonstrated on cardiac models and artificial X-rays generated from CTs. The registration error was [Formula: see text] on 1000 test cases, superior to that of manual ([Formula: see text]) and gradient-based ([Formula: see text]) registration. High robustness is shown in 19 clinical CRT cases. Besides the proposed methods feasibility in a clinical environment, evaluation has shown good accuracy and high robustness indicating that it could be applied in image-guided interventions.
Architecture for the Interdisciplinary Earth Data Alliance
NASA Astrophysics Data System (ADS)
Richard, S. M.
2016-12-01
The Interdisciplinary Earth Data Alliance (IEDA) is leading an EarthCube (EC) Integrative Activity to develop a governance structure and technology framework that enables partner data systems to share technology, infrastructure, and practice for documenting, curating, and accessing heterogeneous geoscience data. The IEDA data facility provides capabilities in an extensible framework that enables domain-specific requirements for each partner system in the Alliance to be integrated into standardized cross-domain workflows. The shared technology infrastructure includes a data submission hub, a domain-agnostic file-based repository, an integrated Alliance catalog and a Data Browser for data discovery across all partner holdings, as well as services for registering identifiers for datasets (DOI) and samples (IGSN). The submission hub will be a platform that facilitates acquisition of cross-domain resource documentation and channels users into domain and resource-specific workflows tailored for each partner community. We are exploring an event-based message bus architecture with a standardized plug-in interface for adding capabilities. This architecture builds on the EC CINERGI metadata pipeline as well as the message-based architecture of the SEAD project. Plug-in components for file introspection to match entities to a data type registry (extending EC Digital Crust and Research Data Alliance work), extract standardized keywords (using CINERGI components), location, cruise, personnel and other metadata linkage information (building on GeoLink and existing IEDA partner components). The submission hub will feed submissions to appropriate partner repositories and service endpoints targeted by domain and resource type for distribution. The Alliance governance will adopt patterns (vocabularies, operations, resource types) for self-describing data services using standard HTTP protocol for simplified data access (building on EC GeoWS and other `RESTful' approaches). Exposure of resource descriptions (datasets and service distributions) for harvesting by commercial search engines as well as geoscience-data focused crawlers (like EC B-Cube crawler) will increase discoverability of IEDA resources with minimal effort by curators.
The HydroServer Platform for Sharing Hydrologic Data
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Horsburgh, J. S.; Schreuders, K.; Maidment, D. R.; Zaslavsky, I.; Valentine, D. W.
2010-12-01
The CUAHSI Hydrologic Information System (HIS) is an internet based system that supports sharing of hydrologic data. HIS consists of databases connected using the Internet through Web services, as well as software for data discovery, access, and publication. The HIS system architecture is comprised of servers for publishing and sharing data, a centralized catalog to support cross server data discovery and a desktop client to access and analyze data. This paper focuses on HydroServer, the component developed for sharing and publishing space-time hydrologic datasets. A HydroServer is a computer server that contains a collection of databases, web services, tools, and software applications that allow data producers to store, publish, and manage the data from an experimental watershed or project site. HydroServer is designed to permit publication of data as part of a distributed national/international system, while still locally managing access to the data. We describe the HydroServer architecture and software stack, including tools for managing and publishing time series data for fixed point monitoring sites as well as spatially distributed, GIS datasets that describe a particular study area, watershed, or region. HydroServer adopts a standards based approach to data publication, relying on accepted and emerging standards for data storage and transfer. CUAHSI developed HydroServer code is free with community code development managed through the codeplex open source code repository and development system. There is some reliance on widely used commercial software for general purpose and standard data publication capability. The sharing of data in a common format is one way to stimulate interdisciplinary research and collaboration. It is anticipated that the growing, distributed network of HydroServers will facilitate cross-site comparisons and large scale studies that synthesize information from diverse settings, making the network as a whole greater than the sum of its parts in advancing hydrologic research. Details of the CUAHSI HIS can be found at http://his.cuahsi.org, and HydroServer codeplex site http://hydroserver.codeplex.com.
NASA Astrophysics Data System (ADS)
Zednik, S.
2015-12-01
Recent data publication practices have made increasing amounts of diverse datasets available online for the general research community to explore and integrate. Even with the abundance of data online, relevant data discovery and successful integration is still highly dependent upon the data being published with well-formed and understandable metadata. Tagging a dataset with well-known or controlled community terms is a common mechanism to indicate the intended purpose, subject matter, or other relevant facts of a dataset, however controlled domain terminology can be difficult for cross-domain researchers to interpret and leverage. It is also a challenge for integration portals to successfully provide cross-domain search capabilities over data holdings described using many different controlled vocabularies. Mappings between controlled vocabularies can be challenging because communities frequently develop specialized terminologies and have highly specific and contextual usages of common words. Despite this specificity it is highly desirable to produce cross-domain mappings to support data integration. In this contribution we evaluate the applicability of several data analytic techniques for the purpose of generating mappings between hierarchies of controlled science terms. We hope our efforts initiate more discussion on the topic and encourage future mapping efforts.
NASA Astrophysics Data System (ADS)
Fricke, Katharina; Baschek, Björn; Jenal, Alexander; Kneer, Caspar; Weber, Immanuel; Bongartz, Jens; Wyrwa, Jens; Schöl, Andreas
2016-10-01
This study presents the results from a combined aerial survey performed with a hexacopter and a gyrocopter over a part of the Elbe estuary near Hamburg, Germany. The survey was conducted by the Federal Institute of Hydrology, Germany, and the Fraunhofer Application Center for Multimodal and Airborne Sensors as well as by a contracted engineering company with the aim to acquire spatial thermal infrared (TIR) data of the Hahnöfer Nebenelbe, a branch of the Elbe estuary. Additionally, RGB and NIR data was captured to facilitate the identification of water surfaces and intertidal mudflats. The temperature distribution of the Elbe estuary affects all biological processes and in consequence the oxygen content, which is a key parameter in water quality. The oxygen levels vary in space between the main fairway and side channels. So far, only point measurements are available for monitoring and calibration/validation of water quality models. To better represent this highly dynamic system with a high spatial and temporal variability, tidal streams, heating and cooling, diffusion and mixing processes, spatially distributed data from several points of time within the tidal cycle are necessary. The data acquisition took place during two tidal cycles over two subsequent days in the summer of 2015. While the piloted gyrocopter covered the whole Hahnöfer Nebenelbe seven times, the unmanned hexacopter covered a smaller section of the branch and tidal mudflats with a higher spatial and temporal resolution (16 coverages of the subarea). The gyrocopter data was acquired with a thermal imaging system and processed and georeferenced using the structure from motion algorithm with GPS information from the gyrocopter and optional ground control points. The hexacopter data was referenced based on ground control points and the GPS and position information of the acquisition system. Both datasets from the gyrocopter and the hexacopter are corrected for the effects of the atmosphere and emissivity of the water surface and compared to in situ measurements, taken during the data acquisition. Of particular interest is the effect of the observation angle on the brightness temperature acquired by the wide angle lenses on the platforms, which is up to 40° at the margins of the imagery. Here, both datasets show deviating temperatures, which are probably not due to actual temperature differences. We will discuss the position accuracy achieved over the water areas, the adaptation of atmospheric and emissivity correction to the observation angle and subsequent improvement of the temperature data. With two datasets of the same research area at different resolutions we will investigate the effects of the acquisition platforms, acquisition system and resolutions on the accuracy of the remotely sensed temperatures as well as their ability to represent temperature patterns of tidal currents and mixing processes.
Grégoire, Y; Germain, M; Delage, G
2018-05-01
Since 25 May 2010, all donors at our blood centre who tested false-positive for HIV, HBV, HCV or syphilis are eligible for re-entry after further testing. Donors who have a second false-positive screening test, either during qualification for or after re-entry, are deferred for life. This study reports on factors associated with the occurrence of such deferrals. Rates of second false-positive results were compared by year of deferral, transmissible disease marker, gender, age, donor status (new or repeat) and testing platform (same or different) both at qualification for re-entry and afterwards. Chi-square tests were used to compare proportions. Cox regression was used for multivariate analyses. Participation rates in the re-entry programme were 42·1%: 25·6% failed to qualify for re-entry [different platform: 2·7%; same platform: 42·9% (P < 0·0001)]. After re-entry, rates of deferral for second false-positive results were 8·4% after 3 years [different platform: 1·8%; same platform: 21·4% (P < 0·0001)]. Deferral rates were higher for HIV and HCV than for HBV at qualification when tested on the same platform. The risk, when analysed by multivariate analyses, of a second deferral for a false-positive result, both at qualification and 3 years after re-entry, was lower for donors deferred on a different platform; this risk was higher for HIV, HCV and syphilis than for HBV and for new donors if tested on the same platform. Re-entry is more often successful when donors are tested on a testing platform different from the one on which they obtained their first false-positive result. © 2018 International Society of Blood Transfusion.
Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Zhou, Yong; An, Ji-Yong
2017-07-05
Knowledge of drug-target interaction (DTI) plays an important role in discovering new drug candidates. Unfortunately, there are unavoidable shortcomings; including the time-consuming and expensive nature of the experimental method to predict DTI. Therefore, it motivates us to develop an effective computational method to predict DTI based on protein sequence. In the paper, we proposed a novel computational approach based on protein sequence, namely PDTPS (Predicting Drug Targets with Protein Sequence) to predict DTI. The PDTPS method combines Bi-gram probabilities (BIGP), Position Specific Scoring Matrix (PSSM), and Principal Component Analysis (PCA) with Relevance Vector Machine (RVM). In order to evaluate the prediction capacity of the PDTPS, the experiment was carried out on enzyme, ion channel, GPCR, and nuclear receptor datasets by using five-fold cross-validation tests. The proposed PDTPS method achieved average accuracy of 97.73%, 93.12%, 86.78%, and 87.78% on enzyme, ion channel, GPCR and nuclear receptor datasets, respectively. The experimental results showed that our method has good prediction performance. Furthermore, in order to further evaluate the prediction performance of the proposed PDTPS method, we compared it with the state-of-the-art support vector machine (SVM) classifier on enzyme and ion channel datasets, and other exiting methods on four datasets. The promising comparison results further demonstrate that the efficiency and robust of the proposed PDTPS method. This makes it a useful tool and suitable for predicting DTI, as well as other bioinformatics tasks.
NASA Astrophysics Data System (ADS)
Javernick, L.; Bertoldi, W.; Redolfi, M.
2017-12-01
Accessing or acquiring high quality, low-cost topographic data has never been easier due to recent developments of the photogrammetric techniques of Structure-from-Motion (SfM). Researchers can acquire the necessary SfM imagery with various platforms, with the ability to capture millimetre resolution and accuracy, or large-scale areas with the help of unmanned platforms. Such datasets in combination with numerical modelling have opened up new opportunities to study river environments physical and ecological relationships. While numerical models overall predictive accuracy is most influenced by topography, proper model calibration requires hydraulic data and morphological data; however, rich hydraulic and morphological datasets remain scarce. This lack in field and laboratory data has limited model advancement through the inability to properly calibrate, assess sensitivity, and validate the models performance. However, new time-lapse imagery techniques have shown success in identifying instantaneous sediment transport in flume experiments and their ability to improve hydraulic model calibration. With new capabilities to capture high resolution spatial and temporal datasets of flume experiments, there is a need to further assess model performance. To address this demand, this research used braided river flume experiments and captured time-lapse observed sediment transport and repeat SfM elevation surveys to provide unprecedented spatial and temporal datasets. Through newly created metrics that quantified observed and modeled activation, deactivation, and bank erosion rates, the numerical model Delft3d was calibrated. This increased temporal data of both high-resolution time series and long-term temporal coverage provided significantly improved calibration routines that refined calibration parameterization. Model results show that there is a trade-off between achieving quantitative statistical and qualitative morphological representations. Specifically, statistical agreement simulations suffered to represent braiding planforms (evolving toward meandering), and parameterization that ensured braided produced exaggerated activation and bank erosion rates. Marie Sklodowska-Curie Individual Fellowship: River-HMV, 656917
Image Harvest: an open-source platform for high-throughput plant image processing and analysis
Knecht, Avi C.; Campbell, Malachy T.; Caprez, Adam; Swanson, David R.; Walia, Harkamal
2016-01-01
High-throughput plant phenotyping is an effective approach to bridge the genotype-to-phenotype gap in crops. Phenomics experiments typically result in large-scale image datasets, which are not amenable for processing on desktop computers, thus creating a bottleneck in the image-analysis pipeline. Here, we present an open-source, flexible image-analysis framework, called Image Harvest (IH), for processing images originating from high-throughput plant phenotyping platforms. Image Harvest is developed to perform parallel processing on computing grids and provides an integrated feature for metadata extraction from large-scale file organization. Moreover, the integration of IH with the Open Science Grid provides academic researchers with the computational resources required for processing large image datasets at no cost. Image Harvest also offers functionalities to extract digital traits from images to interpret plant architecture-related characteristics. To demonstrate the applications of these digital traits, a rice (Oryza sativa) diversity panel was phenotyped and genome-wide association mapping was performed using digital traits that are used to describe different plant ideotypes. Three major quantitative trait loci were identified on rice chromosomes 4 and 6, which co-localize with quantitative trait loci known to regulate agronomically important traits in rice. Image Harvest is an open-source software for high-throughput image processing that requires a minimal learning curve for plant biologists to analyzephenomics datasets. PMID:27141917
Two novel motion-based algorithms for surveillance video analysis on embedded platforms
NASA Astrophysics Data System (ADS)
Vijverberg, Julien A.; Loomans, Marijn J. H.; Koeleman, Cornelis J.; de With, Peter H. N.
2010-05-01
This paper proposes two novel motion-vector based techniques for target detection and target tracking in surveillance videos. The algorithms are designed to operate on a resource-constrained device, such as a surveillance camera, and to reuse the motion vectors generated by the video encoder. The first novel algorithm for target detection uses motion vectors to construct a consistent motion mask, which is combined with a simple background segmentation technique to obtain a segmentation mask. The second proposed algorithm aims at multi-target tracking and uses motion vectors to assign blocks to targets employing five features. The weights of these features are adapted based on the interaction between targets. These algorithms are combined in one complete analysis application. The performance of this application for target detection has been evaluated for the i-LIDS sterile zone dataset and achieves an F1-score of 0.40-0.69. The performance of the analysis algorithm for multi-target tracking has been evaluated using the CAVIAR dataset and achieves an MOTP of around 9.7 and MOTA of 0.17-0.25. On a selection of targets in videos from other datasets, the achieved MOTP and MOTA are 8.8-10.5 and 0.32-0.49 respectively. The execution time on a PC-based platform is 36 ms. This includes the 20 ms for generating motion vectors, which are also required by the video encoder.
Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data
2016-01-01
Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted ‘glmnet’). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the sample Pearson correlation between observed and imputed genotype dosages at the site and individual level; computation time served as a second metric for comparison. We then set out to examine factors affecting imputation accuracy, such as levels of missing data, read depth, minor allele frequency (MAF), and reference panel composition. PMID:27537694
Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.
Chan, Ariel W; Hamblin, Martha T; Jannink, Jean-Luc
2016-01-01
Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the sample Pearson correlation between observed and imputed genotype dosages at the site and individual level; computation time served as a second metric for comparison. We then set out to examine factors affecting imputation accuracy, such as levels of missing data, read depth, minor allele frequency (MAF), and reference panel composition.
Bernacka-Wojcik, Iwona; Águas, Hugo; Carlos, Fabio Ferreira; Lopes, Paulo; Wojcik, Pawel Jerzy; Costa, Mafalda Nascimento; Veigas, Bruno; Igreja, Rui; Fortunato, Elvira; Baptista, Pedro Viana; Martins, Rodrigo
2015-06-01
The use of microfluidics platforms combined with the optimal optical properties of gold nanoparticles has found plenty of application in molecular biosensing. This paper describes a bio-microfluidic platform coupled to a non-cross-linking colorimetric gold nanoprobe assay to detect a single nucleotide polymorphism associated with increased risk of obesity fat-mass and obesity-associated (FTO) rs9939609 (Carlos et al., 2014). The system enabled significant discrimination between positive and negative assays using a target DNA concentration of 5 ng/µL below the limit of detection of the conventionally used microplate reader (i.e., 15 ng/µL) with 10 times lower solution volume (i.e., 3 µL). A set of optimization of our previously reported bio-microfluidic platform (Bernacka-Wojcik et al., 2013) resulted in a 160% improvement of colorimetric analysis results. Incorporation of planar microlenses increased 6 times signal-to-loss ratio reaching the output optical fiber improving by 34% the colorimetric analysis of gold nanoparticles, while the implementation of an optoelectronic acquisition system yielded increased accuracy and reduced noise. The microfluidic chip was also integrated with a miniature fiber spectrometer to analyze the assays' colorimetric changes and also the LEDs transmission spectra when illuminating through various solutions. Furthermore, by coupling an optical microscope to a digital camera with a long exposure time (30 s), we could visualise the different scatter intensities of gold nanoparticles within channels following salt addition. These intensities correlate well to the expected difference in aggregation between FTO positive (none to small aggregates) and negative samples (large aggregates). © 2015 Wiley Periodicals, Inc.
A global dataset of crowdsourced land cover and land use reference data.
Fritz, Steffen; See, Linda; Perger, Christoph; McCallum, Ian; Schill, Christian; Schepaschenko, Dmitry; Duerauer, Martina; Karner, Mathias; Dresel, Christopher; Laso-Bayas, Juan-Carlos; Lesiv, Myroslava; Moorthy, Inian; Salk, Carl F; Danylo, Olha; Sturn, Tobias; Albrecht, Franziska; You, Liangzhi; Kraxner, Florian; Obersteiner, Michael
2017-06-13
Global land cover is an essential climate variable and a key biophysical driver for earth system models. While remote sensing technology, particularly satellites, have played a key role in providing land cover datasets, large discrepancies have been noted among the available products. Global land use is typically more difficult to map and in many cases cannot be remotely sensed. In-situ or ground-based data and high resolution imagery are thus an important requirement for producing accurate land cover and land use datasets and this is precisely what is lacking. Here we describe the global land cover and land use reference data derived from the Geo-Wiki crowdsourcing platform via four campaigns. These global datasets provide information on human impact, land cover disagreement, wilderness and land cover and land use. Hence, they are relevant for the scientific community that requires reference data for global satellite-derived products, as well as those interested in monitoring global terrestrial ecosystems in general.
A global dataset of crowdsourced land cover and land use reference data
Fritz, Steffen; See, Linda; Perger, Christoph; McCallum, Ian; Schill, Christian; Schepaschenko, Dmitry; Duerauer, Martina; Karner, Mathias; Dresel, Christopher; Laso-Bayas, Juan-Carlos; Lesiv, Myroslava; Moorthy, Inian; Salk, Carl F.; Danylo, Olha; Sturn, Tobias; Albrecht, Franziska; You, Liangzhi; Kraxner, Florian; Obersteiner, Michael
2017-01-01
Global land cover is an essential climate variable and a key biophysical driver for earth system models. While remote sensing technology, particularly satellites, have played a key role in providing land cover datasets, large discrepancies have been noted among the available products. Global land use is typically more difficult to map and in many cases cannot be remotely sensed. In-situ or ground-based data and high resolution imagery are thus an important requirement for producing accurate land cover and land use datasets and this is precisely what is lacking. Here we describe the global land cover and land use reference data derived from the Geo-Wiki crowdsourcing platform via four campaigns. These global datasets provide information on human impact, land cover disagreement, wilderness and land cover and land use. Hence, they are relevant for the scientific community that requires reference data for global satellite-derived products, as well as those interested in monitoring global terrestrial ecosystems in general. PMID:28608851
NASA Astrophysics Data System (ADS)
Klos, Anna; Pottiaux, Eric; Van Malderen, Roeland; Bock, Olivier; Bogusz, Janusz
2017-04-01
A synthetic benchmark dataset of Integrated Water Vapour (IWV) was created within the activity of "Data homogenisation" of sub-working group WG3 of COST ES1206 Action. The benchmark dataset was created basing on the analysis of IWV differences retrieved by Global Positioning System (GPS) International GNSS Service (IGS) stations using European Centre for Medium-Range Weather Forecats (ECMWF) reanalysis data (ERA-Interim). Having analysed a set of 120 series of IWV differences (ERAI-GPS) derived for IGS stations, we delivered parameters of a number of gaps and breaks for every certain station. Moreover, we estimated values of trends, significant seasonalities and character of residuals when deterministic model was removed. We tested five different noise models and found that a combination of white and autoregressive processes of first order describes the stochastic part with a good accuracy. Basing on this analysis, we performed Monte Carlo simulations of 25 years long data with two different types of noise: white as well as combination of white and autoregressive processes. We also added few strictly defined offsets, creating three variants of synthetic dataset: easy, less-complicated and fully-complicated. The 'Easy' dataset included seasonal signals (annual, semi-annual, 3 and 4 months if present for a particular station), offsets and white noise. The 'Less-complicated' dataset included above-mentioned, as well as the combination of white and first order autoregressive processes (AR(1)+WH). The 'Fully-complicated' dataset included, beyond above, a trend and gaps. In this research, we show the impact of manual homogenisation on the estimates of trend and its error. We also cross-compare the results for three above-mentioned datasets, as the synthetized noise type might have a significant influence on manual homogenisation. Therefore, it might mostly affect the values of trend and their uncertainties when inappropriately handled. In a future, the synthetic dataset we present is going to be used as a benchmark to test various statistical tools in terms of homogenisation task.
Zhao, Xiaowei; Ning, Qiao; Chai, Haiting; Ma, Zhiqiang
2015-06-07
As a widespread type of protein post-translational modifications (PTMs), succinylation plays an important role in regulating protein conformation, function and physicochemical properties. Compared with the labor-intensive and time-consuming experimental approaches, computational predictions of succinylation sites are much desirable due to their convenient and fast speed. Currently, numerous computational models have been developed to identify PTMs sites through various types of two-class machine learning algorithms. These methods require both positive and negative samples for training. However, designation of the negative samples of PTMs was difficult and if it is not properly done can affect the performance of computational models dramatically. So that in this work, we implemented the first application of positive samples only learning (PSoL) algorithm to succinylation sites prediction problem, which was a special class of semi-supervised machine learning that used positive samples and unlabeled samples to train the model. Meanwhile, we proposed a novel succinylation sites computational predictor called SucPred (succinylation site predictor) by using multiple feature encoding schemes. Promising results were obtained by the SucPred predictor with an accuracy of 88.65% using 5-fold cross validation on the training dataset and an accuracy of 84.40% on the independent testing dataset, which demonstrated that the positive samples only learning algorithm presented here was particularly useful for identification of protein succinylation sites. Besides, the positive samples only learning algorithm can be applied to build predictors for other types of PTMs sites with ease. A web server for predicting succinylation sites was developed and was freely accessible at http://59.73.198.144:8088/SucPred/. Copyright © 2015 Elsevier Ltd. All rights reserved.
Satellite remote sensing of fine particulate air pollutants over Indian mega cities
NASA Astrophysics Data System (ADS)
Sreekanth, V.; Mahesh, B.; Niranjan, K.
2017-11-01
In the backdrop of the need for high spatio-temporal resolution data on PM2.5 mass concentrations for health and epidemiological studies over India, empirical relations between Aerosol Optical Depth (AOD) and PM2.5 mass concentrations are established over five Indian mega cities. These relations are sought to predict the surface PM2.5 mass concentrations from high resolution columnar AOD datasets. Current study utilizes multi-city public domain PM2.5 data (from US Consulate and Embassy's air monitoring program) and MODIS AOD, spanning for almost four years. PM2.5 is found to be positively correlated with AOD. Station-wise linear regression analysis has shown spatially varying regression coefficients. Similar analysis has been repeated by eliminating data from the elevated aerosol prone seasons, which has improved the correlation coefficient. The impact of the day to day variability in the local meteorological conditions on the AOD-PM2.5 relationship has been explored by performing a multiple regression analysis. A cross-validation approach for the multiple regression analysis considering three years of data as training dataset and one-year data as validation dataset yielded an R value of ∼0.63. The study was concluded by discussing the factors which can improve the relationship.
Thress, Kenneth S; Brant, Roz; Carr, T Hedley; Dearden, Simon; Jenkins, Suzanne; Brown, Helen; Hammett, Tracey; Cantarini, Mireille; Barrett, J Carl
2015-12-01
To assess the ability of different technology platforms to detect epidermal growth factor receptor (EGFR) mutations, including T790M, from circulating tumor DNA (ctDNA) in advanced non-small cell lung cancer (NSCLC) patients. A comparison of multiple platforms for detecting EGFR mutations in plasma ctDNA was undertaken. Plasma samples were collected from patients entering the ongoing AURA trial (NCT01802632), investigating the safety, tolerability, and efficacy of AZD9291 in patients with EGFR-sensitizing mutation-positive NSCLC. Plasma was collected prior to AZD9291 dosing but following clinical progression on a previous EGFR-tyrosine kinase inhibitor (TKI). Extracted ctDNA was analyzed using two non-digital platforms (cobas(®) EGFR Mutation Test and therascreen™ EGFR amplification refractory mutation system assay) and two digital platforms (Droplet Digital™ PCR and BEAMing digital PCR [dPCR]). Preliminary assessment (38 samples) was conducted using all four platforms. For EGFR-TKI-sensitizing mutations, high sensitivity (78-100%) and specificity (93-100%) were observed using tissue as a non-reference standard. For the T790M mutation, the digital platforms outperformed the non-digital platforms. Subsequent assessment using 72 additional baseline plasma samples was conducted using the cobas(®) EGFR Mutation Test and BEAMing dPCR. The two platforms demonstrated high sensitivity (82-87%) and specificity (97%) for EGFR-sensitizing mutations. For the T790M mutation, the sensitivity and specificity were 73% and 67%, respectively, with the cobas(®) EGFR Mutation Test, and 81% and 58%, respectively, with BEAMing dPCR. Concordance between the platforms was >90%, showing that multiple platforms are capable of sensitive and specific detection of EGFR-TKI-sensitizing mutations from NSCLC patient plasma. The cobas(®) EGFR Mutation Test and BEAMing dPCR demonstrate a high sensitivity for T790M mutation detection. Genomic heterogeneity of T790M-mediated resistance may explain the reduced specificity observed with plasma-based detection of T790M mutations versus tissue. These data support the use of both platforms in the AZD9291 clinical development program. Copyright © 2015 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Omenn, Gilbert; States, David J.; Adamski, Marcin
2005-08-13
HUPO initiated the Plasma Proteome Project (PPP) in 2002. Its pilot phase has (1) evaluated advantages and limitations of many depletion, fractionation, and MS technology platforms; (2) compared PPP reference specimens of human serum and EDTA, heparin, and citrate-anticoagulated plasma; and (3) created a publicly-available knowledge base (www.bioinformatics. med.umich.edu/hupo/ppp; www.ebi.ac.uk/pride). Thirty-five participating laboratories in 13 countries submitted datasets. Working groups addressed (a) specimen stability and protein concentrations; (b) protein identifications from 18 MS/MS datasets; (c) independent analyses from raw MS-MS spectra; (d) search engine performance, subproteome analyses, and biological insights; (e) antibody arrays; and (f) direct MS/SELDI analyses. MS-MS datasetsmore » had 15 710 different International Protein Index (IPI) protein IDs; our integration algorithm applied to multiple matches of peptide sequences yielded 9504 IPI proteins identified with one or more peptides and 3020 proteins identified with two or more peptides (the Core Dataset). These proteins have been characterized with Gene Ontology, InterPro, Novartis Atlas, OMIM, and immunoassay based concentration determinations. The database permits examination of many other subsets, such as 1274 proteins identified with three or more peptides. Reverse protein to DNA matching identified proteins for 118 previously unidentified ORFs. We recommend use of plasma instead of serum, with EDTA (or citrate) for anticoagulation. To improve resolution, sensitivity and reproducibility of peptide identifications and protein matches, we recommend combinations of depletion, fractionation, and MS/MS technologies, with explicit criteria for evaluation of spectra, use of search algorithms, and integration of homologous protein matches. This Special Issue of PROTEOMICS presents papers integral to the collaborative analysis plus many reports of supplementary work on various aspects of the PPP workplan. These PPP results on complexity, dynamic range, incomplete sampling, false-positive matches, and integration of diverse datasets for plasma and serum proteins lay a foundation for development and validation of circulating protein biomarkers in health and disease.« less
Merolli, Mark; Gray, Kathleen; Martin-Sanchez, Fernando; Lopez-Campos, Guillermo
2015-01-22
Patient-reported outcomes (PROs) from social media use in chronic disease management continue to emerge. While many published articles suggest the potential for social media is positive, there is a lack of robust examination into mediating mechanisms that might help explain social media's therapeutic value. This study presents findings from a global online survey of people with chronic pain (PWCP) to better understand how they use social media as part of self-management. Our aim is to improve understanding of the various health outcomes reported by PWCP by paying close attention to therapeutic affordances of social media. We wish to examine if demographics of participants underpin health outcomes and whether the concept of therapeutic affordances explains links between social media use and PROs. The goal is for this to help tailor future recommendations for use of social media to meet individuals' health needs and improve clinical practice of social media use. A total of 231 PWCP took part in a global online survey investigating PROs from social media use. Recruited through various chronic disease entities and social networks, participants provided information on demographics, health/pain status, social media use, therapeutic affordances, and PROs from use. Quantitative analysis was performed on the data using descriptive statistics, cross-tabulation, and cluster analysis. The total dataset represented 218 completed surveys. The majority of participants were university educated (67.0%, 146/218) and female (83.9%, 183/218). More than half (58.7%, 128/218) were married/partnered and not working for pay (75.9%, 88/116 of these due to ill health). Fibromyalgia (46.6%, 55/118) and arthritis (27.1%, 32/118) were the most commonly reported conditions causing pain. Participants showed a clear affinity for social network site use (90.0%, 189/210), followed by discussion forums and blogs. PROs were consistent, suggesting that social media positively impact psychological, social, and cognitive health. Analysis also highlighted two strong correlations linking platform used and health outcomes (particularly psychological, social, and cognitive) to (1) the narrative affordance of social media and (2) frequency of use of the platforms. Results did not uncover definitive demographics or characteristics of PWCP for which health outcomes are impacted. However, findings corroborate literature within this domain suggesting that there is a typical profile of people who use social media for health and that social media are more suited to particular health outcomes. Exploration of the relationship between social media's therapeutic affordances and health outcomes, in particular the narration affordance, warrants further attention by patients and clinicians.
High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms
Teodoro, George; Pan, Tony; Kurc, Tahsin M.; Kong, Jun; Cooper, Lee A. D.; Podhorszki, Norbert; Klasky, Scott; Saltz, Joel H.
2014-01-01
Analysis of large pathology image datasets offers significant opportunities for the investigation of disease morphology, but the resource requirements of analysis pipelines limit the scale of such studies. Motivated by a brain cancer study, we propose and evaluate a parallel image analysis application pipeline for high throughput computation of large datasets of high resolution pathology tissue images on distributed CPU-GPU platforms. To achieve efficient execution on these hybrid systems, we have built runtime support that allows us to express the cancer image analysis application as a hierarchical data processing pipeline. The application is implemented as a coarse-grain pipeline of stages, where each stage may be further partitioned into another pipeline of fine-grain operations. The fine-grain operations are efficiently managed and scheduled for computation on CPUs and GPUs using performance aware scheduling techniques along with several optimizations, including architecture aware process placement, data locality conscious task assignment, data prefetching, and asynchronous data copy. These optimizations are employed to maximize the utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. Our experimental evaluation shows that the cooperative use of CPUs and GPUs achieves significant improvements on top of GPU-only versions (up to 1.6×) and that the execution of the application as a set of fine-grain operations provides more opportunities for runtime optimizations and attains better performance than coarser-grain, monolithic implementations used in other works. An implementation of the cancer image analysis pipeline using the runtime support was able to process an image dataset consisting of 36,848 4Kx4K-pixel image tiles (about 1.8TB uncompressed) in less than 4 minutes (150 tiles/second) on 100 nodes of a state-of-the-art hybrid cluster system. PMID:25419546
Crowd computing: using competitive dynamics to develop and refine highly predictive models.
Bentzien, Jörg; Muegge, Ingo; Hamner, Ben; Thompson, David C
2013-05-01
A recent application of a crowd computing platform to develop highly predictive in silico models for use in the drug discovery process is described. The platform, Kaggle™, exploits a competitive dynamic that results in model optimization as the competition unfolds. Here, this dynamic is described in detail and compared with more-conventional modeling strategies. The complete and full structure of the underlying dataset is disclosed and some thoughts as to the broader utility of such 'gamification' approaches to the field of modeling are offered. Copyright © 2013 Elsevier Ltd. All rights reserved.
Prediction of breast cancer risk with volatile biomarkers in breath.
Phillips, Michael; Cataneo, Renee N; Cruz-Ramos, Jose Alfonso; Huston, Jan; Ornelas, Omar; Pappas, Nadine; Pathak, Sonali
2018-03-23
Human breath contains volatile organic compounds (VOCs) that are biomarkers of breast cancer. We investigated the positive and negative predictive values (PPV and NPV) of breath VOC biomarkers as indicators of breast cancer risk. We employed ultra-clean breath collection balloons to collect breath samples from 54 women with biopsy-proven breast cancer and 124 cancer-free controls. Breath VOCs were analyzed with gas chromatography (GC) combined with either mass spectrometry (GC MS) or surface acoustic wave detection (GC SAW). Chromatograms were randomly assigned to a training set or a validation set. Monte Carlo analysis identified significant breath VOC biomarkers of breast cancer in the training set, and these biomarkers were incorporated into a multivariate algorithm to predict disease in the validation set. In the unsplit dataset, the predictive algorithms generated discriminant function (DF) values that varied with sensitivity, specificity, PPV and NPV. Using GC MS, test accuracy = 90% (area under curve of receiver operating characteristic in unsplit dataset) and cross-validated accuracy = 77%. Using GC SAW, test accuracy = 86% and cross-validated accuracy = 74%. With both assays, a low DF value was associated with a low risk of breast cancer (NPV > 99.9%). A high DF value was associated with a high risk of breast cancer and PPV rising to 100%. Analysis of breath VOC samples collected with ultra-clean balloons detected biomarkers that accurately predicted risk of breast cancer.
Large-Scale Astrophysical Visualization on Smartphones
NASA Astrophysics Data System (ADS)
Becciani, U.; Massimino, P.; Costa, A.; Gheller, C.; Grillo, A.; Krokos, M.; Petta, C.
2011-07-01
Nowadays digital sky surveys and long-duration, high-resolution numerical simulations using high performance computing and grid systems produce multidimensional astrophysical datasets in the order of several Petabytes. Sharing visualizations of such datasets within communities and collaborating research groups is of paramount importance for disseminating results and advancing astrophysical research. Moreover educational and public outreach programs can benefit greatly from novel ways of presenting these datasets by promoting understanding of complex astrophysical processes, e.g., formation of stars and galaxies. We have previously developed VisIVO Server, a grid-enabled platform for high-performance large-scale astrophysical visualization. This article reviews the latest developments on VisIVO Web, a custom designed web portal wrapped around VisIVO Server, then introduces VisIVO Smartphone, a gateway connecting VisIVO Web and data repositories for mobile astrophysical visualization. We discuss current work and summarize future developments.
Luegmair, Karolina; Zenzmaier, Christoph; Oblasser, Claudia; König-Bachmann, Martina
2018-04-01
to evaluate women's satisfaction with care at the birthplace in Austria and to provide reference data for cross-country comparisons within the international Babies Born Better project. a cross-sectional design was applied. The data were extracted from the Babies Born Better survey as a national sub-dataset that included all participants with Austria as the indicated country of residence. an online survey targeting women who had given birth within the last five years and distributed primarily via social media. In addition to sociodemographic and closed-ended questions regarding pregnancy and the childbirth environment, the women's childbirth experiences and satisfaction with the birthplace were obtained with three open-ended questions regarding (i) best experience of care, (ii) required changes in care and (iii) honest description of the experienced care. five hundred thirty-nine women who had given birth in Austria within the last five years. based on the concepts of public health, salutogenesis and self-efficacy, a deductive coding framework was developed and applied to analyse the qualitative data of the Babies Born Better survey. Regarding honest descriptions of the experienced care at the birthplace, 82% were positive, indicating that most of the respondents were mostly satisfied with the care experienced. More than 95% of the survey participants' positive experiences and more than 87% of their negative experiences with care could be assigned to the categories of the deductive coding framework. Whereas positive experiences mainly addressed care experienced at the individual level, negative experiences more frequently related to issues of the existing infrastructure, breastfeeding counselling or topics not covered by the coding framework. Evaluation of these unassigned responses revealed an emphasis on antenatal and puerperal care as well as insufficient reimbursements of expenses by health insurance funds and the desire for more midwifery-led care. although the participating women were mostly satisfied with perinatal care in Austria, it appears that shortcomings particularly exist in antenatal and puerperal care and counselling. the established coding framework that covered the vast majority of the women's responses to the open-ended questions might serve as a basis for cross-country comparisons within the international Babies Born Better project. Copyright © 2018. Published by Elsevier Ltd.
Machine Learning for Flood Prediction in Google Earth Engine
NASA Astrophysics Data System (ADS)
Kuhn, C.; Tellman, B.; Max, S. A.; Schwarz, B.
2015-12-01
With the increasing availability of high-resolution satellite imagery, dynamic flood mapping in near real time is becoming a reachable goal for decision-makers. This talk describes a newly developed framework for predicting biophysical flood vulnerability using public data, cloud computing and machine learning. Our objective is to define an approach to flood inundation modeling using statistical learning methods deployed in a cloud-based computing platform. Traditionally, static flood extent maps grounded in physically based hydrologic models can require hours of human expertise to construct at significant financial cost. In addition, desktop modeling software and limited local server storage can impose restraints on the size and resolution of input datasets. Data-driven, cloud-based processing holds promise for predictive watershed modeling at a wide range of spatio-temporal scales. However, these benefits come with constraints. In particular, parallel computing limits a modeler's ability to simulate the flow of water across a landscape, rendering traditional routing algorithms unusable in this platform. Our project pushes these limits by testing the performance of two machine learning algorithms, Support Vector Machine (SVM) and Random Forests, at predicting flood extent. Constructed in Google Earth Engine, the model mines a suite of publicly available satellite imagery layers to use as algorithm inputs. Results are cross-validated using MODIS-based flood maps created using the Dartmouth Flood Observatory detection algorithm. Model uncertainty highlights the difficulty of deploying unbalanced training data sets based on rare extreme events.
Zhang, Mingjing; Wen, Ming; Zhang, Zhi-Min; Lu, Hongmei; Liang, Yizeng; Zhan, Dejian
2015-03-01
Retention time shift is one of the most challenging problems during the preprocessing of massive chromatographic datasets. Here, an improved version of the moving window fast Fourier transform cross-correlation algorithm is presented to perform nonlinear and robust alignment of chromatograms by analyzing the shifts matrix generated by moving window procedure. The shifts matrix in retention time can be estimated by fast Fourier transform cross-correlation with a moving window procedure. The refined shift of each scan point can be obtained by calculating the mode of corresponding column of the shifts matrix. This version is simple, but more effective and robust than the previously published moving window fast Fourier transform cross-correlation method. It can handle nonlinear retention time shift robustly if proper window size has been selected. The window size is the only one parameter needed to adjust and optimize. The properties of the proposed method are investigated by comparison with the previous moving window fast Fourier transform cross-correlation and recursive alignment by fast Fourier transform using chromatographic datasets. The pattern recognition results of a gas chromatography mass spectrometry dataset of metabolic syndrome can be improved significantly after preprocessing by this method. Furthermore, the proposed method is available as an open source package at https://github.com/zmzhang/MWFFT2. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Gouret, Philippe; Vitiello, Vérane; Balandraud, Nathalie; Gilles, André; Pontarotti, Pierre; Danchin, Etienne GJ
2005-01-01
Background Two of the main objectives of the genomic and post-genomic era are to structurally and functionally annotate genomes which consists of detecting genes' position and structure, and inferring their function (as well as of other features of genomes). Structural and functional annotation both require the complex chaining of numerous different software, algorithms and methods under the supervision of a biologist. The automation of these pipelines is necessary to manage huge amounts of data released by sequencing projects. Several pipelines already automate some of these complex chaining but still necessitate an important contribution of biologists for supervising and controlling the results at various steps. Results Here we propose an innovative automated platform, FIGENIX, which includes an expert system capable to substitute to human expertise at several key steps. FIGENIX currently automates complex pipelines of structural and functional annotation under the supervision of the expert system (which allows for example to make key decisions, check intermediate results or refine the dataset). The quality of the results produced by FIGENIX is comparable to those obtained by expert biologists with a drastic gain in terms of time costs and avoidance of errors due to the human manipulation of data. Conclusion The core engine and expert system of the FIGENIX platform currently handle complex annotation processes of broad interest for the genomic community. They could be easily adapted to new, or more specialized pipelines, such as for example the annotation of miRNAs, the classification of complex multigenic families, annotation of regulatory elements and other genomic features of interest. PMID:16083500
NASA Astrophysics Data System (ADS)
Stanfield, R.; Dong, X.; Su, H.; Xi, B.; Jiang, J. H.
2016-12-01
In the past few years, studies have found a strong connection between atmospheric heat transport across the equator (AHTEQ) and the position of the ITCZ. This study investigates the seasonal, annual-mean and interannual variability of the ITCZ position and explores the relationships between the ITCZ position and inter-hemispheric energy transport in NASA NEWS products, multiple reanalyses datasets, and CMIP5 simulations. We find large discrepancies exist in the ITCZ-AHTEQ relationships in these datasets and model simulations. The components of energy fluxes are examined to identify the primary sources for the discrepancies among the datasets and models results.
Evaluation of Game Engines for Cross-Platform Development of Mobile Serious Games for Health.
Kleinschmidt, Carina; Haag, Martin
2016-01-01
Studies have shown that serious games for health can improve patient compliance and help to increase the quality of medical education. Due to a growing availability of mobile devices, especially the development of cross-platform mobile apps is helpful for improving healthcare. As the development can be highly time-consuming and expensive, an alternative development process is needed. Game engines are expected to simplify this process. Therefore, this article examines the question whether using game engines for cross-platform serious games for health can simplify the development compared to the development of a plain HTML5 app. At first, a systematic review of the literature was conducted in different databases (MEDLINE, ACM and IEEE). Afterwards three different game engines were chosen, evaluated in different categories and compared to the development of a HTML5 app. This was realized by implementing a prototypical application in the different engines and conducting a utility analysis. The evaluation shows that the Marmalade engine is the best choice for development in this scenario. Furthermore, it is obvious that the game engines have great benefits against plain HTML5 development as they provide components for graphics, physics, sounds, etc. The authors recommend to use the Marmalade Engine for a cross-platform mobile Serious Game for Health.
StreptoBase: An Oral Streptococcus mitis Group Genomic Resource and Analysis Platform.
Zheng, Wenning; Tan, Tze King; Paterson, Ian C; Mutha, Naresh V R; Siow, Cheuk Chuen; Tan, Shi Yang; Old, Lesley A; Jakubovics, Nicholas S; Choo, Siew Woh
2016-01-01
The oral streptococci are spherical Gram-positive bacteria categorized under the phylum Firmicutes which are among the most common causative agents of bacterial infective endocarditis (IE) and are also important agents in septicaemia in neutropenic patients. The Streptococcus mitis group is comprised of 13 species including some of the most common human oral colonizers such as S. mitis, S. oralis, S. sanguinis and S. gordonii as well as species such as S. tigurinus, S. oligofermentans and S. australis that have only recently been classified and are poorly understood at present. We present StreptoBase, which provides a specialized free resource focusing on the genomic analyses of oral species from the mitis group. It currently hosts 104 S. mitis group genomes including 27 novel mitis group strains that we sequenced using the high throughput Illumina HiSeq technology platform, and provides a comprehensive set of genome sequences for analyses, particularly comparative analyses and visualization of both cross-species and cross-strain characteristics of S. mitis group bacteria. StreptoBase incorporates sophisticated in-house designed bioinformatics web tools such as Pairwise Genome Comparison (PGC) tool and Pathogenomic Profiling Tool (PathoProT), which facilitate comparative pathogenomics analysis of Streptococcus strains. Examples are provided to demonstrate how StreptoBase can be employed to compare genome structure of different S. mitis group bacteria and putative virulence genes profile across multiple streptococcal strains. In conclusion, StreptoBase offers access to a range of streptococci genomic resources as well as analysis tools and will be an invaluable platform to accelerate research in streptococci. Database URL: http://streptococcus.um.edu.my.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mbah, Chamberlain, E-mail: chamberlain.mbah@ugent.be; Department of Mathematical Modeling, Statistics, and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent; Thierens, Hubert
Purpose: To identify the main causes underlying the failure of prediction models for radiation therapy toxicity to replicate. Methods and Materials: Data were used from two German cohorts, Individual Radiation Sensitivity (ISE) (n=418) and Mammary Carcinoma Risk Factor Investigation (MARIE) (n=409), of breast cancer patients with similar characteristics and radiation therapy treatments. The toxicity endpoint chosen was telangiectasia. The LASSO (least absolute shrinkage and selection operator) logistic regression method was used to build a predictive model for a dichotomized endpoint (Radiation Therapy Oncology Group/European Organization for the Research and Treatment of Cancer score 0, 1, or ≥2). Internal areas undermore » the receiver operating characteristic curve (inAUCs) were calculated by a naïve approach whereby the training data (ISE) were also used for calculating the AUC. Cross-validation was also applied to calculate the AUC within the same cohort, a second type of inAUC. Internal AUCs from cross-validation were calculated within ISE and MARIE separately. Models trained on one dataset (ISE) were applied to a test dataset (MARIE) and AUCs calculated (exAUCs). Results: Internal AUCs from the naïve approach were generally larger than inAUCs from cross-validation owing to overfitting the training data. Internal AUCs from cross-validation were also generally larger than the exAUCs, reflecting heterogeneity in the predictors between cohorts. The best models with largest inAUCs from cross-validation within both cohorts had a number of common predictors: hypertension, normalized total boost, and presence of estrogen receptors. Surprisingly, the effect (coefficient in the prediction model) of hypertension on telangiectasia incidence was positive in ISE and negative in MARIE. Other predictors were also not common between the 2 cohorts, illustrating that overcoming overfitting does not solve the problem of replication failure of prediction models completely. Conclusions: Overfitting and cohort heterogeneity are the 2 main causes of replication failure of prediction models across cohorts. Cross-validation and similar techniques (eg, bootstrapping) cope with overfitting, but the development of validated predictive models for radiation therapy toxicity requires strategies that deal with cohort heterogeneity.« less
Time Series Econometrics for the 21st Century
ERIC Educational Resources Information Center
Hansen, Bruce E.
2017-01-01
The field of econometrics largely started with time series analysis because many early datasets were time-series macroeconomic data. As the field developed, more cross-sectional and longitudinal datasets were collected, which today dominate the majority of academic empirical research. In nonacademic (private sector, central bank, and governmental)…
Chaudhuri, Rima; Sadrieh, Arash; Hoffman, Nolan J; Parker, Benjamin L; Humphrey, Sean J; Stöckli, Jacqueline; Hill, Adam P; James, David E; Yang, Jean Yee Hwa
2015-08-19
Most biological processes are influenced by protein post-translational modifications (PTMs). Identifying novel PTM sites in different organisms, including humans and model organisms, has expedited our understanding of key signal transduction mechanisms. However, with increasing availability of deep, quantitative datasets in diverse species, there is a growing need for tools to facilitate cross-species comparison of PTM data. This is particularly important because functionally important modification sites are more likely to be evolutionarily conserved; yet cross-species comparison of PTMs is difficult since they often lie in structurally disordered protein domains. Current tools that address this can only map known PTMs between species based on known orthologous phosphosites, and do not enable the cross-species mapping of newly identified modification sites. Here, we addressed this by developing a web-based software tool, PhosphOrtholog ( www.phosphortholog.com ) that accurately maps protein modification sites between different species. This facilitates the comparison of datasets derived from multiple species, and should be a valuable tool for the proteomics community. Here we describe PhosphOrtholog, a web-based application for mapping known and novel orthologous PTM sites from experimental data obtained from different species. PhosphOrtholog is the only generic and automated tool that enables cross-species comparison of large-scale PTM datasets without relying on existing PTM databases. This is achieved through pairwise sequence alignment of orthologous protein residues. To demonstrate its utility we apply it to two sets of human and rat muscle phosphoproteomes generated following insulin and exercise stimulation, respectively, and one publicly available mouse phosphoproteome following cellular stress revealing high mapping and coverage efficiency. Although coverage statistics are dataset dependent, PhosphOrtholog increased the number of cross-species mapped sites in all our example data sets by more than double when compared to those recovered using existing resources such as PhosphoSitePlus. PhosphOrtholog is the first tool that enables mapping of thousands of novel and known protein phosphorylation sites across species, accessible through an easy-to-use web interface. Identification of conserved PTMs across species from large-scale experimental data increases our knowledgebase of functional PTM sites. Moreover, PhosphOrtholog is generic being applicable to other PTM datasets such as acetylation, ubiquitination and methylation.
NASA Technical Reports Server (NTRS)
Bates, Lisa B.; Young, David T.
2012-01-01
This paper describes recent developmental testing to verify the integration of a developmental electromechanical actuator (EMA) with high rate lithium ion batteries and a cross platform extensible controller. Testing was performed at the Thrust Vector Control Research, Development and Qualification Laboratory at the NASA George C. Marshall Space Flight Center. Electric Thrust Vector Control (ETVC) systems like the EMA may significantly reduce recurring launch costs and complexity compared to heritage systems. Electric actuator mechanisms and control requirements across dissimilar platforms are also discussed with a focus on the similarities leveraged and differences overcome by the cross platform extensible common controller architecture.
An approach to drought data web-dissemination
NASA Astrophysics Data System (ADS)
Angeluccetti, Irene; Perez, Francesca; Balbo, Simone; Cámaro, Walther; Boccardo, Piero
2017-04-01
Drought data dissemination has always been a challenge for the scientific community. Firstly, a variety of widely known datasets is currently being used to describe different aspects of this same phenomenon. Secondly, new indexes are constantly being produced by scientists trying to better capture drought events. The present work aims at presenting how the drought monitoring communication issue was addressed by the ITHACA team. The ITHACA drought monitoring system makes use of two indicators: the Standardized Precipitation Index (SPI) and the Seasonal Small Integral Deviation (SSID). The first one is obtained considering the 3-months cumulating interval of the rainfall derived from the TRMM dataset; the second one is the percent deviation from the historical average value of the integral of the NDVI function describing the vegetation season. The SPI and the SSID are 30 and 5 km gridded respectively. The whole time-series of these two indicators (since year 2000 onwards), covering the whole Africa, are published by a WebGIS platform (http://drought.ithacaweb.org). On the one hand, although the SPI has been used for decades in different contexts and little explanation is due when presenting this indicator to an audience with a scientific background, the WebGIS platform shows a guide for its correct interpretation. On the other hand, being the SSID not commonly used in the field of vegetation analysis, the guide shown on the WebGIS platform is essential for the visitor to understand the data. Recently a new index has been created in order to synthesize, for a non-expert audience, the information provided by the indicators. It is aggregated per second order administrative levels and is calculated as follows: (i) a meteorological drought warning is issued when negative SPI and no vegetative season is detected (a blue palette is used); (ii) a warning value is assigned if SSID, SPI, or both, are negative (amber to brown palette is used) i.e., where the vegetative season is ongoing and the SSID is negative, a negative SPI value entails an agricultural drought warning, while a positive SPI implies a vegetation stress warning; (iv) a meteorological drought warning is issued when negative SPI during the vegetation season is detected but vegetation stress effects are not (i.e. positive SSID). The latest available Drought Warning Index is also published on the mentioned WebGIS platform. The index is stored in a database table: a single value is calculated for each administrative level. A table view on the database contains fields describing the geometry of the administrative level polygons and the respective index; this table view is published as a WMS service, by associating the symbology previously described. The WMS service is then captured in order to generate a live map with a series of basic WebGIS functionalities. The integrated index is undoubtedly useful for a non-expert user to understand immediately if a particular region is subject to a drought stress. However, the simplification introduces uncertainty as it implies several assumptions that couldn't be verified at a continental scale.
Ahern, Thomas P.; Beck, Andrew H.; Rosner, Bernard A.; Glass, Ben; Frieling, Gretchen; Collins, Laura C.; Tamimi, Rulla M.
2017-01-01
Background Computational pathology platforms incorporate digital microscopy with sophisticated image analysis to permit rapid, continuous measurement of protein expression. We compared two computational pathology platforms on their measurement of breast tumor estrogen receptor (ER) and progesterone receptor (PR) expression. Methods Breast tumor microarrays from the Nurses’ Health Study were stained for ER (n=592) and PR (n=187). One expert pathologist scored cases as positive if ≥1% of tumor nuclei exhibited stain. ER and PR were then measured with the Definiens Tissue Studio (automated) and Aperio Digital Pathology (user-supervised) platforms. Platform-specific measurements were compared using boxplots, scatter plots, and correlation statistics. Classification of ER and PR positivity by platform-specific measurements was evaluated with areas under receiver operating characteristic curves (AUC) from univariable logistic regression models, using expert pathologist classification as the standard. Results Both platforms showed considerable overlap in continuous measurements of ER and PR between positive and negative groups classified by expert pathologist. Platform-specific measurements were strongly and positively correlated with one another (rho≥0.77). The user-supervised Aperio workflow performed slightly better than the automated Definiens workflow at classifying ER positivity (AUCAperio=0.97; AUCDefiniens=0.90; difference=0.07, 95% CI: 0.05, 0.09) and PR positivity (AUCAperio=0.94; AUCDefiniens=0.87; difference=0.07, 95% CI: 0.03, 0.12). Conclusion Paired hormone receptor expression measurements from two different computational pathology platforms agreed well with one another. The user-supervised workflow yielded better classification accuracy than the automated workflow. Appropriately validated computational pathology algorithms enrich molecular epidemiology studies with continuous protein expression data and may accelerate tumor biomarker discovery. PMID:27729430
Human-Robot Emergency Response - Experimental Platform and Preliminary Dataset
2014-07-28
Proceedings of the IEEE International Conference on Robotics and Automation, Leuven, Belgium, May 16–21 1998, pp. 3715–3720. [13] itseez, “ Opencv ,” http...function and camshift function in OpenCV [13]. In each image obtained form cameras, we first calculate back projection of a histogram model of a human. In
NASA Astrophysics Data System (ADS)
Tisdale, M.
2016-12-01
NASA's Atmospheric Science Data Center (ASDC) is operationally using the Esri ArcGIS Platform to improve data discoverability, accessibility and interoperability to meet the diversifying government, private, public and academic communities' driven requirements. The ASDC is actively working to provide their mission essential datasets as ArcGIS Image Services, Open Geospatial Consortium (OGC) Web Mapping Services (WMS), OGC Web Coverage Services (WCS) and leveraging the ArcGIS multidimensional mosaic dataset structure. Science teams and ASDC are utilizing these services, developing applications using the Web AppBuilder for ArcGIS and ArcGIS API for Javascript, and evaluating restructuring their data production and access scripts within the ArcGIS Python Toolbox framework and Geoprocessing service environment. These capabilities yield a greater usage and exposure of ASDC data holdings and provide improved geospatial analytical tools for a mission critical understanding in the areas of the earth's radiation budget, clouds, aerosols, and tropospheric chemistry.
A real-time method to predict social media popularity
NASA Astrophysics Data System (ADS)
Chen, Xiao; Lu, Zhe-Ming
How to predict the future popularity of a message or video on online social media (OSM) has long been an attractive problem for researchers. Although many difficulties are still ahead, recent studies suggest that temporal and topological features of early adopters generally play a very important role. However, with the increase of the adopters, the feature space will grow explosively. How to select the most effective features is still an open issue. In this work, we investigate several feature extraction methods over the Twitter platform and find that most predictive power concentrates on the second half of the propagation period, and that not only a model trained on one platform generalizes well to others as previous works observed, but also a model trained on one dataset performs well on predicting the popularity for other datasets with different number of observed early adopters. According to these findings, at least for the best features by far, the data used to extract features can be halved without loss of evident accuracy and we provide a way to roughly predict the growth trend of a social-media item in real-time.
NASA Astrophysics Data System (ADS)
Tisdale, M.
2017-12-01
NASA's Atmospheric Science Data Center (ASDC) is operationally using the Esri ArcGIS Platform to improve data discoverability, accessibility and interoperability to meet the diversifying user requirements from government, private, public and academic communities. The ASDC is actively working to provide their mission essential datasets as ArcGIS Image Services, Open Geospatial Consortium (OGC) Web Mapping Services (WMS), and OGC Web Coverage Services (WCS) while leveraging the ArcGIS multidimensional mosaic dataset structure. Science teams at ASDC are utilizing these services through the development of applications using the Web AppBuilder for ArcGIS and the ArcGIS API for Javascript. These services provide greater exposure of ASDC data holdings to the GIS community and allow for broader sharing and distribution to various end users. These capabilities provide interactive visualization tools and improved geospatial analytical tools for a mission critical understanding in the areas of the earth's radiation budget, clouds, aerosols, and tropospheric chemistry. The presentation will cover how the ASDC is developing geospatial web services and applications to improve data discoverability, accessibility, and interoperability.
NASA Astrophysics Data System (ADS)
Nechad, Bouchra; Alvera-Azcaràte, Aida; Ruddick, Kevin; Greenwood, Naomi
2011-08-01
In situ measurements of total suspended matter (TSM) over the period 2003-2006, collected with two autonomous platforms from the Centre for Environment, Fisheries and Aquatic Sciences (Cefas) measuring the optical backscatter (OBS) in the southern North Sea, are used to assess the accuracy of TSM time series extracted from satellite data. Since there are gaps in the remote sensing (RS) data, due mainly to cloud cover, the Data Interpolating Empirical Orthogonal Functions (DINEOF) is used to fill in the TSM time series and build a continuous daily "recoloured" dataset. The RS datasets consist of TSM maps derived from MODIS imagery using the bio-optical model of Nechad et al. (Rem Sens Environ 114: 854-866, 2010). In this study, the DINEOF time series are compared to the in situ OBS measured in moderately to very turbid waters respectively in West Gabbard and Warp Anchorage, in the southern North Sea. The discrepancies between instantaneous RS, DINEOF-filled RS data and Cefas data are analysed in terms of TSM algorithm uncertainties, space-time variability and DINEOF reconstruction uncertainty.
A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gittens, Alex; Kottalam, Jey; Yang, Jiyan
We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with themore » fastest times obtained on the experimental Cray cluster. In comparison, the C implementation was 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments.« less
CINERGI: Community Inventory of EarthCube Resources for Geoscience Interoperability
NASA Astrophysics Data System (ADS)
Zaslavsky, Ilya; Bermudez, Luis; Grethe, Jeffrey; Gupta, Amarnath; Hsu, Leslie; Lehnert, Kerstin; Malik, Tanu; Richard, Stephen; Valentine, David; Whitenack, Thomas
2014-05-01
Organizing geoscience data resources to support cross-disciplinary data discovery, interpretation, analysis and integration is challenging because of different information models, semantic frameworks, metadata profiles, catalogs, and services used in different geoscience domains, not to mention different research paradigms and methodologies. The central goal of CINERGI, a new project supported by the US National Science Foundation through its EarthCube Building Blocks program, is to create a methodology and assemble a large inventory of high-quality information resources capable of supporting data discovery needs of researchers in a wide range of geoscience domains. The key characteristics of the inventory are: 1) collaboration with and integration of metadata resources from a number of large data facilities; 2) reliance on international metadata and catalog service standards; 3) assessment of resource "interoperability-readiness"; 4) ability to cross-link and navigate data resources, projects, models, researcher directories, publications, usage information, etc.; 5) efficient inclusion of "long-tail" data, which are not appearing in existing domain repositories; 6) data registration at feature level where appropriate, in addition to common dataset-level registration, and 7) integration with parallel EarthCube efforts, in particular focused on EarthCube governance, information brokering, service-oriented architecture design and management of semantic information. We discuss challenges associated with accomplishing CINERGI goals, including defining the inventory scope; managing different granularity levels of resource registration; interaction with search systems of domain repositories; explicating domain semantics; metadata brokering, harvesting and pruning; managing provenance of the harvested metadata; and cross-linking resources based on the linked open data (LOD) approaches. At the higher level of the inventory, we register domain-wide resources such as domain catalogs, vocabularies, information models, data service specifications, identifier systems, and assess their conformance with international standards (such as those adopted by ISO and OGC, and used by INSPIRE) or de facto community standards using, in part, automatic validation techniques. The main level in CINERGI leverages a metadata aggregation platform (currently Geoportal Server) to organize harvested resources from multiple collections and contributed by community members during EarthCube end-user domain workshops or suggested online. The latter mechanism uses the SciCrunch toolkit originally developed within the Neuroscience Information Framework (NIF) project and now being extended to other communities. The inventory is designed to support requests such as "Find resources with theme X in geographic area S", "Find datasets with subject Y using query concept expansion", "Find geographic regions having data of type Z", "Find datasets that contain property P". With the added LOD support, additional types of requests, such as "Find example implementations of specification X", "Find researchers who have worked in Domain X, dataset Y, location L", "Find resources annotated by person X", will be supported. Project's website (http://workspace.earthcube.org/cinergi) provides access to the initial resource inventory, a gallery of EarthCube researchers, collections of geoscience models, metadata entry forms, and other software modules and inventories being integrated into the CINERGI system. Support from the US National Science Foundation under award NSF ICER-1343816 is gratefully acknowledged.
Platform Architecture for Decentralized Positioning Systems.
Kasmi, Zakaria; Norrdine, Abdelmoumen; Blankenbach, Jörg
2017-04-26
A platform architecture for positioning systems is essential for the realization of a flexible localization system, which interacts with other systems and supports various positioning technologies and algorithms. The decentralized processing of a position enables pushing the application-level knowledge into a mobile station and avoids the communication with a central unit such as a server or a base station. In addition, the calculation of the position on low-cost and resource-constrained devices presents a challenge due to the limited computing, storage capacity, as well as power supply. Therefore, we propose a platform architecture that enables the design of a system with the reusability of the components, extensibility (e.g., with other positioning technologies) and interoperability. Furthermore, the position is computed on a low-cost device such as a microcontroller, which simultaneously performs additional tasks such as data collecting or preprocessing based on an operating system. The platform architecture is designed, implemented and evaluated on the basis of two positioning systems: a field strength system and a time of arrival-based positioning system.
Platform Architecture for Decentralized Positioning Systems
Kasmi, Zakaria; Norrdine, Abdelmoumen; Blankenbach, Jörg
2017-01-01
A platform architecture for positioning systems is essential for the realization of a flexible localization system, which interacts with other systems and supports various positioning technologies and algorithms. The decentralized processing of a position enables pushing the application-level knowledge into a mobile station and avoids the communication with a central unit such as a server or a base station. In addition, the calculation of the position on low-cost and resource-constrained devices presents a challenge due to the limited computing, storage capacity, as well as power supply. Therefore, we propose a platform architecture that enables the design of a system with the reusability of the components, extensibility (e.g., with other positioning technologies) and interoperability. Furthermore, the position is computed on a low-cost device such as a microcontroller, which simultaneously performs additional tasks such as data collecting or preprocessing based on an operating system. The platform architecture is designed, implemented and evaluated on the basis of two positioning systems: a field strength system and a time of arrival-based positioning system. PMID:28445414
Near Real-time Scientific Data Analysis and Visualization with the ArcGIS Platform
NASA Astrophysics Data System (ADS)
Shrestha, S. R.; Viswambharan, V.; Doshi, A.
2017-12-01
Scientific multidimensional data are generated from a variety of sources and platforms. These datasets are mostly produced by earth observation and/or modeling systems. Agencies like NASA, NOAA, USGS, and ESA produce large volumes of near real-time observation, forecast, and historical data that drives fundamental research and its applications in larger aspects of humanity from basic decision making to disaster response. A common big data challenge for organizations working with multidimensional scientific data and imagery collections is the time and resources required to manage and process such large volumes and varieties of data. The challenge of adopting data driven real-time visualization and analysis, as well as the need to share these large datasets, workflows, and information products to wider and more diverse communities, brings an opportunity to use the ArcGIS platform to handle such demand. In recent years, a significant effort has put in expanding the capabilities of ArcGIS to support multidimensional scientific data across the platform. New capabilities in ArcGIS to support scientific data management, processing, and analysis as well as creating information products from large volumes of data using the image server technology are becoming widely used in earth science and across other domains. We will discuss and share the challenges associated with big data by the geospatial science community and how we have addressed these challenges in the ArcGIS platform. We will share few use cases, such as NOAA High Resolution Refresh Radar (HRRR) data, that demonstrate how we access large collections of near real-time data (that are stored on-premise or on the cloud), disseminate them dynamically, process and analyze them on-the-fly, and serve them to a variety of geospatial applications. We will also share how on-the-fly processing using raster functions capabilities, can be extended to create persisted data and information products using raster analytics capabilities that exploit distributed computing in an enterprise environment.
Hoo-Chang, Shin; Roth, Holger R.; Gao, Mingchen; Lu, Le; Xu, Ziyue; Nogues, Isabella; Yao, Jianhua; Mollura, Daniel
2016-01-01
Remarkable progress has been made in image recognition, primarily due to the availability of large-scale annotated datasets (i.e. ImageNet) and the revival of deep convolutional neural networks (CNN). CNNs enable learning data-driven, highly representative, layered hierarchical image features from sufficient training data. However, obtaining datasets as comprehensively annotated as ImageNet in the medical imaging domain remains a challenge. There are currently three major techniques that successfully employ CNNs to medical image classification: training the CNN from scratch, using off-the-shelf pre-trained CNN features, and conducting unsupervised CNN pre-training with supervised fine-tuning. Another effective method is transfer learning, i.e., fine-tuning CNN models (supervised) pre-trained from natural image dataset to medical image tasks (although domain transfer between two medical image datasets is also possible). In this paper, we exploit three important, but previously understudied factors of employing deep convolutional neural networks to computer-aided detection problems. We first explore and evaluate different CNN architectures. The studied models contain 5 thousand to 160 million parameters, and vary in numbers of layers. We then evaluate the influence of dataset scale and spatial image context on performance. Finally, we examine when and why transfer learning from pre-trained ImageNet (via fine-tuning) can be useful. We study two specific computeraided detection (CADe) problems, namely thoraco-abdominal lymph node (LN) detection and interstitial lung disease (ILD) classification. We achieve the state-of-the-art performance on the mediastinal LN detection, with 85% sensitivity at 3 false positive per patient, and report the first five-fold cross-validation classification results on predicting axial CT slices with ILD categories. Our extensive empirical evaluation, CNN model analysis and valuable insights can be extended to the design of high performance CAD systems for other medical imaging tasks. PMID:26886976
Ali, Abdirahman A; O'Neill, Christopher J; Thomson, Peter C; Kadarmideen, Haja N
2012-07-27
Infectious bovine keratoconjunctivitis (IBK) or 'pinkeye' is an economically important ocular disease that significantly impacts animal performance. Genetic parameters for IBK infection and its genetic and phenotypic correlations with cattle tick counts, number of helminth (unspecified species) eggs per gram of faeces and growth traits in Australian tropically adapted Bos taurus cattle were estimated. Animals were clinically examined for the presence of IBK infection before and after weaning when the calves were 3 to 6 months and 15 to 18 months old, respectively and were also recorded for tick counts, helminth eggs counts as an indicator of intestinal parasites and live weights at several ages including 18 months. Negative genetic correlations were estimated between IBK incidence and weight traits for animals in pre-weaning and post-weaning datasets. Genetic correlations among weight measurements were positive, with moderate to high values. Genetic correlations of IBK incidence with tick counts were positive for the pre-weaning and negative for the post-weaning datasets but negative with helminth eggs counts for the pre-weaning dataset and slightly positive for the post-weaning dataset. Genetic correlations between tick and helminth eggs counts were moderate and positive for both datasets. Phenotypic correlations of IBK incidence with helminth eggs per gram of faeces were moderate and positive for both datasets, but were close to zero for both datasets with tick counts. Our results suggest that genetic selection against IBK incidence in tropical cattle is feasible and that calves genetically prone to acquire IBK infection could also be genetically prone to have a slower growth. The positive genetic correlations among weight traits and between tick and helminth eggs counts suggest that they are controlled by common genes (with pleiotropic effects). Genetic correlations between IBK incidence and tick and helminth egg counts were moderate and opposite between pre-weaning and post-weaning datasets, suggesting that the environmental and (or) maternal effects differ between these two growth phases. This preliminary study provides estimated genetic parameters for IBK incidence, which could be used to design selection and breeding programs for tropical adaptation in beef cattle.
2012-01-01
Background Infectious bovine keratoconjunctivitis (IBK) or ‘pinkeye’ is an economically important ocular disease that significantly impacts animal performance. Genetic parameters for IBK infection and its genetic and phenotypic correlations with cattle tick counts, number of helminth (unspecified species) eggs per gram of faeces and growth traits in Australian tropically adapted Bos taurus cattle were estimated. Methods Animals were clinically examined for the presence of IBK infection before and after weaning when the calves were 3 to 6 months and 15 to 18 months old, respectively and were also recorded for tick counts, helminth eggs counts as an indicator of intestinal parasites and live weights at several ages including 18 months. Results Negative genetic correlations were estimated between IBK incidence and weight traits for animals in pre-weaning and post-weaning datasets. Genetic correlations among weight measurements were positive, with moderate to high values. Genetic correlations of IBK incidence with tick counts were positive for the pre-weaning and negative for the post-weaning datasets but negative with helminth eggs counts for the pre-weaning dataset and slightly positive for the post-weaning dataset. Genetic correlations between tick and helminth eggs counts were moderate and positive for both datasets. Phenotypic correlations of IBK incidence with helminth eggs per gram of faeces were moderate and positive for both datasets, but were close to zero for both datasets with tick counts. Conclusions Our results suggest that genetic selection against IBK incidence in tropical cattle is feasible and that calves genetically prone to acquire IBK infection could also be genetically prone to have a slower growth. The positive genetic correlations among weight traits and between tick and helminth eggs counts suggest that they are controlled by common genes (with pleiotropic effects). Genetic correlations between IBK incidence and tick and helminth egg counts were moderate and opposite between pre-weaning and post-weaning datasets, suggesting that the environmental and (or) maternal effects differ between these two growth phases. This preliminary study provides estimated genetic parameters for IBK incidence, which could be used to design selection and breeding programs for tropical adaptation in beef cattle. PMID:22839739
Validating silicon polytrodes with paired juxtacellular recordings: method and dataset
Lopes, Gonçalo; Frazão, João; Nogueira, Joana; Lacerda, Pedro; Baião, Pedro; Aarts, Arno; Andrei, Alexandru; Musa, Silke; Fortunato, Elvira; Barquinha, Pedro; Kampff, Adam R.
2016-01-01
Cross-validating new methods for recording neural activity is necessary to accurately interpret and compare the signals they measure. Here we describe a procedure for precisely aligning two probes for in vivo “paired-recordings” such that the spiking activity of a single neuron is monitored with both a dense extracellular silicon polytrode and a juxtacellular micropipette. Our new method allows for efficient, reliable, and automated guidance of both probes to the same neural structure with micrometer resolution. We also describe a new dataset of paired-recordings, which is available online. We propose that our novel targeting system, and ever expanding cross-validation dataset, will be vital to the development of new algorithms for automatically detecting/sorting single-units, characterizing new electrode materials/designs, and resolving nagging questions regarding the origin and nature of extracellular neural signals. PMID:27306671
Bennie, Marion; Malcolm, William; Marwick, Charis A; Kavanagh, Kimberley; Sneddon, Jean; Nathwani, Dilip
2017-10-01
The better use of new and emerging data streams to understand the epidemiology of infectious disease and to inform and evaluate antimicrobial stewardship improvement programmes is paramount in the global fight against antimicrobial resistance. To create a national informatics platform that synergizes the wealth of disjointed, infection-related health data, building an intelligence capability that allows rapid enquiry, generation of new knowledge and feedback to clinicians and policy makers. A multi-stakeholder community, led by the Scottish Antimicrobial Prescribing Group, secured government funding to deliver a national programme of work centred on three key aspects: (i) technical platform development with record linkage capability across multiple datasets; (ii) a proportionate governance approach to enhance responsiveness; and (iii) generation of new evidence to guide clinical practice. The National Health Service Scotland Infection Intelligence Platform (IIP) is now hosted within the national health data repository to assure resilience and sustainability. New technical solutions include simplified 'data views' of complex, linked datasets and embedded statistical programs to enhance capability. These developments have enabled responsiveness, flexibility and robustness in conducting population-based studies including a focus on intended and unintended effects of antimicrobial stewardship interventions and quantification of infection risk factors and clinical outcomes. We have completed the build and test phase of IIP, overcoming the technical and governance challenges, and produced new capability in infection informatics, generating new evidence for improved clinical practice. This provides a foundation for expansion and opportunity for global collaborations. © The Author 2017. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Alexeev, V. A.; Gordov, E. P.
2016-12-01
Recently initiated collaborative research project is presented. Its main objective is to develop high spatial and temporal resolution datasets for studying the ongoing and future climate changes in Siberia, caused by global and regional processes in the atmosphere and the ocean. This goal will be achieved by using a set of regional and global climate models for the analysis of the mechanisms of climate change and quantitative assessment of changes in key climate variables, including analysis of extreme weather and climate events and their dynamics, evaluation of the frequency, amplitude and the risks caused by the extreme events in the region. The main practical application of the project is to provide experts, stakeholders and the public with quantitative information about the future climate change in Siberia obtained on the base of a computational web- geoinformation platform. The thematic platform will be developed in order to facilitate processing and analysis of high resolution georeferenced datasets that will be delivered and made available to scientific community, policymakes and other end users as a result of the project. Software packages will be developed to implement calculation of various climatological indicators in order to characterize and diagnose climate change and its dynamics, as well as to archive results in digital form of electronic maps (GIS layers). By achieving these goals the project will provide science based tools necessary for developing mitigation measures for adapting to climate change and reducing negative impact on the population and infrastructure of the region. Financial support of the computational web- geoinformation platform prototype development by the RF Ministry of Education and Science under Agreement 14.613.21.0037 (RFMEFI61315X0037) is acknowledged.
#Healthy Selfies: Exploration of Health Topics on Instagram.
Muralidhara, Sachin; Paul, Michael J
2018-06-29
Social media provides a complementary source of information for public health surveillance. The dominate data source for this type of monitoring is the microblogging platform Twitter, which is convenient due to the free availability of public data. Less is known about the utility of other social media platforms, despite their popularity. This work aims to characterize the health topics that are prominently discussed in the image-sharing platform Instagram, as a step toward understanding how this data might be used for public health research. The study uses a topic modeling approach to discover topics in a dataset of 96,426 Instagram posts containing hashtags related to health. We use a polylingual topic model, initially developed for datasets in different natural languages, to model different modalities of data: hashtags, caption words, and image tags automatically extracted using a computer vision tool. We identified 47 health-related topics in the data (kappa=.77), covering ten broad categories: acute illness, alternative medicine, chronic illness and pain, diet, exercise, health care & medicine, mental health, musculoskeletal health and dermatology, sleep, and substance use. The most prevalent topics were related to diet (8,293/96,426; 8.6% of posts) and exercise (7,328/96,426; 7.6% of posts). A large and diverse set of health topics are discussed in Instagram. The extracted image tags were generally too coarse and noisy to be used for identifying posts but were in some cases accurate for identifying images relevant to studying diet and substance use. Instagram shows potential as a source of public health information, though limitations in data collection and metadata availability may limit its use in comparison to platforms like Twitter. ©Sachin Muralidhara, Michael J. Paul. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 29.06.2018.
Web-based visual analysis for high-throughput genomics
2013-01-01
Background Visualization plays an essential role in genomics research by making it possible to observe correlations and trends in large datasets as well as communicate findings to others. Visual analysis, which combines visualization with analysis tools to enable seamless use of both approaches for scientific investigation, offers a powerful method for performing complex genomic analyses. However, there are numerous challenges that arise when creating rich, interactive Web-based visualizations/visual analysis applications for high-throughput genomics. These challenges include managing data flow from Web server to Web browser, integrating analysis tools and visualizations, and sharing visualizations with colleagues. Results We have created a platform simplifies the creation of Web-based visualization/visual analysis applications for high-throughput genomics. This platform provides components that make it simple to efficiently query very large datasets, draw common representations of genomic data, integrate with analysis tools, and share or publish fully interactive visualizations. Using this platform, we have created a Circos-style genome-wide viewer, a generic scatter plot for correlation analysis, an interactive phylogenetic tree, a scalable genome browser for next-generation sequencing data, and an application for systematically exploring tool parameter spaces to find good parameter values. All visualizations are interactive and fully customizable. The platform is integrated with the Galaxy (http://galaxyproject.org) genomics workbench, making it easy to integrate new visual applications into Galaxy. Conclusions Visualization and visual analysis play an important role in high-throughput genomics experiments, and approaches are needed to make it easier to create applications for these activities. Our framework provides a foundation for creating Web-based visualizations and integrating them into Galaxy. Finally, the visualizations we have created using the framework are useful tools for high-throughput genomics experiments. PMID:23758618
NASA Astrophysics Data System (ADS)
Evans, Ben; Wyborn, Lesley; Druken, Kelsey; Richards, Clare; Trenham, Claire; Wang, Jingbo; Rozas Larraondo, Pablo; Steer, Adam; Smillie, Jon
2017-04-01
The National Computational Infrastructure (NCI) facility hosts one of Australia's largest repositories (10+ PBytes) of research data collections spanning datasets from climate, coasts, oceans, and geophysics through to astronomy, bioinformatics, and the social sciences domains. The data are obtained from national and international sources, spanning a wide range of gridded and ungridded (i.e., line surveys, point clouds) data, and raster imagery, as well as diverse coordinate reference projections and resolutions. Rather than managing these data assets as a digital library, whereby users can discover and download files to personal servers (similar to borrowing 'books' from a 'library'), NCI has built an extensive and well-integrated research data platform, the National Environmental Research Data Interoperability Platform (NERDIP, http://nci.org.au/data-collections/nerdip/). The NERDIP architecture enables programmatic access to data via standards-compliant services for high performance data analysis, and provides a flexible cloud-based environment to facilitate the next generation of transdisciplinary scientific research across all data domains. To improve use of modern scalable data infrastructures that are focused on efficient data analysis, the data organisation needs to be carefully managed including performance evaluations of projections and coordinate systems, data encoding standards and formats. A complication is that we have often found multiple domain vocabularies and ontologies are associated with equivalent datasets. It is not practical for individual dataset managers to determine which standards are best to apply to their dataset as this could impact accessibility and interoperability. Instead, they need to work with data custodians across interrelated communities and, in partnership with the data repository, the international scientific community to determine the most useful approach. For the data repository, this approach is essential to enable different disciplines and research communities to invoke new forms of analysis and discovery in an increasingly complex data-rich environment. Driven by the heterogeneity of Earth and environmental datasets, NCI developed a Data Quality/Data Assurance Strategy to ensure consistency is maintained within and across all datasets, as well as functionality testing to ensure smooth interoperability between products, tools, and services. This is particularly so for collections that contain data generated from multiple data acquisition campaigns, often using instruments and models that have evolved over time. By implementing the NCI Data Quality Strategy we have seen progressive improvement in the integration and quality of the datasets across the different subject domains, and through this, the ease by which the users can access data from this major data infrastructure. By both adhering to international standards and also contributing to extensions of these standards, data from the NCI NERDIP platform can be federated with data from other globally distributed data repositories and infrastructures. The NCI approach builds on our experience working with the astronomy and climate science communities, which have been internationally coordinating such interoperability standards within their disciplines for some years. The results of our work so far demonstrate more could be done in the Earth science, solid earth and environmental communities, particularly through establishing better linkages between international/national community efforts such as EPOS, ENVRIplus, EarthCube, AuScope and the Research Data Alliance.
Anguita, Alberto; García-Remesal, Miguel; Graf, Norbert; Maojo, Victor
2016-04-01
Modern biomedical research relies on the semantic integration of heterogeneous data sources to find data correlations. Researchers access multiple datasets of disparate origin, and identify elements-e.g. genes, compounds, pathways-that lead to interesting correlations. Normally, they must refer to additional public databases in order to enrich the information about the identified entities-e.g. scientific literature, published clinical trial results, etc. While semantic integration techniques have traditionally focused on providing homogeneous access to private datasets-thus helping automate the first part of the research, and there exist different solutions for browsing public data, there is still a need for tools that facilitate merging public repositories with private datasets. This paper presents a framework that automatically locates public data of interest to the researcher and semantically integrates it with existing private datasets. The framework has been designed as an extension of traditional data integration systems, and has been validated with an existing data integration platform from a European research project by integrating a private biological dataset with data from the National Center for Biotechnology Information (NCBI). Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Wyborn, Lesley; Car, Nicholas; Evans, Benjamin; Klump, Jens
2016-04-01
Persistent identifiers in the form of a Digital Object Identifier (DOI) are becoming more mainstream, assigned at both the collection and dataset level. For static datasets, this is a relatively straight-forward matter. However, many new data collections are dynamic, with new data being appended, models and derivative products being revised with new data, or the data itself revised as processing methods are improved. Further, because data collections are becoming accessible as services, researchers can log in and dynamically create user-defined subsets for specific research projects: they also can easily mix and match data from multiple collections, each of which can have a complex history. Inevitably extracts from such dynamic data sets underpin scholarly publications, and this presents new challenges. The National Computational Infrastructure (NCI) has been experiencing and making progress towards addressing these issues. The NCI is large node of the Research Data Services initiative (RDS) of the Australian Government's research infrastructure, which currently makes available over 10 PBytes of priority research collections, ranging from geosciences, geophysics, environment, and climate, through to astronomy, bioinformatics, and social sciences. Data are replicated to, or are produced at, NCI and then processed there to higher-level data products or directly analysed. Individual datasets range from multi-petabyte computational models and large volume raster arrays, down to gigabyte size, ultra-high resolution datasets. To facilitate access, maximise reuse and enable integration across the disciplines, datasets have been organized on a platform called the National Environmental Research Data Interoperability Platform (NERDIP). Combined, the NERDIP data collections form a rich and diverse asset for researchers: their co-location and standardization optimises the value of existing data, and forms a new resource to underpin data-intensive Science. New publication procedures require that a persistent identifier (DOI) be provided for the dataset that underpins the publication. Being able to produce these for data extracts from the NCI data node using only DOIs is proving difficult: preserving a copy of each data extract is not possible due to data scale. A proposal is for researchers to use workflows that capture the provenance of each data extraction, including metadata (e.g., version of the dataset used, the query and time of extraction). In parallel, NCI is now working with the NERDIP dataset providers to ensure that the provenance of data publication is also captured in provenance systems including references to previous versions and a history of data appended or modified. This proposed solution would require an enhancement to new scholarly publication procedures whereby the reference to underlying dataset to a scholarly publication would be the persistent identifier of the provenance workflow that created the data extract. In turn, the provenance workflow would itself link to a series of persistent identifiers that, at a minimum, provide complete dataset production transparency and, if required, would facilitate reconstruction of the dataset. Such a solution will require strict adherence to design patterns for provenance representation to ensure that the provenance representation of the workflow does indeed contain information required to deliver dataset generation transparency and a pathway to reconstruction.
Axillary Lymph Node Evaluation Utilizing Convolutional Neural Networks Using MRI Dataset.
Ha, Richard; Chang, Peter; Karcich, Jenika; Mutasa, Simukayi; Fardanesh, Reza; Wynn, Ralph T; Liu, Michael Z; Jambawalikar, Sachin
2018-04-25
The aim of this study is to evaluate the role of convolutional neural network (CNN) in predicting axillary lymph node metastasis, using a breast MRI dataset. An institutional review board (IRB)-approved retrospective review of our database from 1/2013 to 6/2016 identified 275 axillary lymph nodes for this study. Biopsy-proven 133 metastatic axillary lymph nodes and 142 negative control lymph nodes were identified based on benign biopsies (100) and from healthy MRI screening patients (42) with at least 3 years of negative follow-up. For each breast MRI, axillary lymph node was identified on first T1 post contrast dynamic images and underwent 3D segmentation using an open source software platform 3D Slicer. A 32 × 32 patch was then extracted from the center slice of the segmented tumor data. A CNN was designed for lymph node prediction based on each of these cropped images. The CNN consisted of seven convolutional layers and max-pooling layers with 50% dropout applied in the linear layer. In addition, data augmentation and L2 regularization were performed to limit overfitting. Training was implemented using the Adam optimizer, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. Code for this study was written in Python using the TensorFlow module (1.0.0). Experiments and CNN training were done on a Linux workstation with NVIDIA GTX 1070 Pascal GPU. Two class axillary lymph node metastasis prediction models were evaluated. For each lymph node, a final softmax score threshold of 0.5 was used for classification. Based on this, CNN achieved a mean five-fold cross-validation accuracy of 84.3%. It is feasible for current deep CNN architectures to be trained to predict likelihood of axillary lymph node metastasis. Larger dataset will likely improve our prediction model and can potentially be a non-invasive alternative to core needle biopsy and even sentinel lymph node evaluation.
Authoring Tours of Geospatial Data With KML and Google Earth
NASA Astrophysics Data System (ADS)
Barcay, D. P.; Weiss-Malik, M.
2008-12-01
As virtual globes become widely adopted by the general public, the use of geospatial data has expanded greatly. With the popularization of Google Earth and other platforms, GIS systems have become virtual reality platforms. Using these platforms, a casual user can easily explore the world, browse massive data-sets, create powerful 3D visualizations, and share those visualizations with millions of people using the KML language. This technology has raised the bar for professionals and academics alike. It is now expected that studies and projects will be accompanied by compelling, high-quality visualizations. In this new landscape, a presentation of geospatial data can be the most effective form of advertisement for a project: engaging both the general public and the scientific community in a unified interactive experience. On the other hand, merely dumping a dataset into a virtual globe can be a disorienting, alienating experience for many users. To create an effective, far-reaching presentation, an author must take care to make their data approachable to a wide variety of users with varying knowledge of the subject matter, expertise in virtual globes, and attention spans. To that end, we present techniques for creating self-guided interactive tours of data represented in KML and visualized in Google Earth. Using these methods, we provide the ability to move the camera through the world while dynamically varying the content, style, and visibility of the displayed data. Such tours can automatically guide users through massive, complex datasets: engaging a broad user-base, and conveying subtle concepts that aren't immediately apparent when viewing the raw data. To the casual user these techniques result in an extremely compelling experience similar to watching video. Unlike video though, these techniques maintain the rich interactive environment provided by the virtual globe, allowing users to explore the data in detail and to add other data sources to the presentation.
Curriculum Mapping with Academic Analytics in Medical and Healthcare Education.
Komenda, Martin; Víta, Martin; Vaitsis, Christos; Schwarz, Daniel; Pokorná, Andrea; Zary, Nabil; Dušek, Ladislav
2015-01-01
No universal solution, based on an approved pedagogical approach, exists to parametrically describe, effectively manage, and clearly visualize a higher education institution's curriculum, including tools for unveiling relationships inside curricular datasets. We aim to solve the issue of medical curriculum mapping to improve understanding of the complex structure and content of medical education programs. Our effort is based on the long-term development and implementation of an original web-based platform, which supports an outcomes-based approach to medical and healthcare education and is suitable for repeated updates and adoption to curriculum innovations. We adopted data exploration and visualization approaches in the context of medical curriculum innovations in higher education institutions domain. We have developed a robust platform, covering detailed formal metadata specifications down to the level of learning units, interconnections, and learning outcomes, in accordance with Bloom's taxonomy and direct links to a particular biomedical nomenclature. Furthermore, we used selected modeling techniques and data mining methods to generate academic analytics reports from medical curriculum mapping datasets. We present a solution that allows users to effectively optimize a curriculum structure that is described with appropriate metadata, such as course attributes, learning units and outcomes, a standardized vocabulary nomenclature, and a tree structure of essential terms. We present a case study implementation that includes effective support for curriculum reengineering efforts of academics through a comprehensive overview of the General Medicine study program. Moreover, we introduce deep content analysis of a dataset that was captured with the use of the curriculum mapping platform; this may assist in detecting any potentially problematic areas, and hence it may help to construct a comprehensive overview for the subsequent global in-depth medical curriculum inspection. We have proposed, developed, and implemented an original framework for medical and healthcare curriculum innovations and harmonization, including: planning model, mapping model, and selected academic analytics extracted with the use of data mining.
Curriculum Mapping with Academic Analytics in Medical and Healthcare Education
Komenda, Martin; Víta, Martin; Vaitsis, Christos; Schwarz, Daniel; Pokorná, Andrea; Zary, Nabil; Dušek, Ladislav
2015-01-01
Background No universal solution, based on an approved pedagogical approach, exists to parametrically describe, effectively manage, and clearly visualize a higher education institution’s curriculum, including tools for unveiling relationships inside curricular datasets. Objective We aim to solve the issue of medical curriculum mapping to improve understanding of the complex structure and content of medical education programs. Our effort is based on the long-term development and implementation of an original web-based platform, which supports an outcomes-based approach to medical and healthcare education and is suitable for repeated updates and adoption to curriculum innovations. Methods We adopted data exploration and visualization approaches in the context of medical curriculum innovations in higher education institutions domain. We have developed a robust platform, covering detailed formal metadata specifications down to the level of learning units, interconnections, and learning outcomes, in accordance with Bloom’s taxonomy and direct links to a particular biomedical nomenclature. Furthermore, we used selected modeling techniques and data mining methods to generate academic analytics reports from medical curriculum mapping datasets. Results We present a solution that allows users to effectively optimize a curriculum structure that is described with appropriate metadata, such as course attributes, learning units and outcomes, a standardized vocabulary nomenclature, and a tree structure of essential terms. We present a case study implementation that includes effective support for curriculum reengineering efforts of academics through a comprehensive overview of the General Medicine study program. Moreover, we introduce deep content analysis of a dataset that was captured with the use of the curriculum mapping platform; this may assist in detecting any potentially problematic areas, and hence it may help to construct a comprehensive overview for the subsequent global in-depth medical curriculum inspection. Conclusions We have proposed, developed, and implemented an original framework for medical and healthcare curriculum innovations and harmonization, including: planning model, mapping model, and selected academic analytics extracted with the use of data mining. PMID:26624281
Low-loss compact multilayer silicon nitride platform for 3D photonic integrated circuits.
Shang, Kuanping; Pathak, Shibnath; Guan, Binbin; Liu, Guangyao; Yoo, S J B
2015-08-10
We design, fabricate, and demonstrate a silicon nitride (Si(3)N(4)) multilayer platform optimized for low-loss and compact multilayer photonic integrated circuits. The designed platform, with 200 nm thick waveguide core and 700 nm interlayer gap, is compatible for active thermal tuning and applicable to realizing compact photonic devices such as arrayed waveguide gratings (AWGs). We achieve ultra-low loss vertical couplers with 0.01 dB coupling loss, multilayer crossing loss of 0.167 dB at 90° crossing angle, 50 μm bending radius, 100 × 2 μm(2) footprint, lateral misalignment tolerance up to 400 nm, and less than -52 dB interlayer crosstalk at 1550 nm wavelength. Based on the designed platform, we demonstrate a 27 × 32 × 2 multilayer star coupler.
Xi-cam: a versatile interface for data visualization and analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pandolfi, Ronald J.; Allan, Daniel B.; Arenholz, Elke
Xi-cam is an extensible platform for data management, analysis and visualization.Xi-camaims to provide a flexible and extensible approach to synchrotron data treatment as a solution to rising demands for high-volume/high-throughput processing pipelines. The core ofXi-camis an extensible plugin-based graphical user interface platform which provides users with an interactive interface to processing algorithms. Plugins are available for SAXS/WAXS/GISAXS/GIWAXS, tomography and NEXAFS data. WithXi-cam's `advanced' mode, data processing steps are designed as a graph-based workflow, which can be executed live, locally or remotely. Remote execution utilizes high-performance computing or de-localized resources, allowing for the effective reduction of high-throughput data.Xi-cam's plugin-based architecture targetsmore » cross-facility and cross-technique collaborative development, in support of multi-modal analysis.Xi-camis open-source and cross-platform, and available for download on GitHub.« less
Xi-cam: a versatile interface for data visualization and analysis
Pandolfi, Ronald J.; Allan, Daniel B.; Arenholz, Elke; ...
2018-05-31
Xi-cam is an extensible platform for data management, analysis and visualization.Xi-camaims to provide a flexible and extensible approach to synchrotron data treatment as a solution to rising demands for high-volume/high-throughput processing pipelines. The core ofXi-camis an extensible plugin-based graphical user interface platform which provides users with an interactive interface to processing algorithms. Plugins are available for SAXS/WAXS/GISAXS/GIWAXS, tomography and NEXAFS data. WithXi-cam's `advanced' mode, data processing steps are designed as a graph-based workflow, which can be executed live, locally or remotely. Remote execution utilizes high-performance computing or de-localized resources, allowing for the effective reduction of high-throughput data.Xi-cam's plugin-based architecture targetsmore » cross-facility and cross-technique collaborative development, in support of multi-modal analysis.Xi-camis open-source and cross-platform, and available for download on GitHub.« less
Hufnagel, P.; Glandorf, J.; Körting, G.; Jabs, W.; Schweiger-Hufnagel, U.; Hahner, S.; Lubeck, M.; Suckau, D.
2007-01-01
Analysis of complex proteomes often results in long protein lists, but falls short in measuring the validity of identification and quantification results on a greater number of proteins. Biological and technical replicates are mandatory, as is the combination of the MS data from various workflows (gels, 1D-LC, 2D-LC), instruments (TOF/TOF, trap, qTOF or FTMS), and search engines. We describe a database-driven study that combines two workflows, two mass spectrometers, and four search engines with protein identification following a decoy database strategy. The sample was a tryptically digested lysate (10,000 cells) of a human colorectal cancer cell line. Data from two LC-MALDI-TOF/TOF runs and a 2D-LC-ESI-trap run using capillary and nano-LC columns were submitted to the proteomics software platform ProteinScape. The combined MALDI data and the ESI data were searched using Mascot (Matrix Science), Phenyx (GeneBio), ProteinSolver (Bruker and Protagen), and Sequest (Thermo) against a decoy database generated from IPI-human in order to obtain one protein list across all workflows and search engines at a defined maximum false-positive rate of 5%. ProteinScape combined the data to one LC-MALDI and one LC-ESI dataset. The initial separate searches from the two combined datasets generated eight independent peptide lists. These were compiled into an integrated protein list using the ProteinExtractor algorithm. An initial evaluation of the generated data led to the identification of approximately 1200 proteins. Result integration on a peptide level allowed discrimination of protein isoforms that would not have been possible with a mere combination of protein lists.
Analysis of rainfall seasonality from observations and climate models
NASA Astrophysics Data System (ADS)
Pascale, Salvatore; Lucarini, Valerio; Feng, Xue; Porporato, Amilcare; Hasson, Shabeh ul
2015-06-01
Two new indicators of rainfall seasonality based on information entropy, the relative entropy (RE) and the dimensionless seasonality index (DSI), together with the mean annual rainfall, are evaluated on a global scale for recently updated precipitation gridded datasets and for historical simulations from coupled atmosphere-ocean general circulation models. The RE provides a measure of the number of wet months and, for precipitation regimes featuring a distinct wet and dry season, it is directly related to the duration of the wet season. The DSI combines the rainfall intensity with its degree of seasonality and it is an indicator of the extent of the global monsoon region. We show that the RE and the DSI are fairly independent of the time resolution of the precipitation data, thereby allowing objective metrics for model intercomparison and ranking. Regions with different precipitation regimes are classified and characterized in terms of RE and DSI. Comparison of different land observational datasets reveals substantial difference in their local representation of seasonality. It is shown that two-dimensional maps of RE provide an easy way to compare rainfall seasonality from various datasets and to determine areas of interest. Models participating to the Coupled Model Intercomparison Project platform, Phase 5, consistently overestimate the RE over tropical Latin America and underestimate it in West Africa, western Mexico and East Asia. It is demonstrated that positive RE biases in a general circulation model are associated with excessively peaked monthly precipitation fractions, too large during the wet months and too small in the months preceding and following the wet season; negative biases are instead due, in most cases, to an excess of rainfall during the premonsoonal months.
MicroRNA array normalization: an evaluation using a randomized dataset as the benchmark.
Qin, Li-Xuan; Zhou, Qin
2014-01-01
MicroRNA arrays possess a number of unique data features that challenge the assumption key to many normalization methods. We assessed the performance of existing normalization methods using two microRNA array datasets derived from the same set of tumor samples: one dataset was generated using a blocked randomization design when assigning arrays to samples and hence was free of confounding array effects; the second dataset was generated without blocking or randomization and exhibited array effects. The randomized dataset was assessed for differential expression between two tumor groups and treated as the benchmark. The non-randomized dataset was assessed for differential expression after normalization and compared against the benchmark. Normalization improved the true positive rate significantly in the non-randomized data but still possessed a false discovery rate as high as 50%. Adding a batch adjustment step before normalization further reduced the number of false positive markers while maintaining a similar number of true positive markers, which resulted in a false discovery rate of 32% to 48%, depending on the specific normalization method. We concluded the paper with some insights on possible causes of false discoveries to shed light on how to improve normalization for microRNA arrays.
MicroRNA Array Normalization: An Evaluation Using a Randomized Dataset as the Benchmark
Qin, Li-Xuan; Zhou, Qin
2014-01-01
MicroRNA arrays possess a number of unique data features that challenge the assumption key to many normalization methods. We assessed the performance of existing normalization methods using two microRNA array datasets derived from the same set of tumor samples: one dataset was generated using a blocked randomization design when assigning arrays to samples and hence was free of confounding array effects; the second dataset was generated without blocking or randomization and exhibited array effects. The randomized dataset was assessed for differential expression between two tumor groups and treated as the benchmark. The non-randomized dataset was assessed for differential expression after normalization and compared against the benchmark. Normalization improved the true positive rate significantly in the non-randomized data but still possessed a false discovery rate as high as 50%. Adding a batch adjustment step before normalization further reduced the number of false positive markers while maintaining a similar number of true positive markers, which resulted in a false discovery rate of 32% to 48%, depending on the specific normalization method. We concluded the paper with some insights on possible causes of false discoveries to shed light on how to improve normalization for microRNA arrays. PMID:24905456
Example MODIS Global Cloud Optical and Microphysical Properties: Comparisons between Terra and Aqua
NASA Technical Reports Server (NTRS)
Hubanks, P. A.; Platnick, S.; King, M. D.; Ackerman, S. A.; Frey, R. A.
2003-01-01
MODIS observations from the NASA EOS Terra spacecraft (launched in December 1999, 1030 local time equatorial crossing) have provided a unique data set of Earth observations. With the launch of the NASA Aqua spacecraft in May 2002 (1330 local time), two MODIS daytime (sunlit) and nighttime observations are now available in a 24 hour period, allowing for some measure of diurnal variability. We report on an initial analysis of several operational global (Level-3) cloud products from the two platforms. The MODIS atmosphere Level-3 products, which include clear-sky and aerosol products in addition to cloud products, are available as three separate files providing daily, eight-day, and monthly aggregations; each temporal aggregation is spatially aggregated to a 1 degree grid. The files contain approximately 600 statisitical datasets (from simple means and standard deviations to 1 - and 2-dimensional histograms). Operational cloud products include detection (cloud fraction), cloud-top properties, and daytimeonly cloud optical thickness and particle effective radius for both water and ice clouds. We will compare example global Terra and Aqua cloud fraction, optical thickness, and effective radius aggregations.
Applications of the LBA-ECO Metadata Warehouse
NASA Astrophysics Data System (ADS)
Wilcox, L.; Morrell, A.; Griffith, P. C.
2006-05-01
The LBA-ECO Project Office has developed a system to harvest and warehouse metadata resulting from the Large-Scale Biosphere Atmosphere Experiment in Amazonia. The harvested metadata is used to create dynamically generated reports, available at www.lbaeco.org, which facilitate access to LBA-ECO datasets. The reports are generated for specific controlled vocabulary terms (such as an investigation team or a geospatial region), and are cross-linked with one another via these terms. This approach creates a rich contextual framework enabling researchers to find datasets relevant to their research. It maximizes data discovery by association and provides a greater understanding of the scientific and social context of each dataset. For example, our website provides a profile (e.g. participants, abstract(s), study sites, and publications) for each LBA-ECO investigation. Linked from each profile is a list of associated registered dataset titles, each of which link to a dataset profile that describes the metadata in a user-friendly way. The dataset profiles are generated from the harvested metadata, and are cross-linked with associated reports via controlled vocabulary terms such as geospatial region. The region name appears on the dataset profile as a hyperlinked term. When researchers click on this link, they find a list of reports relevant to that region, including a list of dataset titles associated with that region. Each dataset title in this list is hyperlinked to its corresponding dataset profile. Moreover, each dataset profile contains hyperlinks to each associated data file at its home data repository and to publications that have used the dataset. We also use the harvested metadata in administrative applications to assist quality assurance efforts. These include processes to check for broken hyperlinks to data files, automated emails that inform our administrators when critical metadata fields are updated, dynamically generated reports of metadata records that link to datasets with questionable file formats, and dynamically generated region/site coordinate quality assurance reports. These applications are as important as those that facilitate access to information because they help ensure a high standard of quality for the information. This presentation will discuss reports currently in use, provide a technical overview of the system, and discuss plans to extend this system to harvest metadata resulting from the North American Carbon Program by drawing on datasets in many different formats, residing in many thematic data centers and also distributed among hundreds of investigators.
Neoproterozoic low- paleolatitude glacial successions on the Yangtze platform, South China
NASA Astrophysics Data System (ADS)
Dobrzinski, N.; Bahlburg, H.; Stauss, H.; Zhang, Q. R.
2003-04-01
Successions of glaciomarine sediments were deposited on the Yangtze platform (South China) during Neoproterozoic time (between c. 748 Ma and 599 Ma), although the platform was situated in low to intermediate paleolatitudes. Our study focuses on sedimentological and geochemical analyses and on paleoclimate interpretation of Sinian glacial successions on the Yangtze platform. This glacial succession comprises a lower glacial unit of diamictites (Dongshanfeng Fm.), followed by a unit of siliciclastic fine-grained and partly cross bedded sediments (Datangpo Fm.) and another unit of glacial diamictites (Nantuo Fm.). The upper diamictite unit is often covered by limestones (cap carbonates) and overlain by black shales and dolomites (Doushantuo Fm.). Geochemical proxies, e.g. the chemical index of alteration (CIA) and V/Cr, help to identify the environmental conditions, which are associated with climate changes. Finegrained siliciclastic sediments between two units of diamictite reflect interglacial conditions documented by sedimentological structures and our geochemical data (CIA values around 70). V/Cr ratios (begin{math}< 2) show oxic conditions during the time of deposition. Carbon isotope data of carbonate samples from the interglacial unit, the cap carbonate and the carbonates of the overlying Doushantuo Formation provide a temporal record of changes in the carbon isotopic composition of Neoproterozoic seawater. Interglacial carbonates display begin{math}δ13 C values between -2.6 and +1.1 per mill. begin{math}δ13C values between -4.8 and -1.9 per mill characterize the cap carbonate level. In the Doushantuo Formation, an evolution of the carbon isotopic composition from -3.3 to +6.5 per mill is discernible. The increase in begin{math}δ13C in the Doushantuo Formation could be due to an increase in the fractional burial of organic carbon. Recent geochemical work suggests that both continents and oceans were completely ice covered in Neoproterozoic time (the "Snowball-Earth" hypothesis). Results of the carbon isotop analysis are in agreement with similar datasets from other Neoproterozoic successions containing a glacial unit followed by carbonates, but the presence of an interglacial unit inspires doubt about the existence of an entirely frozen planet Earth.
Huang, Chao; Yang, Yang; Chen, Xuetong; Wang, Chao; Li, Yan; Zheng, Chunli; Wang, Yonghua
2017-01-01
Veterinary Herbal Medicine (VHM) is a comprehensive, current, and informative discipline on the utilization of herbs in veterinary practice. Driven by chemistry but progressively directed by pharmacology and the clinical sciences, drug research has contributed more to address the needs for innovative veterinary medicine for curing animal diseases. However, research into veterinary medicine of vegetal origin in the pharmaceutical industry has reduced, owing to questions such as the short of compatibility of traditional natural-product extract libraries with high-throughput screening. Here, we present a cross-species chemogenomic screening platform to dissect the genetic basis of multifactorial diseases and to determine the most suitable points of attack for future veterinary medicines, thereby increasing the number of treatment options. First, based on critically examined pharmacology and text mining, we build a cross-species drug-likeness evaluation approach to screen the lead compounds in veterinary medicines. Second, a specific cross-species target prediction model is developed to infer drug-target connections, with the purpose of understanding how drugs work on the specific targets. Third, we focus on exploring the multiple targets interference effects of veterinary medicines by heterogeneous network convergence and modularization analysis. Finally, we manually integrate a disease pathway to test whether the cross-species chemogenomic platform could uncover the active mechanism of veterinary medicine, which is exemplified by a specific network module. We believe the proposed cross-species chemogenomic platform allows for the systematization of current and traditional knowledge of veterinary medicine and, importantly, for the application of this emerging body of knowledge to the development of new drugs for animal diseases.
Huang, Chao; Yang, Yang; Chen, Xuetong; Wang, Chao; Li, Yan; Zheng, Chunli
2017-01-01
Veterinary Herbal Medicine (VHM) is a comprehensive, current, and informative discipline on the utilization of herbs in veterinary practice. Driven by chemistry but progressively directed by pharmacology and the clinical sciences, drug research has contributed more to address the needs for innovative veterinary medicine for curing animal diseases. However, research into veterinary medicine of vegetal origin in the pharmaceutical industry has reduced, owing to questions such as the short of compatibility of traditional natural-product extract libraries with high-throughput screening. Here, we present a cross-species chemogenomic screening platform to dissect the genetic basis of multifactorial diseases and to determine the most suitable points of attack for future veterinary medicines, thereby increasing the number of treatment options. First, based on critically examined pharmacology and text mining, we build a cross-species drug-likeness evaluation approach to screen the lead compounds in veterinary medicines. Second, a specific cross-species target prediction model is developed to infer drug-target connections, with the purpose of understanding how drugs work on the specific targets. Third, we focus on exploring the multiple targets interference effects of veterinary medicines by heterogeneous network convergence and modularization analysis. Finally, we manually integrate a disease pathway to test whether the cross-species chemogenomic platform could uncover the active mechanism of veterinary medicine, which is exemplified by a specific network module. We believe the proposed cross-species chemogenomic platform allows for the systematization of current and traditional knowledge of veterinary medicine and, importantly, for the application of this emerging body of knowledge to the development of new drugs for animal diseases. PMID:28915268
Luo, Jake; Apperson-Hansen, Carolyn; Pelfrey, Clara M; Zhang, Guo-Qiang
2014-11-30
Cross-institutional cross-disciplinary collaboration has become a trend as researchers move toward building more productive and innovative teams for scientific research. Research collaboration is significantly changing the organizational structure and strategies used in the clinical and translational science domain. However, due to the obstacles of diverse administrative structures, differences in area of expertise, and communication barriers, establishing and managing a cross-institutional research project is still a challenging task. We address these challenges by creating an integrated informatics platform to reduce the barriers to biomedical research collaboration. The Request Management System (RMS) is an informatics infrastructure designed to transform a patchwork of expertise and resources into an integrated support network. The RMS facilitates investigators' initiation of new collaborative projects and supports the management of the collaboration process. In RMS, experts and their knowledge areas are categorized and managed structurally to provide consistent service. A role-based collaborative workflow is tightly integrated with domain experts and services to streamline and monitor the life-cycle of a research project. The RMS has so far tracked over 1,500 investigators with over 4,800 tasks. The research network based on the data collected in RMS illustrated that the investigators' collaborative projects increased close to 3 times from 2009 to 2012. Our experience with RMS indicates that the platform reduces barriers for cross-institutional collaboration of biomedical research projects. Building a new generation of infrastructure to enhance cross-disciplinary and multi-institutional collaboration has become an important yet challenging task. In this paper, we share the experience of developing and utilizing a collaborative project management system. The results of this study demonstrate that a web-based integrated informatics platform can facilitate and increase research interactions among investigators.
Optimizing disk registration algorithms for nanobeam electron diffraction strain mapping
Pekin, Thomas C.; Gammer, Christoph; Ciston, Jim; ...
2017-01-28
Scanning nanobeam electron diffraction strain mapping is a technique by which the positions of diffracted disks sampled at the nanoscale over a crystalline sample can be used to reconstruct a strain map over a large area. However, it is important that the disk positions are measured accurately, as their positions relative to a reference are directly used to calculate strain. Here in this study, we compare several correlation methods using both simulated and experimental data in order to directly probe susceptibility to measurement error due to non-uniform diffracted disk illumination structure. We found that prefiltering the diffraction patterns with amore » Sobel filter before performing cross correlation or performing a square-root magnitude weighted phase correlation returned the best results when inner disk structure was present. Lastly, we have tested these methods both on simulated datasets, and experimental data from unstrained silicon as well as a twin grain boundary in 304 stainless steel.« less
Task decomposition for a multilimbed robot to work in reachable but unorientable space
NASA Technical Reports Server (NTRS)
Su, Chau; Zheng, Yuan F.
1991-01-01
Robot manipulators installed on legged mobile platforms are suggested for enlarging robot workspace. To plan the motion of such a system, the arm-platform motion coordination problem is raised, and a task decomposition is proposed to solve the problem. A given task described by the destination position and orientation of the end effector is decomposed into subtasks for arm manipulation and for platform configuration, respectively. The former is defined as the end-effector position and orientation with respect to the platform, and the latter as the platform position and orientation in the base coordinates. Three approaches are proposed for the task decomposition. The approaches are also evaluated in terms of the displacements, from which an optimal approach can be selected.
Cross-Platform Learning: On the Nature of Children's Learning from Multiple Media Platforms
ERIC Educational Resources Information Center
Fisch, Shalom M.
2013-01-01
It is increasingly common for an educational media project to span several media platforms (e.g., TV, Web, hands-on materials), assuming that the benefits of learning from multiple media extend beyond those gained from one medium alone. Yet research typically has investigated learning from a single medium in isolation. This paper reviews several…
Evaluation of nine popular de novo assemblers in microbial genome assembly.
Forouzan, Esmaeil; Maleki, Masoumeh Sadat Mousavi; Karkhane, Ali Asghar; Yakhchali, Bagher
2017-12-01
Next generation sequencing (NGS) technologies are revolutionizing biology, with Illumina being the most popular NGS platform. Short read assembly is a critical part of most genome studies using NGS. Hence, in this study, the performance of nine well-known assemblers was evaluated in the assembly of seven different microbial genomes. Effect of different read coverage and k-mer parameters on the quality of the assembly were also evaluated on both simulated and actual read datasets. Our results show that the performance of assemblers on real and simulated datasets could be significantly different, mainly because of coverage bias. According to outputs on actual read datasets, for all studied read coverages (of 7×, 25× and 100×), SPAdes and IDBA-UD clearly outperformed other assemblers based on NGA50 and accuracy metrics. Velvet is the most conservative assembler with the lowest NGA50 and error rate. Copyright © 2017. Published by Elsevier B.V.
Image Harvest: an open-source platform for high-throughput plant image processing and analysis.
Knecht, Avi C; Campbell, Malachy T; Caprez, Adam; Swanson, David R; Walia, Harkamal
2016-05-01
High-throughput plant phenotyping is an effective approach to bridge the genotype-to-phenotype gap in crops. Phenomics experiments typically result in large-scale image datasets, which are not amenable for processing on desktop computers, thus creating a bottleneck in the image-analysis pipeline. Here, we present an open-source, flexible image-analysis framework, called Image Harvest (IH), for processing images originating from high-throughput plant phenotyping platforms. Image Harvest is developed to perform parallel processing on computing grids and provides an integrated feature for metadata extraction from large-scale file organization. Moreover, the integration of IH with the Open Science Grid provides academic researchers with the computational resources required for processing large image datasets at no cost. Image Harvest also offers functionalities to extract digital traits from images to interpret plant architecture-related characteristics. To demonstrate the applications of these digital traits, a rice (Oryza sativa) diversity panel was phenotyped and genome-wide association mapping was performed using digital traits that are used to describe different plant ideotypes. Three major quantitative trait loci were identified on rice chromosomes 4 and 6, which co-localize with quantitative trait loci known to regulate agronomically important traits in rice. Image Harvest is an open-source software for high-throughput image processing that requires a minimal learning curve for plant biologists to analyzephenomics datasets. © The Author 2016. Published by Oxford University Press on behalf of the Society for Experimental Biology.
Huang, Zhenzhen; Duan, Huilong; Li, Haomin
2015-01-01
Large-scale human cancer genomics projects, such as TCGA, generated large genomics data for further study. Exploring and mining these data to obtain meaningful analysis results can help researchers find potential genomics alterations that intervene the development and metastasis of tumors. We developed a web-based gene analysis platform, named TCGA4U, which used statistics methods and models to help translational investigators explore, mine and visualize human cancer genomic characteristic information from the TCGA datasets. Furthermore, through Gene Ontology (GO) annotation and clinical data integration, the genomic data were transformed into biological process, molecular function, cellular component and survival curves to help researchers identify potential driver genes. Clinical researchers without expertise in data analysis will benefit from such a user-friendly genomic analysis platform.
Bottini, Silvia; Hamouda-Tekaya, Nedra; Tanasa, Bogdan; Zaragosi, Laure-Emmanuelle; Grandjean, Valerie; Repetto, Emanuela; Trabucchi, Michele
2017-05-19
Experimental evidence indicates that about 60% of miRNA-binding activity does not follow the canonical rule about the seed matching between miRNA and target mRNAs, but rather a non-canonical miRNA targeting activity outside the seed or with a seed-like motifs. Here, we propose a new unbiased method to identify canonical and non-canonical miRNA-binding sites from peaks identified by Ago2 Cross-Linked ImmunoPrecipitation associated to high-throughput sequencing (CLIP-seq). Since the quality of peaks is of pivotal importance for the final output of the proposed method, we provide a comprehensive benchmarking of four peak detection programs, namely CIMS, PIPE-CLIP, Piranha and Pyicoclip, on four publicly available Ago2-HITS-CLIP datasets and one unpublished in-house Ago2-dataset in stem cells. We measured the sensitivity, the specificity and the position accuracy toward miRNA binding sites identification, and the agreement with TargetScan. Secondly, we developed a new pipeline, called miRBShunter, to identify canonical and non-canonical miRNA-binding sites based on de novo motif identification from Ago2 peaks and prediction of miRNA::RNA heteroduplexes. miRBShunter was tested and experimentally validated on the in-house Ago2-dataset and on an Ago2-PAR-CLIP dataset in human stem cells. Overall, we provide guidelines to choose a suitable peak detection program and a new method for miRNA-target identification. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Bottini, Silvia; Hamouda-Tekaya, Nedra; Tanasa, Bogdan; Zaragosi, Laure-Emmanuelle; Grandjean, Valerie; Repetto, Emanuela
2017-01-01
Abstract Experimental evidence indicates that about 60% of miRNA-binding activity does not follow the canonical rule about the seed matching between miRNA and target mRNAs, but rather a non-canonical miRNA targeting activity outside the seed or with a seed-like motifs. Here, we propose a new unbiased method to identify canonical and non-canonical miRNA-binding sites from peaks identified by Ago2 Cross-Linked ImmunoPrecipitation associated to high-throughput sequencing (CLIP-seq). Since the quality of peaks is of pivotal importance for the final output of the proposed method, we provide a comprehensive benchmarking of four peak detection programs, namely CIMS, PIPE-CLIP, Piranha and Pyicoclip, on four publicly available Ago2-HITS-CLIP datasets and one unpublished in-house Ago2-dataset in stem cells. We measured the sensitivity, the specificity and the position accuracy toward miRNA binding sites identification, and the agreement with TargetScan. Secondly, we developed a new pipeline, called miRBShunter, to identify canonical and non-canonical miRNA-binding sites based on de novo motif identification from Ago2 peaks and prediction of miRNA::RNA heteroduplexes. miRBShunter was tested and experimentally validated on the in-house Ago2-dataset and on an Ago2-PAR-CLIP dataset in human stem cells. Overall, we provide guidelines to choose a suitable peak detection program and a new method for miRNA-target identification. PMID:28108660
Prediction and Identification of Krüppel-Like Transcription Factors by Machine Learning Method.
Liao, Zhijun; Wang, Xinrui; Chen, Xingyong; Zou, Quan
2017-01-01
The Krüppel-like factors (KLFs) are a family of containing Zn finger(ZF) motif transcription factors with 18 members in human genome, among them, KLF18 is predicted by bioinformatics. KLFs possess various physiological function involving in a number of cancers and other diseases. Here we perform a binary-class classification of KLFs and non-KLFs by machine learning methods. The protein sequences of KLFs and non-KLFs were searched from UniProt and randomly separate them into training dataset(containing positive and negative sequences) and test dataset(containing only negative sequences), after extracting the 188-dimensional(188D) feature vectors we carry out category with four classifiers(GBDT, libSVM, RF, and k-NN). On the human KLFs, we further dig into the evolutionary relationship and motif distribution, and finally we analyze the conserved amino acid residue of three zinc fingers. The classifier model from training dataset were well constructed, and the highest specificity(Sp) was 99.83% from a library for support vector machine(libSVM) and all the correctly classified rates were over 70% for 10-fold cross-validation on test dataset. The 18 human KLFs can be further divided into 7 groups and the zinc finger domains were located at the carboxyl terminus, and many conserved amino acid residues including Cysteine and Histidine, and the span and interval between them were consistent in the three ZF domains. Two classification models for KLFs prediction have been built by novel machine learning methods. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
e-Science platform for translational biomedical imaging research: running, statistics, and analysis
NASA Astrophysics Data System (ADS)
Wang, Tusheng; Yang, Yuanyuan; Zhang, Kai; Wang, Mingqing; Zhao, Jun; Xu, Lisa; Zhang, Jianguo
2015-03-01
In order to enable multiple disciplines of medical researchers, clinical physicians and biomedical engineers working together in a secured, efficient, and transparent cooperative environment, we had designed an e-Science platform for biomedical imaging research and application cross multiple academic institutions and hospitals in Shanghai and presented this work in SPIE Medical Imaging conference held in San Diego in 2012. In past the two-years, we implemented a biomedical image chain including communication, storage, cooperation and computing based on this e-Science platform. In this presentation, we presented the operating status of this system in supporting biomedical imaging research, analyzed and discussed results of this system in supporting multi-disciplines collaboration cross-multiple institutions.
Cross-validation pitfalls when selecting and assessing regression and classification models.
Krstajic, Damjan; Buturovic, Ljubomir J; Leahy, David E; Thomas, Simon
2014-03-29
We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches. We describe in detail an algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and we define a repeated nested cross-validation algorithm for model assessment. As regards variable selection and parameter tuning we define two algorithms (repeated grid-search cross-validation and double cross-validation), and provide arguments for using the repeated grid-search in the general case. We show results of our algorithms on seven QSAR datasets. The variation of the prediction performance, which is the result of choosing different splits of the dataset in V-fold cross-validation, needs to be taken into account when selecting and assessing classification and regression models. We demonstrate the importance of repeating cross-validation when selecting an optimal model, as well as the importance of repeating nested cross-validation when assessing a prediction error.
Herrinton, Lisa J; Liu, Liyan; Altschuler, Andrea; Dell, Richard; Rabrenovich, Violeta; Compton-Phillips, Amy L
2015-01-01
The cost to build and to maintain traditional registries for many dire, complex, low-frequency conditions is prohibitive. The authors used accessible technology to develop a platform that would generate miniregistries (small, routinely updated datasets) for surveillance, to identify patients who were missing elected utilization and to influence clinicians to change practices to improve care. The platform, tested in 5 medical specialty departments, enabled the specialists to rapidly and effectively communicate clinical questions, knowledge of disease, clinical workflows, and improve opportunities. Each miniregistry required 1 to 2 hours of collaboration by a specialist. Turnaround was 1 to 14 days.
Validating silicon polytrodes with paired juxtacellular recordings: method and dataset.
Neto, Joana P; Lopes, Gonçalo; Frazão, João; Nogueira, Joana; Lacerda, Pedro; Baião, Pedro; Aarts, Arno; Andrei, Alexandru; Musa, Silke; Fortunato, Elvira; Barquinha, Pedro; Kampff, Adam R
2016-08-01
Cross-validating new methods for recording neural activity is necessary to accurately interpret and compare the signals they measure. Here we describe a procedure for precisely aligning two probes for in vivo "paired-recordings" such that the spiking activity of a single neuron is monitored with both a dense extracellular silicon polytrode and a juxtacellular micropipette. Our new method allows for efficient, reliable, and automated guidance of both probes to the same neural structure with micrometer resolution. We also describe a new dataset of paired-recordings, which is available online. We propose that our novel targeting system, and ever expanding cross-validation dataset, will be vital to the development of new algorithms for automatically detecting/sorting single-units, characterizing new electrode materials/designs, and resolving nagging questions regarding the origin and nature of extracellular neural signals. Copyright © 2016 the American Physiological Society.
Bias correction for selecting the minimal-error classifier from many machine learning models.
Ding, Ying; Tang, Shaowu; Liao, Serena G; Jia, Jia; Oesterreich, Steffi; Lin, Yan; Tseng, George C
2014-11-15
Supervised machine learning is commonly applied in genomic research to construct a classifier from the training data that is generalizable to predict independent testing data. When test datasets are not available, cross-validation is commonly used to estimate the error rate. Many machine learning methods are available, and it is well known that no universally best method exists in general. It has been a common practice to apply many machine learning methods and report the method that produces the smallest cross-validation error rate. Theoretically, such a procedure produces a selection bias. Consequently, many clinical studies with moderate sample sizes (e.g. n = 30-60) risk reporting a falsely small cross-validation error rate that could not be validated later in independent cohorts. In this article, we illustrated the probabilistic framework of the problem and explored the statistical and asymptotic properties. We proposed a new bias correction method based on learning curve fitting by inverse power law (IPL) and compared it with three existing methods: nested cross-validation, weighted mean correction and Tibshirani-Tibshirani procedure. All methods were compared in simulation datasets, five moderate size real datasets and two large breast cancer datasets. The result showed that IPL outperforms the other methods in bias correction with smaller variance, and it has an additional advantage to extrapolate error estimates for larger sample sizes, a practical feature to recommend whether more samples should be recruited to improve the classifier and accuracy. An R package 'MLbias' and all source files are publicly available. tsenglab.biostat.pitt.edu/software.htm. ctseng@pitt.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers.
Teodoro, Douglas; Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio
2018-01-01
The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers
Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio
2018-01-01
The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms. PMID:29293556
Bayesian Network Webserver: a comprehensive tool for biological network modeling.
Ziebarth, Jesse D; Bhattacharya, Anindya; Cui, Yan
2013-11-01
The Bayesian Network Webserver (BNW) is a platform for comprehensive network modeling of systems genetics and other biological datasets. It allows users to quickly and seamlessly upload a dataset, learn the structure of the network model that best explains the data and use the model to understand relationships between network variables. Many datasets, including those used to create genetic network models, contain both discrete (e.g. genotype) and continuous (e.g. gene expression traits) variables, and BNW allows for modeling hybrid datasets. Users of BNW can incorporate prior knowledge during structure learning through an easy-to-use structural constraint interface. After structure learning, users are immediately presented with an interactive network model, which can be used to make testable hypotheses about network relationships. BNW, including a downloadable structure learning package, is available at http://compbio.uthsc.edu/BNW. (The BNW interface for adding structural constraints uses HTML5 features that are not supported by current version of Internet Explorer. We suggest using other browsers (e.g. Google Chrome or Mozilla Firefox) when accessing BNW). ycui2@uthsc.edu. Supplementary data are available at Bioinformatics online.
Computational biology in the cloud: methods and new insights from computing at scale.
Kasson, Peter M
2013-01-01
The past few years have seen both explosions in the size of biological data sets and the proliferation of new, highly flexible on-demand computing capabilities. The sheer amount of information available from genomic and metagenomic sequencing, high-throughput proteomics, experimental and simulation datasets on molecular structure and dynamics affords an opportunity for greatly expanded insight, but it creates new challenges of scale for computation, storage, and interpretation of petascale data. Cloud computing resources have the potential to help solve these problems by offering a utility model of computing and storage: near-unlimited capacity, the ability to burst usage, and cheap and flexible payment models. Effective use of cloud computing on large biological datasets requires dealing with non-trivial problems of scale and robustness, since performance-limiting factors can change substantially when a dataset grows by a factor of 10,000 or more. New computing paradigms are thus often needed. The use of cloud platforms also creates new opportunities to share data, reduce duplication, and to provide easy reproducibility by making the datasets and computational methods easily available.
Milešević, Jelena; Samaniego, Lourdes; Kiely, Mairead; Glibetić, Maria; Roe, Mark; Finglas, Paul
2018-02-01
A review of national nutrition surveys from 2000 to date, demonstrated high prevalence of vitamin D intakes below the EFSA Adequate Intake (AI) (<15μg/d vitamin D) in adults across Europe. Dietary assessment and modelling are required to monitor efficacy and safety of ongoing strategic vitamin D fortification. To support these studies, a specialized vitamin D food composition dataset, based on EuroFIR standards, was compiled. The FoodEXplorer™ tool was used to retrieve well documented analytical data for vitamin D and arrange the data into two datasets - European (8 European countries, 981 data values) and US (1836 data values). Data were classified, using the LanguaL™, FoodEX2 and ODIN classification systems and ranked according to quality criteria. Significant differences in the content, quality of data values, missing data on vitamin D 2 and 25(OH)D 3 and documentation of analytical methods were observed. The dataset is available through the EuroFIR platform. Copyright © 2017 Elsevier Ltd. All rights reserved.
2014-01-01
Background Numerous inflammation-related pathways have been shown to play important roles in atherogenesis. Rapid and efficient assessment of the relative influence of each of those pathways is a challenge in the era of “omics” data generation. The aim of the present work was to develop a network model of inflammation-related molecular pathways underlying vascular disease to assess the degree of translatability of preclinical molecular data to the human clinical setting. Methods We constructed and evaluated the Vascular Inflammatory Processes Network (V-IPN), a model representing a collection of vascular processes modulated by inflammatory stimuli that lead to the development of atherosclerosis. Results Utilizing the V-IPN as a platform for biological discovery, we have identified key vascular processes and mechanisms captured by gene expression profiling data from four independent datasets from human endothelial cells (ECs) and human and murine intact vessels. Primary ECs in culture from multiple donors revealed a richer mapping of mechanisms identified by the V-IPN compared to an immortalized EC line. Furthermore, an evaluation of gene expression datasets from aortas of old ApoE-/- mice (78 weeks) and human coronary arteries with advanced atherosclerotic lesions identified significant commonalities in the two species, as well as several mechanisms specific to human arteries that are consistent with the development of unstable atherosclerotic plaques. Conclusions We have generated a new biological network model of atherogenic processes that demonstrates the power of network analysis to advance integrative, systems biology-based knowledge of cross-species translatability, plaque development and potential mechanisms leading to plaque instability. PMID:24965703
An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Fang, Yu-Hong; Zhao, Yu-Jun; Zhang, Ming
2016-01-01
We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments on Yeast and Human datasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the Yeast dataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.
The impact of presumed consent legislation on cadaveric organ donation: a cross-country study.
Abadie, Alberto; Gay, Sebastien
2006-07-01
In the U.S., Great Britain and in many other countries, the gap between the demand and the supply of human organs for transplantation is on the rise, despite the efforts of governments and health agencies to promote donor registration. In some countries of continental Europe, however, cadaveric organ procurement is based on the principle of presumed consent. Under presumed consent legislation, a deceased individual is classified as a potential donor in absence of explicit opposition to donation before death. This article analyzes the impact of presumed consent laws on donation rates. For this purpose, we construct a dataset on organ donation rates and potential factors affecting organ donation for 22 countries over a 10-year period. We find that while differences in other determinants of organ donation explain much of the variation in donation rates, after controlling for those determinants presumed consent legislation has a positive and sizeable effect on organ donation rates. We use the panel structure of our dataset to test and reject the hypothesis that unmeasured determinants of organ donation rates confound our empirical results.
Inner Core Tomography Under Africa
NASA Astrophysics Data System (ADS)
Irving, J. C. E.
2014-12-01
Hemispherical structure in the inner core has been observed using both normal mode and body wave data, but the more regional scale properties of the inner core are still the subject of ongoing debate. The nature of the vertical boundary regions between the eastern and western hemispheres will be an important constraint on dynamical processes at work in the inner core. With limited data available, earlier inner core studies defined each boundary using one line of longitude, but this may not be a sufficient description for what could be one of the inner core's most heterogeneous regions. Here, I present a large, hand-picked dataset of PKPbc-PKPdf differential travel times which sample the inner core under Africa, where the proposed position of one hemisphere boundary is located. The dataset contains polar, intermediate and equatorial rays through the inner core, and the presence of crossing raypaths makes regional-scale tomography of the inner core feasible. I invert the data to find regional variations in inner core anisotropy under different parts of Africa, and present both anisotropy and voigt isotropic velocity variations of this important portion of the inner core.
Glover, Jason; Man, Tsz-Kwong; Barkauskas, Donald A; Hall, David; Tello, Tanya; Sullivan, Mary Beth; Gorlick, Richard; Janeway, Katherine; Grier, Holcombe; Lau, Ching; Toretsky, Jeffrey A; Borinstein, Scott C; Khanna, Chand; Fan, Timothy M
2017-01-01
The prospective banking of osteosarcoma tissue samples to promote research endeavors has been realized through the establishment of a nationally centralized biospecimen repository, the Children's Oncology Group (COG) biospecimen bank located at the Biopathology Center (BPC)/Nationwide Children's Hospital in Columbus, Ohio. Although the physical inventory of osteosarcoma biospecimens is substantive (>15,000 sample specimens), the nature of these resources remains exhaustible. Despite judicious allocation of these high-value biospecimens for conducting sarcoma-related research, a deeper understanding of osteosarcoma biology, in particular metastases, remains unrealized. In addition the identification and development of novel diagnostics and effective therapeutics remain elusive. The QuadW-COG Childhood Sarcoma Biostatistics and Annotation Office (CSBAO) has developed the High Dimensional Data (HDD) platform to complement the existing physical inventory and to promote in silico hypothesis testing in sarcoma biology. The HDD is a relational biologic database derived from matched osteosarcoma biospecimens in which diverse experimental readouts have been generated and digitally deposited. As proof-of-concept, we demonstrate that the HDD platform can be utilized to address previously unrealized biologic questions though the systematic juxtaposition of diverse datasets derived from shared biospecimens. The continued population of the HDD platform with high-value, high-throughput and mineable datasets allows a shared and reusable resource for researchers, both experimentalists and bioinformatics investigators, to propose and answer questions in silico that advance our understanding of osteosarcoma biology.
VASIR: An Open-Source Research Platform for Advanced Iris Recognition Technologies.
Lee, Yooyoung; Micheals, Ross J; Filliben, James J; Phillips, P Jonathon
2013-01-01
The performance of iris recognition systems is frequently affected by input image quality, which in turn is vulnerable to less-than-optimal conditions due to illuminations, environments, and subject characteristics (e.g., distance, movement, face/body visibility, blinking, etc.). VASIR (Video-based Automatic System for Iris Recognition) is a state-of-the-art NIST-developed iris recognition software platform designed to systematically address these vulnerabilities. We developed VASIR as a research tool that will not only provide a reference (to assess the relative performance of alternative algorithms) for the biometrics community, but will also advance (via this new emerging iris recognition paradigm) NIST's measurement mission. VASIR is designed to accommodate both ideal (e.g., classical still images) and less-than-ideal images (e.g., face-visible videos). VASIR has three primary modules: 1) Image Acquisition 2) Video Processing, and 3) Iris Recognition. Each module consists of several sub-components that have been optimized by use of rigorous orthogonal experiment design and analysis techniques. We evaluated VASIR performance using the MBGC (Multiple Biometric Grand Challenge) NIR (Near-Infrared) face-visible video dataset and the ICE (Iris Challenge Evaluation) 2005 still-based dataset. The results showed that even though VASIR was primarily developed and optimized for the less-constrained video case, it still achieved high verification rates for the traditional still-image case. For this reason, VASIR may be used as an effective baseline for the biometrics community to evaluate their algorithm performance, and thus serves as a valuable research platform.
VASIR: An Open-Source Research Platform for Advanced Iris Recognition Technologies
Lee, Yooyoung; Micheals, Ross J; Filliben, James J; Phillips, P Jonathon
2013-01-01
The performance of iris recognition systems is frequently affected by input image quality, which in turn is vulnerable to less-than-optimal conditions due to illuminations, environments, and subject characteristics (e.g., distance, movement, face/body visibility, blinking, etc.). VASIR (Video-based Automatic System for Iris Recognition) is a state-of-the-art NIST-developed iris recognition software platform designed to systematically address these vulnerabilities. We developed VASIR as a research tool that will not only provide a reference (to assess the relative performance of alternative algorithms) for the biometrics community, but will also advance (via this new emerging iris recognition paradigm) NIST’s measurement mission. VASIR is designed to accommodate both ideal (e.g., classical still images) and less-than-ideal images (e.g., face-visible videos). VASIR has three primary modules: 1) Image Acquisition 2) Video Processing, and 3) Iris Recognition. Each module consists of several sub-components that have been optimized by use of rigorous orthogonal experiment design and analysis techniques. We evaluated VASIR performance using the MBGC (Multiple Biometric Grand Challenge) NIR (Near-Infrared) face-visible video dataset and the ICE (Iris Challenge Evaluation) 2005 still-based dataset. The results showed that even though VASIR was primarily developed and optimized for the less-constrained video case, it still achieved high verification rates for the traditional still-image case. For this reason, VASIR may be used as an effective baseline for the biometrics community to evaluate their algorithm performance, and thus serves as a valuable research platform. PMID:26401431
Inertial particle focusing in serpentine channels on a centrifugal platform
NASA Astrophysics Data System (ADS)
Shamloo, Amir; Mashhadian, Ali
2018-01-01
Inertial particle focusing as a powerful passive method is widely used in diagnostic test devices. It is common to use a curved channel in this approach to achieve particle focusing through balancing of the secondary flow drag force and the inertial lift force. Here, we present a focusing device on a disk based on the interaction of secondary flow drag force, inertial lift force, and centrifugal forces to focus particles. By choosing a channel whose cross section has a low aspect ratio, the mixing effect of the secondary flow becomes negligible. To calculate inertial lift force, which is exerted on the particle from the fluid, the interaction between the fluid and particle is investigated accurately through implementation of 3D Direct Numerical Solution (DNS) method. The particle focusing in three serpentine channels with different corner angles of 75°, 85°, and 90° is investigated for three polystyrene particles with diameters of 8 μm, 9.9 μm, and 13 μm. To show the simulation reliability, the results obtained from the simulations of two examples, namely, particle focusing and centrifugal platform, are verified against experimental counterparts. The effects of angular velocity of disk on the fluid velocity and on the focusing parameters are studied. Fluid velocity in a channel with corner angle of 75° is greater than two other channels. Furthermore, the particle equilibrium positions at the cross section of channel are obtained at the outlet. There are two equilibrium positions located at the centers of the long walls. Finally, the effect of particle density on the focusing length is investigated. A particle with a higher density and larger diameter is focused in a shorter length of the channel compared to its counterpart with a lower density and shorter diameter. The channel with a corner angle of 90° has better focusing efficiency compared to other channels. This design focuses particles without using any pump or sheath flow. Inertial particle focusing on centrifugal platform, which rarely has been studied, can be used for a wide range of diagnostic lab-on-a-disk device.
Ahern, Thomas P; Beck, Andrew H; Rosner, Bernard A; Glass, Ben; Frieling, Gretchen; Collins, Laura C; Tamimi, Rulla M
2017-05-01
Computational pathology platforms incorporate digital microscopy with sophisticated image analysis to permit rapid, continuous measurement of protein expression. We compared two computational pathology platforms on their measurement of breast tumour oestrogen receptor (ER) and progesterone receptor (PR) expression. Breast tumour microarrays from the Nurses' Health Study were stained for ER (n=592) and PR (n=187). One expert pathologist scored cases as positive if ≥1% of tumour nuclei exhibited stain. ER and PR were then measured with the Definiens Tissue Studio (automated) and Aperio Digital Pathology (user-supervised) platforms. Platform-specific measurements were compared using boxplots, scatter plots and correlation statistics. Classification of ER and PR positivity by platform-specific measurements was evaluated with areas under receiver operating characteristic curves (AUC) from univariable logistic regression models, using expert pathologist classification as the standard. Both platforms showed considerable overlap in continuous measurements of ER and PR between positive and negative groups classified by expert pathologist. Platform-specific measurements were strongly and positively correlated with one another (r≥0.77). The user-supervised Aperio workflow performed slightly better than the automated Definiens workflow at classifying ER positivity (AUC Aperio =0.97; AUC Definiens =0.90; difference=0.07, 95% CI 0.05 to 0.09) and PR positivity (AUC Aperio =0.94; AUC Definiens =0.87; difference=0.07, 95% CI 0.03 to 0.12). Paired hormone receptor expression measurements from two different computational pathology platforms agreed well with one another. The user-supervised workflow yielded better classification accuracy than the automated workflow. Appropriately validated computational pathology algorithms enrich molecular epidemiology studies with continuous protein expression data and may accelerate tumour biomarker discovery. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
A remark on copy number variation detection methods.
Li, Shuo; Dou, Xialiang; Gao, Ruiqi; Ge, Xinzhou; Qian, Minping; Wan, Lin
2018-01-01
Copy number variations (CNVs) are gain and loss of DNA sequence of a genome. High throughput platforms such as microarrays and next generation sequencing technologies (NGS) have been applied for genome wide copy number losses. Although progress has been made in both approaches, the accuracy and consistency of CNV calling from the two platforms remain in dispute. In this study, we perform a deep analysis on copy number losses on 254 human DNA samples, which have both SNP microarray data and NGS data publicly available from Hapmap Project and 1000 Genomes Project respectively. We show that the copy number losses reported from Hapmap Project and 1000 Genome Project only have < 30% overlap, while these reports are required to have cross-platform (e.g. PCR, microarray and high-throughput sequencing) experimental supporting by their corresponding projects, even though state-of-art calling methods were employed. On the other hand, copy number losses are found directly from HapMap microarray data by an accurate algorithm, i.e. CNVhac, almost all of which have lower read mapping depth in NGS data; furthermore, 88% of which can be supported by the sequences with breakpoint in NGS data. Our results suggest the ability of microarray calling CNVs and the possible introduction of false negatives from the unessential requirement of the additional cross-platform supporting. The inconsistency of CNV reports from Hapmap Project and 1000 Genomes Project might result from the inadequate information containing in microarray data, the inconsistent detection criteria, or the filtration effect of cross-platform supporting. The statistical test on CNVs called from CNVhac show that the microarray data can offer reliable CNV reports, and majority of CNV candidates can be confirmed by raw sequences. Therefore, the CNV candidates given by a good caller could be highly reliable without cross-platform supporting, so additional experimental information should be applied in need instead of necessarily.
A Cross-Platform Infrastructure for Scalable Runtime Application Performance Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jack Dongarra; Shirley Moore; Bart Miller, Jeffrey Hollingsworth
2005-03-15
The purpose of this project was to build an extensible cross-platform infrastructure to facilitate the development of accurate and portable performance analysis tools for current and future high performance computing (HPC) architectures. Major accomplishments include tools and techniques for multidimensional performance analysis, as well as improved support for dynamic performance monitoring of multithreaded and multiprocess applications. Previous performance tool development has been limited by the burden of having to re-write a platform-dependent low-level substrate for each architecture/operating system pair in order to obtain the necessary performance data from the system. Manual interpretation of performance data is not scalable for large-scalemore » long-running applications. The infrastructure developed by this project provides a foundation for building portable and scalable performance analysis tools, with the end goal being to provide application developers with the information they need to analyze, understand, and tune the performance of terascale applications on HPC architectures. The backend portion of the infrastructure provides runtime instrumentation capability and access to hardware performance counters, with thread-safety for shared memory environments and a communication substrate to support instrumentation of multiprocess and distributed programs. Front end interfaces provides tool developers with a well-defined, platform-independent set of calls for requesting performance data. End-user tools have been developed that demonstrate runtime data collection, on-line and off-line analysis of performance data, and multidimensional performance analysis. The infrastructure is based on two underlying performance instrumentation technologies. These technologies are the PAPI cross-platform library interface to hardware performance counters and the cross-platform Dyninst library interface for runtime modification of executable images. The Paradyn and KOJAK projects have made use of this infrastructure to build performance measurement and analysis tools that scale to long-running programs on large parallel and distributed systems and that automate much of the search for performance bottlenecks.« less
An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Yan, Gui-Ying; Hu, Ji-Pu
2016-10-01
Predicting protein-protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high-throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM-BiGP that combines the relevance vector machine (RVM) model and Bi-gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi-gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five-fold cross-validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-BiGP method is significantly better than the SVM-based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM-BiGP-PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/. © 2016 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
NASA Astrophysics Data System (ADS)
Willmes, M.; McMorrow, L.; Kinsley, L.; Armstrong, R.; Aubert, M.; Eggins, S.; Falguères, C.; Maureille, B.; Moffat, I.; Grün, R.
2013-11-01
Strontium isotope ratios (87Sr/86Sr) are a key geochemical tracer used in a wide range of fields including archaeology, ecology, food and forensic sciences. These applications are based on the principle that the Sr isotopic ratios of natural materials reflect the sources of strontium available during their formation. A major constraint for current studies is the lack of robust reference maps to evaluate the source of strontium isotope ratios measured in the samples. Here we provide a new dataset of bioavailable Sr isotope ratios for the major geologic units of France, based on plant and soil samples (Pangaea data repository doi:10.1594/PANGAEA.819142). The IRHUM (Isotopic Reconstruction of Human Migration) database is a web platform to access, explore and map our dataset. The database provides the spatial context and metadata for each sample, allowing the user to evaluate the suitability of the sample for their specific study. In addition, it allows users to upload and share their own datasets and data products, which will enhance collaboration across the different research fields. This article describes the sampling and analytical methods used to generate the dataset and how to use and access of the dataset through the IRHUM database. Any interpretation of the isotope dataset is outside the scope of this publication.
Convergence of Educational Attainment Levels in the OECD: More Data, More Problems?
ERIC Educational Resources Information Center
Crespo Cuaresma, Jesus
2006-01-01
This note shows that the dynamics of the dispersion of educational attainment across OECD countries in the period 1960-1990 differ enormously depending on the dataset used, as do the results of the test of significance in the change of the cross-country standard deviation of schooling years between subperiods. The three datasets studied (the…
Strength and Power Qualities Are Highly Associated With Punching Impact in Elite Amateur Boxers.
Loturco, Irineu; Nakamura, Fabio Y; Artioli, Guilherme G; Kobal, Ronaldo; Kitamura, Katia; Cal Abad, Cesar C; Cruz, Igor F; Romano, Felipe; Pereira, Lucas A; Franchini, Emerson
2016-01-01
This study investigated the relationship between punching impact and selected strength and power variables in 15 amateur boxers from the Brazilian National Team (9 men and 6 women). Punching impact was assessed in the following conditions: 3 jabs starting from the standardized position, 3 crosses starting from the standardized position, 3 jabs starting from a self-selected position, and 3 crosses starting from a self-selected position. For punching tests, a force platform (1.02 × 0.76 m) covered by a body shield was mounted on the wall at a height of 1 m, perpendicular to the floor. The selected strength and power variables were vertical jump height (in squat jump and countermovement jump), mean propulsive power in the jump squat, bench press (BP), and bench throw, maximum isometric force in squat and BP, and rate of force development in the squat and BP. Sex and position main effects were observed, with higher impact for males compared with females (p ≤ 0.05) and the self-selected distance resulting in higher impact in the jab technique compared with the fixed distance (p ≤ 0.05). Finally, the correlations between strength/power variables and punching impact indices ranged between 0.67 and 0.85. Because of the strong associations between punching impact and strength/power variables (e.g., lower limb muscle power), this study provides important information for coaches to specifically design better training strategies to improve punching impact.
A wavelet analysis of co-movements in Asian gold markets
NASA Astrophysics Data System (ADS)
Das, Debojyoti; Kannadhasan, M.; Al-Yahyaee, Khamis Hamed; Yoon, Seong-Min
2018-02-01
This study assesses the cross-country co-movements of gold spot returns among the major gold consuming countries in Asia using wavelet-based analysis for a dataset spanning over 26 years. Wavelet-based analysis is used since it allows measuring co-movements in a time-frequency space. The results suggest intense and positive co-movements in Asia after the Asian financial crisis of 1997 at all frequencies. In addition, the Asian gold spot markets depict a state of impending perfect market integration. Finally, Thailand emerges as the potential market leader in all wavelet scales except one, which is led by India. The study has important implications for international diversification of a single-asset (gold) portfolio.
Navigation and Positioning System Using High Altitude Platforms Systems (HAPS)
NASA Astrophysics Data System (ADS)
Tsujii, Toshiaki; Harigae, Masatoshi; Harada, Masashi
Recently, some countries have begun conducting feasibility studies and R&D projects on High Altitude Platform Systems (HAPS). Japan has been investigating the use of an airship system that will function as a stratospheric platform for applications such as environmental monitoring, communications and broadcasting. If pseudolites were mounted on the airships, their GPS-like signals would be stable augmentations that would improve the accuracy, availability, and integrity of GPS-based positioning systems. Also, the sufficient number of HAPS can function as a positioning system independent of GPS. In this paper, a system design of the HAPS-based positioning system and its positioning error analyses are described.
Ling, Zhi-Qiang; Wang, Yi; Mukaisho, Kenichi; Hattori, Takanori; Tatsuta, Takeshi; Ge, Ming-Hua; Jin, Li; Mao, Wei-Min; Sugihara, Hiroyuki
2010-06-01
Tests of differentially expressed genes (DEGs) from microarray experiments are based on the null hypothesis that genes that are irrelevant to the phenotype/stimulus are expressed equally in the target and control samples. However, this strict hypothesis is not always true, as there can be several transcriptomic background differences between target and control samples, including different cell/tissue types, different cell cycle stages and different biological donors. These differences lead to increased false positives, which have little biological/medical significance. In this article, we propose a statistical framework to identify DEGs between target and control samples from expression microarray data allowing transcriptomic background differences between these samples by introducing a modified null hypothesis that the gene expression background difference is normally distributed. We use an iterative procedure to perform robust estimation of the null hypothesis and identify DEGs as outliers. We evaluated our method using our own triplicate microarray experiment, followed by validations with reverse transcription-polymerase chain reaction (RT-PCR) and on the MicroArray Quality Control dataset. The evaluations suggest that our technique (i) results in less false positive and false negative results, as measured by the degree of agreement with RT-PCR of the same samples, (ii) can be applied to different microarray platforms and results in better reproducibility as measured by the degree of DEG identification concordance both intra- and inter-platforms and (iii) can be applied efficiently with only a few microarray replicates. Based on these evaluations, we propose that this method not only identifies more reliable and biologically/medically significant DEG, but also reduces the power-cost tradeoff problem in the microarray field. Source code and binaries freely available for download at http://comonca.org.cn/fdca/resources/softwares/deg.zip.
Slope climbing challenges, fear of heights, anxiety and time of the day.
Ennaceur, A; Hussain, M D; Abuhamdah, R M; Mostafa, R M; Chazot, P L
2017-01-01
When exposed to an unfamiliar open space, animals experience fear and attempt to find an escape route. Anxiety emerges when animals are confronted with a challenging obstacle to this fear motivated escape. High anxiety animals do not take risks; they avoid the challenge. The present experiments investigated this risk avoidant behavior in mice. In experiment 1, BALB/c, C57BL/6J and CD-1 mice were exposed to a large platform with downward inclined steep slopes attached on two opposite sides. The platform was elevated 75 and 100cm from the ground, in a standard (SPDS) and in a raised (RPDS) configuration, respectively. In experiment 2, the platform was elevated 75cm from the ground. Mice had to climb onto a stand at the top of upward inclined slopes (SPUS). In experiment 3, BALB/c mice were exposed to SPDS with steep or shallow slopes either in early morning or in late afternoon. In all 3 test configurations, mice spent more time in the areas adjacent to the slopes than in the areas adjacent to void, however only C57BL/6J and CD-1 crossed onto the slopes in SPDS, and crossed onto the stands in SPUS whereas BALB/c remained on the platform in SPDS and explored the slopes in SPUS. Elevation of the platform from the ground reduced the crossings onto the slopes in C57BL/6J and CD-1, and no differences were observed between BALB/c and C57BL/6J. BALB/c mice demonstrated no difference in anxiety when tested early morning or late afternoon; they crossed onto shallow slopes and avoided the steep one. Copyright © 2016 Elsevier B.V. All rights reserved.
Strategy and Structure for Online News Production - Case Studies of CNN and NRK
NASA Astrophysics Data System (ADS)
Krumsvik, Arne H.
This cross-national comparative case study of online news production analyzes the strategies of Cable News Network (CNN) and the Norwegian Broadcasting Corporation (NRK), aiming at understanding of the implications of organizational strategy on the role of journalists, explains why traditional media organizations have a tendency to develop a multi-platform approach (distributing content on several platforms, such as television, online, mobile) rather than developing the cross-media (with interplay between media types) or multimedia approach anticipated by both scholars and practitioners.
Agile data management for curation of genomes to watershed datasets
NASA Astrophysics Data System (ADS)
Varadharajan, C.; Agarwal, D.; Faybishenko, B.; Versteeg, R.
2015-12-01
A software platform is being developed for data management and assimilation [DMA] as part of the U.S. Department of Energy's Genomes to Watershed Sustainable Systems Science Focus Area 2.0. The DMA components and capabilities are driven by the project science priorities and the development is based on agile development techniques. The goal of the DMA software platform is to enable users to integrate and synthesize diverse and disparate field, laboratory, and simulation datasets, including geological, geochemical, geophysical, microbiological, hydrological, and meteorological data across a range of spatial and temporal scales. The DMA objectives are (a) developing an integrated interface to the datasets, (b) storing field monitoring data, laboratory analytical results of water and sediments samples collected into a database, (c) providing automated QA/QC analysis of data and (d) working with data providers to modify high-priority field and laboratory data collection and reporting procedures as needed. The first three objectives are driven by user needs, while the last objective is driven by data management needs. The project needs and priorities are reassessed regularly with the users. After each user session we identify development priorities to match the identified user priorities. For instance, data QA/QC and collection activities have focused on the data and products needed for on-going scientific analyses (e.g. water level and geochemistry). We have also developed, tested and released a broker and portal that integrates diverse datasets from two different databases used for curation of project data. The development of the user interface was based on a user-centered design process involving several user interviews and constant interaction with data providers. The initial version focuses on the most requested feature - i.e. finding the data needed for analyses through an intuitive interface. Once the data is found, the user can immediately plot and download data through the portal. The resulting product has an interface that is more intuitive and presents the highest priority datasets that are needed by the users. Our agile approach has enabled us to build a system that is keeping pace with the science needs while utilizing limited resources.
Fedko, Iryna O; Hottenga, Jouke-Jan; Medina-Gomez, Carolina; Pappa, Irene; van Beijsterveldt, Catharina E M; Ehli, Erik A; Davies, Gareth E; Rivadeneira, Fernando; Tiemeier, Henning; Swertz, Morris A; Middeldorp, Christel M; Bartels, Meike; Boomsma, Dorret I
2015-09-01
Combining genotype data across cohorts increases power to estimate the heritability due to common single nucleotide polymorphisms (SNPs), based on analyzing a Genetic Relationship Matrix (GRM). However, the combination of SNP data across multiple cohorts may lead to stratification, when for example, different genotyping platforms are used. In the current study, we address issues of combining SNP data from different cohorts, the Netherlands Twin Register (NTR) and the Generation R (GENR) study. Both cohorts include children of Northern European Dutch background (N = 3102 + 2826, respectively) who were genotyped on different platforms. We explore imputation and phasing as a tool and compare three GRM-building strategies, when data from two cohorts are (1) just combined, (2) pre-combined and cross-platform imputed and (3) cross-platform imputed and post-combined. We test these three strategies with data on childhood height for unrelated individuals (N = 3124, average age 6.7 years) to explore their effect on SNP-heritability estimates and compare results to those obtained from the independent studies. All combination strategies result in SNP-heritability estimates with a standard error smaller than those of the independent studies. We did not observe significant difference in estimates of SNP-heritability based on various cross-platform imputed GRMs. SNP-heritability of childhood height was on average estimated as 0.50 (SE = 0.10). Introducing cohort as a covariate resulted in ≈2 % drop. Principal components (PCs) adjustment resulted in SNP-heritability estimates of about 0.39 (SE = 0.11). Strikingly, we did not find significant difference between cross-platform imputed and combined GRMs. All estimates were significant regardless the use of PCs adjustment. Based on these analyses we conclude that imputation with a reference set helps to increase power to estimate SNP-heritability by combining cohorts of the same ethnicity genotyped on different platforms. However, important factors should be taken into account such as remaining cohort stratification after imputation and/or phenotypic heterogeneity between and within cohorts. Whether one should use imputation, or just combine the genotype data, depends on the number of overlapping SNPs in relation to the total number of genotyped SNPs for both cohorts, and their ability to tag all the genetic variance related to the specific trait of interest.
Long-term electrical resistivity monitoring of recharge-induced contaminant plume behavior.
Gasperikova, Erika; Hubbard, Susan S; Watson, David B; Baker, Gregory S; Peterson, John E; Kowalsky, Michael B; Smith, Meagan; Brooks, Scott
2012-11-01
Geophysical measurements, and electrical resistivity tomography (ERT) data in particular, are sensitive to properties that are related (directly or indirectly) to hydrological processes. The challenge is in extracting information from geophysical data at a relevant scale that can be used to gain insight about subsurface behavior and to parameterize or validate flow and transport models. Here, we consider the use of ERT data for examining the impact of recharge on subsurface contamination at the S-3 ponds of the Oak Ridge Integrated Field Research Challenge (IFRC) site in Tennessee. A large dataset of time-lapse cross-well and surface ERT data, collected at the site over a period of 12 months, is used to study time variations in resistivity due to changes in total dissolved solids (primarily nitrate). The electrical resistivity distributions recovered from cross-well and surface ERT data agrees well, and both of these datasets can be used to interpret spatiotemporal variations in subsurface nitrate concentrations due to rainfall, although the sensitivity of the electrical resistivity response to dilution varies with nitrate concentration. Using the time-lapse surface ERT data interpreted in terms of nitrate concentrations, we find that the subsurface nitrate concentration at this site varies as a function of spatial position, episodic heavy rainstorms (versus seasonal and annual fluctuations), and antecedent rainfall history. These results suggest that the surface ERT monitoring approach is potentially useful for examining subsurface plume responses to recharge over field-relevant scales. Published by Elsevier B.V.
Enhancing Geoscience Research Discovery Through the Semantic Web
NASA Astrophysics Data System (ADS)
Rowan, Linda R.; Gross, M. Benjamin; Mayernik, Matthew; Khan, Huda; Boler, Frances; Maull, Keith; Stott, Don; Williams, Steve; Corson-Rikert, Jon; Johns, Erica M.; Daniels, Michael; Krafft, Dean B.; Meertens, Charles
2016-04-01
UNAVCO, UCAR, and Cornell University are working together to leverage semantic web technologies to enable discovery of people, datasets, publications and other research products, as well as the connections between them. The EarthCollab project, a U.S. National Science Foundation EarthCube Building Block, is enhancing an existing open-source semantic web application, VIVO, to enhance connectivity across distributed networks of researchers and resources related to the following two geoscience-based communities: (1) the Bering Sea Project, an interdisciplinary field program whose data archive is hosted by NCAR's Earth Observing Laboratory (EOL), and (2) UNAVCO, a geodetic facility and consortium that supports diverse research projects informed by geodesy. People, publications, datasets and grant information have been mapped to an extended version of the VIVO-ISF ontology and ingested into VIVO's database. Much of the VIVO ontology was built for the life sciences, so we have added some components of existing geoscience-based ontologies and a few terms from a local ontology that we created. The UNAVCO VIVO instance, connect.unavco.org, utilizes persistent identifiers whenever possible; for example using ORCIDs for people, publication DOIs, data DOIs and unique NSF grant numbers. Data is ingested using a custom set of scripts that include the ability to perform basic automated and curated disambiguation. VIVO can display a page for every object ingested, including connections to other objects in the VIVO database. A dataset page, for example, includes the dataset type, time interval, DOI, related publications, and authors. The dataset type field provides a connection to all other datasets of the same type. The author's page shows, among other information, related datasets and co-authors. Information previously spread across several unconnected databases is now stored in a single location. In addition to VIVO's default display, the new database can be queried using SPARQL, a query language for semantic data. EarthCollab is extending the VIVO web application. One such extension is the ability to cross-link separate VIVO instances across institutions, allowing local display of externally curated information. For example, Cornell's VIVO faculty pages will display UNAVCO's dataset information and UNAVCO's VIVO will display Cornell faculty member contact and position information. About half of UNAVCO's membership is international and we hope to connect our data to institutions in other countries with a similar approach. Additional extensions, including enhanced geospatial capabilities, will be developed based on task-centered usability testing.
High-frequency fluctuations of surface temperatures in an urban environment
NASA Astrophysics Data System (ADS)
Christen, Andreas; Meier, Fred; Scherer, Dieter
2012-04-01
This study presents an attempt to resolve fluctuations in surface temperatures at scales of a few seconds to several minutes using time-sequential thermography (TST) from a ground-based platform. A scheme is presented to decompose a TST dataset into fluctuating, high-frequency, and long-term mean parts. To demonstrate the scheme's application, a set of four TST runs (day/night, leaves-on/leaves-off) recorded from a 125-m-high platform above a complex urban environment in Berlin, Germany is used. Fluctuations in surface temperatures of different urban facets are measured and related to surface properties (material and form) and possible error sources. A number of relationships were found: (1) Surfaces with surface temperatures that were significantly different from air temperature experienced the highest fluctuations. (2) With increasing surface temperature above (below) air temperature, surface temperature fluctuations experienced a stronger negative (positive) skewness. (3) Surface materials with lower thermal admittance (lawns, leaves) showed higher fluctuations than surfaces with high thermal admittance (walls, roads). (4) Surface temperatures of emerged leaves fluctuate more compared to trees in a leaves-off situation. (5) In many cases, observed fluctuations were coherent across several neighboring pixels. The evidence from (1) to (5) suggests that atmospheric turbulence is a significant contributor to fluctuations. The study underlines the potential of using high-frequency thermal remote sensing in energy balance and turbulence studies at complex land-atmosphere interfaces.
Chan, George C. Y. [Bloomington, IN; Hieftje, Gary M [Bloomington, IN
2010-08-03
A method for detecting and correcting inaccurate results in inductively coupled plasma-atomic emission spectrometry (ICP-AES). ICP-AES analysis is performed across a plurality of selected locations in the plasma on an unknown sample, collecting the light intensity at one or more selected wavelengths of one or more sought-for analytes, creating a first dataset. The first dataset is then calibrated with a calibration dataset creating a calibrated first dataset curve. If the calibrated first dataset curve has a variability along the location within the plasma for a selected wavelength, errors are present. Plasma-related errors are then corrected by diluting the unknown sample and performing the same ICP-AES analysis on the diluted unknown sample creating a calibrated second dataset curve (accounting for the dilution) for the one or more sought-for analytes. The cross-over point of the calibrated dataset curves yields the corrected value (free from plasma related errors) for each sought-for analyte.
Autonomous robotic platforms for locating radio sources buried under rubble
NASA Astrophysics Data System (ADS)
Tasu, A. S.; Anchidin, L.; Tamas, R.; Paun, M.; Danisor, A.; Petrescu, T.
2016-12-01
This paper deals with the use of autonomous robotic platforms able to locate radio signal sources such as mobile phones, buried under collapsed buildings as a result of earthquakes, natural disasters, terrorism, war, etc. This technique relies on averaging position data resulting from a propagation model implemented on the platform and the data acquired by robotic platforms at the disaster site. That allows us to calculate the approximate position of radio sources buried under the rubble. Based on measurements, a radio map of the disaster site is made, very useful for locating victims and for guiding specific rubble lifting machinery, by assuming that there is a victim next to a mobile device detected by the robotic platform; by knowing the approximate position, the lifting machinery does not risk to further hurt the victims. Moreover, by knowing the positions of the victims, the reaction time is decreased, and the chances of survival for the victims buried under the rubble, are obviously increased.
Classification based upon gene expression data: bias and precision of error rates.
Wood, Ian A; Visscher, Peter M; Mengersen, Kerrie L
2007-06-01
Gene expression data offer a large number of potentially useful predictors for the classification of tissue samples into classes, such as diseased and non-diseased. The predictive error rate of classifiers can be estimated using methods such as cross-validation. We have investigated issues of interpretation and potential bias in the reporting of error rate estimates. The issues considered here are optimization and selection biases, sampling effects, measures of misclassification rate, baseline error rates, two-level external cross-validation and a novel proposal for detection of bias using the permutation mean. Reporting an optimal estimated error rate incurs an optimization bias. Downward bias of 3-5% was found in an existing study of classification based on gene expression data and may be endemic in similar studies. Using a simulated non-informative dataset and two example datasets from existing studies, we show how bias can be detected through the use of label permutations and avoided using two-level external cross-validation. Some studies avoid optimization bias by using single-level cross-validation and a test set, but error rates can be more accurately estimated via two-level cross-validation. In addition to estimating the simple overall error rate, we recommend reporting class error rates plus where possible the conditional risk incorporating prior class probabilities and a misclassification cost matrix. We also describe baseline error rates derived from three trivial classifiers which ignore the predictors. R code which implements two-level external cross-validation with the PAMR package, experiment code, dataset details and additional figures are freely available for non-commercial use from http://www.maths.qut.edu.au/profiles/wood/permr.jsp
Two-UAV Intersection Localization System Based on the Airborne Optoelectronic Platform
Bai, Guanbing; Liu, Jinghong; Song, Yueming; Zuo, Yujia
2017-01-01
To address the limitation of the existing UAV (unmanned aerial vehicles) photoelectric localization method used for moving objects, this paper proposes an improved two-UAV intersection localization system based on airborne optoelectronic platforms by using the crossed-angle localization method of photoelectric theodolites for reference. This paper introduces the makeup and operating principle of intersection localization system, creates auxiliary coordinate systems, transforms the LOS (line of sight, from the UAV to the target) vectors into homogeneous coordinates, and establishes a two-UAV intersection localization model. In this paper, the influence of the positional relationship between UAVs and the target on localization accuracy has been studied in detail to obtain an ideal measuring position and the optimal localization position where the optimal intersection angle is 72.6318°. The result shows that, given the optimal position, the localization root mean square error (RMS) will be 25.0235 m when the target is 5 km away from UAV baselines. Finally, the influence of modified adaptive Kalman filtering on localization results is analyzed, and an appropriate filtering model is established to reduce the localization RMS error to 15.7983 m. Finally, An outfield experiment was carried out and obtained the optimal results: σB=1.63×10−4 (°), σL=1.35×10−4 (°), σH=15.8 (m), σsum=27.6 (m), where σB represents the longitude error, σL represents the latitude error, σH represents the altitude error, and σsum represents the error radius. PMID:28067814
Two-UAV Intersection Localization System Based on the Airborne Optoelectronic Platform.
Bai, Guanbing; Liu, Jinghong; Song, Yueming; Zuo, Yujia
2017-01-06
To address the limitation of the existing UAV (unmanned aerial vehicles) photoelectric localization method used for moving objects, this paper proposes an improved two-UAV intersection localization system based on airborne optoelectronic platforms by using the crossed-angle localization method of photoelectric theodolites for reference. This paper introduces the makeup and operating principle of intersection localization system, creates auxiliary coordinate systems, transforms the LOS (line of sight, from the UAV to the target) vectors into homogeneous coordinates, and establishes a two-UAV intersection localization model. In this paper, the influence of the positional relationship between UAVs and the target on localization accuracy has been studied in detail to obtain an ideal measuring position and the optimal localization position where the optimal intersection angle is 72.6318°. The result shows that, given the optimal position, the localization root mean square error (RMS) will be 25.0235 m when the target is 5 km away from UAV baselines. Finally, the influence of modified adaptive Kalman filtering on localization results is analyzed, and an appropriate filtering model is established to reduce the localization RMS error to 15.7983 m. Finally, An outfield experiment was carried out and obtained the optimal results: σ B = 1.63 × 10 - 4 ( ° ) , σ L = 1.35 × 10 - 4 ( ° ) , σ H = 15.8 ( m ) , σ s u m = 27.6 ( m ) , where σ B represents the longitude error, σ L represents the latitude error, σ H represents the altitude error, and σ s u m represents the error radius.
A Novel Performance Evaluation Methodology for Single-Target Trackers.
Kristan, Matej; Matas, Jiri; Leonardis, Ales; Vojir, Tomas; Pflugfelder, Roman; Fernandez, Gustavo; Nebehay, Georg; Porikli, Fatih; Cehovin, Luka
2016-11-01
This paper addresses the problem of single-target tracker performance evaluation. We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison. The ranking-based methodology addresses tracker equivalence in terms of statistical significance and practical differences. A fully-annotated dataset with per-frame annotations with several visual attributes is introduced. The diversity of its visual properties is maximized in a novel way by clustering a large number of videos according to their visual attributes. This makes it the most sophistically constructed and annotated dataset to date. A multi-platform evaluation system allowing easy integration of third-party trackers is presented as well. The proposed evaluation methodology was tested on the VOT2014 challenge on the new dataset and 38 trackers, making it the largest benchmark to date. Most of the tested trackers are indeed state-of-the-art since they outperform the standard baselines, resulting in a highly-challenging benchmark. An exhaustive analysis of the dataset from the perspective of tracking difficulty is carried out. To facilitate tracker comparison a new performance visualization technique is proposed.
Robinson, Nathaniel; Allred, Brady; Jones, Matthew; ...
2017-08-21
Satellite derived vegetation indices (VIs) are broadly used in ecological research, ecosystem modeling, and land surface monitoring. The Normalized Difference Vegetation Index (NDVI), perhaps the most utilized VI, has countless applications across ecology, forestry, agriculture, wildlife, biodiversity, and other disciplines. Calculating satellite derived NDVI is not always straight-forward, however, as satellite remote sensing datasets are inherently noisy due to cloud and atmospheric contamination, data processing failures, and instrument malfunction. Readily available NDVI products that account for these complexities are generally at coarse resolution; high resolution NDVI datasets are not conveniently accessible and developing them often presents numerous technical and methodologicalmore » challenges. Here, we address this deficiency by producing a Landsat derived, high resolution (30 m), long-term (30+ years) NDVI dataset for the conterminous United States. We use Google Earth Engine, a planetary-scale cloud-based geospatial analysis platform, for processing the Landsat data and distributing the final dataset. We use a climatology driven approach to fill missing data and validate the dataset with established remote sensing products at multiple scales. We provide access to the composites through a simple web application, allowing users to customize key parameters appropriate for their application, question, and region of interest.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Robinson, Nathaniel; Allred, Brady; Jones, Matthew
Satellite derived vegetation indices (VIs) are broadly used in ecological research, ecosystem modeling, and land surface monitoring. The Normalized Difference Vegetation Index (NDVI), perhaps the most utilized VI, has countless applications across ecology, forestry, agriculture, wildlife, biodiversity, and other disciplines. Calculating satellite derived NDVI is not always straight-forward, however, as satellite remote sensing datasets are inherently noisy due to cloud and atmospheric contamination, data processing failures, and instrument malfunction. Readily available NDVI products that account for these complexities are generally at coarse resolution; high resolution NDVI datasets are not conveniently accessible and developing them often presents numerous technical and methodologicalmore » challenges. Here, we address this deficiency by producing a Landsat derived, high resolution (30 m), long-term (30+ years) NDVI dataset for the conterminous United States. We use Google Earth Engine, a planetary-scale cloud-based geospatial analysis platform, for processing the Landsat data and distributing the final dataset. We use a climatology driven approach to fill missing data and validate the dataset with established remote sensing products at multiple scales. We provide access to the composites through a simple web application, allowing users to customize key parameters appropriate for their application, question, and region of interest.« less
Development of image processing method to detect noise in geostationary imagery
NASA Astrophysics Data System (ADS)
Khlopenkov, Konstantin V.; Doelling, David R.
2016-10-01
The Clouds and the Earth's Radiant Energy System (CERES) has incorporated imagery from 16 individual geostationary (GEO) satellites across five contiguous domains since March 2000. In order to derive broadband fluxes uniform across satellite platforms it is important to ensure a good quality of the input raw count data. GEO data obtained by older GOES imagers (such as MTSAT-1, Meteosat-5, Meteosat-7, GMS-5, and GOES-9) are known to frequently contain various types of noise caused by transmission errors, sync errors, stray light contamination, and others. This work presents an image processing methodology designed to detect most kinds of noise and corrupt data in all bands of raw imagery from modern and historic GEO satellites. The algorithm is based on a set of different approaches to detect abnormal image patterns, including inter-line and inter-pixel differences within a scanline, correlation between scanlines, analysis of spatial variance, and also a 2D Fourier analysis of the image spatial frequencies. In spite of computational complexity, the described method is highly optimized for performance to facilitate volume processing of multi-year data and runs in fully automated mode. Reliability of this noise detection technique has been assessed by human supervision for each GEO dataset obtained during selected time periods in 2005 and 2006. This assessment has demonstrated the overall detection accuracy of over 99.5% and the false alarm rate of under 0.3%. The described noise detection routine is currently used in volume processing of historical GEO imagery for subsequent production of global gridded data products and for cross-platform calibration.
JS-MS: a cross-platform, modular javascript viewer for mass spectrometry signals.
Rosen, Jebediah; Handy, Kyle; Gillan, André; Smith, Rob
2017-11-06
Despite the ubiquity of mass spectrometry (MS), data processing tools can be surprisingly limited. To date, there is no stand-alone, cross-platform 3-D visualizer for MS data. Available visualization toolkits require large libraries with multiple dependencies and are not well suited for custom MS data processing modules, such as MS storage systems or data processing algorithms. We present JS-MS, a 3-D, modular JavaScript client application for viewing MS data. JS-MS provides several advantages over existing MS viewers, such as a dependency-free, browser-based, one click, cross-platform install and better navigation interfaces. The client includes a modular Java backend with a novel streaming.mzML parser to demonstrate the API-based serving of MS data to the viewer. JS-MS enables custom MS data processing and evaluation by providing fast, 3-D visualization using improved navigation without dependencies. JS-MS is publicly available with a GPLv2 license at github.com/optimusmoose/jsms.
The scheme and research of TV series multidimensional comprehensive evaluation on cross-platform
NASA Astrophysics Data System (ADS)
Chai, Jianping; Bai, Xuesong; Zhou, Hongjun; Yin, Fulian
2016-10-01
As for shortcomings of the comprehensive evaluation system on traditional TV programs such as single data source, ignorance of new media as well as the high time cost and difficulty of making surveys, a new evaluation of TV series is proposed in this paper, which has a perspective in cross-platform multidimensional evaluation after broadcasting. This scheme considers the data directly collected from cable television and the Internet as research objects. It's based on TOPSIS principle, after preprocessing and calculation of the data, they become primary indicators that reflect different profiles of the viewing of TV series. Then after the process of reasonable empowerment and summation by the six methods(PCA, AHP, etc.), the primary indicators form the composite indices on different channels or websites. The scheme avoids the inefficiency and difficulty of survey and marking; At the same time, it not only reflects different dimensions of viewing, but also combines TV media and new media, completing the multidimensional comprehensive evaluation of TV series on cross-platform.
NASA Astrophysics Data System (ADS)
Jones, Bob; Casu, Francesco
2013-04-01
The feasibility of using commercial cloud services for scientific research is of great interest to research organisations such as CERN, ESA and EMBL, to the suppliers of cloud-based services and to the national and European funding agencies. Through the Helix Nebula - the Science Cloud [1] initiative and with the support of the European Commission, these stakeholders are driving a two year pilot-phase during which procurement processes and governance issues for a framework of public/private partnership will be appraised. Three initial flagship use cases from high energy physics, molecular biology and earth-observation are being used to validate the approach, enable a cost-benefit analysis to be undertaken and prepare the next stage of the Science Cloud Strategic Plan [2] to be developed and approved. The power of Helix Nebula lies in a shared set of services for initially 3 very different sciences each supporting a global community and thus building a common e-Science platform. Of particular relevance is the ESA sponsored flagship application SuperSites Exploitation Platform (SSEP [3]) that offers the global geo-hazard community a common platform for the correlation and processing of observation data for supersites monitoring. The US-NSF Earth Cube [4] and Ocean Observatory Initiative [5] (OOI) are taking a similar approach for data intensive science. The work of Helix Nebula and its recent architecture model [6] has shown that is it technically feasible to allow publicly funded infrastructures, such as EGI [7] and GEANT [8], to interoperate with commercial cloud services. Such hybrid systems are in the interest of the existing users of publicly funded infrastructures and funding agencies because they will provide "freedom of choice" over the type of computing resources to be consumed and the manner in which they can be obtained. But to offer such freedom-of choice across a spectrum of suppliers, various issues such as intellectual property, legal responsibility, service quality agreements and related issues need to be addressed. Investigating these issues is one of the goals of the Helix Nebula initiative. The next generation of researchers will put aside the historical categorisation of research as a neatly defined set of disciplines and integrate the data from different sources and instruments into complex models that are as applicable to earth observation or biomedicine as they are to high-energy physics. This aggregation of datasets and development of new models will accelerate scientific development but will only be possible if the issues of data intensive science described above are addressed. The culture of science has the possibility to develop with the availability of Helix Nebula as a "Science Cloud" because: • Large scale datasets from many disciplines will be accessible • Scientists and others will be able to develop and contribute open source tools to expand the set of services available • Collaboration of scientists will take place around the on-demand availability of data, tools and services • Cross-domain research will advance at a faster pace due to the availability of a common platform. References: 1 http://www.helix-nebula.eu/ 2 http://cdsweb.cern.ch/record/1374172/files/CERN-OPEN-2011-036.pdf 3 http://www.helix-nebula.eu/index.php/helix-nebula-use-cases/uc3.html 4 http://www.nsf.gov/geo/earthcube/ 5 http://www.oceanobservatories.org/ 6 http://cdsweb.cern.ch/record/1478364/files/HelixNebula-NOTE-2012-001.pdf 7 http://www.nsf.gov/geo/earthcube/ 8 http://www.geant.net/
ERIC Educational Resources Information Center
Khalil, Mohammad; Ebner, Martin
2017-01-01
Massive Open Online Courses (MOOCs) are remote courses that excel in their students' heterogeneity and quantity. Due to the peculiarity of being massiveness, the large datasets generated by MOOC platforms require advanced tools and techniques to reveal hidden patterns for purposes of enhancing learning and educational behaviors. This publication…
Approaching the exa-scale: a real-world evaluation of rendering extremely large data sets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patchett, John M; Ahrens, James P; Lo, Li - Ta
2010-10-15
Extremely large scale analysis is becoming increasingly important as supercomputers and their simulations move from petascale to exascale. The lack of dedicated hardware acceleration for rendering on today's supercomputing platforms motivates our detailed evaluation of the possibility of interactive rendering on the supercomputer. In order to facilitate our understanding of rendering on the supercomputing platform, we focus on scalability of rendering algorithms and architecture envisioned for exascale datasets. To understand tradeoffs for dealing with extremely large datasets, we compare three different rendering algorithms for large polygonal data: software based ray tracing, software based rasterization and hardware accelerated rasterization. We presentmore » a case study of strong and weak scaling of rendering extremely large data on both GPU and CPU based parallel supercomputers using Para View, a parallel visualization tool. Wc use three different data sets: two synthetic and one from a scientific application. At an extreme scale, algorithmic rendering choices make a difference and should be considered while approaching exascale computing, visualization, and analysis. We find software based ray-tracing offers a viable approach for scalable rendering of the projected future massive data sizes.« less
Lagkouvardos, Ilias; Joseph, Divya; Kapfhammer, Martin; Giritli, Sabahattin; Horn, Matthias; Haller, Dirk; Clavel, Thomas
2016-09-23
The SRA (Sequence Read Archive) serves as primary depository for massive amounts of Next Generation Sequencing data, and currently host over 100,000 16S rRNA gene amplicon-based microbial profiles from various host habitats and environments. This number is increasing rapidly and there is a dire need for approaches to utilize this pool of knowledge. Here we created IMNGS (Integrated Microbial Next Generation Sequencing), an innovative platform that uniformly and systematically screens for and processes all prokaryotic 16S rRNA gene amplicon datasets available in SRA and uses them to build sample-specific sequence databases and OTU-based profiles. Via a web interface, this integrative sequence resource can easily be queried by users. We show examples of how the approach allows testing the ecological importance of specific microorganisms in different hosts or ecosystems, and performing targeted diversity studies for selected taxonomic groups. The platform also offers a complete workflow for de novo analysis of users' own raw 16S rRNA gene amplicon datasets for the sake of comparison with existing data. IMNGS can be accessed at www.imngs.org.
Web-GIS platform for monitoring and forecasting of regional climate and ecological changes
NASA Astrophysics Data System (ADS)
Gordov, E. P.; Krupchatnikov, V. N.; Lykosov, V. N.; Okladnikov, I.; Titov, A. G.; Shulgina, T. M.
2012-12-01
Growing volume of environmental data from sensors and model outputs makes development of based on modern information-telecommunication technologies software infrastructure for information support of integrated scientific researches in the field of Earth sciences urgent and important task (Gordov et al, 2012, van der Wel, 2005). It should be considered that original heterogeneity of datasets obtained from different sources and institutions not only hampers interchange of data and analysis results but also complicates their intercomparison leading to a decrease in reliability of analysis results. However, modern geophysical data processing techniques allow combining of different technological solutions for organizing such information resources. Nowadays it becomes a generally accepted opinion that information-computational infrastructure should rely on a potential of combined usage of web- and GIS-technologies for creating applied information-computational web-systems (Titov et al, 2009, Gordov et al. 2010, Gordov, Okladnikov and Titov, 2011). Using these approaches for development of internet-accessible thematic information-computational systems, and arranging of data and knowledge interchange between them is a very promising way of creation of distributed information-computation environment for supporting of multidiscipline regional and global research in the field of Earth sciences including analysis of climate changes and their impact on spatial-temporal vegetation distribution and state. Experimental software and hardware platform providing operation of a web-oriented production and research center for regional climate change investigations which combines modern web 2.0 approach, GIS-functionality and capabilities of running climate and meteorological models, large geophysical datasets processing, visualization, joint software development by distributed research groups, scientific analysis and organization of students and post-graduate students education is presented. Platform software developed (Shulgina et al, 2012, Okladnikov et al, 2012) includes dedicated modules for numerical processing of regional and global modeling results for consequent analysis and visualization. Also data preprocessing, run and visualization of modeling results of models WRF and «Planet Simulator» integrated into the platform is provided. All functions of the center are accessible by a user through a web-portal using common graphical web-browser in the form of an interactive graphical user interface which provides, particularly, capabilities of visualization of processing results, selection of geographical region of interest (pan and zoom) and data layers manipulation (order, enable/disable, features extraction). Platform developed provides users with capabilities of heterogeneous geophysical data analysis, including high-resolution data, and discovering of tendencies in climatic and ecosystem changes in the framework of different multidisciplinary researches (Shulgina et al, 2011). Using it even unskilled user without specific knowledge can perform computational processing and visualization of large meteorological, climatological and satellite monitoring datasets through unified graphical web-interface.
Sensor deployment on unmanned ground vehicles
NASA Astrophysics Data System (ADS)
Gerhart, Grant R.; Witus, Gary
2007-10-01
TARDEC has been developing payloads for small robots as part of its unmanned ground vehicle (UGV) development programs. These platforms typically weigh less than 100 lbs and are used for various physical security and force protection applications. This paper will address a number of technical issues including platform mobility, payload positioning, sensor configuration and operational tradeoffs. TARDEC has developed a number of robots with different mobility mechanisms including track, wheel and hybrid track/wheel running gear configurations. An extensive discussion will focus upon omni-directional vehicle (ODV) platforms with enhanced intrinsic mobility for positioning sensor payloads. This paper also discusses tradeoffs between intrinsic platform mobility and articulated arm complexity for end point positioning of modular sensor packages.
Hierarchical Recognition Scheme for Human Facial Expression Recognition Systems
Siddiqi, Muhammad Hameed; Lee, Sungyoung; Lee, Young-Koo; Khan, Adil Mehmood; Truc, Phan Tran Ho
2013-01-01
Over the last decade, human facial expressions recognition (FER) has emerged as an important research area. Several factors make FER a challenging research problem. These include varying light conditions in training and test images; need for automatic and accurate face detection before feature extraction; and high similarity among different expressions that makes it difficult to distinguish these expressions with a high accuracy. This work implements a hierarchical linear discriminant analysis-based facial expressions recognition (HL-FER) system to tackle these problems. Unlike the previous systems, the HL-FER uses a pre-processing step to eliminate light effects, incorporates a new automatic face detection scheme, employs methods to extract both global and local features, and utilizes a HL-FER to overcome the problem of high similarity among different expressions. Unlike most of the previous works that were evaluated using a single dataset, the performance of the HL-FER is assessed using three publicly available datasets under three different experimental settings: n-fold cross validation based on subjects for each dataset separately; n-fold cross validation rule based on datasets; and, finally, a last set of experiments to assess the effectiveness of each module of the HL-FER separately. Weighted average recognition accuracy of 98.7% across three different datasets, using three classifiers, indicates the success of employing the HL-FER for human FER. PMID:24316568
Registration of High Angular Resolution Diffusion MRI Images Using 4th Order Tensors⋆
Barmpoutis, Angelos; Vemuri, Baba C.; Forder, John R.
2009-01-01
Registration of Diffusion Weighted (DW)-MRI datasets has been commonly achieved to date in literature by using either scalar or 2nd-order tensorial information. However, scalar or 2nd-order tensors fail to capture complex local tissue structures, such as fiber crossings, and therefore, datasets containing fiber-crossings cannot be registered accurately by using these techniques. In this paper we present a novel method for non-rigidly registering DW-MRI datasets that are represented by a field of 4th-order tensors. We use the Hellinger distance between the normalized 4th-order tensors represented as distributions, in order to achieve this registration. Hellinger distance is easy to compute, is scale and rotation invariant and hence allows for comparison of the true shape of distributions. Furthermore, we propose a novel 4th-order tensor re-transformation operator, which plays an essential role in the registration procedure and shows significantly better performance compared to the re-orientation operator used in literature for DTI registration. We validate and compare our technique with other existing scalar image and DTI registration methods using simulated diffusion MR data and real HARDI datasets. PMID:18051145
The characteristics and dynamics of wave-driven flow across a platform coral reef in the Red Sea
NASA Astrophysics Data System (ADS)
Lentz, S. J.; Churchill, J. H.; Davis, K. A.; Farrar, J. T.; Pineda, J.; Starczak, V.
2016-02-01
Current dynamics across a platform reef in the Red Sea near Jeddah, Saudi Arabia, are examined using 18 months of current profile, pressure, surface wave, and wind observations. The platform reef is 700 m long, 200 m across with spatial and temporal variations in water depth over the reef ranging from 0.6 to 1.6 m. Surface waves breaking at the seaward edge of the reef cause a 2-10 cm setup of sea level that drives cross-reef currents of 5-20 cm s-1. Bottom stress is a significant component of the wave setup balance in the surf zone. Over the reef flat, where waves are not breaking, the cross-reef pressure gradient associated with wave setup is balanced by bottom stress. The quadratic drag coefficient for the depth-average flow decreases with increasing water depth from Cda = 0.17 in 0.4 m of water to Cda = 0.03 in 1.2 m of water. The observed dependence of the drag coefficient on water depth is consistent with open-channel flow theory and a hydrodynamic roughness of zo = 0.06 m. A simple one-dimensional model driven by incident surface waves and wind stress accurately reproduces the observed depth-averaged cross-reef currents and a portion of the weaker along-reef currents over the focus reef and two other Red Sea platform reefs. The model indicates the cross-reef current is wave forced and the along-reef current is partially wind forced.
Efficient and Scalable Cross-Matching of (Very) Large Catalogs
NASA Astrophysics Data System (ADS)
Pineau, F.-X.; Boch, T.; Derriere, S.
2011-07-01
Whether it be for building multi-wavelength datasets from independent surveys, studying changes in objects luminosities, or detecting moving objects (stellar proper motions, asteroids), cross-catalog matching is a technique widely used in astronomy. The need for efficient, reliable and scalable cross-catalog matching is becoming even more pressing with forthcoming projects which will produce huge catalogs in which astronomers will dig for rare objects, perform statistical analysis and classification, or real-time transients detection. We have developed a formalism and the corresponding technical framework to address the challenge of fast cross-catalog matching. Our formalism supports more than simple nearest-neighbor search, and handles elliptical positional errors. Scalability is improved by partitioning the sky using the HEALPix scheme, and processing independently each sky cell. The use of multi-threaded two-dimensional kd-trees adapted to managing equatorial coordinates enables efficient neighbor search. The whole process can run on a single computer, but could also use clusters of machines to cross-match future very large surveys such as GAIA or LSST in reasonable times. We already achieve performances where the 2MASS (˜470M sources) and SDSS DR7 (˜350M sources) can be matched on a single machine in less than 10 minutes. We aim at providing astronomers with a catalog cross-matching service, available on-line and leveraging on the catalogs present in the VizieR database. This service will allow users both to access pre-computed cross-matches across some very large catalogs, and to run customized cross-matching operations. It will also support VO protocols for synchronous or asynchronous queries.
Remote visualization and scale analysis of large turbulence datatsets
NASA Astrophysics Data System (ADS)
Livescu, D.; Pulido, J.; Burns, R.; Canada, C.; Ahrens, J.; Hamann, B.
2015-12-01
Accurate simulations of turbulent flows require solving all the dynamically relevant scales of motions. This technique, called Direct Numerical Simulation, has been successfully applied to a variety of simple flows; however, the large-scale flows encountered in Geophysical Fluid Dynamics (GFD) would require meshes outside the range of the most powerful supercomputers for the foreseeable future. Nevertheless, the current generation of petascale computers has enabled unprecedented simulations of many types of turbulent flows which focus on various GFD aspects, from the idealized configurations extensively studied in the past to more complex flows closer to the practical applications. The pace at which such simulations are performed only continues to increase; however, the simulations themselves are restricted to a small number of groups with access to large computational platforms. Yet the petabytes of turbulence data offer almost limitless information on many different aspects of the flow, from the hierarchy of turbulence moments, spectra and correlations, to structure-functions, geometrical properties, etc. The ability to share such datasets with other groups can significantly reduce the time to analyze the data, help the creative process and increase the pace of discovery. Using the largest DOE supercomputing platforms, we have performed some of the biggest turbulence simulations to date, in various configurations, addressing specific aspects of turbulence production and mixing mechanisms. Until recently, the visualization and analysis of such datasets was restricted by access to large supercomputers. The public Johns Hopkins Turbulence database simplifies the access to multi-Terabyte turbulence datasets and facilitates turbulence analysis through the use of commodity hardware. First, one of our datasets, which is part of the database, will be described and then a framework that adds high-speed visualization and wavelet support for multi-resolution analysis of turbulence will be highlighted. The addition of wavelet support reduces the latency and bandwidth requirements for visualization, allowing for many concurrent users, and enables new types of analyses, including scale decomposition and coherent feature extraction.
Zapata-Peñasco, Icoquih; Poot-Hernandez, Augusto Cesar; Eguiarte, Luis E
2017-01-01
Abstract The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large “omic” datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa. PMID:29069412
NASA Astrophysics Data System (ADS)
Dehn, Angelika; Brizzi, G.; Barrot, G.; Bovensmann, H.; Canela, M.; Fehr, T.; Laur, H.; Lichtenberg, G.; Niro, F.; Perron, G.; Raspollini, P.; Saavedra de Miguel, L.; Scarpino, G.; Vogel, P.
The atmospheric chemistry instruments on board the ENVISAT platform (GOMOS, MIPAS and SCIAMACHY) provide a unique dataset of geophysical parameters (e.g.: trace gases, clouds, and aerosol) that allows a comprehensive characterization of the atmosphere's chemical and climatological processes [1]. These instruments started to provide significant science data shortly after the launch of the ENVISAT satellite (March 2002). At the time of writing this paper, these instruments and the whole payload modules are fully working and are well beyond the expected lifetime of 5 years. In addition the orbit control strategy of the platform will be modified starting from 2010, in order to extend the mission lifetime up to 2013 [2]. This means that if no instrument problems will appear, the ENVISAT atmospheric sensors will provide at the end of their life, three separated, but complementary datasets of the most important atmospheric state parameters, spanning a time interval of about 11 years. This represents an extraordinary source of information for the scientific user community, both for the completeness and quality of the data and for the extent of the dataset. The aim of this paper is to present the actual status of the ESA operational atmospheric chemistry dataset provided by the three ENVISAT atmospheric chemistry instruments and the future evolution. The processing and reprocessing status will be described in details for each instrument. The outcomes of the geophysical validation and the planned validation activities will be discussed. Finally the data availability and the source of information will be specified. [1] H. Nett, J. Frerick, T. Paulsen, and G. Levrini, "The atmospheric instruments and their applications: GOMOS, MIPAS and SCIAMACHY", ESA Bulletin (ISSN 0376-4265), No. 106, p. 77 -87 (2001) [2] J. Frerick, B. Duesmann, and M. Canela, "2010 and beyond -The ENVISAT mission extension", Proc. `Envisat Symposium 2007', Montreux, Switzerland, 23-27 April 2007 (ESA SP-636, July 2007)
NASA Astrophysics Data System (ADS)
Niro, F.
2009-04-01
The atmospheric chemistry instruments on board the ENVISAT platform (GOMOS, MIPAS and SCIAMACHY) provide a unique dataset of geophysical parameters (e.g.: trace gases, clouds, and aerosol) that allows a comprehensive characterization of the atmosphere's chemical and climatological processes [1]. These instruments started to provide significant science data shortly after the launch of the ENVISAT satellite (March 2002). At the time of writing this paper, these instruments and the whole payload modules are fully working and are well beyond the expected lifetime of 5 years. In addition the orbit control strategy of the platform will be modified starting from 2010, in order to extend the mission lifetime up to 2013 [2]. This means that if no instrument problems will appear, the ENVISAT atmospheric sensors will provide at the end of their life, three separated, but complementary datasets of the most important atmospheric state parameters, spanning a time interval of about 11 years. This represents an extraordinary source of information for the scientific user community, both for the completeness and quality of the data and for the extent of the dataset. The aim of this paper is to present the actual status of the ESA operational atmospheric chemistry dataset provided by the three ENVISAT atmospheric chemistry instruments and the future evolution. The processing and reprocessing status will be described in details for each instrument. The outcomes of the geophysical validation and the planned validation activities will be discussed. Finally the data availability and the source of information will be specified. [1] H. Nett, J. Frerick, T. Paulsen, and G. Levrini, "The atmospheric instruments and their applications: GOMOS, MIPAS and SCIAMACHY", ESA Bulletin (ISSN 0376-4265), No. 106, p. 77 - 87 (2001) [2] J. Frerick, B. Duesmann, and M. Canela, "2010 and beyond - The ENVISAT mission extension", Proc. ‘Envisat Symposium 2007', Montreux, Switzerland, 23-27 April 2007 (ESA SP-636, July 2007)
De Anda, Valerie; Zapata-Peñasco, Icoquih; Poot-Hernandez, Augusto Cesar; Eguiarte, Luis E; Contreras-Moreira, Bruno; Souza, Valeria
2017-11-01
The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large "omic" datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa. © The Author 2017. Published by Oxford University Press.
McArt, Darragh G.; Dunne, Philip D.; Blayney, Jaine K.; Salto-Tellez, Manuel; Van Schaeybroeck, Sandra; Hamilton, Peter W.; Zhang, Shu-Dong
2013-01-01
The advent of next generation sequencing technologies (NGS) has expanded the area of genomic research, offering high coverage and increased sensitivity over older microarray platforms. Although the current cost of next generation sequencing is still exceeding that of microarray approaches, the rapid advances in NGS will likely make it the platform of choice for future research in differential gene expression. Connectivity mapping is a procedure for examining the connections among diseases, genes and drugs by differential gene expression initially based on microarray technology, with which a large collection of compound-induced reference gene expression profiles have been accumulated. In this work, we aim to test the feasibility of incorporating NGS RNA-Seq data into the current connectivity mapping framework by utilizing the microarray based reference profiles and the construction of a differentially expressed gene signature from a NGS dataset. This would allow for the establishment of connections between the NGS gene signature and those microarray reference profiles, alleviating the associated incurring cost of re-creating drug profiles with NGS technology. We examined the connectivity mapping approach on a publicly available NGS dataset with androgen stimulation of LNCaP cells in order to extract candidate compounds that could inhibit the proliferative phenotype of LNCaP cells and to elucidate their potential in a laboratory setting. In addition, we also analyzed an independent microarray dataset of similar experimental settings. We found a high level of concordance between the top compounds identified using the gene signatures from the two datasets. The nicotine derivative cotinine was returned as the top candidate among the overlapping compounds with potential to suppress this proliferative phenotype. Subsequent lab experiments validated this connectivity mapping hit, showing that cotinine inhibits cell proliferation in an androgen dependent manner. Thus the results in this study suggest a promising prospect of integrating NGS data with connectivity mapping. PMID:23840550
Temporal and Cross Correlations in Business News
NASA Astrophysics Data System (ADS)
Mizuno, T.; Takei, K.; Ohnishi, T.; Watanabe, T.
We empirically investigate temporal and cross correlations inthe frequency of news reports on companies, using a dataset of more than 100 million news articles reported in English by around 500 press agencies worldwide for the period 2003--2009. Our first finding is that the frequency of news reports on a company does not follow a Poisson process, but instead exhibits long memory with a positive autocorrelation for longer than one year. The second finding is that there exist significant correlations in the frequency of news across companies. Specifically, on a daily time scale or longer the frequency of news is governed by external dynamics, while on a time scale of minutes it is governed by internal dynamics. These two findings indicate that the frequency of news reports on companies has statistical properties similar to trading volume or price volatility in stock markets, suggesting that the flow of information through company news plays an important role in price dynamics in stock markets.
The magnetic low of central Europe: analysis and interpretation by a multi scale approach.
NASA Astrophysics Data System (ADS)
Milano, Maurizio; Fedi, Maurizio
2016-04-01
The objective of this work is an interpretation of the European magnetic low (EML) which is the main magnetic anomaly characterizing the magnetic field of central Europe at high-altitude, extending from the eastern France to Poland and placed above the main geological boundary of Europe, the Trans European Suture Zone (TESZ), that separates the western and thinner Paleozoic platform from the eastern and thicker Precambrian platform. In particular, the EML has a relative magnetic high north-east of it, showing a reverse dipolar behavior that many authors tried to interpret in past also by high-altitude satellite exploration. We used an aeromagnetic dataset and employed a level-to-level upward continuation from 1 km up to 200 km, following a multiscale approach thanks to which the anomalies generated by sources placed at different depths can be discriminated. Low-altitude magnetic maps show a complex pattern of high-frequency anomalies up to an altitude of 50 km; then, increasing the altitude up to 200 km, the field simplifies gradually. In order to interpret the anomalies we generated the maps of the total gradient (|T|) of the field at each upward continued altitude, thanks to its property in localizing in a very simple way the edges of the sources and their horizontal position without specifying a priori information about source parameters. From the total gradient maps at low altitude we obtained information about the position of shallow and localized sources producing patterns of small anomalies. In central Europe, most of them have a reverse dipolar behavior, being related probably to metasedimentary rocks in the upper crust containing pyrrhotite and a strong remament component. At higher altitude the total gradient maps has been useful to give a more complex explanation of the EML taking in consideration the results obtained in previous studies. The maps at 150-200 km show that the maximum amplitude of |T| is exactly localized along the TESZ in the NW-SE direction. So, a simple contact model was performed in order to demonstrate that the main source that generates the EML is the complex fault system of the TESZ. However, the |T| maxima are positioned not only along the suture zone, but also in Central Europe, showing that the contributions to the EML derive also from sources placed in the Paleozoic platform with a reverse dipolar aspect. From these results it appears that the contributions responsible for the nature of this anomaly are to be reconnected first to the presence of the TESZ, which puts in contact two different platforms with different thicknesses, and also to the presence of bodies with a strong remanent component, which characterize part of the Central European crust.
Smart Water: Energy-Water Optimization in Drinking Water Systems
This project aims to develop and commercialize a Smart Water Platform – Sensor-based Data-driven Energy-Water Optimization technology in drinking water systems. The key technological advances rely on cross-platform data acquisition and management system, model-based real-time sys...