Sample records for stringent benchmark dataset

  1. Ensemble Linear Neighborhood Propagation for Predicting Subchloroplast Localization of Multi-Location Proteins.

    PubMed

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2016-12-02

    In the postgenomic era, the number of unreviewed protein sequences is remarkably larger and grows tremendously faster than that of reviewed ones. However, existing methods for protein subchloroplast localization often ignore the information from these unlabeled proteins. This paper proposes a multi-label predictor based on ensemble linear neighborhood propagation (LNP), namely, LNP-Chlo, which leverages hybrid sequence-based feature information from both labeled and unlabeled proteins for predicting localization of both single- and multi-label chloroplast proteins. Experimental results on a stringent benchmark dataset and a novel independent dataset suggest that LNP-Chlo performs at least 6% (absolute) better than state-of-the-art predictors. This paper also demonstrates that ensemble LNP significantly outperforms LNP based on individual features. For readers' convenience, the online Web server LNP-Chlo is freely available at http://bioinfo.eie.polyu.edu.hk/LNPChloServer/ .

  2. The Isprs Benchmark on Indoor Modelling

    NASA Astrophysics Data System (ADS)

    Khoshelham, K.; Díaz Vilariño, L.; Peter, M.; Kang, Z.; Acharya, D.

    2017-09-01

    Automated generation of 3D indoor models from point cloud data has been a topic of intensive research in recent years. While results on various datasets have been reported in literature, a comparison of the performance of different methods has not been possible due to the lack of benchmark datasets and a common evaluation framework. The ISPRS benchmark on indoor modelling aims to address this issue by providing a public benchmark dataset and an evaluation framework for performance comparison of indoor modelling methods. In this paper, we present the benchmark dataset comprising several point clouds of indoor environments captured by different sensors. We also discuss the evaluation and comparison of indoor modelling methods based on manually created reference models and appropriate quality evaluation criteria. The benchmark dataset is available for download at: http://www2.isprs.org/commissions/comm4/wg5/benchmark-on-indoor-modelling.html.

  3. Benchmark Dataset for Whole Genome Sequence Compression.

    PubMed

    C L, Biji; S Nair, Achuthsankar

    2017-01-01

    The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be benchmarked in the absence of such scientifically compiled whole genome sequence dataset and proposes a benchmark dataset using multistage sampling procedure. Considering the genome sequence of organisms available in the National Centre for Biotechnology and Information (NCBI) as the universe, the proposed dataset selects 1,105 prokaryotes, 200 plasmids, 164 viruses, and 65 eukaryotes. This paper reports the results of using three established tools on the newly compiled dataset and show that their strength and weakness are evident only with a comparison based on the scientifically compiled benchmark dataset. The sample dataset and the respective links are available @ https://sourceforge.net/projects/benchmarkdnacompressiondataset/.

  4. A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge

    PubMed Central

    Gururaj, Anupama E.; Chen, Xiaoling; Pournejati, Saeid; Alter, George; Hersh, William R.; Demner-Fushman, Dina; Ohno-Machado, Lucila

    2017-01-01

    Abstract The rapid proliferation of publicly available biomedical datasets has provided abundant resources that are potentially of value as a means to reproduce prior experiments, and to generate and explore novel hypotheses. However, there are a number of barriers to the re-use of such datasets, which are distributed across a broad array of dataset repositories, focusing on different data types and indexed using different terminologies. New methods are needed to enable biomedical researchers to locate datasets of interest within this rapidly expanding information ecosystem, and new resources are needed for the formal evaluation of these methods as they emerge. In this paper, we describe the design and generation of a benchmark for information retrieval of biomedical datasets, which was developed and used for the 2016 bioCADDIE Dataset Retrieval Challenge. In the tradition of the seminal Cranfield experiments, and as exemplified by the Text Retrieval Conference (TREC), this benchmark includes a corpus (biomedical datasets), a set of queries, and relevance judgments relating these queries to elements of the corpus. This paper describes the process through which each of these elements was derived, with a focus on those aspects that distinguish this benchmark from typical information retrieval reference sets. Specifically, we discuss the origin of our queries in the context of a larger collaborative effort, the biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium, and the distinguishing features of biomedical dataset retrieval as a task. The resulting benchmark set has been made publicly available to advance research in the area of biomedical dataset retrieval. Database URL: https://biocaddie.org/benchmark-data PMID:29220453

  5. Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance

    PubMed Central

    Rand, Hugh; Shumway, Martin; Trees, Eija K.; Simmons, Mustafa; Agarwala, Richa; Davis, Steven; Tillman, Glenn E.; Defibaugh-Chavez, Stephanie; Carleton, Heather A.; Klimke, William A.; Katz, Lee S.

    2017-01-01

    Background As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. Methods We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and “known” phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Results Our “outbreak” benchmark datasets represent the four major foodborne bacterial pathogens (Listeria monocytogenes, Salmonella enterica, Escherichia coli, and Campylobacter jejuni) and one simulated dataset where the “known tree” can be accurately called the “true tree”. The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. Discussion These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools—we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines. PMID:29372115

  6. Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance.

    PubMed

    Timme, Ruth E; Rand, Hugh; Shumway, Martin; Trees, Eija K; Simmons, Mustafa; Agarwala, Richa; Davis, Steven; Tillman, Glenn E; Defibaugh-Chavez, Stephanie; Carleton, Heather A; Klimke, William A; Katz, Lee S

    2017-01-01

    As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and "known" phylogenetic trees in publicly-accessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Our "outbreak" benchmark datasets represent the four major foodborne bacterial pathogens ( Listeria monocytogenes , Salmonella enterica , Escherichia coli , and Campylobacter jejuni ) and one simulated dataset where the "known tree" can be accurately called the "true tree". The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools-we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines.

  7. 75 FR 51982 - Fisheries of the Gulf of Mexico; Southeast Data, Assessment, and Review (SEDAR) Update; Greater...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-08-24

    ... evaluates potential datasets and recommends which datasets are appropriate for assessment analyses. The... points to datasets incorporated in the original SEDAR benchmark assessment and run the benchmark... Webinar II November 22, 2010; 10 a.m. - 1 p.m.; SEDAR Update Assessment Webinar III Using updated datasets...

  8. Role of the standard deviation in the estimation of benchmark doses with continuous data.

    PubMed

    Gaylor, David W; Slikker, William

    2004-12-01

    For continuous data, risk is defined here as the proportion of animals with values above a large percentile, e.g., the 99th percentile or below the 1st percentile, for the distribution of values among control animals. It is known that reducing the standard deviation of measurements through improved experimental techniques will result in less stringent (higher) doses for the lower confidence limit on the benchmark dose that is estimated to produce a specified risk of animals with abnormal levels for a biological effect. Thus, a somewhat larger (less stringent) lower confidence limit is obtained that may be used as a point of departure for low-dose risk assessment. It is shown in this article that it is important for the benchmark dose to be based primarily on the standard deviation among animals, s(a), apart from the standard deviation of measurement errors, s(m), within animals. If the benchmark dose is incorrectly based on the overall standard deviation among average values for animals, which includes measurement error variation, the benchmark dose will be overestimated and the risk will be underestimated. The bias increases as s(m) increases relative to s(a). The bias is relatively small if s(m) is less than one-third of s(a), a condition achieved in most experimental designs.

  9. Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.

    PubMed

    Liu, Guang-Hui; Shen, Hong-Bin; Yu, Dong-Jun

    2016-04-01

    Accurately predicting protein-protein interaction sites (PPIs) is currently a hot topic because it has been demonstrated to be very useful for understanding disease mechanisms and designing drugs. Machine-learning-based computational approaches have been broadly utilized and demonstrated to be useful for PPI prediction. However, directly applying traditional machine learning algorithms, which often assume that samples in different classes are balanced, often leads to poor performance because of the severe class imbalance that exists in the PPI prediction problem. In this study, we propose a novel method for improving PPI prediction performance by relieving the severity of class imbalance using a data-cleaning procedure and reducing predicted false positives with a post-filtering procedure: First, a machine-learning-based data-cleaning procedure is applied to remove those marginal targets, which may potentially have a negative effect on training a model with a clear classification boundary, from the majority samples to relieve the severity of class imbalance in the original training dataset; then, a prediction model is trained on the cleaned dataset; finally, an effective post-filtering procedure is further used to reduce potential false positive predictions. Stringent cross-validation and independent validation tests on benchmark datasets demonstrated the efficacy of the proposed method, which exhibits highly competitive performance compared with existing state-of-the-art sequence-based PPIs predictors and should supplement existing PPI prediction methods.

  10. PMLB: a large benchmark suite for machine learning evaluation and comparison.

    PubMed

    Olson, Randal S; La Cava, William; Orzechowski, Patryk; Urbanowicz, Ryan J; Moore, Jason H

    2017-01-01

    The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered. This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.

  11. Decoys Selection in Benchmarking Datasets: Overview and Perspectives

    PubMed Central

    Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu

    2018-01-01

    Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509

  12. Generation of openEHR Test Datasets for Benchmarking.

    PubMed

    El Helou, Samar; Karvonen, Tuukka; Yamamoto, Goshiro; Kume, Naoto; Kobayashi, Shinji; Kondo, Eiji; Hiragi, Shusuke; Okamoto, Kazuya; Tamura, Hiroshi; Kuroda, Tomohiro

    2017-01-01

    openEHR is a widely used EHR specification. Given its technology-independent nature, different approaches for implementing openEHR data repositories exist. Public openEHR datasets are needed to conduct benchmark analyses over different implementations. To address their current unavailability, we propose a method for generating openEHR test datasets that can be publicly shared and used.

  13. Benchmark dataset for undirected and Mixed Capacitated Arc Routing Problems under Time restrictions with Intermediate Facilities.

    PubMed

    Willemse, Elias J; Joubert, Johan W

    2016-09-01

    In this article we present benchmark datasets for the Mixed Capacitated Arc Routing Problem under Time restrictions with Intermediate Facilities (MCARPTIF). The problem is a generalisation of the Capacitated Arc Routing Problem (CARP), and closely represents waste collection routing. Four different test sets are presented, each consisting of multiple instance files, and which can be used to benchmark different solution approaches for the MCARPTIF. An in-depth description of the datasets can be found in "Constructive heuristics for the Mixed Capacity Arc Routing Problem under Time Restrictions with Intermediate Facilities" (Willemseand Joubert, 2016) [2] and "Splitting procedures for the Mixed Capacitated Arc Routing Problem under Time restrictions with Intermediate Facilities" (Willemseand Joubert, in press) [4]. The datasets are publicly available from "Library of benchmark test sets for variants of the Capacitated Arc Routing Problem under Time restrictions with Intermediate Facilities" (Willemse and Joubert, 2016) [3].

  14. MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a

    PubMed Central

    Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N.; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S.; Leswing, Karl

    2017-01-01

    Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm. PMID:29629118

  15. A Benchmark Dataset for SSVEP-Based Brain-Computer Interfaces.

    PubMed

    Wang, Yijun; Chen, Xiaogang; Gao, Xiaorong; Gao, Shangkai

    2017-10-01

    This paper presents a benchmark steady-state visual evoked potential (SSVEP) dataset acquired with a 40-target brain- computer interface (BCI) speller. The dataset consists of 64-channel Electroencephalogram (EEG) data from 35 healthy subjects (8 experienced and 27 naïve) while they performed a cue-guided target selecting task. The virtual keyboard of the speller was composed of 40 visual flickers, which were coded using a joint frequency and phase modulation (JFPM) approach. The stimulation frequencies ranged from 8 Hz to 15.8 Hz with an interval of 0.2 Hz. The phase difference between two adjacent frequencies was . For each subject, the data included six blocks of 40 trials corresponding to all 40 flickers indicated by a visual cue in a random order. The stimulation duration in each trial was five seconds. The dataset can be used as a benchmark dataset to compare the methods for stimulus coding and target identification in SSVEP-based BCIs. Through offline simulation, the dataset can be used to design new system diagrams and evaluate their BCI performance without collecting any new data. The dataset also provides high-quality data for computational modeling of SSVEPs. The dataset is freely available fromhttp://bci.med.tsinghua.edu.cn/download.html.

  16. Summer 2016

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mendoza, Paul Michael

    2016-08-31

    The project goals seek to develop applications in order to automate MCNP criticality benchmark execution; create a dataset containing static benchmark information; combine MCNP output with benchmark information; and fit and visually represent data.

  17. ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers.

    PubMed

    Teodoro, Douglas; Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio

    2018-01-01

    The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.

  18. ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers

    PubMed Central

    Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio

    2018-01-01

    The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms. PMID:29293556

  19. A Web Resource for Standardized Benchmark Datasets, Metrics, and Rosetta Protocols for Macromolecular Modeling and Design.

    PubMed

    Ó Conchúir, Shane; Barlow, Kyle A; Pache, Roland A; Ollikainen, Noah; Kundert, Kale; O'Meara, Matthew J; Smith, Colin A; Kortemme, Tanja

    2015-01-01

    The development and validation of computational macromolecular modeling and design methods depend on suitable benchmark datasets and informative metrics for comparing protocols. In addition, if a method is intended to be adopted broadly in diverse biological applications, there needs to be information on appropriate parameters for each protocol, as well as metrics describing the expected accuracy compared to experimental data. In certain disciplines, there exist established benchmarks and public resources where experts in a particular methodology are encouraged to supply their most efficient implementation of each particular benchmark. We aim to provide such a resource for protocols in macromolecular modeling and design. We present a freely accessible web resource (https://kortemmelab.ucsf.edu/benchmarks) to guide the development of protocols for protein modeling and design. The site provides benchmark datasets and metrics to compare the performance of a variety of modeling protocols using different computational sampling methods and energy functions, providing a "best practice" set of parameters for each method. Each benchmark has an associated downloadable benchmark capture archive containing the input files, analysis scripts, and tutorials for running the benchmark. The captures may be run with any suitable modeling method; we supply command lines for running the benchmarks using the Rosetta software suite. We have compiled initial benchmarks for the resource spanning three key areas: prediction of energetic effects of mutations, protein design, and protein structure prediction, each with associated state-of-the-art modeling protocols. With the help of the wider macromolecular modeling community, we hope to expand the variety of benchmarks included on the website and continue to evaluate new iterations of current methods as they become available.

  20. A benchmark for vehicle detection on wide area motion imagery

    NASA Astrophysics Data System (ADS)

    Catrambone, Joseph; Amzovski, Ismail; Liang, Pengpeng; Blasch, Erik; Sheaff, Carolyn; Wang, Zhonghai; Chen, Genshe; Ling, Haibin

    2015-05-01

    Wide area motion imagery (WAMI) has been attracting an increased amount of research attention due to its large spatial and temporal coverage. An important application includes moving target analysis, where vehicle detection is often one of the first steps before advanced activity analysis. While there exist many vehicle detection algorithms, a thorough evaluation of them on WAMI data still remains a challenge mainly due to the lack of an appropriate benchmark data set. In this paper, we address a research need by presenting a new benchmark for wide area motion imagery vehicle detection data. The WAMI benchmark is based on the recently available Wright-Patterson Air Force Base (WPAFB09) dataset and the Temple Resolved Uncertainty Target History (TRUTH) associated target annotation. Trajectory annotations were provided in the original release of the WPAFB09 dataset, but detailed vehicle annotations were not available with the dataset. In addition, annotations of static vehicles, e.g., in parking lots, are also not identified in the original release. Addressing these issues, we re-annotated the whole dataset with detailed information for each vehicle, including not only a target's location, but also its pose and size. The annotated WAMI data set should be useful to community for a common benchmark to compare WAMI detection, tracking, and identification methods.

  1. Identifying key genes in glaucoma based on a benchmarked dataset and the gene regulatory network.

    PubMed

    Chen, Xi; Wang, Qiao-Ling; Zhang, Meng-Hui

    2017-10-01

    The current study aimed to identify key genes in glaucoma based on a benchmarked dataset and gene regulatory network (GRN). Local and global noise was added to the gene expression dataset to produce a benchmarked dataset. Differentially-expressed genes (DEGs) between patients with glaucoma and normal controls were identified utilizing the Linear Models for Microarray Data (Limma) package based on benchmarked dataset. A total of 5 GRN inference methods, including Zscore, GeneNet, context likelihood of relatedness (CLR) algorithm, Partial Correlation coefficient with Information Theory (PCIT) and GEne Network Inference with Ensemble of Trees (Genie3) were evaluated using receiver operating characteristic (ROC) and precision and recall (PR) curves. The interference method with the best performance was selected to construct the GRN. Subsequently, topological centrality (degree, closeness and betweenness) was conducted to identify key genes in the GRN of glaucoma. Finally, the key genes were validated by performing reverse transcription-quantitative polymerase chain reaction (RT-qPCR). A total of 176 DEGs were detected from the benchmarked dataset. The ROC and PR curves of the 5 methods were analyzed and it was determined that Genie3 had a clear advantage over the other methods; thus, Genie3 was used to construct the GRN. Following topological centrality analysis, 14 key genes for glaucoma were identified, including IL6 , EPHA2 and GSTT1 and 5 of these 14 key genes were validated by RT-qPCR. Therefore, the current study identified 14 key genes in glaucoma, which may be potential biomarkers to use in the diagnosis of glaucoma and aid in identifying the molecular mechanism of this disease.

  2. Benchmarking neuromorphic vision: lessons learnt from computer vision

    PubMed Central

    Tan, Cheston; Lallee, Stephane; Orchard, Garrick

    2015-01-01

    Neuromorphic Vision sensors have improved greatly since the first silicon retina was presented almost three decades ago. They have recently matured to the point where they are commercially available and can be operated by laymen. However, despite improved availability of sensors, there remains a lack of good datasets, while algorithms for processing spike-based visual data are still in their infancy. On the other hand, frame-based computer vision algorithms are far more mature, thanks in part to widely accepted datasets which allow direct comparison between algorithms and encourage competition. We are presented with a unique opportunity to shape the development of Neuromorphic Vision benchmarks and challenges by leveraging what has been learnt from the use of datasets in frame-based computer vision. Taking advantage of this opportunity, in this paper we review the role that benchmarks and challenges have played in the advancement of frame-based computer vision, and suggest guidelines for the creation of Neuromorphic Vision benchmarks and challenges. We also discuss the unique challenges faced when benchmarking Neuromorphic Vision algorithms, particularly when attempting to provide direct comparison with frame-based computer vision. PMID:26528120

  3. Principals' Leadership Styles and Student Achievement

    ERIC Educational Resources Information Center

    Harnish, David Alan

    2012-01-01

    Many schools struggle to meet No Child Left Behind's stringent adequate yearly progress standards, although the benchmark has stimulated national creativity and reform. The purpose of this study was to explore teacher perceptions of principals' leadership styles, curriculum reform, and student achievement to ascertain possible factors to improve…

  4. MicroRNA array normalization: an evaluation using a randomized dataset as the benchmark.

    PubMed

    Qin, Li-Xuan; Zhou, Qin

    2014-01-01

    MicroRNA arrays possess a number of unique data features that challenge the assumption key to many normalization methods. We assessed the performance of existing normalization methods using two microRNA array datasets derived from the same set of tumor samples: one dataset was generated using a blocked randomization design when assigning arrays to samples and hence was free of confounding array effects; the second dataset was generated without blocking or randomization and exhibited array effects. The randomized dataset was assessed for differential expression between two tumor groups and treated as the benchmark. The non-randomized dataset was assessed for differential expression after normalization and compared against the benchmark. Normalization improved the true positive rate significantly in the non-randomized data but still possessed a false discovery rate as high as 50%. Adding a batch adjustment step before normalization further reduced the number of false positive markers while maintaining a similar number of true positive markers, which resulted in a false discovery rate of 32% to 48%, depending on the specific normalization method. We concluded the paper with some insights on possible causes of false discoveries to shed light on how to improve normalization for microRNA arrays.

  5. MicroRNA Array Normalization: An Evaluation Using a Randomized Dataset as the Benchmark

    PubMed Central

    Qin, Li-Xuan; Zhou, Qin

    2014-01-01

    MicroRNA arrays possess a number of unique data features that challenge the assumption key to many normalization methods. We assessed the performance of existing normalization methods using two microRNA array datasets derived from the same set of tumor samples: one dataset was generated using a blocked randomization design when assigning arrays to samples and hence was free of confounding array effects; the second dataset was generated without blocking or randomization and exhibited array effects. The randomized dataset was assessed for differential expression between two tumor groups and treated as the benchmark. The non-randomized dataset was assessed for differential expression after normalization and compared against the benchmark. Normalization improved the true positive rate significantly in the non-randomized data but still possessed a false discovery rate as high as 50%. Adding a batch adjustment step before normalization further reduced the number of false positive markers while maintaining a similar number of true positive markers, which resulted in a false discovery rate of 32% to 48%, depending on the specific normalization method. We concluded the paper with some insights on possible causes of false discoveries to shed light on how to improve normalization for microRNA arrays. PMID:24905456

  6. Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics.

    PubMed

    Mahmood, Khalid; Jung, Chol-Hee; Philip, Gayle; Georgeson, Peter; Chung, Jessica; Pope, Bernard J; Park, Daniel J

    2017-05-16

    Genetic variant effect prediction algorithms are used extensively in clinical genomics and research to determine the likely consequences of amino acid substitutions on protein function. It is vital that we better understand their accuracies and limitations because published performance metrics are confounded by serious problems of circularity and error propagation. Here, we derive three independent, functionally determined human mutation datasets, UniFun, BRCA1-DMS and TP53-TA, and employ them, alongside previously described datasets, to assess the pre-eminent variant effect prediction tools. Apparent accuracies of variant effect prediction tools were influenced significantly by the benchmarking dataset. Benchmarking with the assay-determined datasets UniFun and BRCA1-DMS yielded areas under the receiver operating characteristic curves in the modest ranges of 0.52 to 0.63 and 0.54 to 0.75, respectively, considerably lower than observed for other, potentially more conflicted datasets. These results raise concerns about how such algorithms should be employed, particularly in a clinical setting. Contemporary variant effect prediction tools are unlikely to be as accurate at the general prediction of functional impacts on proteins as reported prior. Use of functional assay-based datasets that avoid prior dependencies promises to be valuable for the ongoing development and accurate benchmarking of such tools.

  7. Note: The performance of new density functionals for a recent blind test of non-covalent interactions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mardirossian, Narbe; Head-Gordon, Martin

    Benchmark datasets of non-covalent interactions are essential for assessing the performance of density functionals and other quantum chemistry approaches. In a recent blind test, Taylor et al. benchmarked 14 methods on a new dataset consisting of 10 dimer potential energy curves calculated using coupled cluster with singles, doubles, and perturbative triples (CCSD(T)) at the complete basis set (CBS) limit (80 data points in total). Finally, the dataset is particularly interesting because compressed, near-equilibrium, and stretched regions of the potential energy surface are extensively sampled.

  8. Note: The performance of new density functionals for a recent blind test of non-covalent interactions

    DOE PAGES

    Mardirossian, Narbe; Head-Gordon, Martin

    2016-11-09

    Benchmark datasets of non-covalent interactions are essential for assessing the performance of density functionals and other quantum chemistry approaches. In a recent blind test, Taylor et al. benchmarked 14 methods on a new dataset consisting of 10 dimer potential energy curves calculated using coupled cluster with singles, doubles, and perturbative triples (CCSD(T)) at the complete basis set (CBS) limit (80 data points in total). Finally, the dataset is particularly interesting because compressed, near-equilibrium, and stretched regions of the potential energy surface are extensively sampled.

  9. Revamping Teacher Evaluation

    ERIC Educational Resources Information Center

    Zatynski, Mandy

    2012-01-01

    In the past two years, as concerns over teacher quality have swelled, teacher evaluation has emerged as a crucial tool for principals and other administrators to improve instructor performance. More states are seeking federal waivers to the stringent benchmarks of No Child Left Behind; others are vying for Race to the Top funds. Both require…

  10. Benchmarking of Typical Meteorological Year datasets dedicated to Concentrated-PV systems

    NASA Astrophysics Data System (ADS)

    Realpe, Ana Maria; Vernay, Christophe; Pitaval, Sébastien; Blanc, Philippe; Wald, Lucien; Lenoir, Camille

    2016-04-01

    Accurate analysis of meteorological and pyranometric data for long-term analysis is the basis of decision-making for banks and investors, regarding solar energy conversion systems. This has led to the development of methodologies for the generation of Typical Meteorological Years (TMY) datasets. The most used method for solar energy conversion systems was proposed in 1978 by the Sandia Laboratory (Hall et al., 1978) considering a specific weighted combination of different meteorological variables with notably global, diffuse horizontal and direct normal irradiances, air temperature, wind speed, relative humidity. In 2012, a new approach was proposed in the framework of the European project FP7 ENDORSE. It introduced the concept of "driver" that is defined by the user as an explicit function of the pyranometric and meteorological relevant variables to improve the representativeness of the TMY datasets with respect the specific solar energy conversion system of interest. The present study aims at comparing and benchmarking different TMY datasets considering a specific Concentrated-PV (CPV) system as the solar energy conversion system of interest. Using long-term (15+ years) time-series of high quality meteorological and pyranometric ground measurements, three types of TMY datasets generated by the following methods: the Sandia method, a simplified driver with DNI as the only representative variable and a more sophisticated driver. The latter takes into account the sensitivities of the CPV system with respect to the spectral distribution of the solar irradiance and wind speed. Different TMY datasets from the three methods have been generated considering different numbers of years in the historical dataset, ranging from 5 to 15 years. The comparisons and benchmarking of these TMY datasets are conducted considering the long-term time series of simulated CPV electric production as a reference. The results of this benchmarking clearly show that the Sandia method is not suitable for CPV systems. For these systems, the TMY datasets obtained using dedicated drivers (DNI only or more precise one) are more representative to derive TMY datasets from limited long-term meteorological dataset.

  11. Performance evaluation of tile-based Fisher Ratio analysis using a benchmark yeast metabolome dataset.

    PubMed

    Watson, Nathanial E; Parsons, Brendon A; Synovec, Robert E

    2016-08-12

    Performance of tile-based Fisher Ratio (F-ratio) data analysis, recently developed for discovery-based studies using comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC×GC-TOFMS), is evaluated with a metabolomics dataset that had been previously analyzed in great detail, but while taking a brute force approach. The previously analyzed data (referred to herein as the benchmark dataset) were intracellular extracts from Saccharomyces cerevisiae (yeast), either metabolizing glucose (repressed) or ethanol (derepressed), which define the two classes in the discovery-based analysis to find metabolites that are statistically different in concentration between the two classes. Beneficially, this previously analyzed dataset provides a concrete means to validate the tile-based F-ratio software. Herein, we demonstrate and validate the significant benefits of applying tile-based F-ratio analysis. The yeast metabolomics data are analyzed more rapidly in about one week versus one year for the prior studies with this dataset. Furthermore, a null distribution analysis is implemented to statistically determine an adequate F-ratio threshold, whereby the variables with F-ratio values below the threshold can be ignored as not class distinguishing, which provides the analyst with confidence when analyzing the hit table. Forty-six of the fifty-four benchmarked changing metabolites were discovered by the new methodology while consistently excluding all but one of the benchmarked nineteen false positive metabolites previously identified. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Review and Analysis of Algorithmic Approaches Developed for Prognostics on CMAPSS Dataset

    DTIC Science & Technology

    2014-12-23

    publications for benchmarking prognostics algorithms. The turbofan degradation datasets have received over seven thousand unique downloads in the last five...approaches that researchers have taken to implement prognostics using these turbofan datasets. Some unique characteristics of these datasets are also...Description of the five turbofan degradation datasets available from NASA repository. Datasets #Fault Modes #Conditions #Train Units #Test Units

  13. Benchmarking Deep Learning Models on Large Healthcare Datasets.

    PubMed

    Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan

    2018-06-04

    Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. Critical Assessment of Metagenome Interpretation – a benchmark of computational metagenomics software

    PubMed Central

    Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter; Koslicki, David; Janssen, Stefan; Dröge, Johannes; Gregor, Ivan; Majda, Stephan; Fiedler, Jessika; Dahms, Eik; Bremges, Andreas; Fritz, Adrian; Garrido-Oter, Ruben; Jørgensen, Tue Sparholt; Shapiro, Nicole; Blood, Philip D.; Gurevich, Alexey; Bai, Yang; Turaev, Dmitrij; DeMaere, Matthew Z.; Chikhi, Rayan; Nagarajan, Niranjan; Quince, Christopher; Meyer, Fernando; Balvočiūtė, Monika; Hansen, Lars Hestbjerg; Sørensen, Søren J.; Chia, Burton K. H.; Denis, Bertrand; Froula, Jeff L.; Wang, Zhong; Egan, Robert; Kang, Dongwan Don; Cook, Jeffrey J.; Deltel, Charles; Beckstette, Michael; Lemaitre, Claire; Peterlongo, Pierre; Rizk, Guillaume; Lavenier, Dominique; Wu, Yu-Wei; Singer, Steven W.; Jain, Chirag; Strous, Marc; Klingenberg, Heiner; Meinicke, Peter; Barton, Michael; Lingner, Thomas; Lin, Hsin-Hung; Liao, Yu-Chieh; Silva, Genivaldo Gueiros Z.; Cuevas, Daniel A.; Edwards, Robert A.; Saha, Surya; Piro, Vitor C.; Renard, Bernhard Y.; Pop, Mihai; Klenk, Hans-Peter; Göker, Markus; Kyrpides, Nikos C.; Woyke, Tanja; Vorholt, Julia A.; Schulze-Lefert, Paul; Rubin, Edward M.; Darling, Aaron E.; Rattei, Thomas; McHardy, Alice C.

    2018-01-01

    In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions. PMID:28967888

  15. Benchmarking protein classification algorithms via supervised cross-validation.

    PubMed

    Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

    2008-04-24

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.

  16. 75 FR 53951 - Fisheries of the Gulf of Mexico; Southeast Data, Assessment, and Review (SEDAR) Update; Greater...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-09-02

    ... is a data report which compiles and evaluates potential datasets and recommends which datasets are... add additional data points to datasets incorporated in the original SEDAR benchmark assessment and run... Conference Call Using updated datasets adopted during the Data Webinar, participants will employ assessment...

  17. Dataset-Driven Research to Support Learning and Knowledge Analytics

    ERIC Educational Resources Information Center

    Verbert, Katrien; Manouselis, Nikos; Drachsler, Hendrik; Duval, Erik

    2012-01-01

    In various research areas, the availability of open datasets is considered as key for research and application purposes. These datasets are used as benchmarks to develop new algorithms and to compare them to other algorithms in given settings. Finding such available datasets for experimentation can be a challenging task in technology enhanced…

  18. A benchmark for comparison of cell tracking algorithms

    PubMed Central

    Maška, Martin; Ulman, Vladimír; Svoboda, David; Matula, Pavel; Matula, Petr; Ederra, Cristina; Urbiola, Ainhoa; España, Tomás; Venkatesan, Subramanian; Balak, Deepak M.W.; Karas, Pavel; Bolcková, Tereza; Štreitová, Markéta; Carthel, Craig; Coraluppi, Stefano; Harder, Nathalie; Rohr, Karl; Magnusson, Klas E. G.; Jaldén, Joakim; Blau, Helen M.; Dzyubachyk, Oleh; Křížek, Pavel; Hagen, Guy M.; Pastor-Escuredo, David; Jimenez-Carretero, Daniel; Ledesma-Carbayo, Maria J.; Muñoz-Barrutia, Arrate; Meijering, Erik; Kozubek, Michal; Ortiz-de-Solorzano, Carlos

    2014-01-01

    Motivation: Automatic tracking of cells in multidimensional time-lapse fluorescence microscopy is an important task in many biomedical applications. A novel framework for objective evaluation of cell tracking algorithms has been established under the auspices of the IEEE International Symposium on Biomedical Imaging 2013 Cell Tracking Challenge. In this article, we present the logistics, datasets, methods and results of the challenge and lay down the principles for future uses of this benchmark. Results: The main contributions of the challenge include the creation of a comprehensive video dataset repository and the definition of objective measures for comparison and ranking of the algorithms. With this benchmark, six algorithms covering a variety of segmentation and tracking paradigms have been compared and ranked based on their performance on both synthetic and real datasets. Given the diversity of the datasets, we do not declare a single winner of the challenge. Instead, we present and discuss the results for each individual dataset separately. Availability and implementation: The challenge Web site (http://www.codesolorzano.com/celltrackingchallenge) provides access to the training and competition datasets, along with the ground truth of the training videos. It also provides access to Windows and Linux executable files of the evaluation software and most of the algorithms that competed in the challenge. Contact: codesolorzano@unav.es Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24526711

  19. Antibody-protein interactions: benchmark datasets and prediction tools evaluation

    PubMed Central

    Ponomarenko, Julia V; Bourne, Philip E

    2007-01-01

    Background The ability to predict antibody binding sites (aka antigenic determinants or B-cell epitopes) for a given protein is a precursor to new vaccine design and diagnostics. Among the various methods of B-cell epitope identification X-ray crystallography is one of the most reliable methods. Using these experimental data computational methods exist for B-cell epitope prediction. As the number of structures of antibody-protein complexes grows, further interest in prediction methods using 3D structure is anticipated. This work aims to establish a benchmark for 3D structure-based epitope prediction methods. Results Two B-cell epitope benchmark datasets inferred from the 3D structures of antibody-protein complexes were defined. The first is a dataset of 62 representative 3D structures of protein antigens with inferred structural epitopes. The second is a dataset of 82 structures of antibody-protein complexes containing different structural epitopes. Using these datasets, eight web-servers developed for antibody and protein binding sites prediction have been evaluated. In no method did performance exceed a 40% precision and 46% recall. The values of the area under the receiver operating characteristic curve for the evaluated methods were about 0.6 for ConSurf, DiscoTope, and PPI-PRED methods and above 0.65 but not exceeding 0.70 for protein-protein docking methods when the best of the top ten models for the bound docking were considered; the remaining methods performed close to random. The benchmark datasets are included as a supplement to this paper. Conclusion It may be possible to improve epitope prediction methods through training on datasets which include only immune epitopes and through utilizing more features characterizing epitopes, for example, the evolutionary conservation score. Notwithstanding, overall poor performance may reflect the generality of antigenicity and hence the inability to decipher B-cell epitopes as an intrinsic feature of the protein. It is an open question as to whether ultimately discriminatory features can be found. PMID:17910770

  20. Food Recognition: A New Dataset, Experiments, and Results.

    PubMed

    Ciocca, Gianluigi; Napoletano, Paolo; Schettini, Raimondo

    2017-05-01

    We propose a new dataset for the evaluation of food recognition algorithms that can be used in dietary monitoring applications. Each image depicts a real canteen tray with dishes and foods arranged in different ways. Each tray contains multiple instances of food classes. The dataset contains 1027 canteen trays for a total of 3616 food instances belonging to 73 food classes. The food on the tray images has been manually segmented using carefully drawn polygonal boundaries. We have benchmarked the dataset by designing an automatic tray analysis pipeline that takes a tray image as input, finds the regions of interest, and predicts for each region the corresponding food class. We have experimented with three different classification strategies using also several visual descriptors. We achieve about 79% of food and tray recognition accuracy using convolutional-neural-networks-based features. The dataset, as well as the benchmark framework, are available to the research community.

  1. MIPS bacterial genomes functional annotation benchmark dataset.

    PubMed

    Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

    2005-05-15

    Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab

  2. A benchmark for statistical microarray data analysis that preserves actual biological and technical variance.

    PubMed

    De Hertogh, Benoît; De Meulder, Bertrand; Berger, Fabrice; Pierre, Michael; Bareke, Eric; Gaigneaux, Anthoula; Depiereux, Eric

    2010-01-11

    Recent reanalysis of spike-in datasets underscored the need for new and more accurate benchmark datasets for statistical microarray analysis. We present here a fresh method using biologically-relevant data to evaluate the performance of statistical methods. Our novel method ranks the probesets from a dataset composed of publicly-available biological microarray data and extracts subset matrices with precise information/noise ratios. Our method can be used to determine the capability of different methods to better estimate variance for a given number of replicates. The mean-variance and mean-fold change relationships of the matrices revealed a closer approximation of biological reality. Performance analysis refined the results from benchmarks published previously.We show that the Shrinkage t test (close to Limma) was the best of the methods tested, except when two replicates were examined, where the Regularized t test and the Window t test performed slightly better. The R scripts used for the analysis are available at http://urbm-cluster.urbm.fundp.ac.be/~bdemeulder/.

  3. Assessment of composite motif discovery methods.

    PubMed

    Klepper, Kjetil; Sandve, Geir K; Abul, Osman; Johansen, Jostein; Drablos, Finn

    2008-02-26

    Computational discovery of regulatory elements is an important area of bioinformatics research and more than a hundred motif discovery methods have been published. Traditionally, most of these methods have addressed the problem of single motif discovery - discovering binding motifs for individual transcription factors. In higher organisms, however, transcription factors usually act in combination with nearby bound factors to induce specific regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets of motifs bound by multiple cooperating transcription factors, so called composite motifs or cis-regulatory modules. Given the large number and diversity of methods available, independent assessment of methods becomes important. Although there have been several benchmark studies of single motif discovery, no similar studies have previously been conducted concerning composite motif discovery. We have developed a benchmarking framework for composite motif discovery and used it to evaluate the performance of eight published module discovery tools. Benchmark datasets were constructed based on real genomic sequences containing experimentally verified regulatory modules, and the module discovery programs were asked to predict both the locations of these modules and to specify the single motifs involved. To aid the programs in their search, we provided position weight matrices corresponding to the binding motifs of the transcription factors involved. In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to test the response of programs to varying levels of noise. Although some of the methods tested tended to score somewhat better than others overall, there were still large variations between individual datasets and no single method performed consistently better than the rest in all situations. The variation in performance on individual datasets also shows that the new benchmark datasets represents a suitable variety of challenges to most methods for module discovery.

  4. NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference.

    PubMed

    Bellot, Pau; Olsen, Catharina; Salembier, Philippe; Oliveras-Vergés, Albert; Meyer, Patrick E

    2015-09-29

    In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.

  5. Comprehensive evaluation of untargeted metabolomics data processing software in feature detection, quantification and discriminating marker selection.

    PubMed

    Li, Zhucui; Lu, Yan; Guo, Yufeng; Cao, Haijie; Wang, Qinhong; Shui, Wenqing

    2018-10-31

    Data analysis represents a key challenge for untargeted metabolomics studies and it commonly requires extensive processing of more than thousands of metabolite peaks included in raw high-resolution MS data. Although a number of software packages have been developed to facilitate untargeted data processing, they have not been comprehensively scrutinized in the capability of feature detection, quantification and marker selection using a well-defined benchmark sample set. In this study, we acquired a benchmark dataset from standard mixtures consisting of 1100 compounds with specified concentration ratios including 130 compounds with significant variation of concentrations. Five software evaluated here (MS-Dial, MZmine 2, XCMS, MarkerView, and Compound Discoverer) showed similar performance in detection of true features derived from compounds in the mixtures. However, significant differences between untargeted metabolomics software were observed in relative quantification of true features in the benchmark dataset. MZmine 2 outperformed the other software in terms of quantification accuracy and it reported the most true discriminating markers together with the fewest false markers. Furthermore, we assessed selection of discriminating markers by different software using both the benchmark dataset and a real-case metabolomics dataset to propose combined usage of two software for increasing confidence of biomarker identification. Our findings from comprehensive evaluation of untargeted metabolomics software would help guide future improvements of these widely used bioinformatics tools and enable users to properly interpret their metabolomics results. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Design of an Evolutionary Approach for Intrusion Detection

    PubMed Central

    2013-01-01

    A novel evolutionary approach is proposed for effective intrusion detection based on benchmark datasets. The proposed approach can generate a pool of noninferior individual solutions and ensemble solutions thereof. The generated ensembles can be used to detect the intrusions accurately. For intrusion detection problem, the proposed approach could consider conflicting objectives simultaneously like detection rate of each attack class, error rate, accuracy, diversity, and so forth. The proposed approach can generate a pool of noninferior solutions and ensembles thereof having optimized trade-offs values of multiple conflicting objectives. In this paper, a three-phase, approach is proposed to generate solutions to a simple chromosome design in the first phase. In the first phase, a Pareto front of noninferior individual solutions is approximated. In the second phase of the proposed approach, the entire solution set is further refined to determine effective ensemble solutions considering solution interaction. In this phase, another improved Pareto front of ensemble solutions over that of individual solutions is approximated. The ensemble solutions in improved Pareto front reported improved detection results based on benchmark datasets for intrusion detection. In the third phase, a combination method like majority voting method is used to fuse the predictions of individual solutions for determining prediction of ensemble solution. Benchmark datasets, namely, KDD cup 1999 and ISCX 2012 dataset, are used to demonstrate and validate the performance of the proposed approach for intrusion detection. The proposed approach can discover individual solutions and ensemble solutions thereof with a good support and a detection rate from benchmark datasets (in comparison with well-known ensemble methods like bagging and boosting). In addition, the proposed approach is a generalized classification approach that is applicable to the problem of any field having multiple conflicting objectives, and a dataset can be represented in the form of labelled instances in terms of its features. PMID:24376390

  7. TargetM6A: Identifying N6-Methyladenosine Sites From RNA Sequences via Position-Specific Nucleotide Propensities and a Support Vector Machine.

    PubMed

    Li, Guang-Qing; Liu, Zi; Shen, Hong-Bin; Yu, Dong-Jun

    2016-10-01

    As one of the most ubiquitous post-transcriptional modifications of RNA, N 6 -methyladenosine ( [Formula: see text]) plays an essential role in many vital biological processes. The identification of [Formula: see text] sites in RNAs is significantly important for both basic biomedical research and practical drug development. In this study, we designed a computational-based method, called TargetM6A, to rapidly and accurately target [Formula: see text] sites solely from the primary RNA sequences. Two new features, i.e., position-specific nucleotide/dinucleotide propensities (PSNP/PSDP), are introduced and combined with the traditional nucleotide composition (NC) feature to formulate RNA sequences. The extracted features are further optimized to obtain a much more compact and discriminative feature subset by applying an incremental feature selection (IFS) procedure. Based on the optimized feature subset, we trained TargetM6A on the training dataset with a support vector machine (SVM) as the prediction engine. We compared the proposed TargetM6A method with existing methods for predicting [Formula: see text] sites by performing stringent jackknife tests and independent validation tests on benchmark datasets. The experimental results show that the proposed TargetM6A method outperformed the existing methods for predicting [Formula: see text] sites and remarkably improved the prediction performances, with MCC = 0.526 and AUC = 0.818. We also provided a user-friendly web server for TargetM6A, which is publicly accessible for academic use at http://csbio.njust.edu.cn/bioinf/TargetM6A.

  8. Validation of Short-Term Noise Assessment Procedures: FY16 Summary of Procedures, Progress, and Preliminary Results

    DTIC Science & Technology

    Validation project. This report describes the procedure used to generate the noise models output dataset , and then it compares that dataset to the...benchmark, the Engineer Research and Development Centers Long-Range Sound Propagation dataset . It was found that the models consistently underpredict the

  9. Systematic chemical-genetic and chemical-chemical interaction datasets for prediction of compound synergism

    PubMed Central

    Wildenhain, Jan; Spitzer, Michaela; Dolma, Sonam; Jarvik, Nick; White, Rachel; Roy, Marcia; Griffiths, Emma; Bellows, David S.; Wright, Gerard D.; Tyers, Mike

    2016-01-01

    The network structure of biological systems suggests that effective therapeutic intervention may require combinations of agents that act synergistically. However, a dearth of systematic chemical combination datasets have limited the development of predictive algorithms for chemical synergism. Here, we report two large datasets of linked chemical-genetic and chemical-chemical interactions in the budding yeast Saccharomyces cerevisiae. We screened 5,518 unique compounds against 242 diverse yeast gene deletion strains to generate an extended chemical-genetic matrix (CGM) of 492,126 chemical-gene interaction measurements. This CGM dataset contained 1,434 genotype-specific inhibitors, termed cryptagens. We selected 128 structurally diverse cryptagens and tested all pairwise combinations to generate a benchmark dataset of 8,128 pairwise chemical-chemical interaction tests for synergy prediction, termed the cryptagen matrix (CM). An accompanying database resource called ChemGRID was developed to enable analysis, visualisation and downloads of all data. The CGM and CM datasets will facilitate the benchmarking of computational approaches for synergy prediction, as well as chemical structure-activity relationship models for anti-fungal drug discovery. PMID:27874849

  10. Designing Solar Data Archives: Practical Considerations

    NASA Astrophysics Data System (ADS)

    Messerotti, M.

    The variety of new solar observatories in space and on the ground poses the stringent problem of an efficient storage and archiving of huge datasets. We briefly address some typical architectures and consider the key point of data access and distribution through networking.

  11. Deriving empirical benchmarks from existing monitoring datasets for rangeland adaptive management

    USDA-ARS?s Scientific Manuscript database

    Under adaptive management, goals and decisions for managing rangeland resources are shaped by requirements like the Bureau of Land Management’s (BLM’s) Land Health Standards, which specify desired conditions. Without formalized, quantitative benchmarks for triggering management actions, adaptive man...

  12. APPLICATION OF BENCHMARK DOSE METHODOLOGY TO DATA FROM PRENATAL DEVELOPMENTAL TOXICITY STUDIES

    EPA Science Inventory

    The benchmark dose (BMD) concept was applied to 246 conventional developmental toxicity datasets from government, industry and commercial laboratories. Five modeling approaches were used, two generic and three specific to developmental toxicity (DT models). BMDs for both quantal ...

  13. Review and Analysis of Algorithmic Approaches Developed for Prognostics on CMAPSS Dataset

    NASA Technical Reports Server (NTRS)

    Ramasso, Emannuel; Saxena, Abhinav

    2014-01-01

    Benchmarking of prognostic algorithms has been challenging due to limited availability of common datasets suitable for prognostics. In an attempt to alleviate this problem several benchmarking datasets have been collected by NASA's prognostic center of excellence and made available to the Prognostics and Health Management (PHM) community to allow evaluation and comparison of prognostics algorithms. Among those datasets are five C-MAPSS datasets that have been extremely popular due to their unique characteristics making them suitable for prognostics. The C-MAPSS datasets pose several challenges that have been tackled by different methods in the PHM literature. In particular, management of high variability due to sensor noise, effects of operating conditions, and presence of multiple simultaneous fault modes are some factors that have great impact on the generalization capabilities of prognostics algorithms. More than 70 publications have used the C-MAPSS datasets for developing data-driven prognostic algorithms. The C-MAPSS datasets are also shown to be well-suited for development of new machine learning and pattern recognition tools for several key preprocessing steps such as feature extraction and selection, failure mode assessment, operating conditions assessment, health status estimation, uncertainty management, and prognostics performance evaluation. This paper summarizes a comprehensive literature review of publications using C-MAPSS datasets and provides guidelines and references to further usage of these datasets in a manner that allows clear and consistent comparison between different approaches.

  14. CIFAR10-DVS: An Event-Stream Dataset for Object Classification

    PubMed Central

    Li, Hongmin; Liu, Hanchao; Ji, Xiangyang; Li, Guoqi; Shi, Luping

    2017-01-01

    Neuromorphic vision research requires high-quality and appropriately challenging event-stream datasets to support continuous improvement of algorithms and methods. However, creating event-stream datasets is a time-consuming task, which needs to be recorded using the neuromorphic cameras. Currently, there are limited event-stream datasets available. In this work, by utilizing the popular computer vision dataset CIFAR-10, we converted 10,000 frame-based images into 10,000 event streams using a dynamic vision sensor (DVS), providing an event-stream dataset of intermediate difficulty in 10 different classes, named as “CIFAR10-DVS.” The conversion of event-stream dataset was implemented by a repeated closed-loop smooth (RCLS) movement of frame-based images. Unlike the conversion of frame-based images by moving the camera, the image movement is more realistic in respect of its practical applications. The repeated closed-loop image movement generates rich local intensity changes in continuous time which are quantized by each pixel of the DVS camera to generate events. Furthermore, a performance benchmark in event-driven object classification is provided based on state-of-the-art classification algorithms. This work provides a large event-stream dataset and an initial benchmark for comparison, which may boost algorithm developments in even-driven pattern recognition and object classification. PMID:28611582

  15. CIFAR10-DVS: An Event-Stream Dataset for Object Classification.

    PubMed

    Li, Hongmin; Liu, Hanchao; Ji, Xiangyang; Li, Guoqi; Shi, Luping

    2017-01-01

    Neuromorphic vision research requires high-quality and appropriately challenging event-stream datasets to support continuous improvement of algorithms and methods. However, creating event-stream datasets is a time-consuming task, which needs to be recorded using the neuromorphic cameras. Currently, there are limited event-stream datasets available. In this work, by utilizing the popular computer vision dataset CIFAR-10, we converted 10,000 frame-based images into 10,000 event streams using a dynamic vision sensor (DVS), providing an event-stream dataset of intermediate difficulty in 10 different classes, named as "CIFAR10-DVS." The conversion of event-stream dataset was implemented by a repeated closed-loop smooth (RCLS) movement of frame-based images. Unlike the conversion of frame-based images by moving the camera, the image movement is more realistic in respect of its practical applications. The repeated closed-loop image movement generates rich local intensity changes in continuous time which are quantized by each pixel of the DVS camera to generate events. Furthermore, a performance benchmark in event-driven object classification is provided based on state-of-the-art classification algorithms. This work provides a large event-stream dataset and an initial benchmark for comparison, which may boost algorithm developments in even-driven pattern recognition and object classification.

  16. BMDExpress Data Viewer: A Visualization Tool to Analyze BMDExpress Datasets

    EPA Science Inventory

    Regulatory agencies increasingly apply benchmark dose (BMD) modeling to determine points of departure in human risk assessments. BMDExpress applies BMD modeling to transcriptomics datasets and groups genes to biological processes and pathways for rapid assessment of doses at whic...

  17. 78 FR 27957 - Fisheries of the South Atlantic, Southeast Data, Assessment, and Review (SEDAR); Public Meetings

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-05-13

    ..., describes the fisheries, evaluates the status of the stock, estimates biological benchmarks, projects future.... Participants will evaluate and recommend datasets appropriate for assessment analysis, employ assessment models to evaluate stock status, estimate population benchmarks and management criteria, and project future...

  18. HEp-2 cell image classification method based on very deep convolutional networks with small datasets

    NASA Astrophysics Data System (ADS)

    Lu, Mengchi; Gao, Long; Guo, Xifeng; Liu, Qiang; Yin, Jianping

    2017-07-01

    Human Epithelial-2 (HEp-2) cell images staining patterns classification have been widely used to identify autoimmune diseases by the anti-Nuclear antibodies (ANA) test in the Indirect Immunofluorescence (IIF) protocol. Because manual test is time consuming, subjective and labor intensive, image-based Computer Aided Diagnosis (CAD) systems for HEp-2 cell classification are developing. However, methods proposed recently are mostly manual features extraction with low accuracy. Besides, the scale of available benchmark datasets is small, which does not exactly suitable for using deep learning methods. This issue will influence the accuracy of cell classification directly even after data augmentation. To address these issues, this paper presents a high accuracy automatic HEp-2 cell classification method with small datasets, by utilizing very deep convolutional networks (VGGNet). Specifically, the proposed method consists of three main phases, namely image preprocessing, feature extraction and classification. Moreover, an improved VGGNet is presented to address the challenges of small-scale datasets. Experimental results over two benchmark datasets demonstrate that the proposed method achieves superior performance in terms of accuracy compared with existing methods.

  19. Evolutionary Wavelet Neural Network ensembles for breast cancer and Parkinson's disease prediction.

    PubMed

    Khan, Maryam Mahsal; Mendes, Alexandre; Chalup, Stephan K

    2018-01-01

    Wavelet Neural Networks are a combination of neural networks and wavelets and have been mostly used in the area of time-series prediction and control. Recently, Evolutionary Wavelet Neural Networks have been employed to develop cancer prediction models. The present study proposes to use ensembles of Evolutionary Wavelet Neural Networks. The search for a high quality ensemble is directed by a fitness function that incorporates the accuracy of the classifiers both independently and as part of the ensemble itself. The ensemble approach is tested on three publicly available biomedical benchmark datasets, one on Breast Cancer and two on Parkinson's disease, using a 10-fold cross-validation strategy. Our experimental results show that, for the first dataset, the performance was similar to previous studies reported in literature. On the second dataset, the Evolutionary Wavelet Neural Network ensembles performed better than all previous methods. The third dataset is relatively new and this study is the first to report benchmark results.

  20. Evolutionary Wavelet Neural Network ensembles for breast cancer and Parkinson’s disease prediction

    PubMed Central

    Mendes, Alexandre; Chalup, Stephan K.

    2018-01-01

    Wavelet Neural Networks are a combination of neural networks and wavelets and have been mostly used in the area of time-series prediction and control. Recently, Evolutionary Wavelet Neural Networks have been employed to develop cancer prediction models. The present study proposes to use ensembles of Evolutionary Wavelet Neural Networks. The search for a high quality ensemble is directed by a fitness function that incorporates the accuracy of the classifiers both independently and as part of the ensemble itself. The ensemble approach is tested on three publicly available biomedical benchmark datasets, one on Breast Cancer and two on Parkinson’s disease, using a 10-fold cross-validation strategy. Our experimental results show that, for the first dataset, the performance was similar to previous studies reported in literature. On the second dataset, the Evolutionary Wavelet Neural Network ensembles performed better than all previous methods. The third dataset is relatively new and this study is the first to report benchmark results. PMID:29420578

  1. A Large-scale Benchmark Dataset for Event Recognition in Surveillance Video

    DTIC Science & Technology

    2011-06-01

    orders of magnitude larger than existing datasets such CAVIAR [7]. TRECVID 2008 airport dataset [16] contains 100 hours of video, but, it provides only...entire human figure (e.g., above shoulder), amounting to 500% human to video 2Some statistics are approximate, obtained from the CAVIAR 1st scene and...and diversity in both col- lection sites and viewpoints. In comparison to surveillance datasets such as CAVIAR [7] and TRECVID [16] shown in Fig. 3

  2. Probabilistic performance estimators for computational chemistry methods: The empirical cumulative distribution function of absolute errors

    NASA Astrophysics Data System (ADS)

    Pernot, Pascal; Savin, Andreas

    2018-06-01

    Benchmarking studies in computational chemistry use reference datasets to assess the accuracy of a method through error statistics. The commonly used error statistics, such as the mean signed and mean unsigned errors, do not inform end-users on the expected amplitude of prediction errors attached to these methods. We show that, the distributions of model errors being neither normal nor zero-centered, these error statistics cannot be used to infer prediction error probabilities. To overcome this limitation, we advocate for the use of more informative statistics, based on the empirical cumulative distribution function of unsigned errors, namely, (1) the probability for a new calculation to have an absolute error below a chosen threshold and (2) the maximal amplitude of errors one can expect with a chosen high confidence level. Those statistics are also shown to be well suited for benchmarking and ranking studies. Moreover, the standard error on all benchmarking statistics depends on the size of the reference dataset. Systematic publication of these standard errors would be very helpful to assess the statistical reliability of benchmarking conclusions.

  3. References and benchmarks for pore-scale flow simulated using micro-CT images of porous media and digital rocks

    NASA Astrophysics Data System (ADS)

    Saxena, Nishank; Hofmann, Ronny; Alpak, Faruk O.; Berg, Steffen; Dietderich, Jesse; Agarwal, Umang; Tandon, Kunj; Hunter, Sander; Freeman, Justin; Wilson, Ove Bjorn

    2017-11-01

    We generate a novel reference dataset to quantify the impact of numerical solvers, boundary conditions, and simulation platforms. We consider a variety of microstructures ranging from idealized pipes to digital rocks. Pore throats of the digital rocks considered are large enough to be well resolved with state-of-the-art micro-computerized tomography technology. Permeability is computed using multiple numerical engines, 12 in total, including, Lattice-Boltzmann, computational fluid dynamics, voxel based, fast semi-analytical, and known empirical models. Thus, we provide a measure of uncertainty associated with flow computations of digital media. Moreover, the reference and standards dataset generated is the first of its kind and can be used to test and improve new fluid flow algorithms. We find that there is an overall good agreement between solvers for idealized cross-section shape pipes. As expected, the disagreement increases with increase in complexity of the pore space. Numerical solutions for pipes with sinusoidal variation of cross section show larger variability compared to pipes of constant cross-section shapes. We notice relatively larger variability in computed permeability of digital rocks with coefficient of variation (of up to 25%) in computed values between various solvers. Still, these differences are small given other subsurface uncertainties. The observed differences between solvers can be attributed to several causes including, differences in boundary conditions, numerical convergence criteria, and parameterization of fundamental physics equations. Solvers that perform additional meshing of irregular pore shapes require an additional step in practical workflows which involves skill and can introduce further uncertainty. Computation times for digital rocks vary from minutes to several days depending on the algorithm and available computational resources. We find that more stringent convergence criteria can improve solver accuracy but at the expense of longer computation time.

  4. Benchmark datasets for 3D MALDI- and DESI-imaging mass spectrometry.

    PubMed

    Oetjen, Janina; Veselkov, Kirill; Watrous, Jeramie; McKenzie, James S; Becker, Michael; Hauberg-Lotte, Lena; Kobarg, Jan Hendrik; Strittmatter, Nicole; Mróz, Anna K; Hoffmann, Franziska; Trede, Dennis; Palmer, Andrew; Schiffler, Stefan; Steinhorst, Klaus; Aichler, Michaela; Goldin, Robert; Guntinas-Lichius, Orlando; von Eggeling, Ferdinand; Thiele, Herbert; Maedler, Kathrin; Walch, Axel; Maass, Peter; Dorrestein, Pieter C; Takats, Zoltan; Alexandrov, Theodore

    2015-01-01

    Three-dimensional (3D) imaging mass spectrometry (MS) is an analytical chemistry technique for the 3D molecular analysis of a tissue specimen, entire organ, or microbial colonies on an agar plate. 3D-imaging MS has unique advantages over existing 3D imaging techniques, offers novel perspectives for understanding the spatial organization of biological processes, and has growing potential to be introduced into routine use in both biology and medicine. Owing to the sheer quantity of data generated, the visualization, analysis, and interpretation of 3D imaging MS data remain a significant challenge. Bioinformatics research in this field is hampered by the lack of publicly available benchmark datasets needed to evaluate and compare algorithms. High-quality 3D imaging MS datasets from different biological systems at several labs were acquired, supplied with overview images and scripts demonstrating how to read them, and deposited into MetaboLights, an open repository for metabolomics data. 3D imaging MS data were collected from five samples using two types of 3D imaging MS. 3D matrix-assisted laser desorption/ionization imaging (MALDI) MS data were collected from murine pancreas, murine kidney, human oral squamous cell carcinoma, and interacting microbial colonies cultured in Petri dishes. 3D desorption electrospray ionization (DESI) imaging MS data were collected from a human colorectal adenocarcinoma. With the aim to stimulate computational research in the field of computational 3D imaging MS, selected high-quality 3D imaging MS datasets are provided that could be used by algorithm developers as benchmark datasets.

  5. Multi-Complementary Model for Long-Term Tracking

    PubMed Central

    Zhang, Deng; Zhang, Junchang; Xia, Chenyang

    2018-01-01

    In recent years, video target tracking algorithms have been widely used. However, many tracking algorithms do not achieve satisfactory performance, especially when dealing with problems such as object occlusions, background clutters, motion blur, low illumination color images, and sudden illumination changes in real scenes. In this paper, we incorporate an object model based on contour information into a Staple tracker that combines the correlation filter model and color model to greatly improve the tracking robustness. Since each model is responsible for tracking specific features, the three complementary models combine for more robust tracking. In addition, we propose an efficient object detection model with contour and color histogram features, which has good detection performance and better detection efficiency compared to the traditional target detection algorithm. Finally, we optimize the traditional scale calculation, which greatly improves the tracking execution speed. We evaluate our tracker on the Object Tracking Benchmarks 2013 (OTB-13) and Object Tracking Benchmarks 2015 (OTB-15) benchmark datasets. With the OTB-13 benchmark datasets, our algorithm is improved by 4.8%, 9.6%, and 10.9% on the success plots of OPE, TRE and SRE, respectively, in contrast to another classic LCT (Long-term Correlation Tracking) algorithm. On the OTB-15 benchmark datasets, when compared with the LCT algorithm, our algorithm achieves 10.4%, 12.5%, and 16.1% improvement on the success plots of OPE, TRE, and SRE, respectively. At the same time, it needs to be emphasized that, due to the high computational efficiency of the color model and the object detection model using efficient data structures, and the speed advantage of the correlation filters, our tracking algorithm could still achieve good tracking speed. PMID:29425170

  6. Search for the lepton flavour violating decay μ ^+ → e^+ γ with the full dataset of the MEG experiment

    NASA Astrophysics Data System (ADS)

    Baldini, A. M.; Bao, Y.; Baracchini, E.; Bemporad, C.; Berg, F.; Biasotti, M.; Boca, G.; Cascella, M.; Cattaneo, P. W.; Cavoto, G.; Cei, F.; Cerri, C.; Chiarello, G.; Chiri, C.; Corvaglia, A.; de Bari, A.; De Gerone, M.; Doke, T.; D'Onofrio, A.; Dussoni, S.; Egger, J.; Fujii, Y.; Galli, L.; Gatti, F.; Grancagnolo, F.; Grassi, M.; Graziosi, A.; Grigoriev, D. N.; Haruyama, T.; Hildebrandt, M.; Hodge, Z.; Ieki, K.; Ignatov, F.; Iwamoto, T.; Kaneko, D.; Kang, T. I.; Kettle, P.-R.; Khazin, B. I.; Khomutov, N.; Korenchenko, A.; Kravchuk, N.; Lim, G. M. A.; Maki, A.; Mihara, S.; Molzon, W.; Mori, Toshinori; Morsani, F.; Mtchedilishvili, A.; Mzavia, D.; Nakaura, S.; Nardò, R.; Nicolò, D.; Nishiguchi, H.; Nishimura, M.; Ogawa, S.; Ootani, W.; Orito, S.; Panareo, M.; Papa, A.; Pazzi, R.; Pepino, A.; Piredda, G.; Pizzigoni, G.; Popov, A.; Raffaelli, F.; Renga, F.; Ripiccini, E.; Ritt, S.; Rossella, M.; Rutar, G.; Sawada, R.; Sergiampietri, F.; Signorelli, G.; Simonetta, M.; Tassielli, G. F.; Tenchini, F.; Uchiyama, Y.; Venturini, M.; Voena, C.; Yamamoto, A.; Yoshida, K.; You, Z.; Yudin, Yu. V.; Zanello, D.

    2016-08-01

    The final results of the search for the lepton flavour violating decay μ^+ → e^+ γ based on the full dataset collected by the MEG experiment at the Paul Scherrer Institut in the period 2009-2013 and totalling 7.5× 10^{14} stopped muons on target are presented. No significant excess of events is observed in the dataset with respect to the expected background and a new upper limit on the branching ratio of this decay of B (μ ^+ → e^+ γ ) < 4.2 × 10^{-13} (90 % confidence level) is established, which represents the most stringent limit on the existence of this decay to date.

  7. Study on homogenization of synthetic GNSS-retrieved IWV time series and its impact on trend estimates with autoregressive noise

    NASA Astrophysics Data System (ADS)

    Klos, Anna; Pottiaux, Eric; Van Malderen, Roeland; Bock, Olivier; Bogusz, Janusz

    2017-04-01

    A synthetic benchmark dataset of Integrated Water Vapour (IWV) was created within the activity of "Data homogenisation" of sub-working group WG3 of COST ES1206 Action. The benchmark dataset was created basing on the analysis of IWV differences retrieved by Global Positioning System (GPS) International GNSS Service (IGS) stations using European Centre for Medium-Range Weather Forecats (ECMWF) reanalysis data (ERA-Interim). Having analysed a set of 120 series of IWV differences (ERAI-GPS) derived for IGS stations, we delivered parameters of a number of gaps and breaks for every certain station. Moreover, we estimated values of trends, significant seasonalities and character of residuals when deterministic model was removed. We tested five different noise models and found that a combination of white and autoregressive processes of first order describes the stochastic part with a good accuracy. Basing on this analysis, we performed Monte Carlo simulations of 25 years long data with two different types of noise: white as well as combination of white and autoregressive processes. We also added few strictly defined offsets, creating three variants of synthetic dataset: easy, less-complicated and fully-complicated. The 'Easy' dataset included seasonal signals (annual, semi-annual, 3 and 4 months if present for a particular station), offsets and white noise. The 'Less-complicated' dataset included above-mentioned, as well as the combination of white and first order autoregressive processes (AR(1)+WH). The 'Fully-complicated' dataset included, beyond above, a trend and gaps. In this research, we show the impact of manual homogenisation on the estimates of trend and its error. We also cross-compare the results for three above-mentioned datasets, as the synthetized noise type might have a significant influence on manual homogenisation. Therefore, it might mostly affect the values of trend and their uncertainties when inappropriately handled. In a future, the synthetic dataset we present is going to be used as a benchmark to test various statistical tools in terms of homogenisation task.

  8. A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database.

    PubMed

    Huang, Zhiwu; Shan, Shiguang; Wang, Ruiping; Zhang, Haihong; Lao, Shihong; Kuerban, Alifu; Chen, Xilin

    2015-12-01

    Face recognition with still face images has been widely studied, while the research on video-based face recognition is inadequate relatively, especially in terms of benchmark datasets and comparisons. Real-world video-based face recognition applications require techniques for three distinct scenarios: 1) Videoto-Still (V2S); 2) Still-to-Video (S2V); and 3) Video-to-Video (V2V), respectively, taking video or still image as query or target. To the best of our knowledge, few datasets and evaluation protocols have benchmarked for all the three scenarios. In order to facilitate the study of this specific topic, this paper contributes a benchmarking and comparative study based on a newly collected still/video face database, named COX(1) Face DB. Specifically, we make three contributions. First, we collect and release a largescale still/video face database to simulate video surveillance with three different video-based face recognition scenarios (i.e., V2S, S2V, and V2V). Second, for benchmarking the three scenarios designed on our database, we review and experimentally compare a number of existing set-based methods. Third, we further propose a novel Point-to-Set Correlation Learning (PSCL) method, and experimentally show that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB. Extensive experimental results clearly demonstrate that video-based face recognition needs more efforts, and our COX Face DB is a good benchmark database for evaluation.

  9. A global water resources ensemble of hydrological models: the eartH2Observe Tier-1 dataset

    NASA Astrophysics Data System (ADS)

    Schellekens, Jaap; Dutra, Emanuel; Martínez-de la Torre, Alberto; Balsamo, Gianpaolo; van Dijk, Albert; Sperna Weiland, Frederiek; Minvielle, Marie; Calvet, Jean-Christophe; Decharme, Bertrand; Eisner, Stephanie; Fink, Gabriel; Flörke, Martina; Peßenteiner, Stefanie; van Beek, Rens; Polcher, Jan; Beck, Hylke; Orth, René; Calton, Ben; Burke, Sophia; Dorigo, Wouter; Weedon, Graham P.

    2017-07-01

    The dataset presented here consists of an ensemble of 10 global hydrological and land surface models for the period 1979-2012 using a reanalysis-based meteorological forcing dataset (0.5° resolution). The current dataset serves as a state of the art in current global hydrological modelling and as a benchmark for further improvements in the coming years. A signal-to-noise ratio analysis revealed low inter-model agreement over (i) snow-dominated regions and (ii) tropical rainforest and monsoon areas. The large uncertainty of precipitation in the tropics is not reflected in the ensemble runoff. Verification of the results against benchmark datasets for evapotranspiration, snow cover, snow water equivalent, soil moisture anomaly and total water storage anomaly using the tools from The International Land Model Benchmarking Project (ILAMB) showed overall useful model performance, while the ensemble mean generally outperformed the single model estimates. The results also show that there is currently no single best model for all variables and that model performance is spatially variable. In our unconstrained model runs the ensemble mean of total runoff into the ocean was 46 268 km3 yr-1 (334 kg m-2 yr-1), while the ensemble mean of total evaporation was 537 kg m-2 yr-1. All data are made available openly through a Water Cycle Integrator portal (WCI, wci.earth2observe.eu), and via a direct http and ftp download. The portal follows the protocols of the open geospatial consortium such as OPeNDAP, WCS and WMS. The DOI for the data is https://doi.org/10.1016/10.5281/zenodo.167070.

  10. Application of IUS equipment and experience to orbit transfer vehicles of the 90's

    NASA Astrophysics Data System (ADS)

    Bangsund, E.; Keeney, J.; Cowgill, E.

    1985-10-01

    This paper relates experiences with the IUS program and the application of that experience to Future Orbit Transfer Vehicles. More specifically it includes the implementation of the U.S. Air Force Space Division high reliability parts standard (SMASO STD 73-2C) and the component/system test standard (MIL-STD-1540A). Test results from the parts and component level testing and the resulting system level test program for fourteen IUS flight vehicles are discussed. The IUS program has had the highest compliance with these standards and thus offers a benchmark of experience for future programs demanding extreme reliability. In summary, application of the stringent parts standard has resulted in fewer failures during testing and the stringent test standard has eliminated design problems in the hardware. Both have been expensive in costs and schedules, and should be applied with flexibility.

  11. Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control

    PubMed Central

    Kirwan, Jennifer A; Weber, Ralf J M; Broadhurst, David I; Viant, Mark R

    2014-01-01

    Direct-infusion mass spectrometry (DIMS) metabolomics is an important approach for characterising molecular responses of organisms to disease, drugs and the environment. Increasingly large-scale metabolomics studies are being conducted, necessitating improvements in both bioanalytical and computational workflows to maintain data quality. This dataset represents a systematic evaluation of the reproducibility of a multi-batch DIMS metabolomics study of cardiac tissue extracts. It comprises of twenty biological samples (cow vs. sheep) that were analysed repeatedly, in 8 batches across 7 days, together with a concurrent set of quality control (QC) samples. Data are presented from each step of the workflow and are available in MetaboLights. The strength of the dataset is that intra- and inter-batch variation can be corrected using QC spectra and the quality of this correction assessed independently using the repeatedly-measured biological samples. Originally designed to test the efficacy of a batch-correction algorithm, it will enable others to evaluate novel data processing algorithms. Furthermore, this dataset serves as a benchmark for DIMS metabolomics, derived using best-practice workflows and rigorous quality assessment. PMID:25977770

  12. Search for the lepton flavour violating decay $$\\mu ^+ \\rightarrow \\mathrm {e}^+ \\gamma $$ with the full dataset of the MEG experiment: MEG Collaboration

    DOE PAGES

    Baldini, A. M.; Bao, Y.; Baracchini, E.; ...

    2016-08-03

    Our final results of the search for the lepton flavour violating decay μ+→e+γ based on the full dataset collected by the MEG experiment at the Paul Scherrer Institut in the period 2009–2013 and totalling 7.5×1014 stopped muons on target are presented. Furthermore, there was not a significant excess of events observed in the dataset with respect to the expected background and a new upper limit on the branching ratio of this decay of B(μ+→e+γ)<4.2×10-13 (90 % confidence level) is established, which represents the most stringent limit on the existence of this decay to date.

  13. Technical Report: Benchmarking for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McLoughlin, K.

    2016-01-22

    The software application “MetaQuant” was developed by our group at Lawrence Livermore National Laboratory (LLNL). It is designed to profile microbial populations in a sample using data from whole-genome shotgun (WGS) metagenomic DNA sequencing. Several other metagenomic profiling applications have been described in the literature. We ran a series of benchmark tests to compare the performance of MetaQuant against that of a few existing profiling tools, using real and simulated sequence datasets. This report describes our benchmarking procedure and results.

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baldini, A. M.; Bao, Y.; Baracchini, E.

    Our final results of the search for the lepton flavour violating decay μ+→e+γ based on the full dataset collected by the MEG experiment at the Paul Scherrer Institut in the period 2009–2013 and totalling 7.5×1014 stopped muons on target are presented. Furthermore, there was not a significant excess of events observed in the dataset with respect to the expected background and a new upper limit on the branching ratio of this decay of B(μ+→e+γ)<4.2×10-13 (90 % confidence level) is established, which represents the most stringent limit on the existence of this decay to date.

  15. Examining a Higher Education Funding Formula in a Time of Shifting Currents: Kentucky's Benchmark Approach

    ERIC Educational Resources Information Center

    Wall, Andrew; Frost, Robert; Smith, Ryan; Keeling, Richard

    2008-01-01

    Although datasets such as the Integrated Postsecondary Data System are available as inputs to higher education funding formulas, these datasets can be unreliable, incomplete, or unresponsive to criteria identified by state education officials. State formulas do not always match the state's economic and human capital goals. This article analyzes…

  16. A Novel Performance Evaluation Methodology for Single-Target Trackers.

    PubMed

    Kristan, Matej; Matas, Jiri; Leonardis, Ales; Vojir, Tomas; Pflugfelder, Roman; Fernandez, Gustavo; Nebehay, Georg; Porikli, Fatih; Cehovin, Luka

    2016-11-01

    This paper addresses the problem of single-target tracker performance evaluation. We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison. The ranking-based methodology addresses tracker equivalence in terms of statistical significance and practical differences. A fully-annotated dataset with per-frame annotations with several visual attributes is introduced. The diversity of its visual properties is maximized in a novel way by clustering a large number of videos according to their visual attributes. This makes it the most sophistically constructed and annotated dataset to date. A multi-platform evaluation system allowing easy integration of third-party trackers is presented as well. The proposed evaluation methodology was tested on the VOT2014 challenge on the new dataset and 38 trackers, making it the largest benchmark to date. Most of the tested trackers are indeed state-of-the-art since they outperform the standard baselines, resulting in a highly-challenging benchmark. An exhaustive analysis of the dataset from the perspective of tracking difficulty is carried out. To facilitate tracker comparison a new performance visualization technique is proposed.

  17. pLoc-mHum: predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information.

    PubMed

    Cheng, Xiang; Xiao, Xuan; Chou, Kuo-Chen

    2018-05-01

    For in-depth understanding the functions of proteins in a cell, the knowledge of their subcellular localization is indispensable. The current study is focused on human protein subcellular location prediction based on the sequence information alone. Although considerable efforts have been made in this regard, the problem is far from being solved yet. Most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions that are particularly important for both basic research and drug design. Using the multi-label theory, we present a new predictor called 'pLoc-mHum' by extracting the crucial GO (Gene Ontology) information into the general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validations on a same stringent benchmark dataset have indicated that the proposed pLoc-mHum predictor is remarkably superior to iLoc-Hum, the state-of-the-art method in predicting the human protein subcellular localization. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mHum/, by which users can easily get their desired results without the need to go through the complicated mathematics involved. xcheng@gordonlifescience.org. Supplementary data are available at Bioinformatics online.

  18. Predicting drug-target interactions by dual-network integrated logistic matrix factorization

    NASA Astrophysics Data System (ADS)

    Hao, Ming; Bryant, Stephen H.; Wang, Yanli

    2017-01-01

    In this work, we propose a dual-network integrated logistic matrix factorization (DNILMF) algorithm to predict potential drug-target interactions (DTI). The prediction procedure consists of four steps: (1) inferring new drug/target profiles and constructing profile kernel matrix; (2) diffusing drug profile kernel matrix with drug structure kernel matrix; (3) diffusing target profile kernel matrix with target sequence kernel matrix; and (4) building DNILMF model and smoothing new drug/target predictions based on their neighbors. We compare our algorithm with the state-of-the-art method based on the benchmark dataset. Results indicate that the DNILMF algorithm outperforms the previously reported approaches in terms of AUPR (area under precision-recall curve) and AUC (area under curve of receiver operating characteristic) based on the 5 trials of 10-fold cross-validation. We conclude that the performance improvement depends on not only the proposed objective function, but also the used nonlinear diffusion technique which is important but under studied in the DTI prediction field. In addition, we also compile a new DTI dataset for increasing the diversity of currently available benchmark datasets. The top prediction results for the new dataset are confirmed by experimental studies or supported by other computational research.

  19. Cancer Detection in Microarray Data Using a Modified Cat Swarm Optimization Clustering Approach

    PubMed

    M, Pandi; R, Balamurugan; N, Sadhasivam

    2017-12-29

    Objective: A better understanding of functional genomics can be obtained by extracting patterns hidden in gene expression data. This could have paramount implications for cancer diagnosis, gene treatments and other domains. Clustering may reveal natural structures and identify interesting patterns in underlying data. The main objective of this research was to derive a heuristic approach to detection of highly co-expressed genes related to cancer from gene expression data with minimum Mean Squared Error (MSE). Methods: A modified CSO algorithm using Harmony Search (MCSO-HS) for clustering cancer gene expression data was applied. Experiment results are analyzed using two cancer gene expression benchmark datasets, namely for leukaemia and for breast cancer. Result: The results indicated MCSO-HS to be better than HS and CSO, 13% and 9% with the leukaemia dataset. For breast cancer dataset improvement was by 22% and 17%, respectively, in terms of MSE. Conclusion: The results showed MCSO-HS to outperform HS and CSO with both benchmark datasets. To validate the clustering results, this work was tested with internal and external cluster validation indices. Also this work points to biological validation of clusters with gene ontology in terms of function, process and component. Creative Commons Attribution License

  20. Metric Evaluation Pipeline for 3d Modeling of Urban Scenes

    NASA Astrophysics Data System (ADS)

    Bosch, M.; Leichtman, A.; Chilcott, D.; Goldberg, H.; Brown, M.

    2017-05-01

    Publicly available benchmark data and metric evaluation approaches have been instrumental in enabling research to advance state of the art methods for remote sensing applications in urban 3D modeling. Most publicly available benchmark datasets have consisted of high resolution airborne imagery and lidar suitable for 3D modeling on a relatively modest scale. To enable research in larger scale 3D mapping, we have recently released a public benchmark dataset with multi-view commercial satellite imagery and metrics to compare 3D point clouds with lidar ground truth. We now define a more complete metric evaluation pipeline developed as publicly available open source software to assess semantically labeled 3D models of complex urban scenes derived from multi-view commercial satellite imagery. Evaluation metrics in our pipeline include horizontal and vertical accuracy and completeness, volumetric completeness and correctness, perceptual quality, and model simplicity. Sources of ground truth include airborne lidar and overhead imagery, and we demonstrate a semi-automated process for producing accurate ground truth shape files to characterize building footprints. We validate our current metric evaluation pipeline using 3D models produced using open source multi-view stereo methods. Data and software is made publicly available to enable further research and planned benchmarking activities.

  1. Towards evidence-based computational statistics: lessons from clinical research on the role and design of real-data benchmark studies.

    PubMed

    Boulesteix, Anne-Laure; Wilson, Rory; Hapfelmeier, Alexander

    2017-09-09

    The goal of medical research is to develop interventions that are in some sense superior, with respect to patient outcome, to interventions currently in use. Similarly, the goal of research in methodological computational statistics is to develop data analysis tools that are themselves superior to the existing tools. The methodology of the evaluation of medical interventions continues to be discussed extensively in the literature and it is now well accepted that medicine should be at least partly "evidence-based". Although we statisticians are convinced of the importance of unbiased, well-thought-out study designs and evidence-based approaches in the context of clinical research, we tend to ignore these principles when designing our own studies for evaluating statistical methods in the context of our methodological research. In this paper, we draw an analogy between clinical trials and real-data-based benchmarking experiments in methodological statistical science, with datasets playing the role of patients and methods playing the role of medical interventions. Through this analogy, we suggest directions for improvement in the design and interpretation of studies which use real data to evaluate statistical methods, in particular with respect to dataset inclusion criteria and the reduction of various forms of bias. More generally, we discuss the concept of "evidence-based" statistical research, its limitations and its impact on the design and interpretation of real-data-based benchmark experiments. We suggest that benchmark studies-a method of assessment of statistical methods using real-world datasets-might benefit from adopting (some) concepts from evidence-based medicine towards the goal of more evidence-based statistical research.

  2. A Benchmark Dataset and Saliency-guided Stacked Autoencoders for Video-based Salient Object Detection.

    PubMed

    Li, Jia; Xia, Changqun; Chen, Xiaowu

    2017-10-12

    Image-based salient object detection (SOD) has been extensively studied in past decades. However, video-based SOD is much less explored due to the lack of large-scale video datasets within which salient objects are unambiguously defined and annotated. Toward this end, this paper proposes a video-based SOD dataset that consists of 200 videos. In constructing the dataset, we manually annotate all objects and regions over 7,650 uniformly sampled keyframes and collect the eye-tracking data of 23 subjects who free-view all videos. From the user data, we find that salient objects in a video can be defined as objects that consistently pop-out throughout the video, and objects with such attributes can be unambiguously annotated by combining manually annotated object/region masks with eye-tracking data of multiple subjects. To the best of our knowledge, it is currently the largest dataset for videobased salient object detection. Based on this dataset, this paper proposes an unsupervised baseline approach for video-based SOD by using saliencyguided stacked autoencoders. In the proposed approach, multiple spatiotemporal saliency cues are first extracted at the pixel, superpixel and object levels. With these saliency cues, stacked autoencoders are constructed in an unsupervised manner that automatically infers a saliency score for each pixel by progressively encoding the high-dimensional saliency cues gathered from the pixel and its spatiotemporal neighbors. In experiments, the proposed unsupervised approach is compared with 31 state-of-the-art models on the proposed dataset and outperforms 30 of them, including 19 imagebased classic (unsupervised or non-deep learning) models, six image-based deep learning models, and five video-based unsupervised models. Moreover, benchmarking results show that the proposed dataset is very challenging and has the potential to boost the development of video-based SOD.

  3. A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images.

    PubMed

    Vázquez, David; Bernal, Jorge; Sánchez, F Javier; Fernández-Esparrach, Gloria; López, Antonio M; Romero, Adriana; Drozdzal, Michal; Courville, Aaron

    2017-01-01

    Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss rate and the inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing decision support systems (DSS) aiming to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image segmentation, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. The proposed dataset consists of 4 relevant classes to inspect the endoluminal scene, targeting different clinical needs. Together with the dataset and taking advantage of advances in semantic segmentation literature, we provide new baselines by training standard fully convolutional networks (FCNs). We perform a comparative study to show that FCNs significantly outperform, without any further postprocessing, prior results in endoluminal scene segmentation, especially with respect to polyp segmentation and localization.

  4. Spectral relative standard deviation: a practical benchmark in metabolomics.

    PubMed

    Parsons, Helen M; Ekman, Drew R; Collette, Timothy W; Viant, Mark R

    2009-03-01

    Metabolomics datasets, by definition, comprise of measurements of large numbers of metabolites. Both technical (analytical) and biological factors will induce variation within these measurements that is not consistent across all metabolites. Consequently, criteria are required to assess the reproducibility of metabolomics datasets that are derived from all the detected metabolites. Here we calculate spectrum-wide relative standard deviations (RSDs; also termed coefficient of variation, CV) for ten metabolomics datasets, spanning a variety of sample types from mammals, fish, invertebrates and a cell line, and display them succinctly as boxplots. We demonstrate multiple applications of spectral RSDs for characterising technical as well as inter-individual biological variation: for optimising metabolite extractions, comparing analytical techniques, investigating matrix effects, and comparing biofluids and tissue extracts from single and multiple species for optimising experimental design. Technical variation within metabolomics datasets, recorded using one- and two-dimensional NMR and mass spectrometry, ranges from 1.6 to 20.6% (reported as the median spectral RSD). Inter-individual biological variation is typically larger, ranging from as low as 7.2% for tissue extracts from laboratory-housed rats to 58.4% for fish plasma. In addition, for some of the datasets we confirm that the spectral RSD values are largely invariant across different spectral processing methods, such as baseline correction, normalisation and binning resolution. In conclusion, we propose spectral RSDs and their median values contained herein as practical benchmarks for metabolomics studies.

  5. The Harvard organic photovoltaic dataset

    DOE PAGES

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; ...

    2016-09-27

    Presented in this work is the Harvard Organic Photovoltaic Dataset (HOPV15), a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

  6. The Harvard organic photovoltaic dataset

    PubMed Central

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-01-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312

  7. The Harvard organic photovoltaic dataset.

    PubMed

    Lopez, Steven A; Pyzer-Knapp, Edward O; Simm, Gregor N; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-09-27

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

  8. Benchmarking Foot Trajectory Estimation Methods for Mobile Gait Analysis

    PubMed Central

    Ollenschläger, Malte; Roth, Nils; Klucken, Jochen

    2017-01-01

    Mobile gait analysis systems based on inertial sensing on the shoe are applied in a wide range of applications. Especially for medical applications, they can give new insights into motor impairment in, e.g., neurodegenerative disease and help objectify patient assessment. One key component in these systems is the reconstruction of the foot trajectories from inertial data. In literature, various methods for this task have been proposed. However, performance is evaluated on a variety of datasets due to the lack of large, generally accepted benchmark datasets. This hinders a fair comparison of methods. In this work, we implement three orientation estimation and three double integration schemes for use in a foot trajectory estimation pipeline. All methods are drawn from literature and evaluated against a marker-based motion capture reference. We provide a fair comparison on the same dataset consisting of 735 strides from 16 healthy subjects. As a result, the implemented methods are ranked and we identify the most suitable processing pipeline for foot trajectory estimation in the context of mobile gait analysis. PMID:28832511

  9. SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

    PubMed

    Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen

    2010-07-01

    We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.

  10. A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Madduri, Kamesh; Ediger, David; Jiang, Karl

    2009-02-15

    We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 millionmore » vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less

  11. A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Madduri, Kamesh; Ediger, David; Jiang, Karl

    2009-05-29

    We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less

  12. Feature pruning by upstream drainage area to support automated generalization of the United States National Hydrography Dataset

    USGS Publications Warehouse

    Stanislawski, L.V.

    2009-01-01

    The United States Geological Survey has been researching generalization approaches to enable multiple-scale display and delivery of geographic data. This paper presents automated methods to prune network and polygon features of the United States high-resolution National Hydrography Dataset (NHD) to lower resolutions. Feature-pruning rules, data enrichment, and partitioning are derived from knowledge of surface water, the NHD model, and associated feature specification standards. Relative prominence of network features is estimated from upstream drainage area (UDA). Network and polygon features are pruned by UDA and NHD reach code to achieve a drainage density appropriate for any less detailed map scale. Data partitioning maintains local drainage density variations that characterize the terrain. For demonstration, a 48 subbasin area of 1:24 000-scale NHD was pruned to 1:100 000-scale (100 K) and compared to a benchmark, the 100 K NHD. The coefficient of line correspondence (CLC) is used to evaluate how well pruned network features match the benchmark network. CLC values of 0.82 and 0.77 result from pruning with and without partitioning, respectively. The number of polygons that remain after pruning is about seven times that of the benchmark, but the area covered by the polygons that remain after pruning is only about 10% greater than the area covered by benchmark polygons. ?? 2009.

  13. Integrative Analysis of High-throughput Cancer Studies with Contrasted Penalization

    PubMed Central

    Shi, Xingjie; Liu, Jin; Huang, Jian; Zhou, Yong; Shia, BenChang; Ma, Shuangge

    2015-01-01

    In cancer studies with high-throughput genetic and genomic measurements, integrative analysis provides a way to effectively pool and analyze heterogeneous raw data from multiple independent studies and outperforms “classic” meta-analysis and single-dataset analysis. When marker selection is of interest, the genetic basis of multiple datasets can be described using the homogeneity model or the heterogeneity model. In this study, we consider marker selection under the heterogeneity model, which includes the homogeneity model as a special case and can be more flexible. Penalization methods have been developed in the literature for marker selection. This study advances from the published ones by introducing the contrast penalties, which can accommodate the within- and across-dataset structures of covariates/regression coefficients and, by doing so, further improve marker selection performance. Specifically, we develop a penalization method that accommodates the across-dataset structures by smoothing over regression coefficients. An effective iterative algorithm, which calls an inner coordinate descent iteration, is developed. Simulation shows that the proposed method outperforms the benchmark with more accurate marker identification. The analysis of breast cancer and lung cancer prognosis studies with gene expression measurements shows that the proposed method identifies genes different from those using the benchmark and has better prediction performance. PMID:24395534

  14. RBscore&NBench: a high-level web server for nucleic acid binding residues prediction with a large-scale benchmarking database.

    PubMed

    Miao, Zhichao; Westhof, Eric

    2016-07-08

    RBscore&NBench combines a web server, RBscore and a database, NBench. RBscore predicts RNA-/DNA-binding residues in proteins and visualizes the prediction scores and features on protein structures. The scoring scheme of RBscore directly links feature values to nucleic acid binding probabilities and illustrates the nucleic acid binding energy funnel on the protein surface. To avoid dataset, binding site definition and assessment metric biases, we compared RBscore with 18 web servers and 3 stand-alone programs on 41 datasets, which demonstrated the high and stable accuracy of RBscore. A comprehensive comparison led us to develop a benchmark database named NBench. The web server is available on: http://ahsoka.u-strasbg.fr/rbscorenbench/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Large-scale Labeled Datasets to Fuel Earth Science Deep Learning Applications

    NASA Astrophysics Data System (ADS)

    Maskey, M.; Ramachandran, R.; Miller, J.

    2017-12-01

    Deep learning has revolutionized computer vision and natural language processing with various algorithms scaled using high-performance computing. However, generic large-scale labeled datasets such as the ImageNet are the fuel that drives the impressive accuracy of deep learning results. Large-scale labeled datasets already exist in domains such as medical science, but creating them in the Earth science domain is a challenge. While there are ways to apply deep learning using limited labeled datasets, there is a need in the Earth sciences for creating large-scale labeled datasets for benchmarking and scaling deep learning applications. At the NASA Marshall Space Flight Center, we are using deep learning for a variety of Earth science applications where we have encountered the need for large-scale labeled datasets. We will discuss our approaches for creating such datasets and why these datasets are just as valuable as deep learning algorithms. We will also describe successful usage of these large-scale labeled datasets with our deep learning based applications.

  16. mPLR-Loc: an adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction.

    PubMed

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2015-03-15

    Proteins located in appropriate cellular compartments are of paramount importance to exert their biological functions. Prediction of protein subcellular localization by computational methods is required in the post-genomic era. Recent studies have been focusing on predicting not only single-location proteins but also multi-location proteins. However, most of the existing predictors are far from effective for tackling the challenges of multi-label proteins. This article proposes an efficient multi-label predictor, namely mPLR-Loc, based on penalized logistic regression and adaptive decisions for predicting both single- and multi-location proteins. Specifically, for each query protein, mPLR-Loc exploits the information from the Gene Ontology (GO) database by using its accession number (AC) or the ACs of its homologs obtained via BLAST. The frequencies of GO occurrences are used to construct feature vectors, which are then classified by an adaptive decision-based multi-label penalized logistic regression classifier. Experimental results based on two recent stringent benchmark datasets (virus and plant) show that mPLR-Loc remarkably outperforms existing state-of-the-art multi-label predictors. In addition to being able to rapidly and accurately predict subcellular localization of single- and multi-label proteins, mPLR-Loc can also provide probabilistic confidence scores for the prediction decisions. For readers' convenience, the mPLR-Loc server is available online (http://bioinfo.eie.polyu.edu.hk/mPLRLocServer). Copyright © 2014 Elsevier Inc. All rights reserved.

  17. pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC.

    PubMed

    Cheng, Xiang; Xiao, Xuan; Chou, Kuo-Chen

    2017-08-22

    One of the fundamental goals in cellular biochemistry is to identify the functions of proteins in the context of compartments that organize them in the cellular environment. To realize this, it is indispensable to develop an automated method for fast and accurate identification of the subcellular locations of uncharacterized proteins. The current study is focused on plant protein subcellular location prediction based on the sequence information alone. Although considerable efforts have been made in this regard, the problem is far from being solved yet. Most of the existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions. This kind of multiplex protein is particularly important for both basic research and drug design. Using the multi-label theory, we present a new predictor called "pLoc-mPlant" by extracting the optimal GO (Gene Ontology) information into the Chou's general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validation on the same stringent benchmark dataset indicated that the proposed pLoc-mPlant predictor is remarkably superior to iLoc-Plant, the state-of-the-art method for predicting plant protein subcellular localization. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at , by which users can easily get their desired results without the need to go through the complicated mathematics involved.

  18. pLoc-mVirus: Predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC.

    PubMed

    Cheng, Xiang; Xiao, Xuan; Chou, Kuo-Chen

    2017-09-10

    Knowledge of subcellular locations of proteins is crucially important for in-depth understanding their functions in a cell. With the explosive growth of protein sequences generated in the postgenomic age, it is highly demanded to develop computational tools for timely annotating their subcellular locations based on the sequence information alone. The current study is focused on virus proteins. Although considerable efforts have been made in this regard, the problem is far from being solved yet. Most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions. This kind of multiplex proteins is particularly important for both basic research and drug design. Using the multi-label theory, we present a new predictor called "pLoc-mVirus" by extracting the optimal GO (Gene Ontology) information into the general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validation on a same stringent benchmark dataset indicated that the proposed pLoc-mVirus predictor is remarkably superior to iLoc-Virus, the state-of-the-art method in predicting virus protein subcellular localization. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mVirus/, by which users can easily get their desired results without the need to go through the complicated mathematics involved. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. MatchingLand, geospatial data testbed for the assessment of matching methods.

    PubMed

    Xavier, Emerson M A; Ariza-López, Francisco J; Ureña-Cámara, Manuel A

    2017-12-05

    This article presents datasets prepared with the aim of helping the evaluation of geospatial matching methods for vector data. These datasets were built up from mapping data produced by official Spanish mapping agencies. The testbed supplied encompasses the three geometry types: point, line and area. Initial datasets were submitted to geometric transformations in order to generate synthetic datasets. These transformations represent factors that might influence the performance of geospatial matching methods, like the morphology of linear or areal features, systematic transformations, and random disturbance over initial data. We call our 11 GiB benchmark data 'MatchingLand' and we hope it can be useful for the geographic information science research community.

  20. PredictSNP: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations

    PubMed Central

    Bendl, Jaroslav; Stourac, Jan; Salanda, Ondrej; Pavelka, Antonin; Wieben, Eric D.; Zendulka, Jaroslav; Brezovsky, Jan; Damborsky, Jiri

    2014-01-01

    Single nucleotide variants represent a prevalent form of genetic variation. Mutations in the coding regions are frequently associated with the development of various genetic diseases. Computational tools for the prediction of the effects of mutations on protein function are very important for analysis of single nucleotide variants and their prioritization for experimental characterization. Many computational tools are already widely employed for this purpose. Unfortunately, their comparison and further improvement is hindered by large overlaps between the training datasets and benchmark datasets, which lead to biased and overly optimistic reported performances. In this study, we have constructed three independent datasets by removing all duplicities, inconsistencies and mutations previously used in the training of evaluated tools. The benchmark dataset containing over 43,000 mutations was employed for the unbiased evaluation of eight established prediction tools: MAPP, nsSNPAnalyzer, PANTHER, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP. The six best performing tools were combined into a consensus classifier PredictSNP, resulting into significantly improved prediction performance, and at the same time returned results for all mutations, confirming that consensus prediction represents an accurate and robust alternative to the predictions delivered by individual tools. A user-friendly web interface enables easy access to all eight prediction tools, the consensus classifier PredictSNP and annotations from the Protein Mutant Database and the UniProt database. The web server and the datasets are freely available to the academic community at http://loschmidt.chemi.muni.cz/predictsnp. PMID:24453961

  1. Bio-inspired benchmark generator for extracellular multi-unit recordings

    PubMed Central

    Mondragón-González, Sirenia Lizbeth; Burguière, Eric

    2017-01-01

    The analysis of multi-unit extracellular recordings of brain activity has led to the development of numerous tools, ranging from signal processing algorithms to electronic devices and applications. Currently, the evaluation and optimisation of these tools are hampered by the lack of ground-truth databases of neural signals. These databases must be parameterisable, easy to generate and bio-inspired, i.e. containing features encountered in real electrophysiological recording sessions. Towards that end, this article introduces an original computational approach to create fully annotated and parameterised benchmark datasets, generated from the summation of three components: neural signals from compartmental models and recorded extracellular spikes, non-stationary slow oscillations, and a variety of different types of artefacts. We present three application examples. (1) We reproduced in-vivo extracellular hippocampal multi-unit recordings from either tetrode or polytrode designs. (2) We simulated recordings in two different experimental conditions: anaesthetised and awake subjects. (3) Last, we also conducted a series of simulations to study the impact of different level of artefacts on extracellular recordings and their influence in the frequency domain. Beyond the results presented here, such a benchmark dataset generator has many applications such as calibration, evaluation and development of both hardware and software architectures. PMID:28233819

  2. Lessons Learned over Four Benchmark Exercises from the Community Structure-Activity Resource

    PubMed Central

    Carlson, Heather A.

    2016-01-01

    Preparing datasets and analyzing the results is difficult and time-consuming, and I hope the points raised here will help other scientists avoid some of the thorny issues we wrestled with. PMID:27345761

  3. Spectral Relative Standard Deviation: A Practical Benchmark in Metabolomics

    EPA Science Inventory

    Metabolomics datasets, by definition, comprise of measurements of large numbers of metabolites. Both technical (analytical) and biological factors will induce variation within these measurements that is not consistent across all metabolites. Consequently, criteria are required to...

  4. Pse-Analysis: a python package for DNA/RNA and protein/ peptide sequence analysis based on pseudo components and kernel methods.

    PubMed

    Liu, Bin; Wu, Hao; Zhang, Deyuan; Wang, Xiaolong; Chou, Kuo-Chen

    2017-02-21

    To expedite the pace in conducting genome/proteome analysis, we have developed a Python package called Pse-Analysis. The powerful package can automatically complete the following five procedures: (1) sample feature extraction, (2) optimal parameter selection, (3) model training, (4) cross validation, and (5) evaluating prediction quality. All the work a user needs to do is to input a benchmark dataset along with the query biological sequences concerned. Based on the benchmark dataset, Pse-Analysis will automatically construct an ideal predictor, followed by yielding the predicted results for the submitted query samples. All the aforementioned tedious jobs can be automatically done by the computer. Moreover, the multiprocessing technique was adopted to enhance computational speed by about 6 folds. The Pse-Analysis Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/Pse-Analysis/, and can be directly run on Windows, Linux, and Unix.

  5. Evaluation of state-of-the-art segmentation algorithms for left ventricle infarct from late Gadolinium enhancement MR images.

    PubMed

    Karim, Rashed; Bhagirath, Pranav; Claus, Piet; James Housden, R; Chen, Zhong; Karimaghaloo, Zahra; Sohn, Hyon-Mok; Lara Rodríguez, Laura; Vera, Sergio; Albà, Xènia; Hennemuth, Anja; Peitgen, Heinz-Otto; Arbel, Tal; Gonzàlez Ballester, Miguel A; Frangi, Alejandro F; Götte, Marco; Razavi, Reza; Schaeffter, Tobias; Rhode, Kawal

    2016-05-01

    Studies have demonstrated the feasibility of late Gadolinium enhancement (LGE) cardiovascular magnetic resonance (CMR) imaging for guiding the management of patients with sequelae to myocardial infarction, such as ventricular tachycardia and heart failure. Clinical implementation of these developments necessitates a reproducible and reliable segmentation of the infarcted regions. It is challenging to compare new algorithms for infarct segmentation in the left ventricle (LV) with existing algorithms. Benchmarking datasets with evaluation strategies are much needed to facilitate comparison. This manuscript presents a benchmarking evaluation framework for future algorithms that segment infarct from LGE CMR of the LV. The image database consists of 30 LGE CMR images of both humans and pigs that were acquired from two separate imaging centres. A consensus ground truth was obtained for all data using maximum likelihood estimation. Six widely-used fixed-thresholding methods and five recently developed algorithms are tested on the benchmarking framework. Results demonstrate that the algorithms have better overlap with the consensus ground truth than most of the n-SD fixed-thresholding methods, with the exception of the Full-Width-at-Half-Maximum (FWHM) fixed-thresholding method. Some of the pitfalls of fixed thresholding methods are demonstrated in this work. The benchmarking evaluation framework, which is a contribution of this work, can be used to test and benchmark future algorithms that detect and quantify infarct in LGE CMR images of the LV. The datasets, ground truth and evaluation code have been made publicly available through the website: https://www.cardiacatlas.org/web/guest/challenges. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  6. Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies.

    PubMed

    Thorsen, Jonathan; Brejnrod, Asker; Mortensen, Martin; Rasmussen, Morten A; Stokholm, Jakob; Al-Soud, Waleed Abu; Sørensen, Søren; Bisgaard, Hans; Waage, Johannes

    2016-11-25

    There is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease. A common approach for examining bacterial communities is high-throughput sequencing of 16S rRNA gene hypervariable regions, aggregating sequence-similar amplicons into operational taxonomic units (OTUs). Strategies for detecting differential relative abundance of OTUs between sample conditions include classical statistical approaches as well as a plethora of newer methods, many borrowing from the related field of RNA-seq analysis. This effort is complicated by unique data characteristics, including sparsity, sequencing depth variation, and nonconformity of read counts to theoretical distributions, which is often exacerbated by exploratory and/or unbalanced study designs. Here, we assess the robustness of available methods for (1) inference in differential relative abundance analysis and (2) beta-diversity-based sample separation, using a rigorous benchmarking framework based on large clinical 16S microbiome datasets from different sources. Running more than 380,000 full differential relative abundance tests on real datasets with permuted case/control assignments and in silico-spiked OTUs, we identify large differences in method performance on a range of parameters, including false positive rates, sensitivity to sparsity and case/control balances, and spike-in retrieval rate. In large datasets, methods with the highest false positive rates also tend to have the best detection power. For beta-diversity-based sample separation, we show that library size normalization has very little effect and that the distance metric is the most important factor in terms of separation power. Our results, generalizable to datasets from different sequencing platforms, demonstrate how the choice of method considerably affects analysis outcome. Here, we give recommendations for tools that exhibit low false positive rates, have good retrieval power across effect sizes and case/control proportions, and have low sparsity bias. Result output from some commonly used methods should be interpreted with caution. We provide an easily extensible framework for benchmarking of new methods and future microbiome datasets.

  7. Deep Filter Banks for Texture Recognition, Description, and Segmentation.

    PubMed

    Cimpoi, Mircea; Maji, Subhransu; Kokkinos, Iasonas; Vedaldi, Andrea

    Visual textures have played a key role in image understanding because they convey important semantics of images, and because texture representations that pool local image descriptors in an orderless manner have had a tremendous impact in diverse applications. In this paper we make several contributions to texture understanding. First, instead of focusing on texture instance and material category recognition, we propose a human-interpretable vocabulary of texture attributes to describe common texture patterns, complemented by a new describable texture dataset for benchmarking. Second, we look at the problem of recognizing materials and texture attributes in realistic imaging conditions, including when textures appear in clutter, developing corresponding benchmarks on top of the recently proposed OpenSurfaces dataset. Third, we revisit classic texture represenations, including bag-of-visual-words and the Fisher vectors, in the context of deep learning and show that these have excellent efficiency and generalization properties if the convolutional layers of a deep model are used as filter banks. We obtain in this manner state-of-the-art performance in numerous datasets well beyond textures, an efficient method to apply deep features to image regions, as well as benefit in transferring features from one domain to another.

  8. Evaluating the Quantitative Capabilities of Metagenomic Analysis Software.

    PubMed

    Kerepesi, Csaba; Grolmusz, Vince

    2016-05-01

    DNA sequencing technologies are applied widely and frequently today to describe metagenomes, i.e., microbial communities in environmental or clinical samples, without the need for culturing them. These technologies usually return short (100-300 base-pairs long) DNA reads, and these reads are processed by metagenomic analysis software that assign phylogenetic composition-information to the dataset. Here we evaluate three metagenomic analysis software (AmphoraNet--a webserver implementation of AMPHORA2--, MG-RAST, and MEGAN5) for their capabilities of assigning quantitative phylogenetic information for the data, describing the frequency of appearance of the microorganisms of the same taxa in the sample. The difficulties of the task arise from the fact that longer genomes produce more reads from the same organism than shorter genomes, and some software assign higher frequencies to species with longer genomes than to those with shorter ones. This phenomenon is called the "genome length bias." Dozens of complex artificial metagenome benchmarks can be found in the literature. Because of the complexity of those benchmarks, it is usually difficult to judge the resistance of a metagenomic software to this "genome length bias." Therefore, we have made a simple benchmark for the evaluation of the "taxon-counting" in a metagenomic sample: we have taken the same number of copies of three full bacterial genomes of different lengths, break them up randomly to short reads of average length of 150 bp, and mixed the reads, creating our simple benchmark. Because of its simplicity, the benchmark is not supposed to serve as a mock metagenome, but if a software fails on that simple task, it will surely fail on most real metagenomes. We applied three software for the benchmark. The ideal quantitative solution would assign the same proportion to the three bacterial taxa. We have found that AMPHORA2/AmphoraNet gave the most accurate results and the other two software were under-performers: they counted quite reliably each short read to their respective taxon, producing the typical genome length bias. The benchmark dataset is available at http://pitgroup.org/static/3RandomGenome-100kavg150bps.fna.

  9. Decibel: The Relational Dataset Branching System

    PubMed Central

    Maddox, Michael; Goehring, David; Elmore, Aaron J.; Madden, Samuel; Parameswaran, Aditya; Deshpande, Amol

    2017-01-01

    As scientific endeavors and data analysis become increasingly collaborative, there is a need for data management systems that natively support the versioning or branching of datasets to enable concurrent analysis, cleaning, integration, manipulation, or curation of data across teams of individuals. Common practice for sharing and collaborating on datasets involves creating or storing multiple copies of the dataset, one for each stage of analysis, with no provenance information tracking the relationships between these datasets. This results not only in wasted storage, but also makes it challenging to track and integrate modifications made by different users to the same dataset. In this paper, we introduce the Relational Dataset Branching System, Decibel, a new relational storage system with built-in version control designed to address these shortcomings. We present our initial design for Decibel and provide a thorough evaluation of three versioned storage engine designs that focus on efficient query processing with minimal storage overhead. We also develop an exhaustive benchmark to enable the rigorous testing of these and future versioned storage engine designs. PMID:28149668

  10. Traffic sign classification with dataset augmentation and convolutional neural network

    NASA Astrophysics Data System (ADS)

    Tang, Qing; Kurnianggoro, Laksono; Jo, Kang-Hyun

    2018-04-01

    This paper presents a method for traffic sign classification using a convolutional neural network (CNN). In this method, firstly we transfer a color image into grayscale, and then normalize it in the range (-1,1) as the preprocessing step. To increase robustness of classification model, we apply a dataset augmentation algorithm and create new images to train the model. To avoid overfitting, we utilize a dropout module before the last fully connection layer. To assess the performance of the proposed method, the German traffic sign recognition benchmark (GTSRB) dataset is utilized. Experimental results show that the method is effective in classifying traffic signs.

  11. Development and Validation of a High-Quality Composite Real-World Mortality Endpoint.

    PubMed

    Curtis, Melissa D; Griffith, Sandra D; Tucker, Melisa; Taylor, Michael D; Capra, William B; Carrigan, Gillis; Holzman, Ben; Torres, Aracelis Z; You, Paul; Arnieri, Brandon; Abernethy, Amy P

    2018-05-14

    To create a high-quality electronic health record (EHR)-derived mortality dataset for retrospective and prospective real-world evidence generation. Oncology EHR data, supplemented with external commercial and US Social Security Death Index data, benchmarked to the National Death Index (NDI). We developed a recent, linkable, high-quality mortality variable amalgamated from multiple data sources to supplement EHR data, benchmarked against the highest completeness U.S. mortality data, the NDI. Data quality of the mortality variable version 2.0 is reported here. For advanced non-small-cell lung cancer, sensitivity of mortality information improved from 66 percent in EHR structured data to 91 percent in the composite dataset, with high date agreement compared to the NDI. For advanced melanoma, metastatic colorectal cancer, and metastatic breast cancer, sensitivity of the final variable was 85 to 88 percent. Kaplan-Meier survival analyses showed that improving mortality data completeness minimized overestimation of survival relative to NDI-based estimates. For EHR-derived data to yield reliable real-world evidence, it needs to be of known and sufficiently high quality. Considering the impact of mortality data completeness on survival endpoints, we highlight the importance of data quality assessment and advocate benchmarking to the NDI. © 2018 The Authors. Health Services Research published by Wiley Periodicals, Inc. on behalf of Health Research and Educational Trust.

  12. Benchmarking the cost efficiency of community care in Australian child and adolescent mental health services: implications for future benchmarking.

    PubMed

    Furber, Gareth; Brann, Peter; Skene, Clive; Allison, Stephen

    2011-06-01

    The purpose of this study was to benchmark the cost efficiency of community care across six child and adolescent mental health services (CAMHS) drawn from different Australian states. Organizational, contact and outcome data from the National Mental Health Benchmarking Project (NMHBP) data-sets were used to calculate cost per "treatment hour" and cost per episode for the six participating organizations. We also explored the relationship between intake severity as measured by the Health of the Nations Outcome Scales for Children and Adolescents (HoNOSCA) and cost per episode. The average cost per treatment hour was $223, with cost differences across the six services ranging from a mean of $156 to $273 per treatment hour. The average cost per episode was $3349 (median $1577) and there were significant differences in the CAMHS organizational medians ranging from $388 to $7076 per episode. HoNOSCA scores explained at best 6% of the cost variance per episode. These large cost differences indicate that community CAMHS have the potential to make substantial gains in cost efficiency through collaborative benchmarking. Benchmarking forums need considerable financial and business expertise for detailed comparison of business models for service provision.

  13. BMDExpress Data Viewer: A Visualization Tool to Analyze BMDExpress Datasets(SoTC)

    EPA Science Inventory

    Background: Benchmark Dose (BMD) modelling is a mathematical approach used to determine where a dose-response change begins to take place relative to controls following chemical exposure. BMDs are being increasingly applied in regulatory toxicology to estimate acceptable exposure...

  14. BMDExpress Data Viewer: A Visualization Tool to Analyze BMDExpress Datasets (STC symposium)

    EPA Science Inventory

    Background: Benchmark Dose (BMD) modelling is a mathematical approach used to determine where a dose-response change begins to take place relative to controls following chemical exposure. BMDs are being increasingly applied in regulatory toxicology to estimate acceptable exposure...

  15. QED Tests and Search for New Physics in Molecular Hydrogen

    NASA Astrophysics Data System (ADS)

    Salumbides, E. J.; Niu, M. L.; Dickenson, G. D.; Eikema, K. S. E.; Komasa, J.; Pachucki, K.; Ubachs, W.

    2013-06-01

    The hydrogen molecule has been the benchmark system for quantum chemistry, and may provide a test ground for new physics. We present our high-resolution spectroscopic studies on the X ^1Σ^+_g electronic ground state rotational series and fundamenal vibrational tones in molecular hydrogen. In combination with recent accurate ab initio calculations, we demonstrate systematic tests of quantum electrodynamical (QED) effects in molecules. Moreover, the precise comparison between theory and experiment can provide stringent constraints on possible new interactions that extend beyond the Standard Model. E. J. Salumbides, G. D. Dickenson, T. I. Ivanov and W. Ubachs, Phys. Rev. Lett. 107, 043005 (2011).

  16. Derivation, Validation and Application of a Pragmatic Risk Prediction Index for Benchmarking of Surgical Outcomes.

    PubMed

    Spence, Richard T; Chang, David C; Kaafarani, Haytham M A; Panieri, Eugenio; Anderson, Geoffrey A; Hutter, Matthew M

    2018-02-01

    Despite the existence of multiple validated risk assessment and quality benchmarking tools in surgery, their utility outside of high-income countries is limited. We sought to derive, validate and apply a scoring system that is both (1) feasible, and (2) reliably predicts mortality in a middle-income country (MIC) context. A 5-step methodology was used: (1) development of a de novo surgical outcomes database modeled around the American College of Surgeons' National Surgical Quality Improvement Program (ACS-NSQIP) in South Africa (SA dataset), (2) use of the resultant data to identify all predictors of in-hospital death with more than 90% capture indicating feasibility of collection, (3) use these predictors to derive and validate an integer-based score that reliably predicts in-hospital death in the 2012 ACS-NSQIP, (4) apply the score in the original SA dataset and demonstrate its performance, (5) identify threshold cutoffs of the score to prompt action and drive quality improvement. Following step one-three above, the 13 point Codman's score was derived and validated on 211,737 and 109,079 patients, respectively, and includes: age 65 (1), partially or completely dependent functional status (1), preoperative transfusions ≥4 units (1), emergency operation (2), sepsis or septic shock (2) American Society of Anesthesia score ≥3 (3) and operative procedure (1-3). Application of the score to 373 patients in the SA dataset showed good discrimination and calibration to predict an in-hospital death. A Codman Score of 8 is an optimal cutoff point for defining expected and unexpected deaths. We have designed a novel risk prediction score specific for a MIC context. The Codman Score can prove useful for both (1) preoperative decision-making and (2) benchmarking the quality of surgical care in MIC's.

  17. Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees

    PubMed Central

    Yamada, Kazunori D.; Tomii, Kentaro; Katoh, Kazutaka

    2016-01-01

    Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. Results: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. Availability and Implementation: http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27378296

  18. Refined docking as a valuable tool for lead optimization: application to histamine H3 receptor antagonists.

    PubMed

    Levoin, Nicolas; Calmels, Thierry; Poupardin-Olivier, Olivia; Labeeuw, Olivier; Danvy, Denis; Robert, Philippe; Berrebi-Bertrand, Isabelle; Ganellin, C Robin; Schunack, Walter; Stark, Holger; Capet, Marc

    2008-10-01

    Drug-discovery projects frequently employ structure-based information through protein modeling and ligand docking, and there is a plethora of reports relating successful use of them in virtual screening. Hit/lead optimization, which represents the next step and the longest for the medicinal chemist, is very rarely considered. This is not surprising because lead optimization is a much more complex task. Here, a homology model of the histamine H(3) receptor was built and tested for its ability to discriminate ligands above a defined threshold of affinity. In addition, drug safety is also evaluated during lead optimization, and "antitargets" are studied. So, we have used the same benchmarking procedure with the HERG channel and CYP2D6 enzyme, for which a minimal affinity is strongly desired. For targets and antitargets, we report here an accuracy as high as at least 70%, for ligands being classified above or below the chosen threshold. Such a good result is beyond what could have been predicted, especially, since our test conditions were particularly stringent. First, we measured the accuracy by means of AUC of ROC plots, i. e. considering both false positive and false negatives. Second, we used as datasets extensive chemical libraries (nearly a thousand ligands for H(3)). All molecules considered were true H(3) receptor ligands with moderate to high affinity (from microM to nM range). Third, the database is issued from concrete SAR (Bioprojet H(3) BF2.649 library) and is not simply constituted by few active ligands buried in a chemical catalogue.

  19. Use of the moon to support on-orbit sensor calibration for climate change measurements

    USGS Publications Warehouse

    Stone, T.C.; Kieffer, H.H.

    2006-01-01

    Production of reliable climate datasets from multiple observational measurements acquired by remote sensing satellite systems available now and in the future places stringent requirements on the stability of sensors and consistency among the instruments and platforms. Detecting trends in environmental parameters measured at solar reflectance wavelengths (0.3 to 2.5 microns) requires on-orbit instrument stability at a level of 1% over a decade. This benchmark can be attained using the Moon as a radiometric reference. The lunar calibration program at the U.S. Geological Survey has an operational model to predict the lunar spectral irradiance with precision ???1%, explicitly accounting for the effects of phase, lunar librations, and the lunar surface photometric function. A system for utilization of the Moon by on-orbit instruments has been established. With multiple lunar views taken by a spacecraft instrument, sensor response characterization with sub-percent precision over several years has been achieved. Meteorological satellites in geostationary orbit (GEO) capture the Moon in operational images; applying lunar calibration to GEO visible-channel image archives has the potential to develop a climate record extending decades into the past. The USGS model and system can provide reliable transfer of calibration among instruments that have viewed the Moon as a common source. This capability will be enhanced with improvements to the USGS model absolute scale. Lunar calibration may prove essential to the critical calibration needs to cover a potential gap in observational capabilities prior to deployment of NPP/NPOESS. A key requirement is that current and future instruments observe the Moon.

  20. Creating a non-linear total sediment load formula using polynomial best subset regression model

    NASA Astrophysics Data System (ADS)

    Okcu, Davut; Pektas, Ali Osman; Uyumaz, Ali

    2016-08-01

    The aim of this study is to derive a new total sediment load formula which is more accurate and which has less application constraints than the well-known formulae of the literature. 5 most known stream power concept sediment formulae which are approved by ASCE are used for benchmarking on a wide range of datasets that includes both field and flume (lab) observations. The dimensionless parameters of these widely used formulae are used as inputs in a new regression approach. The new approach is called Polynomial Best subset regression (PBSR) analysis. The aim of the PBRS analysis is fitting and testing all possible combinations of the input variables and selecting the best subset. Whole the input variables with their second and third powers are included in the regression to test the possible relation between the explanatory variables and the dependent variable. While selecting the best subset a multistep approach is used that depends on significance values and also the multicollinearity degrees of inputs. The new formula is compared to others in a holdout dataset and detailed performance investigations are conducted for field and lab datasets within this holdout data. Different goodness of fit statistics are used as they represent different perspectives of the model accuracy. After the detailed comparisons are carried out we figured out the most accurate equation that is also applicable on both flume and river data. Especially, on field dataset the prediction performance of the proposed formula outperformed the benchmark formulations.

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, Grace L.; Department of Health Services Research, The University of Texas MD Anderson Cancer Center, Houston, Texas; Jiang, Jing

    Purpose: High-quality treatment for intact cervical cancer requires external radiation therapy, brachytherapy, and chemotherapy, carefully sequenced and completed without delays. We sought to determine how frequently current treatment meets quality benchmarks and whether new technologies have influenced patterns of care. Methods and Materials: By searching diagnosis and procedure claims in MarketScan, an employment-based health care claims database, we identified 1508 patients with nonmetastatic, intact cervical cancer treated from 1999 to 2011, who were <65 years of age and received >10 fractions of radiation. Treatments received were identified using procedure codes and compared with 3 quality benchmarks: receipt of brachytherapy, receipt ofmore » chemotherapy, and radiation treatment duration not exceeding 63 days. The Cochran-Armitage test was used to evaluate temporal trends. Results: Seventy-eight percent of patients (n=1182) received brachytherapy, with brachytherapy receipt stable over time (Cochran-Armitage P{sub trend}=.15). Among patients who received brachytherapy, 66% had high–dose rate and 34% had low–dose rate treatment, although use of high–dose rate brachytherapy steadily increased to 75% by 2011 (P{sub trend}<.001). Eighteen percent of patients (n=278) received intensity modulated radiation therapy (IMRT), and IMRT receipt increased to 37% by 2011 (P{sub trend}<.001). Only 2.5% of patients (n=38) received IMRT in the setting of brachytherapy omission. Overall, 79% of patients (n=1185) received chemotherapy, and chemotherapy receipt increased to 84% by 2011 (P{sub trend}<.001). Median radiation treatment duration was 56 days (interquartile range, 47-65 days); however, duration exceeded 63 days in 36% of patients (n=543). Although 98% of patients received at least 1 benchmark treatment, only 44% received treatment that met all 3 benchmarks. With more stringent indicators (brachytherapy, ≥4 chemotherapy cycles, and duration not exceeding 56 days), only 25% of patients received treatment that met all benchmarks. Conclusion: In this cohort, most cervical cancer patients received treatment that did not comply with all 3 benchmarks for quality treatment. In contrast to increasing receipt of newer radiation technologies, there was little improvement in receipt of essential treatment benchmarks.« less

  2. CSmetaPred: a consensus method for prediction of catalytic residues.

    PubMed

    Choudhary, Preeti; Kumar, Shailesh; Bachhawat, Anand Kumar; Pandit, Shashi Bhushan

    2017-12-22

    Knowledge of catalytic residues can play an essential role in elucidating mechanistic details of an enzyme. However, experimental identification of catalytic residues is a tedious and time-consuming task, which can be expedited by computational predictions. Despite significant development in active-site prediction methods, one of the remaining issues is ranked positions of putative catalytic residues among all ranked residues. In order to improve ranking of catalytic residues and their prediction accuracy, we have developed a meta-approach based method CSmetaPred. In this approach, residues are ranked based on the mean of normalized residue scores derived from four well-known catalytic residue predictors. The mean residue score of CSmetaPred is combined with predicted pocket information to improve prediction performance in meta-predictor, CSmetaPred_poc. Both meta-predictors are evaluated on two comprehensive benchmark datasets and three legacy datasets using Receiver Operating Characteristic (ROC) and Precision Recall (PR) curves. The visual and quantitative analysis of ROC and PR curves shows that meta-predictors outperform their constituent methods and CSmetaPred_poc is the best of evaluated methods. For instance, on CSAMAC dataset CSmetaPred_poc (CSmetaPred) achieves highest Mean Average Specificity (MAS), a scalar measure for ROC curve, of 0.97 (0.96). Importantly, median predicted rank of catalytic residues is the lowest (best) for CSmetaPred_poc. Considering residues ranked ≤20 classified as true positive in binary classification, CSmetaPred_poc achieves prediction accuracy of 0.94 on CSAMAC dataset. Moreover, on the same dataset CSmetaPred_poc predicts all catalytic residues within top 20 ranks for ~73% of enzymes. Furthermore, benchmarking of prediction on comparative modelled structures showed that models result in better prediction than only sequence based predictions. These analyses suggest that CSmetaPred_poc is able to rank putative catalytic residues at lower (better) ranked positions, which can facilitate and expedite their experimental characterization. The benchmarking studies showed that employing meta-approach in combining residue-level scores derived from well-known catalytic residue predictors can improve prediction accuracy as well as provide improved ranked positions of known catalytic residues. Hence, such predictions can assist experimentalist to prioritize residues for mutational studies in their efforts to characterize catalytic residues. Both meta-predictors are available as webserver at: http://14.139.227.206/csmetapred/ .

  3. Benchmarking a Visual-Basic based multi-component one-dimensional reactive transport modeling tool

    NASA Astrophysics Data System (ADS)

    Torlapati, Jagadish; Prabhakar Clement, T.

    2013-01-01

    We present the details of a comprehensive numerical modeling tool, RT1D, which can be used for simulating biochemical and geochemical reactive transport problems. The code can be run within the standard Microsoft EXCEL Visual Basic platform, and it does not require any additional software tools. The code can be easily adapted by others for simulating different types of laboratory-scale reactive transport experiments. We illustrate the capabilities of the tool by solving five benchmark problems with varying levels of reaction complexity. These literature-derived benchmarks are used to highlight the versatility of the code for solving a variety of practical reactive transport problems. The benchmarks are described in detail to provide a comprehensive database, which can be used by model developers to test other numerical codes. The VBA code presented in the study is a practical tool that can be used by laboratory researchers for analyzing both batch and column datasets within an EXCEL platform.

  4. A benchmarking procedure for PIGE related differential cross-sections

    NASA Astrophysics Data System (ADS)

    Axiotis, M.; Lagoyannis, A.; Fazinić, S.; Harissopulos, S.; Kokkoris, M.; Preketes-Sigalas, K.; Provatas, G.

    2018-05-01

    The application of standard-less PIGE requires the a priori knowledge of the differential cross section of the reaction used for the quantification of each detected light element. Towards this end, a lot of datasets have been published the last few years from several laboratories around the world. The discrepancies often found between different measured cross sections can be resolved by applying a rigorous benchmarking procedure through the measurement of thick target yields. Such a procedure is proposed in the present paper and is applied in the case of the 19F(p,p‧ γ)19F reaction.

  5. Building Bridges Between Geoscience and Data Science through Benchmark Data Sets

    NASA Astrophysics Data System (ADS)

    Thompson, D. R.; Ebert-Uphoff, I.; Demir, I.; Gel, Y.; Hill, M. C.; Karpatne, A.; Güereque, M.; Kumar, V.; Cabral, E.; Smyth, P.

    2017-12-01

    The changing nature of observational field data demands richer and more meaningful collaboration between data scientists and geoscientists. Thus, among other efforts, the Working Group on Case Studies of the NSF-funded RCN on Intelligent Systems Research To Support Geosciences (IS-GEO) is developing a framework to strengthen such collaborations through the creation of benchmark datasets. Benchmark datasets provide an interface between disciplines without requiring extensive background knowledge. The goals are to create (1) a means for two-way communication between geoscience and data science researchers; (2) new collaborations, which may lead to new approaches for data analysis in the geosciences; and (3) a public, permanent repository of complex data sets, representative of geoscience problems, useful to coordinate efforts in research and education. The group identified 10 key elements and characteristics for ideal benchmarks. High impact: A problem with high potential impact. Active research area: A group of geoscientists should be eager to continue working on the topic. Challenge: The problem should be challenging for data scientists. Data science generality and versatility: It should stimulate development of new general and versatile data science methods. Rich information content: Ideally the data set provides stimulus for analysis at many different levels. Hierarchical problem statement: A hierarchy of suggested analysis tasks, from relatively straightforward to open-ended tasks. Means for evaluating success: Data scientists and geoscientists need means to evaluate whether the algorithms are successful and achieve intended purpose. Quick start guide: Introduction for data scientists on how to easily read the data to enable rapid initial data exploration. Geoscience context: Summary for data scientists of the specific data collection process, instruments used, any pre-processing and the science questions to be answered. Citability: A suitable identifier to facilitate tracking the use of the benchmark later on, e.g. allowing search engines to find all research papers using it. A first sample benchmark developed in collaboration with the Jet Propulsion Laboratory (JPL) deals with the automatic analysis of imaging spectrometer data to detect significant methane sources in the atmosphere.

  6. Real-world datasets for portfolio selection and solutions of some stochastic dominance portfolio models.

    PubMed

    Bruni, Renato; Cesarone, Francesco; Scozzari, Andrea; Tardella, Fabio

    2016-09-01

    A large number of portfolio selection models have appeared in the literature since the pioneering work of Markowitz. However, even when computational and empirical results are described, they are often hard to replicate and compare due to the unavailability of the datasets used in the experiments. We provide here several datasets for portfolio selection generated using real-world price values from several major stock markets. The datasets contain weekly return values, adjusted for dividends and for stock splits, which are cleaned from errors as much as possible. The datasets are available in different formats, and can be used as benchmarks for testing the performances of portfolio selection models and for comparing the efficiency of the algorithms used to solve them. We also provide, for these datasets, the portfolios obtained by several selection strategies based on Stochastic Dominance models (see "On Exact and Approximate Stochastic Dominance Strategies for Portfolio Selection" (Bruni et al. [2])). We believe that testing portfolio models on publicly available datasets greatly simplifies the comparison of the different portfolio selection strategies.

  7. BMDExpress Data Viewer - A visualization Tool to Analyze BMDExpress Datasets (Health Canada Science Forum)

    EPA Science Inventory

    Benchmark Dose (BMD) modelling is a mathematical approach used to determine where a dose-response change begins to take place relative to controls following chemical exposure. BMDs are being increasingly applied in regulatory toxicology to determine points of departure. BMDExpres...

  8. Search for neutral MSSM Higgs bosons decaying into a pair of bottom quarks

    DOE PAGES

    Khachatryan, Vardan

    2015-11-11

    A search for neutral Higgs bosons decaying into a bb¯ quark pair and produced in association with at least one additional b quark is presented. This signature is sensitive to the Higgs sector of the minimal supersymmetric standard model (MSSM) with large values of the parameter tan β. The analysis is based on data from proton-proton collisions at a center-of-mass energy of 8 TeV collected with the CMS detector at the LHC, corresponding to an integrated luminosity of 19.7 fb –1. The results are combined with a previous analysis based on 7 TeV data. No signal is observed. Stringent uppermore » limits on the cross section times branching fraction are derived for Higgs bosons with masses up to 900 GeV, and the results are interpreted within different MSSM benchmark scenarios, m h max, m h mod+, m h mod–, light-stau and light-stop. Observed 95% confidence level upper limits on tan β, ranging from 14 to 50, are obtained in the m h mod+ benchmark scenario.« less

  9. Comprehensive comparison of gap filling techniques for eddy covariance net carbon fluxes

    NASA Astrophysics Data System (ADS)

    Moffat, A. M.; Papale, D.; Reichstein, M.; Hollinger, D. Y.; Richardson, A. D.; Barr, A. G.; Beckstein, C.; Braswell, B. H.; Churkina, G.; Desai, A. R.; Falge, E.; Gove, J. H.; Heimann, M.; Hui, D.; Jarvis, A. J.; Kattge, J.; Noormets, A.; Stauch, V. J.

    2007-12-01

    Review of fifteen techniques for estimating missing values of net ecosystem CO2 exchange (NEE) in eddy covariance time series and evaluation of their performance for different artificial gap scenarios based on a set of ten benchmark datasets from six forested sites in Europe. The goal of gap filling is the reproduction of the NEE time series and hence this present work focuses on estimating missing NEE values, not on editing or the removal of suspect values in these time series due to systematic errors in the measurements (e.g. nighttime flux, advection). The gap filling was examined by generating fifty secondary datasets with artificial gaps (ranging in length from single half-hours to twelve consecutive days) for each benchmark dataset and evaluating the performance with a variety of statistical metrics. The performance of the gap filling varied among sites and depended on the level of aggregation (native half- hourly time step versus daily), long gaps were more difficult to fill than short gaps, and differences among the techniques were more pronounced during the day than at night. The non-linear regression techniques (NLRs), the look-up table (LUT), marginal distribution sampling (MDS), and the semi-parametric model (SPM) generally showed good overall performance. The artificial neural network based techniques (ANNs) were generally, if only slightly, superior to the other techniques. The simple interpolation technique of mean diurnal variation (MDV) showed a moderate but consistent performance. Several sophisticated techniques, the dual unscented Kalman filter (UKF), the multiple imputation method (MIM), the terrestrial biosphere model (BETHY), but also one of the ANNs and one of the NLRs showed high biases which resulted in a low reliability of the annual sums, indicating that additional development might be needed. An uncertainty analysis comparing the estimated random error in the ten benchmark datasets with the artificial gap residuals suggested that the techniques are already at or very close to the noise limit of the measurements. Based on the techniques and site data examined here, the effect of gap filling on the annual sums of NEE is modest, with most techniques falling within a range of ±25 g C m-2 y-1.

  10. A PDB-wide, evolution-based assessment of protein-protein interfaces.

    PubMed

    Baskaran, Kumaran; Duarte, Jose M; Biyani, Nikhil; Bliven, Spencer; Capitani, Guido

    2014-10-18

    Thanks to the growth in sequence and structure databases, more than 50 million sequences are now available in UniProt and 100,000 structures in the PDB. Rich information about protein-protein interfaces can be obtained by a comprehensive study of protein contacts in the PDB, their sequence conservation and geometric features. An automated computational pipeline was developed to run our Evolutionary Protein-Protein Interface Classifier (EPPIC) software on the entire PDB and store the results in a relational database, currently containing > 800,000 interfaces. This allows the analysis of interface data on a PDB-wide scale. Two large benchmark datasets of biological interfaces and crystal contacts, each containing about 3000 entries, were automatically generated based on criteria thought to be strong indicators of interface type. The BioMany set of biological interfaces includes NMR dimers solved as crystal structures and interfaces that are preserved across diverse crystal forms, as catalogued by the Protein Common Interface Database (ProtCID) from Xu and Dunbrack. The second dataset, XtalMany, is derived from interfaces that would lead to infinite assemblies and are therefore crystal contacts. BioMany and XtalMany were used to benchmark the EPPIC approach. The performance of EPPIC was also compared to classifications from the Protein Interfaces, Surfaces, and Assemblies (PISA) program on a PDB-wide scale, finding that the two approaches give the same call in about 88% of PDB interfaces. By comparing our safest predictions to the PDB author annotations, we provide a lower-bound estimate of the error rate of biological unit annotations in the PDB. Additionally, we developed a PyMOL plugin for direct download and easy visualization of EPPIC interfaces for any PDB entry. Both the datasets and the PyMOL plugin are available at http://www.eppic-web.org/ewui/\\#downloads. Our computational pipeline allows us to analyze protein-protein contacts and their sequence conservation across the entire PDB. Two new benchmark datasets are provided, which are over an order of magnitude larger than existing manually curated ones. These tools enable the comprehensive study of several aspects of protein-protein contacts in the PDB and represent a basis for future, even larger scale studies of protein-protein interactions.

  11. Benchmarking quantitative label-free LC-MS data processing workflows using a complex spiked proteomic standard dataset.

    PubMed

    Ramus, Claire; Hovasse, Agnès; Marcellin, Marlène; Hesse, Anne-Marie; Mouton-Barbosa, Emmanuelle; Bouyssié, David; Vaca, Sebastian; Carapito, Christine; Chaoui, Karima; Bruley, Christophe; Garin, Jérôme; Cianférani, Sarah; Ferro, Myriam; Van Dorssaeler, Alain; Burlet-Schiltz, Odile; Schaeffer, Christine; Couté, Yohann; Gonzalez de Peredo, Anne

    2016-01-30

    Proteomic workflows based on nanoLC-MS/MS data-dependent-acquisition analysis have progressed tremendously in recent years. High-resolution and fast sequencing instruments have enabled the use of label-free quantitative methods, based either on spectral counting or on MS signal analysis, which appear as an attractive way to analyze differential protein expression in complex biological samples. However, the computational processing of the data for label-free quantification still remains a challenge. Here, we used a proteomic standard composed of an equimolar mixture of 48 human proteins (Sigma UPS1) spiked at different concentrations into a background of yeast cell lysate to benchmark several label-free quantitative workflows, involving different software packages developed in recent years. This experimental design allowed to finely assess their performances in terms of sensitivity and false discovery rate, by measuring the number of true and false-positive (respectively UPS1 or yeast background proteins found as differential). The spiked standard dataset has been deposited to the ProteomeXchange repository with the identifier PXD001819 and can be used to benchmark other label-free workflows, adjust software parameter settings, improve algorithms for extraction of the quantitative metrics from raw MS data, or evaluate downstream statistical methods. Bioinformatic pipelines for label-free quantitative analysis must be objectively evaluated in their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. This can be done through the use of complex spiked samples, for which the "ground truth" of variant proteins is known, allowing a statistical evaluation of the performances of the data processing workflow. We provide here such a controlled standard dataset and used it to evaluate the performances of several label-free bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, for detection of variant proteins with different absolute expression levels and fold change values. The dataset presented here can be useful for tuning software tool parameters, and also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. A benchmark for comparison of dental radiography analysis algorithms.

    PubMed

    Wang, Ching-Wei; Huang, Cheng-Ta; Lee, Jia-Hong; Li, Chung-Hsing; Chang, Sheng-Wei; Siao, Ming-Jhih; Lai, Tat-Ming; Ibragimov, Bulat; Vrtovec, Tomaž; Ronneberger, Olaf; Fischer, Philipp; Cootes, Tim F; Lindner, Claudia

    2016-07-01

    Dental radiography plays an important role in clinical diagnosis, treatment and surgery. In recent years, efforts have been made on developing computerized dental X-ray image analysis systems for clinical usages. A novel framework for objective evaluation of automatic dental radiography analysis algorithms has been established under the auspices of the IEEE International Symposium on Biomedical Imaging 2015 Bitewing Radiography Caries Detection Challenge and Cephalometric X-ray Image Analysis Challenge. In this article, we present the datasets, methods and results of the challenge and lay down the principles for future uses of this benchmark. The main contributions of the challenge include the creation of the dental anatomy data repository of bitewing radiographs, the creation of the anatomical abnormality classification data repository of cephalometric radiographs, and the definition of objective quantitative evaluation for comparison and ranking of the algorithms. With this benchmark, seven automatic methods for analysing cephalometric X-ray image and two automatic methods for detecting bitewing radiography caries have been compared, and detailed quantitative evaluation results are presented in this paper. Based on the quantitative evaluation results, we believe automatic dental radiography analysis is still a challenging and unsolved problem. The datasets and the evaluation software will be made available to the research community, further encouraging future developments in this field. (http://www-o.ntust.edu.tw/~cweiwang/ISBI2015/). Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  13. Model evaluation using a community benchmarking system for land surface models

    NASA Astrophysics Data System (ADS)

    Mu, M.; Hoffman, F. M.; Lawrence, D. M.; Riley, W. J.; Keppel-Aleks, G.; Kluzek, E. B.; Koven, C. D.; Randerson, J. T.

    2014-12-01

    Evaluation of atmosphere, ocean, sea ice, and land surface models is an important step in identifying deficiencies in Earth system models and developing improved estimates of future change. For the land surface and carbon cycle, the design of an open-source system has been an important objective of the International Land Model Benchmarking (ILAMB) project. Here we evaluated CMIP5 and CLM models using a benchmarking system that enables users to specify models, data sets, and scoring systems so that results can be tailored to specific model intercomparison projects. Our scoring system used information from four different aspects of global datasets, including climatological mean spatial patterns, seasonal cycle dynamics, interannual variability, and long-term trends. Variable-to-variable comparisons enable investigation of the mechanistic underpinnings of model behavior, and allow for some control of biases in model drivers. Graphics modules allow users to evaluate model performance at local, regional, and global scales. Use of modular structures makes it relatively easy for users to add new variables, diagnostic metrics, benchmarking datasets, or model simulations. Diagnostic results are automatically organized into HTML files, so users can conveniently share results with colleagues. We used this system to evaluate atmospheric carbon dioxide, burned area, global biomass and soil carbon stocks, net ecosystem exchange, gross primary production, ecosystem respiration, terrestrial water storage, evapotranspiration, and surface radiation from CMIP5 historical and ESM historical simulations. We found that the multi-model mean often performed better than many of the individual models for most variables. We plan to publicly release a stable version of the software during fall of 2014 that has land surface, carbon cycle, hydrology, radiation and energy cycle components.

  14. IEA Wind Task 36 Forecasting

    NASA Astrophysics Data System (ADS)

    Giebel, Gregor; Cline, Joel; Frank, Helmut; Shaw, Will; Pinson, Pierre; Hodge, Bri-Mathias; Kariniotakis, Georges; Sempreviva, Anna Maria; Draxl, Caroline

    2017-04-01

    Wind power forecasts have been used operatively for over 20 years. Despite this fact, there are still several possibilities to improve the forecasts, both from the weather prediction side and from the usage of the forecasts. The new International Energy Agency (IEA) Task on Wind Power Forecasting tries to organise international collaboration, among national weather centres with an interest and/or large projects on wind forecast improvements (NOAA, DWD, UK MetOffice, …) and operational forecaster and forecast users. The Task is divided in three work packages: Firstly, a collaboration on the improvement of the scientific basis for the wind predictions themselves. This includes numerical weather prediction model physics, but also widely distributed information on accessible datasets for verification. Secondly, we will be aiming at an international pre-standard (an IEA Recommended Practice) on benchmarking and comparing wind power forecasts, including probabilistic forecasts aiming at industry and forecasters alike. This WP will also organise benchmarks, in cooperation with the IEA Task WakeBench. Thirdly, we will be engaging end users aiming at dissemination of the best practice in the usage of wind power predictions, especially probabilistic ones. The Operating Agent is Gregor Giebel of DTU, Co-Operating Agent is Joel Cline of the US Department of Energy. Collaboration in the task is solicited from everyone interested in the forecasting business. We will collaborate with IEA Task 31 Wakebench, which developed the Windbench benchmarking platform, which this task will use for forecasting benchmarks. The task runs for three years, 2016-2018. Main deliverables are an up-to-date list of current projects and main project results, including datasets which can be used by researchers around the world to improve their own models, an IEA Recommended Practice on performance evaluation of probabilistic forecasts, a position paper regarding the use of probabilistic forecasts, and one or more benchmark studies implemented on the Windbench platform hosted at CENER. Additionally, spreading of relevant information in both the forecasters and the users community is paramount. The poster also shows the work done in the first half of the Task, e.g. the collection of available datasets and the learnings from a public workshop on 9 June in Barcelona on Experiences with the Use of Forecasts and Gaps in Research. Participation is open for all interested parties in member states of the IEA Annex on Wind Power, see ieawind.org for the up-to-date list. For collaboration, please contact the author grgi@dtu.dk).

  15. Recent trends in precision measurements of atomic and nuclear properties with lasers and ion traps

    NASA Astrophysics Data System (ADS)

    Block, Michael

    2017-11-01

    The X. international workshop on "Application of Lasers and Storage Devices in Atomic Nuclei Research" took place in Poznan in May 2016. It addressed the latest experimental and theoretical achievements in laser and ion trap-based investigations of radionuclides, highly charged ions and antiprotons. The precise determination of atomic and nuclear properties provides a stringent benchmark for theoretical models and eventually leads to a better understanding of the underlying fundamental interactions and symmetries. This article addresses some general trends in this field and highlights select recent achievements presented at the workshop. Many of these are covered in more detail within the individual contributions to this special issue of Hyperfine Interactions.

  16. From benchmarking HITS-CLIP peak detection programs to a new method for identification of miRNA-binding sites from Ago2-CLIP data.

    PubMed

    Bottini, Silvia; Hamouda-Tekaya, Nedra; Tanasa, Bogdan; Zaragosi, Laure-Emmanuelle; Grandjean, Valerie; Repetto, Emanuela; Trabucchi, Michele

    2017-05-19

    Experimental evidence indicates that about 60% of miRNA-binding activity does not follow the canonical rule about the seed matching between miRNA and target mRNAs, but rather a non-canonical miRNA targeting activity outside the seed or with a seed-like motifs. Here, we propose a new unbiased method to identify canonical and non-canonical miRNA-binding sites from peaks identified by Ago2 Cross-Linked ImmunoPrecipitation associated to high-throughput sequencing (CLIP-seq). Since the quality of peaks is of pivotal importance for the final output of the proposed method, we provide a comprehensive benchmarking of four peak detection programs, namely CIMS, PIPE-CLIP, Piranha and Pyicoclip, on four publicly available Ago2-HITS-CLIP datasets and one unpublished in-house Ago2-dataset in stem cells. We measured the sensitivity, the specificity and the position accuracy toward miRNA binding sites identification, and the agreement with TargetScan. Secondly, we developed a new pipeline, called miRBShunter, to identify canonical and non-canonical miRNA-binding sites based on de novo motif identification from Ago2 peaks and prediction of miRNA::RNA heteroduplexes. miRBShunter was tested and experimentally validated on the in-house Ago2-dataset and on an Ago2-PAR-CLIP dataset in human stem cells. Overall, we provide guidelines to choose a suitable peak detection program and a new method for miRNA-target identification. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. From benchmarking HITS-CLIP peak detection programs to a new method for identification of miRNA-binding sites from Ago2-CLIP data

    PubMed Central

    Bottini, Silvia; Hamouda-Tekaya, Nedra; Tanasa, Bogdan; Zaragosi, Laure-Emmanuelle; Grandjean, Valerie; Repetto, Emanuela

    2017-01-01

    Abstract Experimental evidence indicates that about 60% of miRNA-binding activity does not follow the canonical rule about the seed matching between miRNA and target mRNAs, but rather a non-canonical miRNA targeting activity outside the seed or with a seed-like motifs. Here, we propose a new unbiased method to identify canonical and non-canonical miRNA-binding sites from peaks identified by Ago2 Cross-Linked ImmunoPrecipitation associated to high-throughput sequencing (CLIP-seq). Since the quality of peaks is of pivotal importance for the final output of the proposed method, we provide a comprehensive benchmarking of four peak detection programs, namely CIMS, PIPE-CLIP, Piranha and Pyicoclip, on four publicly available Ago2-HITS-CLIP datasets and one unpublished in-house Ago2-dataset in stem cells. We measured the sensitivity, the specificity and the position accuracy toward miRNA binding sites identification, and the agreement with TargetScan. Secondly, we developed a new pipeline, called miRBShunter, to identify canonical and non-canonical miRNA-binding sites based on de novo motif identification from Ago2 peaks and prediction of miRNA::RNA heteroduplexes. miRBShunter was tested and experimentally validated on the in-house Ago2-dataset and on an Ago2-PAR-CLIP dataset in human stem cells. Overall, we provide guidelines to choose a suitable peak detection program and a new method for miRNA-target identification. PMID:28108660

  18. The mark of vegetation change on Earth's surface energy balance: data-driven diagnostics and model validation

    NASA Astrophysics Data System (ADS)

    Cescatti, A.; Duveiller, G.; Hooker, J.

    2017-12-01

    Changing vegetation cover not only affects the atmospheric concentration of greenhouse gases but also alters the radiative and non-radiative properties of the surface. The result of competing biophysical processes on Earth's surface energy balance varies spatially and seasonally, and can lead to warming or cooling depending on the specific vegetation change and on the background climate. To date these effects are not accounted for in land-based climate policies because of the complexity of the phenomena, contrasting model predictions and the lack of global data-driven assessments. To overcome the limitations of available observation-based diagnostics and of the on-going model inter-comparison, here we present a new benchmarking dataset derived from satellite remote sensing. This global dataset provides the potential changes induced by multiple vegetation transitions on the single terms of the surface energy balance. We used this dataset for two major goals: 1) Quantify the impact of actual vegetation changes that occurred during the decade 2000-2010, showing the overwhelming role of tropical deforestation in warming the surface by reducing evapotranspiration despite the concurrent brightening of the Earth. 2) Benchmark a series of ESMs against data-driven metrics of the land cover change impacts on the various terms of the surface energy budget and on the surface temperature. We anticipate that the dataset could be also used to evaluate future scenarios of land cover change and to develop the monitoring, reporting and verification guidelines required for the implementation of mitigation plans that account for biophysical land processes.

  19. Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models

    PubMed Central

    Stephens, Zachary D.; Hudson, Matthew E.; Mainzer, Liudmila S.; Taschuk, Morgan; Weber, Matthew R.; Iyer, Ravishankar K.

    2016-01-01

    An obstacle to validating and benchmarking methods for genome analysis is that there are few reference datasets available for which the “ground truth” about the mutational landscape of the sample genome is known and fully validated. Additionally, the free and public availability of real human genome datasets is incompatible with the preservation of donor privacy. In order to better analyze and understand genomic data, we need test datasets that model all variants, reflecting known biology as well as sequencing artifacts. Read simulators can fulfill this requirement, but are often criticized for limited resemblance to true data and overall inflexibility. We present NEAT (NExt-generation sequencing Analysis Toolkit), a set of tools that not only includes an easy-to-use read simulator, but also scripts to facilitate variant comparison and tool evaluation. NEAT has a wide variety of tunable parameters which can be set manually on the default model or parameterized using real datasets. The software is freely available at github.com/zstephens/neat-genreads. PMID:27893777

  20. Improving quantitative structure-activity relationship models using Artificial Neural Networks trained with dropout.

    PubMed

    Mendenhall, Jeffrey; Meiler, Jens

    2016-02-01

    Dropout is an Artificial Neural Network (ANN) training technique that has been shown to improve ANN performance across canonical machine learning (ML) datasets. Quantitative Structure Activity Relationship (QSAR) datasets used to relate chemical structure to biological activity in Ligand-Based Computer-Aided Drug Discovery pose unique challenges for ML techniques, such as heavily biased dataset composition, and relatively large number of descriptors relative to the number of actives. To test the hypothesis that dropout also improves QSAR ANNs, we conduct a benchmark on nine large QSAR datasets. Use of dropout improved both enrichment false positive rate and log-scaled area under the receiver-operating characteristic curve (logAUC) by 22-46 % over conventional ANN implementations. Optimal dropout rates are found to be a function of the signal-to-noise ratio of the descriptor set, and relatively independent of the dataset. Dropout ANNs with 2D and 3D autocorrelation descriptors outperform conventional ANNs as well as optimized fingerprint similarity search methods.

  1. Improving Quantitative Structure-Activity Relationship Models using Artificial Neural Networks Trained with Dropout

    PubMed Central

    Mendenhall, Jeffrey; Meiler, Jens

    2016-01-01

    Dropout is an Artificial Neural Network (ANN) training technique that has been shown to improve ANN performance across canonical machine learning (ML) datasets. Quantitative Structure Activity Relationship (QSAR) datasets used to relate chemical structure to biological activity in Ligand-Based Computer-Aided Drug Discovery (LB-CADD) pose unique challenges for ML techniques, such as heavily biased dataset composition, and relatively large number of descriptors relative to the number of actives. To test the hypothesis that dropout also improves QSAR ANNs, we conduct a benchmark on nine large QSAR datasets. Use of dropout improved both Enrichment false positive rate (FPR) and log-scaled area under the receiver-operating characteristic curve (logAUC) by 22–46% over conventional ANN implementations. Optimal dropout rates are found to be a function of the signal-to-noise ratio of the descriptor set, and relatively independent of the dataset. Dropout ANNs with 2D and 3D autocorrelation descriptors outperform conventional ANNs as well as optimized fingerprint similarity search methods. PMID:26830599

  2. Screening for High Conductivity/Low Viscosity Ionic Liquids Using Product Descriptors.

    PubMed

    Martin, Shawn; Pratt, Harry D; Anderson, Travis M

    2017-07-01

    We seek to optimize Ionic liquids (ILs) for application to redox flow batteries. As part of this effort, we have developed a computational method for suggesting ILs with high conductivity and low viscosity. Since ILs consist of cation-anion pairs, we consider a method for treating ILs as pairs using product descriptors for QSPRs, a concept borrowed from the prediction of protein-protein interactions in bioinformatics. We demonstrate the method by predicting electrical conductivity, viscosity, and melting point on a dataset taken from the ILThermo database on June 18 th , 2014. The dataset consists of 4,329 measurements taken from 165 ILs made up of 72 cations and 34 anions. We benchmark our QSPRs on the known values in the dataset then extend our predictions to screen all 2,448 possible cation-anion pairs in the dataset. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Screening for High Conductivity/Low Viscosity Ionic Liquids Using Product Descriptors

    DOE PAGES

    Martin, Shawn; Pratt, III, Harry D.; Anderson, Travis M.

    2017-02-21

    We seek to optimize Ionic liquids (ILs) for application to redox flow batteries. As part of this effort, we have developed a computational method for suggesting ILs with high conductivity and low viscosity. Since ILs consist of cation-anion pairs, we consider a method for treating ILs as pairs using product descriptors for QSPRs, a concept borrowed from the prediction of protein-protein interactions in bioinformatics. We demonstrate the method by predicting electrical conductivity, viscosity, and melting point on a dataset taken from the ILThermo database on June 18th, 2014. The dataset consists of 4,329 measurements taken from 165 ILs made upmore » of 72 cations and 34 anions. In conclusion, we benchmark our QSPRs on the known values in the dataset then extend our predictions to screen all 2,448 possible cation-anion pairs in the dataset.« less

  4. Screening for High Conductivity/Low Viscosity Ionic Liquids Using Product Descriptors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martin, Shawn; Pratt, III, Harry D.; Anderson, Travis M.

    We seek to optimize Ionic liquids (ILs) for application to redox flow batteries. As part of this effort, we have developed a computational method for suggesting ILs with high conductivity and low viscosity. Since ILs consist of cation-anion pairs, we consider a method for treating ILs as pairs using product descriptors for QSPRs, a concept borrowed from the prediction of protein-protein interactions in bioinformatics. We demonstrate the method by predicting electrical conductivity, viscosity, and melting point on a dataset taken from the ILThermo database on June 18th, 2014. The dataset consists of 4,329 measurements taken from 165 ILs made upmore » of 72 cations and 34 anions. In conclusion, we benchmark our QSPRs on the known values in the dataset then extend our predictions to screen all 2,448 possible cation-anion pairs in the dataset.« less

  5. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    PubMed

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  6. Benchmarking Undedicated Cloud Computing Providers for Analysis of Genomic Datasets

    PubMed Central

    Yazar, Seyhan; Gooden, George E. C.; Mackey, David A.; Hewitt, Alex W.

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5–78.2) for E.coli and 53.5% (95% CI: 34.4–72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5–303.1) and 173.9% (95% CI: 134.6–213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE. PMID:25247298

  7. Benchmark Dose Analysis from Multiple Datasets: The Cumulative Risk Assessment for the N-Methyl Carbamate Pesticides

    EPA Science Inventory

    The US EPA’s N-Methyl Carbamate (NMC) Cumulative Risk assessment was based on the effect on acetylcholine esterase (AChE) activity of exposure to 10 NMC pesticides through dietary, drinking water, and residential exposures, assuming the effects of joint exposure to NMCs is dose-...

  8. Divide and Conquer-Based 1D CNN Human Activity Recognition Using Test Data Sharpening †

    PubMed Central

    Yoon, Sang Min

    2018-01-01

    Human Activity Recognition (HAR) aims to identify the actions performed by humans using signals collected from various sensors embedded in mobile devices. In recent years, deep learning techniques have further improved HAR performance on several benchmark datasets. In this paper, we propose one-dimensional Convolutional Neural Network (1D CNN) for HAR that employs a divide and conquer-based classifier learning coupled with test data sharpening. Our approach leverages a two-stage learning of multiple 1D CNN models; we first build a binary classifier for recognizing abstract activities, and then build two multi-class 1D CNN models for recognizing individual activities. We then introduce test data sharpening during prediction phase to further improve the activity recognition accuracy. While there have been numerous researches exploring the benefits of activity signal denoising for HAR, few researches have examined the effect of test data sharpening for HAR. We evaluate the effectiveness of our approach on two popular HAR benchmark datasets, and show that our approach outperforms both the two-stage 1D CNN-only method and other state of the art approaches. PMID:29614767

  9. Divide and Conquer-Based 1D CNN Human Activity Recognition Using Test Data Sharpening.

    PubMed

    Cho, Heeryon; Yoon, Sang Min

    2018-04-01

    Human Activity Recognition (HAR) aims to identify the actions performed by humans using signals collected from various sensors embedded in mobile devices. In recent years, deep learning techniques have further improved HAR performance on several benchmark datasets. In this paper, we propose one-dimensional Convolutional Neural Network (1D CNN) for HAR that employs a divide and conquer-based classifier learning coupled with test data sharpening. Our approach leverages a two-stage learning of multiple 1D CNN models; we first build a binary classifier for recognizing abstract activities, and then build two multi-class 1D CNN models for recognizing individual activities. We then introduce test data sharpening during prediction phase to further improve the activity recognition accuracy. While there have been numerous researches exploring the benefits of activity signal denoising for HAR, few researches have examined the effect of test data sharpening for HAR. We evaluate the effectiveness of our approach on two popular HAR benchmark datasets, and show that our approach outperforms both the two-stage 1D CNN-only method and other state of the art approaches.

  10. dynGENIE3: dynamical GENIE3 for the inference of gene networks from time series expression data.

    PubMed

    Huynh-Thu, Vân Anh; Geurts, Pierre

    2018-02-21

    The elucidation of gene regulatory networks is one of the major challenges of systems biology. Measurements about genes that are exploited by network inference methods are typically available either in the form of steady-state expression vectors or time series expression data. In our previous work, we proposed the GENIE3 method that exploits variable importance scores derived from Random forests to identify the regulators of each target gene. This method provided state-of-the-art performance on several benchmark datasets, but it could however not specifically be applied to time series expression data. We propose here an adaptation of the GENIE3 method, called dynamical GENIE3 (dynGENIE3), for handling both time series and steady-state expression data. The proposed method is evaluated extensively on the artificial DREAM4 benchmarks and on three real time series expression datasets. Although dynGENIE3 does not systematically yield the best performance on each and every network, it is competitive with diverse methods from the literature, while preserving the main advantages of GENIE3 in terms of scalability.

  11. A Bayesian approach to traffic light detection and mapping

    NASA Astrophysics Data System (ADS)

    Hosseinyalamdary, Siavash; Yilmaz, Alper

    2017-03-01

    Automatic traffic light detection and mapping is an open research problem. The traffic lights vary in color, shape, geolocation, activation pattern, and installation which complicate their automated detection. In addition, the image of the traffic lights may be noisy, overexposed, underexposed, or occluded. In order to address this problem, we propose a Bayesian inference framework to detect and map traffic lights. In addition to the spatio-temporal consistency constraint, traffic light characteristics such as color, shape and height is shown to further improve the accuracy of the proposed approach. The proposed approach has been evaluated on two benchmark datasets and has been shown to outperform earlier studies. The results show that the precision and recall rates for the KITTI benchmark are 95.78 % and 92.95 % respectively and the precision and recall rates for the LARA benchmark are 98.66 % and 94.65 % .

  12. Benchmark of Machine Learning Methods for Classification of a SENTINEL-2 Image

    NASA Astrophysics Data System (ADS)

    Pirotti, F.; Sunar, F.; Piragnolo, M.

    2016-06-01

    Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.

  13. Successful implementation of image-guided radiation therapy quality assurance in the Trans Tasman Radiation Oncology Group 08.01 PROFIT Study.

    PubMed

    Middleton, Mark; Frantzis, Jim; Healy, Brendan; Jones, Mark; Murry, Rebecca; Kron, Tomas; Plank, Ashley; Catton, Charles; Martin, Jarad

    2011-12-01

    The quality assurance (QA) of image-guided radiation therapy (IGRT) within clinical trials is in its infancy, but its importance will continue to grow as IGRT becomes the standard of care. The purpose of this study was to demonstrate the feasibility of IGRT QA as part of the credentialing process for a clinical trial. As part of the accreditation process for a randomized trial in prostate cancer hypofraction, IGRT benchmarking across multiple sites was incorporated. Each participating site underwent IGRT credentialing via a site visit. In all centers, intraprostatic fiducials were used. A real-time assessment of analysis of IGRT was performed using Varian's Offline Review image analysis package. Two-dimensional (2D) kV and MV electronic portal imaging prostate patient datasets were used, consisting of 39 treatment verification images for 2D/2D comparison with the digitally reconstructed radiograph derived from the planning scan. The influence of differing sites, image modality, and observer experience on IGRT was then assessed. Statistical analysis of the mean mismatch errors showed that IGRT analysis was performed uniformly regardless of institution, therapist seniority, or imaging modality across the three orthogonal planes. The IGRT component of clinical trials that include sophisticated planning and treatment protocols must undergo stringent QA. The IGRT technique of intraprostatic fiducials has been shown in the context of this trial to be undertaken in a uniform manner across Australia. Extending this concept to many sites with different equipment and IGRT experience will require a robust remote credentialing process. Crown Copyright © 2011. Published by Elsevier Inc. All rights reserved.

  14. iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model

    PubMed Central

    Lin, Wei-Zhong; Fang, Jian-An; Xiao, Xuan; Chou, Kuo-Chen

    2011-01-01

    DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the “grey model” and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results. PMID:21935457

  15. Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins.

    PubMed

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2016-02-24

    Predicting protein subcellular localization is indispensable for inferring protein functions. Recent studies have been focusing on predicting not only single-location proteins, but also multi-location proteins. Almost all of the high performing predictors proposed recently use gene ontology (GO) terms to construct feature vectors for classification. Despite their high performance, their prediction decisions are difficult to interpret because of the large number of GO terms involved. This paper proposes using sparse regressions to exploit GO information for both predicting and interpreting subcellular localization of single- and multi-location proteins. Specifically, we compared two multi-label sparse regression algorithms, namely multi-label LASSO (mLASSO) and multi-label elastic net (mEN), for large-scale predictions of protein subcellular localization. Both algorithms can yield sparse and interpretable solutions. By using the one-vs-rest strategy, mLASSO and mEN identified 87 and 429 out of more than 8,000 GO terms, respectively, which play essential roles in determining subcellular localization. More interestingly, many of the GO terms selected by mEN are from the biological process and molecular function categories, suggesting that the GO terms of these categories also play vital roles in the prediction. With these essential GO terms, not only where a protein locates can be decided, but also why it resides there can be revealed. Experimental results show that the output of both mEN and mLASSO are interpretable and they perform significantly better than existing state-of-the-art predictors. Moreover, mEN selects more features and performs better than mLASSO on a stringent human benchmark dataset. For readers' convenience, an online server called SpaPredictor for both mLASSO and mEN is available at http://bioinfo.eie.polyu.edu.hk/SpaPredictorServer/.

  16. Benchmarking homogenization algorithms for monthly data

    NASA Astrophysics Data System (ADS)

    Venema, V. K. C.; Mestre, O.; Aguilar, E.; Auer, I.; Guijarro, J. A.; Domonkos, P.; Vertacnik, G.; Szentimrey, T.; Stepanek, P.; Zahradnicek, P.; Viarre, J.; Müller-Westermeier, G.; Lakatos, M.; Williams, C. N.; Menne, M. J.; Lindau, R.; Rasol, D.; Rustemeier, E.; Kolokythas, K.; Marinova, T.; Andresen, L.; Acquaotta, F.; Fratianni, S.; Cheval, S.; Klancar, M.; Brunetti, M.; Gruber, C.; Prohom Duran, M.; Likso, T.; Esteban, P.; Brandsma, T.

    2012-01-01

    The COST (European Cooperation in Science and Technology) Action ES0601: advances in homogenization methods of climate series: an integrated approach (HOME) has executed a blind intercomparison and validation study for monthly homogenization algorithms. Time series of monthly temperature and precipitation were evaluated because of their importance for climate studies and because they represent two important types of statistics (additive and multiplicative). The algorithms were validated against a realistic benchmark dataset. The benchmark contains real inhomogeneous data as well as simulated data with inserted inhomogeneities. Random independent break-type inhomogeneities with normally distributed breakpoint sizes were added to the simulated datasets. To approximate real world conditions, breaks were introduced that occur simultaneously in multiple station series within a simulated network of station data. The simulated time series also contained outliers, missing data periods and local station trends. Further, a stochastic nonlinear global (network-wide) trend was added. Participants provided 25 separate homogenized contributions as part of the blind study. After the deadline at which details of the imposed inhomogeneities were revealed, 22 additional solutions were submitted. These homogenized datasets were assessed by a number of performance metrics including (i) the centered root mean square error relative to the true homogeneous value at various averaging scales, (ii) the error in linear trend estimates and (iii) traditional contingency skill scores. The metrics were computed both using the individual station series as well as the network average regional series. The performance of the contributions depends significantly on the error metric considered. Contingency scores by themselves are not very informative. Although relative homogenization algorithms typically improve the homogeneity of temperature data, only the best ones improve precipitation data. Training the users on homogenization software was found to be very important. Moreover, state-of-the-art relative homogenization algorithms developed to work with an inhomogeneous reference are shown to perform best. The study showed that automatic algorithms can perform as well as manual ones.

  17. Benchmarking monthly homogenization algorithms

    NASA Astrophysics Data System (ADS)

    Venema, V. K. C.; Mestre, O.; Aguilar, E.; Auer, I.; Guijarro, J. A.; Domonkos, P.; Vertacnik, G.; Szentimrey, T.; Stepanek, P.; Zahradnicek, P.; Viarre, J.; Müller-Westermeier, G.; Lakatos, M.; Williams, C. N.; Menne, M.; Lindau, R.; Rasol, D.; Rustemeier, E.; Kolokythas, K.; Marinova, T.; Andresen, L.; Acquaotta, F.; Fratianni, S.; Cheval, S.; Klancar, M.; Brunetti, M.; Gruber, C.; Prohom Duran, M.; Likso, T.; Esteban, P.; Brandsma, T.

    2011-08-01

    The COST (European Cooperation in Science and Technology) Action ES0601: Advances in homogenization methods of climate series: an integrated approach (HOME) has executed a blind intercomparison and validation study for monthly homogenization algorithms. Time series of monthly temperature and precipitation were evaluated because of their importance for climate studies and because they represent two important types of statistics (additive and multiplicative). The algorithms were validated against a realistic benchmark dataset. The benchmark contains real inhomogeneous data as well as simulated data with inserted inhomogeneities. Random break-type inhomogeneities were added to the simulated datasets modeled as a Poisson process with normally distributed breakpoint sizes. To approximate real world conditions, breaks were introduced that occur simultaneously in multiple station series within a simulated network of station data. The simulated time series also contained outliers, missing data periods and local station trends. Further, a stochastic nonlinear global (network-wide) trend was added. Participants provided 25 separate homogenized contributions as part of the blind study as well as 22 additional solutions submitted after the details of the imposed inhomogeneities were revealed. These homogenized datasets were assessed by a number of performance metrics including (i) the centered root mean square error relative to the true homogeneous value at various averaging scales, (ii) the error in linear trend estimates and (iii) traditional contingency skill scores. The metrics were computed both using the individual station series as well as the network average regional series. The performance of the contributions depends significantly on the error metric considered. Contingency scores by themselves are not very informative. Although relative homogenization algorithms typically improve the homogeneity of temperature data, only the best ones improve precipitation data. Training was found to be very important. Moreover, state-of-the-art relative homogenization algorithms developed to work with an inhomogeneous reference are shown to perform best. The study showed that currently automatic algorithms can perform as well as manual ones.

  18. Dataset for reporting of thymic epithelial tumours: recommendations from the International Collaboration on Cancer Reporting (ICCR).

    PubMed

    Nicholson, Andrew G; Detterbeck, Frank; Marx, Alexander; Roden, Anja C; Marchevsky, Alberto M; Mukai, Kiyoshi; Chen, Gang; Marino, Mirella; den Bakker, Michael A; Yang, Woo-Ick; Judge, Meagan; Hirschowitz, Lynn

    2017-03-01

    The International Collaboration on Cancer Reporting (ICCR) is a not-for-profit organization formed by the Royal Colleges of Pathologists of Australasia and the United Kingdom, the College of American Pathologists, the Canadian Association of Pathologists-Association Canadienne des Pathologists in association with the Canadian Partnership Against Cancer, and the European Society of Pathology. Its goal is to produce standardized, internationally agreed, evidence-based datasets for use throughout the world. This article describes the development of a cancer dataset by the multidisciplinary ICCR expert panel for the reporting of thymic epithelial tumours. The dataset includes 'required' (mandatory) and 'recommended' (non-mandatory) elements, which are validated by a review of current evidence and supported by explanatory text. Seven required elements and 12 recommended elements were agreed by the international dataset authoring committee to represent the essential information for the reporting of thymic epithelial tumours. The use of an internationally agreed, structured pathology dataset for reporting thymic tumours provides all of the necessary information for optimal patient management, facilitates consistent and accurate data collection, and provides valuable data for research and international benchmarking. The dataset also provides a valuable resource for those countries and institutions that are not in a position to develop their own datasets. © 2016 John Wiley & Sons Ltd.

  19. Deep learning and face recognition: the state of the art

    NASA Astrophysics Data System (ADS)

    Balaban, Stephen

    2015-05-01

    Deep Neural Networks (DNNs) have established themselves as a dominant technique in machine learning. DNNs have been top performers on a wide variety of tasks including image classification, speech recognition, and face recognition.1-3 Convolutional neural networks (CNNs) have been used in nearly all of the top performing methods on the Labeled Faces in the Wild (LFW) dataset.3-6 In this talk and accompanying paper, I attempt to provide a review and summary of the deep learning techniques used in the state-of-the-art. In addition, I highlight the need for both larger and more challenging public datasets to benchmark these systems. Despite the ability of DNNs and autoencoders to perform unsupervised feature learning, modern facial recognition pipelines still require domain specific engineering in the form of re-alignment. For example, in Facebook's recent DeepFace paper, a 3D "frontalization" step lies at the beginning of the pipeline. This step creates a 3D face model for the incoming image and then uses a series of affine transformations of the fiducial points to "frontalize" the image. This step enables the DeepFace system to use a neural network architecture with locally connected layers without weight sharing as opposed to standard convolutional layers.6 Deep learning techniques combined with large datasets have allowed research groups to surpass human level performance on the LFW dataset.3, 5 The high accuracy (99.63% for FaceNet at the time of publishing) and utilization of outside data (hundreds of millions of images in the case of Google's FaceNet) suggest that current face verification benchmarks such as LFW may not be challenging enough, nor provide enough data, for current techniques.3, 5 There exist a variety of organizations with mobile photo sharing applications that would be capable of releasing a very large scale and highly diverse dataset of facial images captured on mobile devices. Such an "ImageNet for Face Recognition" would likely receive a warm welcome from researchers and practitioners alike.

  20. An End-to-End simulator for the development of atmospheric corrections and temperature - emissivity separation algorithms in the TIR spectral domain

    NASA Astrophysics Data System (ADS)

    Rock, Gilles; Fischer, Kim; Schlerf, Martin; Gerhards, Max; Udelhoven, Thomas

    2017-04-01

    The development and optimization of image processing algorithms requires the availability of datasets depicting every step from earth surface to the sensor's detector. The lack of ground truth data obliges to develop algorithms on simulated data. The simulation of hyperspectral remote sensing data is a useful tool for a variety of tasks such as the design of systems, the understanding of the image formation process, and the development and validation of data processing algorithms. An end-to-end simulator has been set up consisting of a forward simulator, a backward simulator and a validation module. The forward simulator derives radiance datasets based on laboratory sample spectra, applies atmospheric contributions using radiative transfer equations, and simulates the instrument response using configurable sensor models. This is followed by the backward simulation branch, consisting of an atmospheric correction (AC), a temperature and emissivity separation (TES) or a hybrid AC and TES algorithm. An independent validation module allows the comparison between input and output dataset and the benchmarking of different processing algorithms. In this study, hyperspectral thermal infrared scenes of a variety of surfaces have been simulated to analyze existing AC and TES algorithms. The ARTEMISS algorithm was optimized and benchmarked against the original implementations. The errors in TES were found to be related to incorrect water vapor retrieval. The atmospheric characterization could be optimized resulting in increasing accuracies in temperature and emissivity retrieval. Airborne datasets of different spectral resolutions were simulated from terrestrial HyperCam-LW measurements. The simulated airborne radiance spectra were subjected to atmospheric correction and TES and further used for a plant species classification study analyzing effects related to noise and mixed pixels.

  1. Benchmarking Big Data Systems and the BigData Top100 List.

    PubMed

    Baru, Chaitanya; Bhandarkar, Milind; Nambiar, Raghunath; Poess, Meikel; Rabl, Tilmann

    2013-03-01

    "Big data" has become a major force of innovation across enterprises of all sizes. New platforms with increasingly more features for managing big datasets are being announced almost on a weekly basis. Yet, there is currently a lack of any means of comparability among such platforms. While the performance of traditional database systems is well understood and measured by long-established institutions such as the Transaction Processing Performance Council (TCP), there is neither a clear definition of the performance of big data systems nor a generally agreed upon metric for comparing these systems. In this article, we describe a community-based effort for defining a big data benchmark. Over the past year, a Big Data Benchmarking Community has become established in order to fill this void. The effort focuses on defining an end-to-end application-layer benchmark for measuring the performance of big data applications, with the ability to easily adapt the benchmark specification to evolving challenges in the big data space. This article describes the efforts that have been undertaken thus far toward the definition of a BigData Top100 List. While highlighting the major technical as well as organizational challenges, through this article, we also solicit community input into this process.

  2. Size ratio performance in detecting cerebral aneurysm rupture status is insensitive to small vessel removal.

    PubMed

    Lauric, Alexandra; Baharoglu, Merih I; Malek, Adel M

    2013-04-01

    The variable definition of size ratio (SR) for sidewall (SW) vs bifurcation (BIF) aneurysms raises confusion for lesions harboring small branches, such as carotid ophthalmic or posterior communicating locations. These aneurysms are considered SW by many clinicians, but SR methodology classifies them as BIF. To evaluate the effect of ignoring small vessels and SW vs stringent BIF labeling on SR ruptured aneurysm detection performance in borderline aneurysms with small branches, and to reconcile SR-based labeling with clinical SW/BIF classification. Catheter rotational angiographic datasets of 134 consecutive aneurysms (60 ruptured) were automatically measured in 3-dimensional. Stringent BIF labeling was applied to clinically labeled aneurysms, with 21 aneurysms switching label from SW to BIF. Parent vessel size was evaluated both taking into account, and ignoring, small vessels. SR was defined accordingly as the ratio between aneurysm and parent vessel sizes. Univariate and multivariate statistics identified significant features. The square of the correlation coefficient (R(2)) was reported for bivariate analysis of alternative SR calculations. Regardless of SW/BIF labeling method, SR was equally significant in discriminating aneurysm ruptured status (P < .001). Bivariate analysis of alternative SR had a high correlation of R(2) = 0.94 on the whole dataset, and R = 0.98 on the 21 borderline aneurysms. Ignoring small branches from SR calculation maintains rupture status detection performance, while reducing postprocessing complexity and removing labeling ambiguity. Aneurysms adjacent to these vessels can be considered SW for morphometric analysis. It is reasonable to use the clinical SW/BIF labeling when using SR for rupture risk evaluation.

  3. The EB Factory: Fundamental Stellar Astrophysics with Eclipsing Binary Stars Discovered by Kepler

    NASA Astrophysics Data System (ADS)

    Stassun, Keivan

    Eclipsing binaries (EBs) are key laboratories for determining the fundamental properties of stars. EBs are therefore foundational objects for constraining stellar evolution models, which in turn are central to determinations of stellar mass functions, of exoplanet properties, and many other areas. The primary goal of this proposal is to mine the Kepler mission light curves for: (1) EBs that include a subgiant star, from which precise ages can be derived and which can thus serve as critically needed age benchmarks; and within these, (2) long-period EBs that include low-mass M stars or brown dwarfs, which are increa-singly becoming the focus of exoplanet searches, but for which there are the fewest available fundamental mass- radius-age benchmarks. A secondary goal of this proposal is to develop an end-to-end computational pipeline -- the Kepler EB Factory -- that allows automatic processing of Kepler light curves for EBs, from period finding, to object classification, to determination of EB physical properties for the most scientifically interesting EBs, and finally to accurate modeling of these EBs for detailed tests and benchmarking of theoretical stellar evolution models. We will integrate the most successful algorithms into a single, cohesive workflow environment, and apply this 'Kepler EB Factory' to the full public Kepler dataset to find and characterize new "benchmark grade" EBs, and will disseminate both the enhanced data products from this pipeline and the pipeline itself to the broader NASA science community. The proposed work responds directly to two of the defined Research Areas of the NASA Astrophysics Data Analysis Program (ADAP), specifically Research Area #2 (Stellar Astrophysics) and Research Area #9 (Astrophysical Databases). To be clear, our primary goal is the fundamental stellar astrophysics that will be enabled by the discovery and analysis of relatively rare, benchmark-grade EBs in the Kepler dataset. At the same time, to enable this goal will require bringing a suite of extant and new custom algorithms to bear on the Kepler data, and thus our development of the Kepler EB Factory represents a value-added product that will allow the widest scientific impact of the in-formation locked within the vast reservoir of the Kepler light curves.

  4. Numerical method of lines for the relaxational dynamics of nematic liquid crystals.

    PubMed

    Bhattacharjee, A K; Menon, Gautam I; Adhikari, R

    2008-08-01

    We propose an efficient numerical scheme, based on the method of lines, for solving the Landau-de Gennes equations describing the relaxational dynamics of nematic liquid crystals. Our method is computationally easy to implement, balancing requirements of efficiency and accuracy. We benchmark our method through the study of the following problems: the isotropic-nematic interface, growth of nematic droplets in the isotropic phase, and the kinetics of coarsening following a quench into the nematic phase. Our results, obtained through solutions of the full coarse-grained equations of motion with no approximations, provide a stringent test of the de Gennes ansatz for the isotropic-nematic interface, illustrate the anisotropic character of droplets in the nucleation regime, and validate dynamical scaling in the coarsening regime.

  5. The Tapered Hybrid Undulator (THUNDER) of the visible free-electron laser oscillator experiment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Robinson, K.E.; Quimby, D.C.; Slater, J.M.

    A 5 m tapered hybrid undulator (THUNDER) has been designed and built as part of the Boeing Aerospace Company and Spectra Technology, Inc. visible free-electron laser (FEL) oscillator experiment. The performance goals required of an undulator for a visible oscillator with large extraction are ambitious. They require the establishment of stringent magnetic field quality tolerances which impact design and fabrication techniques. The performance goals of THUNDER are presented. The tolerances resulting from the FEL interaction are contrasted and compared to those of a synchrotron radiation source. The design, fabrication, and field measurements are discussed. The performance of THUNDER serves asmore » a benchmark for future wiggler/undulator design for advanced FEL's and synchrotron radiation sources.« less

  6. Benchmarking Spike-Based Visual Recognition: A Dataset and Evaluation

    PubMed Central

    Liu, Qian; Pineda-García, Garibaldi; Stromatias, Evangelos; Serrano-Gotarredona, Teresa; Furber, Steve B.

    2016-01-01

    Today, increasing attention is being paid to research into spike-based neural computation both to gain a better understanding of the brain and to explore biologically-inspired computation. Within this field, the primate visual pathway and its hierarchical organization have been extensively studied. Spiking Neural Networks (SNNs), inspired by the understanding of observed biological structure and function, have been successfully applied to visual recognition and classification tasks. In addition, implementations on neuromorphic hardware have enabled large-scale networks to run in (or even faster than) real time, making spike-based neural vision processing accessible on mobile robots. Neuromorphic sensors such as silicon retinas are able to feed such mobile systems with real-time visual stimuli. A new set of vision benchmarks for spike-based neural processing are now needed to measure progress quantitatively within this rapidly advancing field. We propose that a large dataset of spike-based visual stimuli is needed to provide meaningful comparisons between different systems, and a corresponding evaluation methodology is also required to measure the performance of SNN models and their hardware implementations. In this paper we first propose an initial NE (Neuromorphic Engineering) dataset based on standard computer vision benchmarksand that uses digits from the MNIST database. This dataset is compatible with the state of current research on spike-based image recognition. The corresponding spike trains are produced using a range of techniques: rate-based Poisson spike generation, rank order encoding, and recorded output from a silicon retina with both flashing and oscillating input stimuli. In addition, a complementary evaluation methodology is presented to assess both model-level and hardware-level performance. Finally, we demonstrate the use of the dataset and the evaluation methodology using two SNN models to validate the performance of the models and their hardware implementations. With this dataset we hope to (1) promote meaningful comparison between algorithms in the field of neural computation, (2) allow comparison with conventional image recognition methods, (3) provide an assessment of the state of the art in spike-based visual recognition, and (4) help researchers identify future directions and advance the field. PMID:27853419

  7. Benchmarking Spike-Based Visual Recognition: A Dataset and Evaluation.

    PubMed

    Liu, Qian; Pineda-García, Garibaldi; Stromatias, Evangelos; Serrano-Gotarredona, Teresa; Furber, Steve B

    2016-01-01

    Today, increasing attention is being paid to research into spike-based neural computation both to gain a better understanding of the brain and to explore biologically-inspired computation. Within this field, the primate visual pathway and its hierarchical organization have been extensively studied. Spiking Neural Networks (SNNs), inspired by the understanding of observed biological structure and function, have been successfully applied to visual recognition and classification tasks. In addition, implementations on neuromorphic hardware have enabled large-scale networks to run in (or even faster than) real time, making spike-based neural vision processing accessible on mobile robots. Neuromorphic sensors such as silicon retinas are able to feed such mobile systems with real-time visual stimuli. A new set of vision benchmarks for spike-based neural processing are now needed to measure progress quantitatively within this rapidly advancing field. We propose that a large dataset of spike-based visual stimuli is needed to provide meaningful comparisons between different systems, and a corresponding evaluation methodology is also required to measure the performance of SNN models and their hardware implementations. In this paper we first propose an initial NE (Neuromorphic Engineering) dataset based on standard computer vision benchmarksand that uses digits from the MNIST database. This dataset is compatible with the state of current research on spike-based image recognition. The corresponding spike trains are produced using a range of techniques: rate-based Poisson spike generation, rank order encoding, and recorded output from a silicon retina with both flashing and oscillating input stimuli. In addition, a complementary evaluation methodology is presented to assess both model-level and hardware-level performance. Finally, we demonstrate the use of the dataset and the evaluation methodology using two SNN models to validate the performance of the models and their hardware implementations. With this dataset we hope to (1) promote meaningful comparison between algorithms in the field of neural computation, (2) allow comparison with conventional image recognition methods, (3) provide an assessment of the state of the art in spike-based visual recognition, and (4) help researchers identify future directions and advance the field.

  8. Compensatory neurofuzzy model for discrete data classification in biomedical

    NASA Astrophysics Data System (ADS)

    Ceylan, Rahime

    2015-03-01

    Biomedical data is separated to two main sections: signals and discrete data. So, studies in this area are about biomedical signal classification or biomedical discrete data classification. There are artificial intelligence models which are relevant to classification of ECG, EMG or EEG signals. In same way, in literature, many models exist for classification of discrete data taken as value of samples which can be results of blood analysis or biopsy in medical process. Each algorithm could not achieve high accuracy rate on classification of signal and discrete data. In this study, compensatory neurofuzzy network model is presented for classification of discrete data in biomedical pattern recognition area. The compensatory neurofuzzy network has a hybrid and binary classifier. In this system, the parameters of fuzzy systems are updated by backpropagation algorithm. The realized classifier model is conducted to two benchmark datasets (Wisconsin Breast Cancer dataset and Pima Indian Diabetes dataset). Experimental studies show that compensatory neurofuzzy network model achieved 96.11% accuracy rate in classification of breast cancer dataset and 69.08% accuracy rate was obtained in experiments made on diabetes dataset with only 10 iterations.

  9. Indoor Modelling Benchmark for 3D Geometry Extraction

    NASA Astrophysics Data System (ADS)

    Thomson, C.; Boehm, J.

    2014-06-01

    A combination of faster, cheaper and more accurate hardware, more sophisticated software, and greater industry acceptance have all laid the foundations for an increased desire for accurate 3D parametric models of buildings. Pointclouds are the data source of choice currently with static terrestrial laser scanning the predominant tool for large, dense volume measurement. The current importance of pointclouds as the primary source of real world representation is endorsed by CAD software vendor acquisitions of pointcloud engines in 2011. Both the capture and modelling of indoor environments require great effort in time by the operator (and therefore cost). Automation is seen as a way to aid this by reducing the workload of the user and some commercial packages have appeared that provide automation to some degree. In the data capture phase, advances in indoor mobile mapping systems are speeding up the process, albeit currently with a reduction in accuracy. As a result this paper presents freely accessible pointcloud datasets of two typical areas of a building each captured with two different capture methods and each with an accurate wholly manually created model. These datasets are provided as a benchmark for the research community to gauge the performance and improvements of various techniques for indoor geometry extraction. With this in mind, non-proprietary, interoperable formats are provided such as E57 for the scans and IFC for the reference model. The datasets can be found at: http://indoor-bench.github.io/indoor-bench.

  10. Joint estimation of motion and illumination change in a sequence of images

    NASA Astrophysics Data System (ADS)

    Koo, Ja-Keoung; Kim, Hyo-Hun; Hong, Byung-Woo

    2015-09-01

    We present an algorithm that simultaneously computes optical flow and estimates illumination change from an image sequence in a unified framework. We propose an energy functional consisting of conventional optical flow energy based on Horn-Schunck method and an additional constraint that is designed to compensate for illumination changes. Any undesirable illumination change that occurs in the imaging procedure in a sequence while the optical flow is being computed is considered a nuisance factor. In contrast to the conventional optical flow algorithm based on Horn-Schunck functional, which assumes the brightness constancy constraint, our algorithm is shown to be robust with respect to temporal illumination changes in the computation of optical flows. An efficient conjugate gradient descent technique is used in the optimization procedure as a numerical scheme. The experimental results obtained from the Middlebury benchmark dataset demonstrate the robustness and the effectiveness of our algorithm. In addition, comparative analysis of our algorithm and Horn-Schunck algorithm is performed on the additional test dataset that is constructed by applying a variety of synthetic bias fields to the original image sequences in the Middlebury benchmark dataset in order to demonstrate that our algorithm outperforms the Horn-Schunck algorithm. The superior performance of the proposed method is observed in terms of both qualitative visualizations and quantitative accuracy errors when compared to Horn-Schunck optical flow algorithm that easily yields poor results in the presence of small illumination changes leading to violation of the brightness constancy constraint.

  11. Benchmark for license plate character segmentation

    NASA Astrophysics Data System (ADS)

    Gonçalves, Gabriel Resende; da Silva, Sirlene Pio Gomes; Menotti, David; Shwartz, William Robson

    2016-09-01

    Automatic license plate recognition (ALPR) has been the focus of many researches in the past years. In general, ALPR is divided into the following problems: detection of on-track vehicles, license plate detection, segmentation of license plate characters, and optical character recognition (OCR). Even though commercial solutions are available for controlled acquisition conditions, e.g., the entrance of a parking lot, ALPR is still an open problem when dealing with data acquired from uncontrolled environments, such as roads and highways when relying only on imaging sensors. Due to the multiple orientations and scales of the license plates captured by the camera, a very challenging task of the ALPR is the license plate character segmentation (LPCS) step, because its effectiveness is required to be (near) optimal to achieve a high recognition rate by the OCR. To tackle the LPCS problem, this work proposes a benchmark composed of a dataset designed to focus specifically on the character segmentation step of the ALPR within an evaluation protocol. Furthermore, we propose the Jaccard-centroid coefficient, an evaluation measure more suitable than the Jaccard coefficient regarding the location of the bounding box within the ground-truth annotation. The dataset is composed of 2000 Brazilian license plates consisting of 14000 alphanumeric symbols and their corresponding bounding box annotations. We also present a straightforward approach to perform LPCS efficiently. Finally, we provide an experimental evaluation for the dataset based on five LPCS approaches and demonstrate the importance of character segmentation for achieving an accurate OCR.

  12. Pedotransfer functions for isoproturon sorption on soils and vadose zone materials.

    PubMed

    Moeys, Julien; Bergheaud, Valérie; Coquet, Yves

    2011-10-01

    Sorption coefficients (the linear K(D) or the non-linear K(F) and N(F)) are critical parameters in models of pesticide transport to groundwater or surface water. In this work, a dataset of isoproturon sorption coefficients and corresponding soil properties (264 K(D) and 55 K(F)) was compiled, and pedotransfer functions were built for predicting isoproturon sorption in soils and vadose zone materials. These were benchmarked against various other prediction methods. The results show that the organic carbon content (OC) and pH are the two main soil properties influencing isoproturon K(D) . The pedotransfer function is K(D) = 1.7822 + 0.0162 OC(1.5) - 0.1958 pH (K(D) in L kg(-1) and OC in g kg(-1)). For low-OC soils (OC < 6.15 g kg(-1)), clay and pH are most influential. The pedotransfer function is then K(D) = 0.9980 + 0.0002 clay - 0.0990 pH (clay in g kg(-1)). Benchmarking K(D) estimations showed that functions calibrated on more specific subsets of the data perform better on these subsets than functions calibrated on larger subsets. Predicting isoproturon sorption in soils in unsampled locations should rely, whenever possible, and by order of preference, on (a) site- or soil-specific pedotransfer functions, (b) pedotransfer functions calibrated on a large dataset, (c) K(OC) values calculated on a large dataset or (d) K(OC) values taken from existing pesticide properties databases. Copyright © 2011 Society of Chemical Industry.

  13. Prediction of rat protein subcellular localization with pseudo amino acid composition based on multiple sequential features.

    PubMed

    Shi, Ruijia; Xu, Cunshuan

    2011-06-01

    The study of rat proteins is an indispensable task in experimental medicine and drug development. The function of a rat protein is closely related to its subcellular location. Based on the above concept, we construct the benchmark rat proteins dataset and develop a combined approach for predicting the subcellular localization of rat proteins. From protein primary sequence, the multiple sequential features are obtained by using of discrete Fourier analysis, position conservation scoring function and increment of diversity, and these sequential features are selected as input parameters of the support vector machine. By the jackknife test, the overall success rate of prediction is 95.6% on the rat proteins dataset. Our method are performed on the apoptosis proteins dataset and the Gram-negative bacterial proteins dataset with the jackknife test, the overall success rates are 89.9% and 96.4%, respectively. The above results indicate that our proposed method is quite promising and may play a complementary role to the existing predictors in this area.

  14. Successful implementation of diabetes audits in Australia: the Australian National Diabetes Information Audit and Benchmarking (ANDIAB) initiative.

    PubMed

    Lee, A S; Colagiuri, S; Flack, J R

    2018-04-06

    We developed and implemented a national audit and benchmarking programme to describe the clinical status of people with diabetes attending specialist diabetes services in Australia. The Australian National Diabetes Information Audit and Benchmarking (ANDIAB) initiative was established as a quality audit activity. De-identified data on demographic, clinical, biochemical and outcome items were collected from specialist diabetes services across Australia to provide cross-sectional data on people with diabetes attending specialist centres at least biennially during the years 1998 to 2011. In total, 38 155 sets of data were collected over the eight ANDIAB audits. Each ANDIAB audit achieved its primary objective to collect, collate, analyse, audit and report clinical diabetes data in Australia. Each audit resulted in the production of a pooled data report, as well as individual site reports allowing comparison and benchmarking against other participating sites. The ANDIAB initiative resulted in the largest cross-sectional national de-identified dataset describing the clinical status of people with diabetes attending specialist diabetes services in Australia. ANDIAB showed that people treated by specialist services had a high burden of diabetes complications. This quality audit activity provided a framework to guide planning of healthcare services. © 2018 Diabetes UK.

  15. Fusion and Sense Making of Heterogeneous Sensor Network and Other Sources

    DTIC Science & Technology

    2017-03-16

    multimodal fusion framework that uses both training data and web resources for scene classification, the experimental results on the benchmark datasets...show that the proposed text-aided scene classification framework could significantly improve classification performance. Experimental results also show...human whose adaptability is achieved by reliability- dependent weighting of different sensory modalities. Experimental results show that the proposed

  16. Rotation-invariant features for multi-oriented text detection in natural images.

    PubMed

    Yao, Cong; Zhang, Xin; Bai, Xiang; Liu, Wenyu; Ma, Yi; Tu, Zhuowen

    2013-01-01

    Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of applications, such as object recognition, image/video retrieval, mapping/navigation, and human computer interaction. However, most existing systems are designed to detect and recognize horizontal (or near-horizontal) texts. Due to the increasing popularity of mobile-computing devices and applications, detecting texts of varying orientations from natural images under less controlled conditions has become an important but challenging task. In this paper, we propose a new algorithm to detect texts of varying orientations. Our algorithm is based on a two-level classification scheme and two sets of features specially designed for capturing the intrinsic characteristics of texts. To better evaluate the proposed method and compare it with the competing algorithms, we generate a comprehensive dataset with various types of texts in diverse real-world scenes. We also propose a new evaluation protocol, which is more suitable for benchmarking algorithms for detecting texts in varying orientations. Experiments on benchmark datasets demonstrate that our system compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on variant texts in complex natural scenes.

  17. Classification and assessment tools for structural motif discovery algorithms.

    PubMed

    Badr, Ghada; Al-Turaiki, Isra; Mathkour, Hassan

    2013-01-01

    Motif discovery is the problem of finding recurring patterns in biological data. Patterns can be sequential, mainly when discovered in DNA sequences. They can also be structural (e.g. when discovering RNA motifs). Finding common structural patterns helps to gain a better understanding of the mechanism of action (e.g. post-transcriptional regulation). Unlike DNA motifs, which are sequentially conserved, RNA motifs exhibit conservation in structure, which may be common even if the sequences are different. Over the past few years, hundreds of algorithms have been developed to solve the sequential motif discovery problem, while less work has been done for the structural case. In this paper, we survey, classify, and compare different algorithms that solve the structural motif discovery problem, where the underlying sequences may be different. We highlight their strengths and weaknesses. We start by proposing a benchmark dataset and a measurement tool that can be used to evaluate different motif discovery approaches. Then, we proceed by proposing our experimental setup. Finally, results are obtained using the proposed benchmark to compare available tools. To the best of our knowledge, this is the first attempt to compare tools solely designed for structural motif discovery. Results show that the accuracy of discovered motifs is relatively low. The results also suggest a complementary behavior among tools where some tools perform well on simple structures, while other tools are better for complex structures. We have classified and evaluated the performance of available structural motif discovery tools. In addition, we have proposed a benchmark dataset with tools that can be used to evaluate newly developed tools.

  18. Benchmarking the evaluated proton differential cross sections suitable for the EBS analysis of natSi and 16O

    NASA Astrophysics Data System (ADS)

    Kokkoris, M.; Dede, S.; Kantre, K.; Lagoyannis, A.; Ntemou, E.; Paneta, V.; Preketes-Sigalas, K.; Provatas, G.; Vlastou, R.; Bogdanović-Radović, I.; Siketić, Z.; Obajdin, N.

    2017-08-01

    The evaluated proton differential cross sections suitable for the Elastic Backscattering Spectroscopy (EBS) analysis of natSi and 16O, as obtained from SigmaCalc 2.0, have been benchmarked over a wide energy and angular range at two different accelerator laboratories, namely at N.C.S.R. 'Demokritos', Athens, Greece and at Ruđer Bošković Institute (RBI), Zagreb, Croatia, using a variety of high-purity thick targets of known stoichiometry. The results are presented in graphical and tabular forms, while the observed discrepancies, as well as, the limits in accuracy of the benchmarking procedure, along with target related effects, are thoroughly discussed and analysed. In the case of oxygen the agreement between simulated and experimental spectra was generally good, while for silicon serious discrepancies were observed above Ep,lab = 2.5 MeV, suggesting that a further tuning of the appropriate nuclear model parameters in the evaluated differential cross-section datasets is required.

  19. Benchmarking database performance for genomic data.

    PubMed

    Khushi, Matloob

    2015-06-01

    Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a novel region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).Inc. © 2015 Wiley Periodicals, Inc.

  20. SAR image classification based on CNN in real and simulation datasets

    NASA Astrophysics Data System (ADS)

    Peng, Lijiang; Liu, Ming; Liu, Xiaohua; Dong, Liquan; Hui, Mei; Zhao, Yuejin

    2018-04-01

    Convolution neural network (CNN) has made great success in image classification tasks. Even in the field of synthetic aperture radar automatic target recognition (SAR-ATR), state-of-art results has been obtained by learning deep representation of features on the MSTAR benchmark. However, the raw data of MSTAR have shortcomings in training a SAR-ATR model because of high similarity in background among the SAR images of each kind. This indicates that the CNN would learn the hierarchies of features of backgrounds as well as the targets. To validate the influence of the background, some other SAR images datasets have been made which contains the simulation SAR images of 10 manufactured targets such as tank and fighter aircraft, and the backgrounds of simulation SAR images are sampled from the whole original MSTAR data. The simulation datasets contain the dataset that the backgrounds of each kind images correspond to the one kind of backgrounds of MSTAR targets or clutters and the dataset that each image shares the random background of whole MSTAR targets or clutters. In addition, mixed datasets of MSTAR and simulation datasets had been made to use in the experiments. The CNN architecture proposed in this paper are trained on all datasets mentioned above. The experimental results shows that the architecture can get high performances on all datasets even the backgrounds of the images are miscellaneous, which indicates the architecture can learn a good representation of the targets even though the drastic changes on background.

  1. Methods to increase reproducibility in differential gene expression via meta-analysis

    PubMed Central

    Sweeney, Timothy E.; Haynes, Winston A.; Vallania, Francesco; Ioannidis, John P.; Khatri, Purvesh

    2017-01-01

    Findings from clinical and biological studies are often not reproducible when tested in independent cohorts. Due to the testing of a large number of hypotheses and relatively small sample sizes, results from whole-genome expression studies in particular are often not reproducible. Compared to single-study analysis, gene expression meta-analysis can improve reproducibility by integrating data from multiple studies. However, there are multiple choices in designing and carrying out a meta-analysis. Yet, clear guidelines on best practices are scarce. Here, we hypothesized that studying subsets of very large meta-analyses would allow for systematic identification of best practices to improve reproducibility. We therefore constructed three very large gene expression meta-analyses from clinical samples, and then examined meta-analyses of subsets of the datasets (all combinations of datasets with up to N/2 samples and K/2 datasets) compared to a ‘silver standard’ of differentially expressed genes found in the entire cohort. We tested three random-effects meta-analysis models using this procedure. We showed relatively greater reproducibility with more-stringent effect size thresholds with relaxed significance thresholds; relatively lower reproducibility when imposing extraneous constraints on residual heterogeneity; and an underestimation of actual false positive rate by Benjamini–Hochberg correction. In addition, multivariate regression showed that the accuracy of a meta-analysis increased significantly with more included datasets even when controlling for sample size. PMID:27634930

  2. Normal Modes Expose Active Sites in Enzymes.

    PubMed

    Glantz-Gashai, Yitav; Meirson, Tomer; Samson, Abraham O

    2016-12-01

    Accurate prediction of active sites is an important tool in bioinformatics. Here we present an improved structure based technique to expose active sites that is based on large changes of solvent accessibility accompanying normal mode dynamics. The technique which detects EXPOsure of active SITes through normal modEs is named EXPOSITE. The technique is trained using a small 133 enzyme dataset and tested using a large 845 enzyme dataset, both with known active site residues. EXPOSITE is also tested in a benchmark protein ligand dataset (PLD) comprising 48 proteins with and without bound ligands. EXPOSITE is shown to successfully locate the active site in most instances, and is found to be more accurate than other structure-based techniques. Interestingly, in several instances, the active site does not correspond to the largest pocket. EXPOSITE is advantageous due to its high precision and paves the way for structure based prediction of active site in enzymes.

  3. Normal Modes Expose Active Sites in Enzymes

    PubMed Central

    Glantz-Gashai, Yitav; Samson, Abraham O.

    2016-01-01

    Accurate prediction of active sites is an important tool in bioinformatics. Here we present an improved structure based technique to expose active sites that is based on large changes of solvent accessibility accompanying normal mode dynamics. The technique which detects EXPOsure of active SITes through normal modEs is named EXPOSITE. The technique is trained using a small 133 enzyme dataset and tested using a large 845 enzyme dataset, both with known active site residues. EXPOSITE is also tested in a benchmark protein ligand dataset (PLD) comprising 48 proteins with and without bound ligands. EXPOSITE is shown to successfully locate the active site in most instances, and is found to be more accurate than other structure-based techniques. Interestingly, in several instances, the active site does not correspond to the largest pocket. EXPOSITE is advantageous due to its high precision and paves the way for structure based prediction of active site in enzymes. PMID:28002427

  4. Complex versus simple models: ion-channel cardiac toxicity prediction.

    PubMed

    Mistry, Hitesh B

    2018-01-01

    There is growing interest in applying detailed mathematical models of the heart for ion-channel related cardiac toxicity prediction. However, a debate as to whether such complex models are required exists. Here an assessment in the predictive performance between two established large-scale biophysical cardiac models and a simple linear model B net was conducted. Three ion-channel data-sets were extracted from literature. Each compound was designated a cardiac risk category using two different classification schemes based on information within CredibleMeds. The predictive performance of each model within each data-set for each classification scheme was assessed via a leave-one-out cross validation. Overall the B net model performed equally as well as the leading cardiac models in two of the data-sets and outperformed both cardiac models on the latest. These results highlight the importance of benchmarking complex versus simple models but also encourage the development of simple models.

  5. Reevaluation of health risk benchmark for sustainable water practice through risk analysis of rooftop-harvested rainwater.

    PubMed

    Lim, Keah-Ying; Jiang, Sunny C

    2013-12-15

    Health risk concerns associated with household use of rooftop-harvested rainwater (HRW) constitute one of the main impediments to exploit the benefits of rainwater harvesting in the United States. However, the benchmark based on the U.S. EPA acceptable annual infection risk level of ≤1 case per 10,000 persons per year (≤10(-4) pppy) developed to aid drinking water regulations may be unnecessarily stringent for sustainable water practice. In this study, we challenge the current risk benchmark by quantifying the potential microbial risk associated with consumption of HRW-irrigated home produce and comparing it against the current risk benchmark. Microbial pathogen data for HRW and exposure rates reported in literature are applied to assess the potential microbial risk posed to household consumers of their homegrown produce. A Quantitative Microbial Risk Assessment (QMRA) model based on worst-case scenario (e.g. overhead irrigation, no pathogen inactivation) is applied to three crops that are most popular among home gardeners (lettuce, cucumbers, and tomatoes) and commonly consumed raw. The infection risks of household consumers attributed to consumption of these home produce vary with the type of produce. The lettuce presents the highest risk, which is followed by tomato and cucumber, respectively. Results show that the 95th percentile values of infection risk per intake event of home produce are one to three orders of magnitude (10(-7) to 10(-5)) lower than U.S. EPA risk benchmark (≤10(-4) pppy). However, annual infection risks under the same scenario (multiple intake events in a year) are very likely to exceed the risk benchmark by one order of magnitude in some cases. Estimated 95th percentile values of the annual risk are in the 10(-4) to 10(-3) pppy range, which are still lower than the 10(-3) to 10(-1) pppy risk range of reclaimed water irrigated produce estimated in comparable studies. We further discuss the desirability of HRW for irrigating home produce based on the relative risk of HRW to reclaimed wastewater for irrigation of food crops. The appropriateness of the ≤10(-4) pppy risk benchmark for assessing safety level of HRW-irrigated fresh produce is questioned by considering the assumptions made for the QMRA model. Consequently, the need of an updated approach to assess appropriateness of sustainable water practice for making guidelines and policies is proposed. Copyright © 2013 Elsevier Ltd. All rights reserved.

  6. GiniClust: detecting rare cell types from single-cell gene expression data with Gini index.

    PubMed

    Jiang, Lan; Chen, Huidong; Pinello, Luca; Yuan, Guo-Cheng

    2016-07-01

    High-throughput single-cell technologies have great potential to discover new cell types; however, it remains challenging to detect rare cell types that are distinct from a large population. We present a novel computational method, called GiniClust, to overcome this challenge. Validation against a benchmark dataset indicates that GiniClust achieves high sensitivity and specificity. Application of GiniClust to public single-cell RNA-seq datasets uncovers previously unrecognized rare cell types, including Zscan4-expressing cells within mouse embryonic stem cells and hemoglobin-expressing cells in the mouse cortex and hippocampus. GiniClust also correctly detects a small number of normal cells that are mixed in a cancer cell population.

  7. Matt: local flexibility aids protein multiple structure alignment.

    PubMed

    Menke, Matthew; Berger, Bonnie; Cowen, Lenore

    2008-01-01

    Even when there is agreement on what measure a protein multiple structure alignment should be optimizing, finding the optimal alignment is computationally prohibitive. One approach used by many previous methods is aligned fragment pair chaining, where short structural fragments from all the proteins are aligned against each other optimally, and the final alignment chains these together in geometrically consistent ways. Ye and Godzik have recently suggested that adding geometric flexibility may help better model protein structures in a variety of contexts. We introduce the program Matt (Multiple Alignment with Translations and Twists), an aligned fragment pair chaining algorithm that, in intermediate steps, allows local flexibility between fragments: small translations and rotations are temporarily allowed to bring sets of aligned fragments closer, even if they are physically impossible under rigid body transformations. After a dynamic programming assembly guided by these "bent" alignments, geometric consistency is restored in the final step before the alignment is output. Matt is tested against other recent multiple protein structure alignment programs on the popular Homstrad and SABmark benchmark datasets. Matt's global performance is competitive with the other programs on Homstrad, but outperforms the other programs on SABmark, a benchmark of multiple structure alignments of proteins with more distant homology. On both datasets, Matt demonstrates an ability to better align the ends of alpha-helices and beta-strands, an important characteristic of any structure alignment program intended to help construct a structural template library for threading approaches to the inverse protein-folding problem. The related question of whether Matt alignments can be used to distinguish distantly homologous structure pairs from pairs of proteins that are not homologous is also considered. For this purpose, a p-value score based on the length of the common core and average root mean squared deviation (RMSD) of Matt alignments is shown to largely separate decoys from homologous protein structures in the SABmark benchmark dataset. We postulate that Matt's strong performance comes from its ability to model proteins in different conformational states and, perhaps even more important, its ability to model backbone distortions in more distantly related proteins.

  8. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.

    PubMed

    Tsatsaronis, George; Balikas, Georgios; Malakasiotis, Prodromos; Partalas, Ioannis; Zschunke, Matthias; Alvers, Michael R; Weissenborn, Dirk; Krithara, Anastasia; Petridis, Sergios; Polychronopoulos, Dimitris; Almirantis, Yannis; Pavlopoulos, John; Baskiotis, Nicolas; Gallinari, Patrick; Artiéres, Thierry; Ngomo, Axel-Cyrille Ngonga; Heino, Norman; Gaussier, Eric; Barrio-Alvers, Liliana; Schroeder, Michael; Androutsopoulos, Ion; Paliouras, Georgios

    2015-04-30

    This article provides an overview of the first BIOASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BIOASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-understandable answers to given natural language questions by combining information from biomedical articles and ontologies. The 2013 BIOASQ competition comprised two tasks, Task 1a and Task 1b. In Task 1a participants were asked to automatically annotate new PUBMED documents with MESH headings. Twelve teams participated in Task 1a, with a total of 46 system runs submitted, and one of the teams performing consistently better than the MTI indexer used by NLM to suggest MESH headings to curators. Task 1b used benchmark datasets containing 29 development and 282 test English questions, along with gold standard (reference) answers, prepared by a team of biomedical experts from around Europe and participants had to automatically produce answers. Three teams participated in Task 1b, with 11 system runs. The BIOASQ infrastructure, including benchmark datasets, evaluation mechanisms, and the results of the participants and baseline methods, is publicly available. A publicly available evaluation infrastructure for biomedical semantic indexing and QA has been developed, which includes benchmark datasets, and can be used to evaluate systems that: assign MESH headings to published articles or to English questions; retrieve relevant RDF triples from ontologies, relevant articles and snippets from PUBMED Central; produce "exact" and paragraph-sized "ideal" answers (summaries). The results of the systems that participated in the 2013 BIOASQ competition are promising. In Task 1a one of the systems performed consistently better from the NLM's MTI indexer. In Task 1b the systems received high scores in the manual evaluation of the "ideal" answers; hence, they produced high quality summaries as answers. Overall, BIOASQ helped obtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedical experts to obtain concise, user-understandable answers to questions reflecting their real information needs.

  9. ClimateNet: A Machine Learning dataset for Climate Science Research

    NASA Astrophysics Data System (ADS)

    Prabhat, M.; Biard, J.; Ganguly, S.; Ames, S.; Kashinath, K.; Kim, S. K.; Kahou, S.; Maharaj, T.; Beckham, C.; O'Brien, T. A.; Wehner, M. F.; Williams, D. N.; Kunkel, K.; Collins, W. D.

    2017-12-01

    Deep Learning techniques have revolutionized commercial applications in Computer vision, speech recognition and control systems. The key for all of these developments was the creation of a curated, labeled dataset ImageNet, for enabling multiple research groups around the world to develop methods, benchmark performance and compete with each other. The success of Deep Learning can be largely attributed to the broad availability of this dataset. Our empirical investigations have revealed that Deep Learning is similarly poised to benefit the task of pattern detection in climate science. Unfortunately, labeled datasets, a key pre-requisite for training, are hard to find. Individual research groups are typically interested in specialized weather patterns, making it hard to unify, and share datasets across groups and institutions. In this work, we are proposing ClimateNet: a labeled dataset that provides labeled instances of extreme weather patterns, as well as associated raw fields in model and observational output. We develop a schema in NetCDF to enumerate weather pattern classes/types, store bounding boxes, and pixel-masks. We are also working on a TensorFlow implementation to natively import such NetCDF datasets, and are providing a reference convolutional architecture for binary classification tasks. Our hope is that researchers in Climate Science, as well as ML/DL, will be able to use (and extend) ClimateNet to make rapid progress in the application of Deep Learning for Climate Science research.

  10. Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation.

    PubMed

    Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi

    2015-01-01

    Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it.

  11. Dataset for Reporting of Malignant Mesothelioma of the Pleura or Peritoneum: Recommendations From the International Collaboration on Cancer Reporting (ICCR).

    PubMed

    Churg, Andrew; Attanoos, Richard; Borczuk, Alain C; Chirieac, Lucian R; Galateau-Sallé, Françoise; Gibbs, Allen; Henderson, Douglas; Roggli, Victor; Rusch, Valerie; Judge, Meagan J; Srigley, John R

    2016-10-01

    -The International Collaboration on Cancer Reporting is a not-for-profit organization formed by the Royal Colleges of Pathologists of Australasia and the United Kingdom; the College of American Pathologists; the Canadian Association of Pathologists-Association Canadienne des Pathologists, in association with the Canadian Partnership Against Cancer; and the European Society of Pathology. Its goal is to produce common, internationally agreed upon, evidence-based datasets for use throughout the world. -To describe a dataset developed by the Expert Panel of the International Collaboration on Cancer Reporting for reporting malignant mesothelioma of both the pleura and peritoneum. The dataset is composed of "required" (mandatory) and "recommended" (nonmandatory) elements. -Based on a review of the most recent evidence and supported by explanatory commentary. -Eight required elements and 7 recommended elements were agreed upon by the Expert Panel to represent the essential information for reporting malignant mesothelioma of the pleura and peritoneum. -In time, the widespread use of an internationally agreed upon, structured, pathology dataset for mesothelioma will lead not only to improved patient management but also provide valuable data for research and international benchmarks.

  12. Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation

    PubMed Central

    Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi

    2015-01-01

    Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133

  13. A Manual Segmentation Tool for Three-Dimensional Neuron Datasets.

    PubMed

    Magliaro, Chiara; Callara, Alejandro L; Vanello, Nicola; Ahluwalia, Arti

    2017-01-01

    To date, automated or semi-automated software and algorithms for segmentation of neurons from three-dimensional imaging datasets have had limited success. The gold standard for neural segmentation is considered to be the manual isolation performed by an expert. To facilitate the manual isolation of complex objects from image stacks, such as neurons in their native arrangement within the brain, a new Manual Segmentation Tool (ManSegTool) has been developed. ManSegTool allows user to load an image stack, scroll down the images and to manually draw the structures of interest stack-by-stack. Users can eliminate unwanted regions or split structures (i.e., branches from different neurons that are too close each other, but, to the experienced eye, clearly belong to a unique cell), to view the object in 3D and save the results obtained. The tool can be used for testing the performance of a single-neuron segmentation algorithm or to extract complex objects, where the available automated methods still fail. Here we describe the software's main features and then show an example of how ManSegTool can be used to segment neuron images acquired using a confocal microscope. In particular, expert neuroscientists were asked to segment different neurons from which morphometric variables were subsequently extracted as a benchmark for precision. In addition, a literature-defined index for evaluating the goodness of segmentation was used as a benchmark for accuracy. Neocortical layer axons from a DIADEM challenge dataset were also segmented with ManSegTool and compared with the manual "gold-standard" generated for the competition.

  14. A benchmark testing ground for integrating homology modeling and protein docking.

    PubMed

    Bohnuud, Tanggis; Luo, Lingqi; Wodak, Shoshana J; Bonvin, Alexandre M J J; Weng, Zhiping; Vajda, Sandor; Schueler-Furman, Ora; Kozakov, Dima

    2017-01-01

    Protein docking procedures carry out the task of predicting the structure of a protein-protein complex starting from the known structures of the individual protein components. More often than not, however, the structure of one or both components is not known, but can be derived by homology modeling on the basis of known structures of related proteins deposited in the Protein Data Bank (PDB). Thus, the problem is to develop methods that optimally integrate homology modeling and docking with the goal of predicting the structure of a complex directly from the amino acid sequences of its component proteins. One possibility is to use the best available homology modeling and docking methods. However, the models built for the individual subunits often differ to a significant degree from the bound conformation in the complex, often much more so than the differences observed between free and bound structures of the same protein, and therefore additional conformational adjustments, both at the backbone and side chain levels need to be modeled to achieve an accurate docking prediction. In particular, even homology models of overall good accuracy frequently include localized errors that unfavorably impact docking results. The predicted reliability of the different regions in the model can also serve as a useful input for the docking calculations. Here we present a benchmark dataset that should help to explore and solve combined modeling and docking problems. This dataset comprises a subset of the experimentally solved 'target' complexes from the widely used Docking Benchmark from the Weng Lab (excluding antibody-antigen complexes). This subset is extended to include the structures from the PDB related to those of the individual components of each complex, and hence represent potential templates for investigating and benchmarking integrated homology modeling and docking approaches. Template sets can be dynamically customized by specifying ranges in sequence similarity and in PDB release dates, or using other filtering options, such as excluding sets of specific structures from the template list. Multiple sequence alignments, as well as structural alignments of the templates to their corresponding subunits in the target are also provided. The resource is accessible online or can be downloaded at http://cluspro.org/benchmark, and is updated on a weekly basis in synchrony with new PDB releases. Proteins 2016; 85:10-16. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  15. Analysis of energy-based algorithms for RNA secondary structure prediction

    PubMed Central

    2012-01-01

    Background RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. Results We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Conclusions Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets. PMID:22296803

  16. Analysis of energy-based algorithms for RNA secondary structure prediction.

    PubMed

    Hajiaghayi, Monir; Condon, Anne; Hoos, Holger H

    2012-02-01

    RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.

  17. MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle

    PubMed Central

    Zapata-Peñasco, Icoquih; Poot-Hernandez, Augusto Cesar; Eguiarte, Luis E

    2017-01-01

    Abstract The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large “omic” datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa. PMID:29069412

  18. SU-G-BRC-17: Using Generalized Mean for Equivalent Square Estimation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, S; Fan, Q; Lei, Y

    Purpose: Equivalent Square (ES) is a widely used concept in radiotherapy. It enables us to determine many important quantities for a rectangular treatment field, without measurement, based on the corresponding values from its ES field. In this study, we propose a Generalized Mean (GM) type ES formula and compare it with other established formulae using benchmark datasets. Methods: Our GM approach is expressed as ES=(w•fx^α+(1-w)•fy^α)^(1/α), where fx, fy, α, and w represent field sizes, power index, and a weighting factor, respectively. When α=−1 it reduces to well-known Sterling type ES formulae. In our study, α and w are determined throughmore » least-square-fitting. Akaike Information Criterion (AIC) was used to benchmark the performance of each formula. BJR (Supplement 17) ES field table for X-ray PDDs and open field output factor tables in Varian TrueBeam representative dataset were used for validation. Results: Switching from α=−1 to α=−1.25, a 20% reduction in standard deviation of residual error in ES estimation was achieved for the BJR dataset. The maximum relative residual error was reduced from ∼3% (in Sterling formula) or ∼2% (in Vadash/Bjarngard formula) down to ∼1% in GM formula for open fields of all energies and at rectangular field sizes from 3cm to 40cm in the Varian dataset. The improvement of the GM over the Sterling type ES formulae is particularly noticeable for very elongated rectangular fields with short width. AIC analysis confirmed the superior performance of the GM formula after taking into account the expanded parameter space. Conclusion: The GM significantly outperforms Sterling type formulae at slightly increased computational cost. The GM calculation may nullify the requirement of data measurement for many rectangular fields and hence shorten the Linac commissioning process. Improved dose calculation accuracy is also expected by adopting the GM formula into treatment planning and secondary MU check systems.« less

  19. MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle.

    PubMed

    De Anda, Valerie; Zapata-Peñasco, Icoquih; Poot-Hernandez, Augusto Cesar; Eguiarte, Luis E; Contreras-Moreira, Bruno; Souza, Valeria

    2017-11-01

    The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large "omic" datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa. © The Author 2017. Published by Oxford University Press.

  20. BLOND, a building-level office environment dataset of typical electrical appliances.

    PubMed

    Kriechbaumer, Thomas; Jacobsen, Hans-Arno

    2018-03-27

    Energy metering has gained popularity as conventional meters are replaced by electronic smart meters that promise energy savings and higher comfort levels for occupants. Achieving these goals requires a deeper understanding of consumption patterns to reduce the energy footprint: load profile forecasting, power disaggregation, appliance identification, startup event detection, etc. Publicly available datasets are used to test, verify, and benchmark possible solutions to these problems. For this purpose, we present the BLOND dataset: continuous energy measurements of a typical office environment at high sampling rates with common appliances and load profiles. We provide voltage and current readings for aggregated circuits and matching fully-labeled ground truth data (individual appliance measurements). The dataset contains 53 appliances (16 classes) in a 3-phase power grid. BLOND-50 contains 213 days of measurements sampled at 50kSps (aggregate) and 6.4kSps (individual appliances). BLOND-250 consists of the same setup: 50 days, 250kSps (aggregate), 50kSps (individual appliances). These are the longest continuous measurements at such high sampling rates and fully-labeled ground truth we are aware of.

  1. BLOND, a building-level office environment dataset of typical electrical appliances

    NASA Astrophysics Data System (ADS)

    Kriechbaumer, Thomas; Jacobsen, Hans-Arno

    2018-03-01

    Energy metering has gained popularity as conventional meters are replaced by electronic smart meters that promise energy savings and higher comfort levels for occupants. Achieving these goals requires a deeper understanding of consumption patterns to reduce the energy footprint: load profile forecasting, power disaggregation, appliance identification, startup event detection, etc. Publicly available datasets are used to test, verify, and benchmark possible solutions to these problems. For this purpose, we present the BLOND dataset: continuous energy measurements of a typical office environment at high sampling rates with common appliances and load profiles. We provide voltage and current readings for aggregated circuits and matching fully-labeled ground truth data (individual appliance measurements). The dataset contains 53 appliances (16 classes) in a 3-phase power grid. BLOND-50 contains 213 days of measurements sampled at 50kSps (aggregate) and 6.4kSps (individual appliances). BLOND-250 consists of the same setup: 50 days, 250kSps (aggregate), 50kSps (individual appliances). These are the longest continuous measurements at such high sampling rates and fully-labeled ground truth we are aware of.

  2. BLOND, a building-level office environment dataset of typical electrical appliances

    PubMed Central

    Kriechbaumer, Thomas; Jacobsen, Hans-Arno

    2018-01-01

    Energy metering has gained popularity as conventional meters are replaced by electronic smart meters that promise energy savings and higher comfort levels for occupants. Achieving these goals requires a deeper understanding of consumption patterns to reduce the energy footprint: load profile forecasting, power disaggregation, appliance identification, startup event detection, etc. Publicly available datasets are used to test, verify, and benchmark possible solutions to these problems. For this purpose, we present the BLOND dataset: continuous energy measurements of a typical office environment at high sampling rates with common appliances and load profiles. We provide voltage and current readings for aggregated circuits and matching fully-labeled ground truth data (individual appliance measurements). The dataset contains 53 appliances (16 classes) in a 3-phase power grid. BLOND-50 contains 213 days of measurements sampled at 50kSps (aggregate) and 6.4kSps (individual appliances). BLOND-250 consists of the same setup: 50 days, 250kSps (aggregate), 50kSps (individual appliances). These are the longest continuous measurements at such high sampling rates and fully-labeled ground truth we are aware of. PMID:29583141

  3. Integration of heterogeneous features for remote sensing scene classification

    NASA Astrophysics Data System (ADS)

    Wang, Xin; Xiong, Xingnan; Ning, Chen; Shi, Aiye; Lv, Guofang

    2018-01-01

    Scene classification is one of the most important issues in remote sensing (RS) image processing. We find that features from different channels (shape, spectral, texture, etc.), levels (low-level and middle-level), or perspectives (local and global) could provide various properties for RS images, and then propose a heterogeneous feature framework to extract and integrate heterogeneous features with different types for RS scene classification. The proposed method is composed of three modules (1) heterogeneous features extraction, where three heterogeneous feature types, called DS-SURF-LLC, mean-Std-LLC, and MS-CLBP, are calculated, (2) heterogeneous features fusion, where the multiple kernel learning (MKL) is utilized to integrate the heterogeneous features, and (3) an MKL support vector machine classifier for RS scene classification. The proposed method is extensively evaluated on three challenging benchmark datasets (a 6-class dataset, a 12-class dataset, and a 21-class dataset), and the experimental results show that the proposed method leads to good classification performance. It produces good informative features to describe the RS image scenes. Moreover, the integration of heterogeneous features outperforms some state-of-the-art features on RS scene classification tasks.

  4. A homology-based pipeline for global prediction of post-translational modification sites

    NASA Astrophysics Data System (ADS)

    Chen, Xiang; Shi, Shao-Ping; Xu, Hao-Dong; Suo, Sheng-Bao; Qiu, Jian-Ding

    2016-05-01

    The pathways of protein post-translational modifications (PTMs) have been shown to play particularly important roles for almost any biological process. Identification of PTM substrates along with information on the exact sites is fundamental for fully understanding or controlling biological processes. Alternative computational strategies would help to annotate PTMs in a high-throughput manner. Traditional algorithms are suited for identifying the common organisms and tissues that have a complete PTM atlas or extensive experimental data. While annotation of rare PTMs in most organisms is a clear challenge. In this work, to this end we have developed a novel homology-based pipeline named PTMProber that allows identification of potential modification sites for most of the proteomes lacking PTMs data. Cross-promotion E-value (CPE) as stringent benchmark has been used in our pipeline to evaluate homology to known modification sites. Independent-validation tests show that PTMProber achieves over 58.8% recall with high precision by CPE benchmark. Comparisons with other machine-learning tools show that PTMProber pipeline performs better on general predictions. In addition, we developed a web-based tool to integrate this pipeline at http://bioinfo.ncu.edu.cn/PTMProber/index.aspx. In addition to pre-constructed prediction models of PTM, the website provides an extensional functionality to allow users to customize models.

  5. PPI4DOCK: large scale assessment of the use of homology models in free docking over more than 1000 realistic targets.

    PubMed

    Yu, Jinchao; Guerois, Raphaël

    2016-12-15

    Protein-protein docking methods are of great importance for understanding interactomes at the structural level. It has become increasingly appealing to use not only experimental structures but also homology models of unbound subunits as input for docking simulations. So far we are missing a large scale assessment of the success of rigid-body free docking methods on homology models. We explored how we could benefit from comparative modelling of unbound subunits to expand docking benchmark datasets. Starting from a collection of 3157 non-redundant, high X-ray resolution heterodimers, we developed the PPI4DOCK benchmark containing 1417 docking targets based on unbound homology models. Rigid-body docking by Zdock showed that for 1208 cases (85.2%), at least one correct decoy was generated, emphasizing the efficiency of rigid-body docking in generating correct assemblies. Overall, the PPI4DOCK benchmark contains a large set of realistic cases and provides new ground for assessing docking and scoring methodologies. Benchmark sets can be downloaded from http://biodev.cea.fr/interevol/ppi4dock/ CONTACT: guerois@cea.frSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Navier-Stokes analysis of a liquid rocket engine disk cavity

    NASA Technical Reports Server (NTRS)

    Benjamin, Theodore G.; Mcconnaughey, Paul K.

    1991-01-01

    This paper presents a Navier-Stokes analysis of hydrodynamic phenomena occurring in the aft disk cavity of a liquid rocket engine turbine. The cavity analyzed in the Space Shuttle Main Engine Alternate Turbopump currently being developed by NASA and Pratt and Whitney. Comparison of results obtained from the Navier-Stokes code for two rotating disk datasets available in the literature are presented as benchmark validations. The benchmark results obtained using the code show good agreement relative to experimental data, and the turbine disk cavity was analyzed with comparable grid resolution, dissipation levels, and turbulence models. Predicted temperatures in the cavity show that little mixing of hot and cold fluid occurs in the cavity and the flow is dominated by swirl and pumping up the rotating disk.

  7. Magnesium-binding architectures in RNA crystal structures: validation, binding preferences, classification and motif detection

    PubMed Central

    Zheng, Heping; Shabalin, Ivan G.; Handing, Katarzyna B.; Bujnicki, Janusz M.; Minor, Wladek

    2015-01-01

    The ubiquitous presence of magnesium ions in RNA has long been recognized as a key factor governing RNA folding, and is crucial for many diverse functions of RNA molecules. In this work, Mg2+-binding architectures in RNA were systematically studied using a database of RNA crystal structures from the Protein Data Bank (PDB). Due to the abundance of poorly modeled or incorrectly identified Mg2+ ions, the set of all sites was comprehensively validated and filtered to identify a benchmark dataset of 15 334 ‘reliable’ RNA-bound Mg2+ sites. The normalized frequencies by which specific RNA atoms coordinate Mg2+ were derived for both the inner and outer coordination spheres. A hierarchical classification system of Mg2+ sites in RNA structures was designed and applied to the benchmark dataset, yielding a set of 41 types of inner-sphere and 95 types of outer-sphere coordinating patterns. This classification system has also been applied to describe six previously reported Mg2+-binding motifs and detect them in new RNA structures. Investigation of the most populous site types resulted in the identification of seven novel Mg2+-binding motifs, and all RNA structures in the PDB were screened for the presence of these motifs. PMID:25800744

  8. Benchmarking homogenization algorithms for monthly data

    NASA Astrophysics Data System (ADS)

    Venema, V. K. C.; Mestre, O.; Aguilar, E.; Auer, I.; Guijarro, J. A.; Domonkos, P.; Vertacnik, G.; Szentimrey, T.; Stepanek, P.; Zahradnicek, P.; Viarre, J.; Müller-Westermeier, G.; Lakatos, M.; Williams, C. N.; Menne, M. J.; Lindau, R.; Rasol, D.; Rustemeier, E.; Kolokythas, K.; Marinova, T.; Andresen, L.; Acquaotta, F.; Fratiannil, S.; Cheval, S.; Klancar, M.; Brunetti, M.; Gruber, C.; Prohom Duran, M.; Likso, T.; Esteban, P.; Brandsma, T.; Willett, K.

    2013-09-01

    The COST (European Cooperation in Science and Technology) Action ES0601: Advances in homogenization methods of climate series: an integrated approach (HOME) has executed a blind intercomparison and validation study for monthly homogenization algorithms. Time series of monthly temperature and precipitation were evaluated because of their importance for climate studies. The algorithms were validated against a realistic benchmark dataset. Participants provided 25 separate homogenized contributions as part of the blind study as well as 22 additional solutions submitted after the details of the imposed inhomogeneities were revealed. These homogenized datasets were assessed by a number of performance metrics including i) the centered root mean square error relative to the true homogeneous values at various averaging scales, ii) the error in linear trend estimates and iii) traditional contingency skill scores. The metrics were computed both using the individual station series as well as the network average regional series. The performance of the contributions depends significantly on the error metric considered. Although relative homogenization algorithms typically improve the homogeneity of temperature data, only the best ones improve precipitation data. Moreover, state-of-the-art relative homogenization algorithms developed to work with an inhomogeneous reference are shown to perform best. The study showed that currently automatic algorithms can perform as well as manual ones.

  9. Benchmarking Commercial Conformer Ensemble Generators.

    PubMed

    Friedrich, Nils-Ole; de Bruyn Kops, Christina; Flachsenberg, Florian; Sommer, Kai; Rarey, Matthias; Kirchmair, Johannes

    2017-11-27

    We assess and compare the performance of eight commercial conformer ensemble generators (ConfGen, ConfGenX, cxcalc, iCon, MOE LowModeMD, MOE Stochastic, MOE Conformation Import, and OMEGA) and one leading free algorithm, the distance geometry algorithm implemented in RDKit. The comparative study is based on a new version of the Platinum Diverse Dataset, a high-quality benchmarking dataset of 2859 protein-bound ligand conformations extracted from the PDB. Differences in the performance of commercial algorithms are much smaller than those observed for free algorithms in our previous study (J. Chem. Inf. 2017, 57, 529-539). For commercial algorithms, the median minimum root-mean-square deviations measured between protein-bound ligand conformations and ensembles of a maximum of 250 conformers are between 0.46 and 0.61 Å. Commercial conformer ensemble generators are characterized by their high robustness, with at least 99% of all input molecules successfully processed and few or even no substantial geometrical errors detectable in their output conformations. The RDKit distance geometry algorithm (with minimization enabled) appears to be a good free alternative since its performance is comparable to that of the midranked commercial algorithms. Based on a statistical analysis, we elaborate on which algorithms to use and how to parametrize them for best performance in different application scenarios.

  10. Simple mathematical law benchmarks human confrontations.

    PubMed

    Johnson, Neil F; Medina, Pablo; Zhao, Guannan; Messinger, Daniel S; Horgan, John; Gill, Paul; Bohorquez, Juan Camilo; Mattson, Whitney; Gangi, Devon; Qi, Hong; Manrique, Pedro; Velasquez, Nicolas; Morgenstern, Ana; Restrepo, Elvira; Johnson, Nicholas; Spagat, Michael; Zarama, Roberto

    2013-12-10

    Many high-profile societal problems involve an individual or group repeatedly attacking another - from child-parent disputes, sexual violence against women, civil unrest, violent conflicts and acts of terror, to current cyber-attacks on national infrastructure and ultrafast cyber-trades attacking stockholders. There is an urgent need to quantify the likely severity and timing of such future acts, shed light on likely perpetrators, and identify intervention strategies. Here we present a combined analysis of multiple datasets across all these domains which account for >100,000 events, and show that a simple mathematical law can benchmark them all. We derive this benchmark and interpret it, using a minimal mechanistic model grounded by state-of-the-art fieldwork. Our findings provide quantitative predictions concerning future attacks; a tool to help detect common perpetrators and abnormal behaviors; insight into the trajectory of a 'lone wolf'; identification of a critical threshold for spreading a message or idea among perpetrators; an intervention strategy to erode the most lethal clusters; and more broadly, a quantitative starting point for cross-disciplinary theorizing about human aggression at the individual and group level, in both real and online worlds.

  11. Simple mathematical law benchmarks human confrontations

    NASA Astrophysics Data System (ADS)

    Johnson, Neil F.; Medina, Pablo; Zhao, Guannan; Messinger, Daniel S.; Horgan, John; Gill, Paul; Bohorquez, Juan Camilo; Mattson, Whitney; Gangi, Devon; Qi, Hong; Manrique, Pedro; Velasquez, Nicolas; Morgenstern, Ana; Restrepo, Elvira; Johnson, Nicholas; Spagat, Michael; Zarama, Roberto

    2013-12-01

    Many high-profile societal problems involve an individual or group repeatedly attacking another - from child-parent disputes, sexual violence against women, civil unrest, violent conflicts and acts of terror, to current cyber-attacks on national infrastructure and ultrafast cyber-trades attacking stockholders. There is an urgent need to quantify the likely severity and timing of such future acts, shed light on likely perpetrators, and identify intervention strategies. Here we present a combined analysis of multiple datasets across all these domains which account for >100,000 events, and show that a simple mathematical law can benchmark them all. We derive this benchmark and interpret it, using a minimal mechanistic model grounded by state-of-the-art fieldwork. Our findings provide quantitative predictions concerning future attacks; a tool to help detect common perpetrators and abnormal behaviors; insight into the trajectory of a `lone wolf' identification of a critical threshold for spreading a message or idea among perpetrators; an intervention strategy to erode the most lethal clusters; and more broadly, a quantitative starting point for cross-disciplinary theorizing about human aggression at the individual and group level, in both real and online worlds.

  12. The Filament Sensor for Near Real-Time Detection of Cytoskeletal Fiber Structures

    PubMed Central

    Eltzner, Benjamin; Wollnik, Carina; Gottschlich, Carsten; Huckemann, Stephan; Rehfeldt, Florian

    2015-01-01

    A reliable extraction of filament data from microscopic images is of high interest in the analysis of acto-myosin structures as early morphological markers in mechanically guided differentiation of human mesenchymal stem cells and the understanding of the underlying fiber arrangement processes. In this paper, we propose the filament sensor (FS), a fast and robust processing sequence which detects and records location, orientation, length, and width for each single filament of an image, and thus allows for the above described analysis. The extraction of these features has previously not been possible with existing methods. We evaluate the performance of the proposed FS in terms of accuracy and speed in comparison to three existing methods with respect to their limited output. Further, we provide a benchmark dataset of real cell images along with filaments manually marked by a human expert as well as simulated benchmark images. The FS clearly outperforms existing methods in terms of computational runtime and filament extraction accuracy. The implementation of the FS and the benchmark database are available as open source. PMID:25996921

  13. IEA-Task 31 WAKEBENCH: Towards a protocol for wind farm flow model evaluation. Part 2: Wind farm wake models

    NASA Astrophysics Data System (ADS)

    Moriarty, Patrick; Sanz Rodrigo, Javier; Gancarski, Pawel; Chuchfield, Matthew; Naughton, Jonathan W.; Hansen, Kurt S.; Machefaux, Ewan; Maguire, Eoghan; Castellani, Francesco; Terzi, Ludovico; Breton, Simon-Philippe; Ueda, Yuko

    2014-06-01

    Researchers within the International Energy Agency (IEA) Task 31: Wakebench have created a framework for the evaluation of wind farm flow models operating at the microscale level. The framework consists of a model evaluation protocol integrated with a web-based portal for model benchmarking (www.windbench.net). This paper provides an overview of the building-block validation approach applied to wind farm wake models, including best practices for the benchmarking and data processing procedures for validation datasets from wind farm SCADA and meteorological databases. A hierarchy of test cases has been proposed for wake model evaluation, from similarity theory of the axisymmetric wake and idealized infinite wind farm, to single-wake wind tunnel (UMN-EPFL) and field experiments (Sexbierum), to wind farm arrays in offshore (Horns Rev, Lillgrund) and complex terrain conditions (San Gregorio). A summary of results from the axisymmetric wake, Sexbierum, Horns Rev and Lillgrund benchmarks are used to discuss the state-of-the-art of wake model validation and highlight the most relevant issues for future development.

  14. Automated benchmarking of peptide-MHC class I binding predictions.

    PubMed

    Trolle, Thomas; Metushi, Imir G; Greenbaum, Jason A; Kim, Yohan; Sidney, John; Lund, Ole; Sette, Alessandro; Peters, Bjoern; Nielsen, Morten

    2015-07-01

    Numerous in silico methods predicting peptide binding to major histocompatibility complex (MHC) class I molecules have been developed over the last decades. However, the multitude of available prediction tools makes it non-trivial for the end-user to select which tool to use for a given task. To provide a solid basis on which to compare different prediction tools, we here describe a framework for the automated benchmarking of peptide-MHC class I binding prediction tools. The framework runs weekly benchmarks on data that are newly entered into the Immune Epitope Database (IEDB), giving the public access to frequent, up-to-date performance evaluations of all participating tools. To overcome potential selection bias in the data included in the IEDB, a strategy was implemented that suggests a set of peptides for which different prediction methods give divergent predictions as to their binding capability. Upon experimental binding validation, these peptides entered the benchmark study. The benchmark has run for 15 weeks and includes evaluation of 44 datasets covering 17 MHC alleles and more than 4000 peptide-MHC binding measurements. Inspection of the results allows the end-user to make educated selections between participating tools. Of the four participating servers, NetMHCpan performed the best, followed by ANN, SMM and finally ARB. Up-to-date performance evaluations of each server can be found online at http://tools.iedb.org/auto_bench/mhci/weekly. All prediction tool developers are invited to participate in the benchmark. Sign-up instructions are available at http://tools.iedb.org/auto_bench/mhci/join. mniel@cbs.dtu.dk or bpeters@liai.org Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. Automated benchmarking of peptide-MHC class I binding predictions

    PubMed Central

    Trolle, Thomas; Metushi, Imir G.; Greenbaum, Jason A.; Kim, Yohan; Sidney, John; Lund, Ole; Sette, Alessandro; Peters, Bjoern; Nielsen, Morten

    2015-01-01

    Motivation: Numerous in silico methods predicting peptide binding to major histocompatibility complex (MHC) class I molecules have been developed over the last decades. However, the multitude of available prediction tools makes it non-trivial for the end-user to select which tool to use for a given task. To provide a solid basis on which to compare different prediction tools, we here describe a framework for the automated benchmarking of peptide-MHC class I binding prediction tools. The framework runs weekly benchmarks on data that are newly entered into the Immune Epitope Database (IEDB), giving the public access to frequent, up-to-date performance evaluations of all participating tools. To overcome potential selection bias in the data included in the IEDB, a strategy was implemented that suggests a set of peptides for which different prediction methods give divergent predictions as to their binding capability. Upon experimental binding validation, these peptides entered the benchmark study. Results: The benchmark has run for 15 weeks and includes evaluation of 44 datasets covering 17 MHC alleles and more than 4000 peptide-MHC binding measurements. Inspection of the results allows the end-user to make educated selections between participating tools. Of the four participating servers, NetMHCpan performed the best, followed by ANN, SMM and finally ARB. Availability and implementation: Up-to-date performance evaluations of each server can be found online at http://tools.iedb.org/auto_bench/mhci/weekly. All prediction tool developers are invited to participate in the benchmark. Sign-up instructions are available at http://tools.iedb.org/auto_bench/mhci/join. Contact: mniel@cbs.dtu.dk or bpeters@liai.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25717196

  16. DeepSig: deep learning improves signal peptide detection in proteins.

    PubMed

    Savojardo, Castrense; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita

    2018-05-15

    The identification of signal peptides in protein sequences is an important step toward protein localization and function characterization. Here, we present DeepSig, an improved approach for signal peptide detection and cleavage-site prediction based on deep learning methods. Comparative benchmarks performed on an updated independent dataset of proteins show that DeepSig is the current best performing method, scoring better than other available state-of-the-art approaches on both signal peptide detection and precise cleavage-site identification. DeepSig is available as both standalone program and web server at https://deepsig.biocomp.unibo.it. All datasets used in this study can be obtained from the same website. pierluigi.martelli@unibo.it. Supplementary data are available at Bioinformatics online.

  17. Blind Pose Prediction, Scoring, and Affinity Ranking of the CSAR 2014 Dataset.

    PubMed

    Martiny, Virginie Y; Martz, François; Selwa, Edithe; Iorga, Bogdan I

    2016-06-27

    The 2014 CSAR Benchmark Exercise was focused on three protein targets: coagulation factor Xa, spleen tyrosine kinase, and bacterial tRNA methyltransferase. Our protocol involved a preliminary analysis of the structural information available in the Protein Data Bank for the protein targets, which allowed the identification of the most appropriate docking software and scoring functions to be used for the rescoring of several docking conformations datasets, as well as for pose prediction and affinity ranking. The two key points of this study were (i) the prior evaluation of molecular modeling tools that are most adapted for each target and (ii) the increased search efficiency during the docking process to better explore the conformational space of big and flexible ligands.

  18. Empirical evaluation of interest-level criteria

    NASA Astrophysics Data System (ADS)

    Sahar, Sigal; Mansour, Yishay

    1999-02-01

    Efficient association rule mining algorithms already exist, however, as the size of databases increases, the number of patterns mined by the algorithms increases to such an extent that their manual evaluation becomes impractical. Automatic evaluation methods are, therefore, required in order to sift through the initial list of rules, which the datamining algorithm outputs. These evaluation methods, or criteria, rank the association rules mined from the dataset. We empirically examined several such statistical criteria: new criteria, as well as previously known ones. The empirical evaluation was conducted using several databases, including a large real-life dataset, acquired from an order-by-phone grocery store, a dataset composed from www proxy logs, and several datasets from the UCI repository. We were interested in discovering whether the ranking performed by the various criteria is similar or easily distinguishable. Our evaluation detected, when significant differences exist, three patterns of behavior in the eight criteria we examined. There is an obvious dilemma in determining how many association rules to choose (in accordance with support and confidence parameters). The tradeoff is between having stringent parameters and, therefore, few rules, or lenient parameters and, thus, a multitude of rules. In many cases, our empirical evaluation revealed that most of the rules found by the comparably strict parameters ranked highly according to the interestingness criteria, when using lax parameters (producing significantly more association rules). Finally, we discuss the association rules that ranked highest, explain why these results are sound, and how they direct future research.

  19. Transductive multi-view zero-shot learning.

    PubMed

    Fu, Yanwei; Hospedales, Timothy M; Xiang, Tao; Gong, Shaogang

    2015-11-01

    Most existing zero-shot learning approaches exploit transfer learning via an intermediate semantic representation shared between an annotated auxiliary dataset and a target dataset with different classes and no annotation. A projection from a low-level feature space to the semantic representation space is learned from the auxiliary dataset and applied without adaptation to the target dataset. In this paper we identify two inherent limitations with these approaches. First, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift problem and propose a novel framework, transductive multi-view embedding, to solve it. The second limitation is the prototype sparsity problem which refers to the fact that for each target class, only a single prototype is available for zero-shot learning given a semantic representation. To overcome this problem, a novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space. It effectively exploits the complementary information offered by different semantic representations and takes advantage of the manifold structures of multiple representation spaces in a coherent manner. We demonstrate through extensive experiments that the proposed approach (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complementarity of multiple semantic representations, (3) significantly outperforms existing methods for both zero-shot and N-shot recognition on three image and video benchmark datasets, and (4) enables novel cross-view annotation tasks.

  20. Integrative missing value estimation for microarray data.

    PubMed

    Hu, Jianjun; Li, Haifeng; Waterman, Michael S; Zhou, Xianghong Jasmine

    2006-10-12

    Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data, high measurement noise, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples. We present the integrative Missing Value Estimation method (iMISS) by incorporating information from multiple reference microarray datasets to improve missing value estimation. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets into consideration. To determine whether the given reference data sets are sufficiently informative for integration, we use a submatrix imputation approach. Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests. We demonstrated that the order-statistics-based integrative imputation algorithms can achieve significant improvements over the state-of-the-art missing value estimation approaches such as LLS and is especially good for imputing microarray datasets with a limited number of samples, high rates of missing data, or very noisy measurements. With the rapid accumulation of microarray datasets, the performance of our approach can be further improved by incorporating larger and more appropriate reference datasets.

  1. Invasive strategy in non-ST-segment elevation acute coronary syndrome: What should be the benchmark target in the real world patients? Insights from BLITZ-4 Quality Campaign.

    PubMed

    Olivari, Zoran; Chinaglia, Alessandra; Gonzini, Lucio; Falsini, Giovanni; Pilleri, Annarita; Valente, Serafina; Gregori, Gianserafino; Rollo, Raffaele; My, Luigi; Scrimieri, Pietro; Lanzillo, Tonino; Corrado, Luigi; Chiti, Maurizio; Picardi, Elisa

    2016-10-01

    To define a benchmark target for an invasive strategy (IS) rate appropriate for performance assessment in intermediate-to-high risk non-ST-segment elevation acute coronary syndromes (NSTE-ACS). During the BLITZ-4 campaign, which aimed at improving the quality of care in 163 Italian coronary care units, 4923/5786 (85.1%) of consecutive patients admitted with NSTE-ACS with troponin elevation and/or dynamic ST-T changes on the electrocardiogram were managed with IS. The reasons driving the choice (RDC) for a conservative strategy (CS) in the remaining 863 patients were prospectively recorded. In 33.8%, CS was mandatory because of patients refusal, known coronary anatomy or death before coronary angiography; in 52.8% it was clinically justified because of active stroke, bleeding, advanced frailty, severe comorbidities, contraindication to antiplatelet therapy or because they were considered to be at low risk; only in 13.4% the reasons, such as renal failure, advanced age or other, were less stringent. As compared to patients undergoing IS, those in the CS were 12years older and had significantly more severe comorbidities. The in-hospital and 6-month all-cause mortality were 9.0% vs 0.9% and 22.0% vs 3.9% in CS and IS groups respectively (p<0.0001 for both). As the RDC for CS were clinically correct in vast majority of cases the observed 85% invasive strategy rate may be considered as the desirable benchmark target in patients with NSTE-ACS. For the same reason, it remains questionable if the higher rate of IS could have improved the prognosis in CS patients, despite their highly unfavorable prognosis. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  2. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae

    PubMed Central

    Reguly, Teresa; Breitkreutz, Ashton; Boucher, Lorrie; Breitkreutz, Bobby-Joe; Hon, Gary C; Myers, Chad L; Parsons, Ainslie; Friesen, Helena; Oughtred, Rose; Tong, Amy; Stark, Chris; Ho, Yuen; Botstein, David; Andrews, Brenda; Boone, Charles; Troyanskya, Olga G; Ideker, Trey; Dolinski, Kara; Batada, Nizar N; Tyers, Mike

    2006-01-01

    Background The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference. Results We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID () and SGD () databases. Conclusion Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks. PMID:16762047

  3. LIPS database with LIPService: a microscopic image database of intracellular structures in Arabidopsis guard cells.

    PubMed

    Higaki, Takumi; Kutsuna, Natsumaro; Hasezawa, Seiichiro

    2013-05-16

    Intracellular configuration is an important feature of cell status. Recent advances in microscopic imaging techniques allow us to easily obtain a large number of microscopic images of intracellular structures. In this circumstance, automated microscopic image recognition techniques are of extreme importance to future phenomics/visible screening approaches. However, there was no benchmark microscopic image dataset for intracellular organelles in a specified plant cell type. We previously established the Live Images of Plant Stomata (LIPS) database, a publicly available collection of optical-section images of various intracellular structures of plant guard cells, as a model system of environmental signal perception and transduction. Here we report recent updates to the LIPS database and the establishment of a database table, LIPService. We updated the LIPS dataset and established a new interface named LIPService to promote efficient inspection of intracellular structure configurations. Cell nuclei, microtubules, actin microfilaments, mitochondria, chloroplasts, endoplasmic reticulum, peroxisomes, endosomes, Golgi bodies, and vacuoles can be filtered using probe names or morphometric parameters such as stomatal aperture. In addition to the serial optical sectional images of the original LIPS database, new volume-rendering data for easy web browsing of three-dimensional intracellular structures have been released to allow easy inspection of their configurations or relationships with cell status/morphology. We also demonstrated the utility of the new LIPS image database for automated organelle recognition of images from another plant cell image database with image clustering analyses. The updated LIPS database provides a benchmark image dataset for representative intracellular structures in Arabidopsis guard cells. The newly released LIPService allows users to inspect the relationship between organellar three-dimensional configurations and morphometrical parameters.

  4. Benchmarking the mesoscale variability in global ocean eddy-permitting numerical systems

    NASA Astrophysics Data System (ADS)

    Cipollone, Andrea; Masina, Simona; Storto, Andrea; Iovino, Doroteaciro

    2017-10-01

    The role of data assimilation procedures on representing ocean mesoscale variability is assessed by applying eddy statistics to a state-of-the-art global ocean reanalysis (C-GLORS), a free global ocean simulation (performed with the NEMO system) and an observation-based dataset (ARMOR3D) used as an independent benchmark. Numerical results are computed on a 1/4 ∘ horizontal grid (ORCA025) and share the same resolution with ARMOR3D dataset. This "eddy-permitting" resolution is sufficient to allow ocean eddies to form. Further to assessing the eddy statistics from three different datasets, a global three-dimensional eddy detection system is implemented in order to bypass the need of regional-dependent definition of thresholds, typical of commonly adopted eddy detection algorithms. It thus provides full three-dimensional eddy statistics segmenting vertical profiles from local rotational velocities. This criterion is crucial for discerning real eddies from transient surface noise that inevitably affects any two-dimensional algorithm. Data assimilation enhances and corrects mesoscale variability on a wide range of features that cannot be well reproduced otherwise. The free simulation fairly reproduces eddies emerging from western boundary currents and deep baroclinic instabilities, while underestimates shallower vortexes that populate the full basin. The ocean reanalysis recovers most of the missing turbulence, shown by satellite products , that is not generated by the model itself and consistently projects surface variability deep into the water column. The comparison with the statistically reconstructed vertical profiles from ARMOR3D show that ocean data assimilation is able to embed variability into the model dynamics, constraining eddies with in situ and altimetry observation and generating them consistently with local environment.

  5. Technical note: RabbitCT--an open platform for benchmarking 3D cone-beam reconstruction algorithms.

    PubMed

    Rohkohl, C; Keck, B; Hofmann, H G; Hornegger, J

    2009-09-01

    Fast 3D cone beam reconstruction is mandatory for many clinical workflows. For that reason, researchers and industry work hard on hardware-optimized 3D reconstruction. Backprojection is a major component of many reconstruction algorithms that require a projection of each voxel onto the projection data, including data interpolation, before updating the voxel value. This step is the bottleneck of most reconstruction algorithms and the focus of optimization in recent publications. A crucial limitation, however, of these publications is that the presented results are not comparable to each other. This is mainly due to variations in data acquisitions, preprocessing, and chosen geometries and the lack of a common publicly available test dataset. The authors provide such a standardized dataset that allows for substantial comparison of hardware accelerated backprojection methods. They developed an open platform RabbitCT (www.rabbitCT.com) for worldwide comparison in backprojection performance and ranking on different architectures using a specific high resolution C-arm CT dataset of a rabbit. This includes a sophisticated benchmark interface, a prototype implementation in C++, and image quality measures. At the time of writing, six backprojection implementations are already listed on the website. Optimizations include multithreading using Intel threading building blocks and OpenMP, vectorization using SSE, and computation on the GPU using CUDA 2.0. There is a need for objectively comparing backprojection implementations for reconstruction algorithms. RabbitCT aims to provide a solution to this problem by offering an open platform with fair chances for all participants. The authors are looking forward to a growing community and await feedback regarding future evaluations of novel software- and hardware-based acceleration schemes.

  6. Benchmark dose and the three Rs. Part I. Getting more information from the same number of animals.

    PubMed

    Slob, Wout

    2014-08-01

    Evaluating dose-response data using the Benchmark dose (BMD) approach rather than by the no observed adverse effect (NOAEL) approach implies a considerable step forward from the perspective of the Reduction, Replacement, and Refinement, three Rs, in particular the R of reduction: more information is obtained from the same number of animals, or, vice versa, similar information may be obtained from fewer animals. The first part of this twin paper focusses on the former, the second on the latter aspect. Regarding the former, the BMD approach provides more information from any given dose-response dataset in various ways. First, the BMDL (= BMD lower confidence bound) provides more information by its more explicit definition. Further, as compared to the NOAEL approach the BMD approach results in more statistical precision in the value of the point of departure (PoD), for deriving exposure limits. While part of the animals in the study do not directly contribute to the numerical value of a NOAEL, all animals are effectively used and do contribute to a BMDL. In addition, the BMD approach allows for combining similar datasets for the same chemical (e.g., both sexes) in a single analysis, which further increases precision. By combining a dose-response dataset with similar historical data for other chemicals, the precision can even be substantially increased. Further, the BMD approach results in more precise estimates for relative potency factors (RPFs, or TEFs). And finally, the BMD approach is not only more precise, it also allows for quantification of the precision in the BMD estimate, which is not possible in the NOAEL approach.

  7. 3D shape representation with spatial probabilistic distribution of intrinsic shape keypoints

    NASA Astrophysics Data System (ADS)

    Ghorpade, Vijaya K.; Checchin, Paul; Malaterre, Laurent; Trassoudaine, Laurent

    2017-12-01

    The accelerated advancement in modeling, digitizing, and visualizing techniques for 3D shapes has led to an increasing amount of 3D models creation and usage, thanks to the 3D sensors which are readily available and easy to utilize. As a result, determining the similarity between 3D shapes has become consequential and is a fundamental task in shape-based recognition, retrieval, clustering, and classification. Several decades of research in Content-Based Information Retrieval (CBIR) has resulted in diverse techniques for 2D and 3D shape or object classification/retrieval and many benchmark data sets. In this article, a novel technique for 3D shape representation and object classification has been proposed based on analyses of spatial, geometric distributions of 3D keypoints. These distributions capture the intrinsic geometric structure of 3D objects. The result of the approach is a probability distribution function (PDF) produced from spatial disposition of 3D keypoints, keypoints which are stable on object surface and invariant to pose changes. Each class/instance of an object can be uniquely represented by a PDF. This shape representation is robust yet with a simple idea, easy to implement but fast enough to compute. Both Euclidean and topological space on object's surface are considered to build the PDFs. Topology-based geodesic distances between keypoints exploit the non-planar surface properties of the object. The performance of the novel shape signature is tested with object classification accuracy. The classification efficacy of the new shape analysis method is evaluated on a new dataset acquired with a Time-of-Flight camera, and also, a comparative evaluation on a standard benchmark dataset with state-of-the-art methods is performed. Experimental results demonstrate superior classification performance of the new approach on RGB-D dataset and depth data.

  8. Intermolecular interactions in the condensed phase: Evaluation of semi-empirical quantum mechanical methods

    NASA Astrophysics Data System (ADS)

    Christensen, Anders S.; Kromann, Jimmy C.; Jensen, Jan H.; Cui, Qiang

    2017-10-01

    To facilitate further development of approximate quantum mechanical methods for condensed phase applications, we present a new benchmark dataset of intermolecular interaction energies in the solution phase for a set of 15 dimers, each containing one charged monomer. The reference interaction energy in solution is computed via a thermodynamic cycle that integrates dimer binding energy in the gas phase at the coupled cluster level and solute-solvent interaction with density functional theory; the estimated uncertainty of such calculated interaction energy is ±1.5 kcal/mol. The dataset is used to benchmark the performance of a set of semi-empirical quantum mechanical (SQM) methods that include DFTB3-D3, DFTB3/CPE-D3, OM2-D3, PM6-D3, PM6-D3H+, and PM7 as well as the HF-3c method. We find that while all tested SQM methods tend to underestimate binding energies in the gas phase with a root-mean-squared error (RMSE) of 2-5 kcal/mol, they overestimate binding energies in the solution phase with an RMSE of 3-4 kcal/mol, with the exception of DFTB3/CPE-D3 and OM2-D3, for which the systematic deviation is less pronounced. In addition, we find that HF-3c systematically overestimates binding energies in both gas and solution phases. As most approximate QM methods are parametrized and evaluated using data measured or calculated in the gas phase, the dataset represents an important first step toward calibrating QM based methods for application in the condensed phase where polarization and exchange repulsion need to be treated in a balanced fashion.

  9. SisFall: A Fall and Movement Dataset

    PubMed Central

    Sucerquia, Angela; López, José David; Vargas-Bonilla, Jesús Francisco

    2017-01-01

    Research on fall and movement detection with wearable devices has witnessed promising growth. However, there are few publicly available datasets, all recorded with smartphones, which are insufficient for testing new proposals due to their absence of objective population, lack of performed activities, and limited information. Here, we present a dataset of falls and activities of daily living (ADLs) acquired with a self-developed device composed of two types of accelerometer and one gyroscope. It consists of 19 ADLs and 15 fall types performed by 23 young adults, 15 ADL types performed by 14 healthy and independent participants over 62 years old, and data from one participant of 60 years old that performed all ADLs and falls. These activities were selected based on a survey and a literature analysis. We test the dataset with widely used feature extraction and a simple to implement threshold based classification, achieving up to 96% of accuracy in fall detection. An individual activity analysis demonstrates that most errors coincide in a few number of activities where new approaches could be focused. Finally, validation tests with elderly people significantly reduced the fall detection performance of the tested features. This validates findings of other authors and encourages developing new strategies with this new dataset as the benchmark. PMID:28117691

  10. SisFall: A Fall and Movement Dataset.

    PubMed

    Sucerquia, Angela; López, José David; Vargas-Bonilla, Jesús Francisco

    2017-01-20

    Research on fall and movement detection with wearable devices has witnessed promising growth. However, there are few publicly available datasets, all recorded with smartphones, which are insufficient for testing new proposals due to their absence of objective population, lack of performed activities, and limited information. Here, we present a dataset of falls and activities of daily living (ADLs) acquired with a self-developed device composed of two types of accelerometer and one gyroscope. It consists of 19 ADLs and 15 fall types performed by 23 young adults, 15 ADL types performed by 14 healthy and independent participants over 62 years old, and data from one participant of 60 years old that performed all ADLs and falls. These activities were selected based on a survey and a literature analysis. We test the dataset with widely used feature extraction and a simple to implement threshold based classification, achieving up to 96% of accuracy in fall detection. An individual activity analysis demonstrates that most errors coincide in a few number of activities where new approaches could be focused. Finally, validation tests with elderly people significantly reduced the fall detection performance of the tested features. This validates findings of other authors and encourages developing new strategies with this new dataset as the benchmark.

  11. ISRUC-Sleep: A comprehensive public dataset for sleep researchers.

    PubMed

    Khalighi, Sirvan; Sousa, Teresa; Santos, José Moutinho; Nunes, Urbano

    2016-02-01

    To facilitate the performance comparison of new methods for sleep patterns analysis, datasets with quality content, publicly-available, are very important and useful. We introduce an open-access comprehensive sleep dataset, called ISRUC-Sleep. The data were obtained from human adults, including healthy subjects, subjects with sleep disorders, and subjects under the effect of sleep medication. Each recording was randomly selected between PSG recordings that were acquired by the Sleep Medicine Centre of the Hospital of Coimbra University (CHUC). The dataset comprises three groups of data: (1) data concerning 100 subjects, with one recording session per subject; (2) data gathered from 8 subjects; two recording sessions were performed per subject, and (3) data collected from one recording session related to 10 healthy subjects. The polysomnography (PSG) recordings, associated with each subject, were visually scored by two human experts. Comparing the existing sleep-related public datasets, ISRUC-Sleep provides data of a reasonable number of subjects with different characteristics such as: data useful for studies involving changes in the PSG signals over time; and data of healthy subjects useful for studies involving comparison of healthy subjects with the patients, suffering from sleep disorders. This dataset was created aiming to complement existing datasets by providing easy-to-apply data collection with some characteristics not covered yet. ISRUC-Sleep can be useful for analysis of new contributions: (i) in biomedical signal processing; (ii) in development of ASSC methods; and (iii) on sleep physiology studies. To evaluate and compare new contributions, which use this dataset as a benchmark, results of applying a subject-independent automatic sleep stage classification (ASSC) method on ISRUC-Sleep dataset are presented. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  12. Scene text detection by leveraging multi-channel information and local context

    NASA Astrophysics Data System (ADS)

    Wang, Runmin; Qian, Shengyou; Yang, Jianfeng; Gao, Changxin

    2018-03-01

    As an important information carrier, texts play significant roles in many applications. However, text detection in unconstrained scenes is a challenging problem due to cluttered backgrounds, various appearances, uneven illumination, etc.. In this paper, an approach based on multi-channel information and local context is proposed to detect texts in natural scenes. According to character candidate detection plays a vital role in text detection system, Maximally Stable Extremal Regions(MSERs) and Graph-cut based method are integrated to obtain the character candidates by leveraging the multi-channel image information. A cascaded false positive elimination mechanism are constructed from the perspective of the character and the text line respectively. Since the local context information is very valuable for us, these information is utilized to retrieve the missing characters for boosting the text detection performance. Experimental results on two benchmark datasets, i.e., the ICDAR 2011 dataset and the ICDAR 2013 dataset, demonstrate that the proposed method have achieved the state-of-the-art performance.

  13. Performance Studies on Distributed Virtual Screening

    PubMed Central

    Krüger, Jens; de la Garza, Luis; Kohlbacher, Oliver; Nagel, Wolfgang E.

    2014-01-01

    Virtual high-throughput screening (vHTS) is an invaluable method in modern drug discovery. It permits screening large datasets or databases of chemical structures for those structures binding possibly to a drug target. Virtual screening is typically performed by docking code, which often runs sequentially. Processing of huge vHTS datasets can be parallelized by chunking the data because individual docking runs are independent of each other. The goal of this work is to find an optimal splitting maximizing the speedup while considering overhead and available cores on Distributed Computing Infrastructures (DCIs). We have conducted thorough performance studies accounting not only for the runtime of the docking itself, but also for structure preparation. Performance studies were conducted via the workflow-enabled science gateway MoSGrid (Molecular Simulation Grid). As input we used benchmark datasets for protein kinases. Our performance studies show that docking workflows can be made to scale almost linearly up to 500 concurrent processes distributed even over large DCIs, thus accelerating vHTS campaigns significantly. PMID:25032219

  14. The generalization ability of SVM classification based on Markov sampling.

    PubMed

    Xu, Jie; Tang, Yuan Yan; Zou, Bin; Xu, Zongben; Li, Luoqing; Lu, Yang; Zhang, Baochang

    2015-06-01

    The previously known works studying the generalization ability of support vector machine classification (SVMC) algorithm are usually based on the assumption of independent and identically distributed samples. In this paper, we go far beyond this classical framework by studying the generalization ability of SVMC based on uniformly ergodic Markov chain (u.e.M.c.) samples. We analyze the excess misclassification error of SVMC based on u.e.M.c. samples, and obtain the optimal learning rate of SVMC for u.e.M.c. We also introduce a new Markov sampling algorithm for SVMC to generate u.e.M.c. samples from given dataset, and present the numerical studies on the learning performance of SVMC based on Markov sampling for benchmark datasets. The numerical studies show that the SVMC based on Markov sampling not only has better generalization ability as the number of training samples are bigger, but also the classifiers based on Markov sampling are sparsity when the size of dataset is bigger with regard to the input dimension.

  15. Toward a Data Scalable Solution for Facilitating Discovery of Science Resources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weaver, Jesse R.; Castellana, Vito G.; Morari, Alessandro

    Science is increasingly motivated by the need to process larger quantities of data. It is facing severe challenges in data collection, management, and processing, so much so that the computational demands of “data scaling” are competing with, and in many fields surpassing, the traditional objective of decreasing processing time. Example domains with large datasets include astronomy, biology, genomics, climate/weather, and material sciences. This paper presents a real-world use case in which we wish to answer queries pro- vided by domain scientists in order to facilitate discovery of relevant science resources. The problem is that the metadata for these science resourcesmore » is very large and is growing quickly, rapidly increasing the need for a data scaling solution. We propose a system – SGEM – designed for answering graph-based queries over large datasets on cluster architectures, and we re- port performance results for queries on the current RDESC dataset of nearly 1.4 billion triples, and on the well-known BSBM SPARQL query benchmark.« less

  16. Single-shot diffraction data from the Mimivirus particle using an X-ray free-electron laser.

    PubMed

    Ekeberg, Tomas; Svenda, Martin; Seibert, M Marvin; Abergel, Chantal; Maia, Filipe R N C; Seltzer, Virginie; DePonte, Daniel P; Aquila, Andrew; Andreasson, Jakob; Iwan, Bianca; Jönsson, Olof; Westphal, Daniel; Odić, Duško; Andersson, Inger; Barty, Anton; Liang, Meng; Martin, Andrew V; Gumprecht, Lars; Fleckenstein, Holger; Bajt, Saša; Barthelmess, Miriam; Coppola, Nicola; Claverie, Jean-Michel; Loh, N Duane; Bostedt, Christoph; Bozek, John D; Krzywinski, Jacek; Messerschmidt, Marc; Bogan, Michael J; Hampton, Christina Y; Sierra, Raymond G; Frank, Matthias; Shoeman, Robert L; Lomb, Lukas; Foucar, Lutz; Epp, Sascha W; Rolles, Daniel; Rudenko, Artem; Hartmann, Robert; Hartmann, Andreas; Kimmel, Nils; Holl, Peter; Weidenspointner, Georg; Rudek, Benedikt; Erk, Benjamin; Kassemeyer, Stephan; Schlichting, Ilme; Strüder, Lothar; Ullrich, Joachim; Schmidt, Carlo; Krasniqi, Faton; Hauser, Günter; Reich, Christian; Soltau, Heike; Schorb, Sebastian; Hirsemann, Helmut; Wunderer, Cornelia; Graafsma, Heinz; Chapman, Henry; Hajdu, Janos

    2016-08-01

    Free-electron lasers (FEL) hold the potential to revolutionize structural biology by producing X-ray pules short enough to outrun radiation damage, thus allowing imaging of biological samples without the limitation from radiation damage. Thus, a major part of the scientific case for the first FELs was three-dimensional (3D) reconstruction of non-crystalline biological objects. In a recent publication we demonstrated the first 3D reconstruction of a biological object from an X-ray FEL using this technique. The sample was the giant Mimivirus, which is one of the largest known viruses with a diameter of 450 nm. Here we present the dataset used for this successful reconstruction. Data-analysis methods for single-particle imaging at FELs are undergoing heavy development but data collection relies on very limited time available through a highly competitive proposal process. This dataset provides experimental data to the entire community and could boost algorithm development and provide a benchmark dataset for new algorithms.

  17. A multimodal dataset for various forms of distracted driving

    PubMed Central

    Taamneh, Salah; Tsiamyrtzis, Panagiotis; Dcosta, Malcolm; Buddharaju, Pradeep; Khatri, Ashik; Manser, Michael; Ferris, Thomas; Wunderlich, Robert; Pavlidis, Ioannis

    2017-01-01

    We describe a multimodal dataset acquired in a controlled experiment on a driving simulator. The set includes data for n=68 volunteers that drove the same highway under four different conditions: No distraction, cognitive distraction, emotional distraction, and sensorimotor distraction. The experiment closed with a special driving session, where all subjects experienced a startle stimulus in the form of unintended acceleration—half of them under a mixed distraction, and the other half in the absence of a distraction. During the experimental drives key response variables and several explanatory variables were continuously recorded. The response variables included speed, acceleration, brake force, steering, and lane position signals, while the explanatory variables included perinasal electrodermal activity (EDA), palm EDA, heart rate, breathing rate, and facial expression signals; biographical and psychometric covariates as well as eye tracking data were also obtained. This dataset enables research into driving behaviors under neatly abstracted distracting stressors, which account for many car crashes. The set can also be used in physiological channel benchmarking and multispectral face recognition. PMID:28809848

  18. A dataset mapping the potential biophysical effects of vegetation cover change

    NASA Astrophysics Data System (ADS)

    Duveiller, Gregory; Hooker, Josh; Cescatti, Alessandro

    2018-02-01

    Changing the vegetation cover of the Earth has impacts on the biophysical properties of the surface and ultimately on the local climate. Depending on the specific type of vegetation change and on the background climate, the resulting competing biophysical processes can have a net warming or cooling effect, which can further vary both spatially and seasonally. Due to uncertain climate impacts and the lack of robust observations, biophysical effects are not yet considered in land-based climate policies. Here we present a dataset based on satellite remote sensing observations that provides the potential changes i) of the full surface energy balance, ii) at global scale, and iii) for multiple vegetation transitions, as would now be required for the comprehensive evaluation of land based mitigation plans. We anticipate that this dataset will provide valuable information to benchmark Earth system models, to assess future scenarios of land cover change and to develop the monitoring, reporting and verification guidelines required for the implementation of mitigation plans that account for biophysical land processes.

  19. A dataset mapping the potential biophysical effects of vegetation cover change

    PubMed Central

    Duveiller, Gregory; Hooker, Josh; Cescatti, Alessandro

    2018-01-01

    Changing the vegetation cover of the Earth has impacts on the biophysical properties of the surface and ultimately on the local climate. Depending on the specific type of vegetation change and on the background climate, the resulting competing biophysical processes can have a net warming or cooling effect, which can further vary both spatially and seasonally. Due to uncertain climate impacts and the lack of robust observations, biophysical effects are not yet considered in land-based climate policies. Here we present a dataset based on satellite remote sensing observations that provides the potential changes i) of the full surface energy balance, ii) at global scale, and iii) for multiple vegetation transitions, as would now be required for the comprehensive evaluation of land based mitigation plans. We anticipate that this dataset will provide valuable information to benchmark Earth system models, to assess future scenarios of land cover change and to develop the monitoring, reporting and verification guidelines required for the implementation of mitigation plans that account for biophysical land processes. PMID:29461538

  20. Single-shot diffraction data from the Mimivirus particle using an X-ray free-electron laser

    NASA Astrophysics Data System (ADS)

    Ekeberg, Tomas; Svenda, Martin; Seibert, M. Marvin; Abergel, Chantal; Maia, Filipe R. N. C.; Seltzer, Virginie; Deponte, Daniel P.; Aquila, Andrew; Andreasson, Jakob; Iwan, Bianca; Jönsson, Olof; Westphal, Daniel; Odić, Duško; Andersson, Inger; Barty, Anton; Liang, Meng; Martin, Andrew V.; Gumprecht, Lars; Fleckenstein, Holger; Bajt, Saša; Barthelmess, Miriam; Coppola, Nicola; Claverie, Jean-Michel; Loh, N. Duane; Bostedt, Christoph; Bozek, John D.; Krzywinski, Jacek; Messerschmidt, Marc; Bogan, Michael J.; Hampton, Christina Y.; Sierra, Raymond G.; Frank, Matthias; Shoeman, Robert L.; Lomb, Lukas; Foucar, Lutz; Epp, Sascha W.; Rolles, Daniel; Rudenko, Artem; Hartmann, Robert; Hartmann, Andreas; Kimmel, Nils; Holl, Peter; Weidenspointner, Georg; Rudek, Benedikt; Erk, Benjamin; Kassemeyer, Stephan; Schlichting, Ilme; Strüder, Lothar; Ullrich, Joachim; Schmidt, Carlo; Krasniqi, Faton; Hauser, Günter; Reich, Christian; Soltau, Heike; Schorb, Sebastian; Hirsemann, Helmut; Wunderer, Cornelia; Graafsma, Heinz; Chapman, Henry; Hajdu, Janos

    2016-08-01

    Free-electron lasers (FEL) hold the potential to revolutionize structural biology by producing X-ray pules short enough to outrun radiation damage, thus allowing imaging of biological samples without the limitation from radiation damage. Thus, a major part of the scientific case for the first FELs was three-dimensional (3D) reconstruction of non-crystalline biological objects. In a recent publication we demonstrated the first 3D reconstruction of a biological object from an X-ray FEL using this technique. The sample was the giant Mimivirus, which is one of the largest known viruses with a diameter of 450 nm. Here we present the dataset used for this successful reconstruction. Data-analysis methods for single-particle imaging at FELs are undergoing heavy development but data collection relies on very limited time available through a highly competitive proposal process. This dataset provides experimental data to the entire community and could boost algorithm development and provide a benchmark dataset for new algorithms.

  1. Diffusion-based recommendation with trust relations on tripartite graphs

    NASA Astrophysics Data System (ADS)

    Wang, Ximeng; Liu, Yun; Zhang, Guangquan; Xiong, Fei; Lu, Jie

    2017-08-01

    The diffusion-based recommendation approach is a vital branch in recommender systems, which successfully applies physical dynamics to make recommendations for users on bipartite or tripartite graphs. Trust links indicate users’ social relations and can provide the benefit of reducing data sparsity. However, traditional diffusion-based algorithms only consider rating links when making recommendations. In this paper, the complementarity of users’ implicit and explicit trust is exploited, and a novel resource-allocation strategy is proposed, which integrates these two kinds of trust relations on tripartite graphs. Through empirical studies on three benchmark datasets, our proposed method obtains better performance than most of the benchmark algorithms in terms of accuracy, diversity and novelty. According to the experimental results, our method is an effective and reasonable way to integrate additional features into the diffusion-based recommendation approach.

  2. Review of indirect detection of dark matter with neutrinos

    NASA Astrophysics Data System (ADS)

    Danninger, Matthias

    2017-09-01

    Dark Matter could be detected indirectly through the observation of neutrinos produced in dark matter self-annihilations or decays. Searches for such neutrino signals have resulted in stringent constraints on the dark matter self-annihilation cross section and the scattering cross section with matter. In recent years these searches have made significant progress in sensitivity through new search methodologies, new detection channels, and through the availability of rich datasets from neutrino telescopes and detectors, like IceCube, ANTARES, Super-Kamiokande, etc. We review recent experimental results and put them in context with respect to other direct and indirect dark matter searches. We also discuss prospects for discoveries at current and next generation neutrino detectors.

  3. Assessing Ecosystem Model Performance in Semiarid Systems

    NASA Astrophysics Data System (ADS)

    Thomas, A.; Dietze, M.; Scott, R. L.; Biederman, J. A.

    2017-12-01

    In ecosystem process modelling, comparing outputs to benchmark datasets observed in the field is an important way to validate models, allowing the modelling community to track model performance over time and compare models at specific sites. Multi-model comparison projects as well as models themselves have largely been focused on temperate forests and similar biomes. Semiarid regions, on the other hand, are underrepresented in land surface and ecosystem modelling efforts, and yet will be disproportionately impacted by disturbances such as climate change due to their sensitivity to changes in the water balance. Benchmarking models at semiarid sites is an important step in assessing and improving models' suitability for predicting the impact of disturbance on semiarid ecosystems. In this study, several ecosystem models were compared at a semiarid grassland in southwestern Arizona using PEcAn, or the Predictive Ecosystem Analyzer, an open-source eco-informatics toolbox ideal for creating the repeatable model workflows necessary for benchmarking. Models included SIPNET, DALEC, JULES, ED2, GDAY, LPJ-GUESS, MAESPA, CLM, CABLE, and FATES. Comparison between model output and benchmarks such as net ecosystem exchange (NEE) tended to produce high root mean square error and low correlation coefficients, reflecting poor simulation of seasonality and the tendency for models to create much higher carbon sources than observed. These results indicate that ecosystem models do not currently adequately represent semiarid ecosystem processes.

  4. New public dataset for spotting patterns in medieval document images

    NASA Astrophysics Data System (ADS)

    En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent

    2017-01-01

    With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.

  5. Sorting protein decoys by machine-learning-to-rank

    PubMed Central

    Jing, Xiaoyang; Wang, Kai; Lu, Ruqian; Dong, Qiwen

    2016-01-01

    Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset. PMID:27530967

  6. Sorting protein decoys by machine-learning-to-rank.

    PubMed

    Jing, Xiaoyang; Wang, Kai; Lu, Ruqian; Dong, Qiwen

    2016-08-17

    Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset.

  7. Combined evaluation of optical and microwave satellite dataset for soil moisture deficit estimation

    NASA Astrophysics Data System (ADS)

    Srivastava, Prashant K.; Han, Dawei; Islam, Tanvir; Singh, Sudhir Kumar; Gupta, Manika; Gupta, Dileep Kumar; Kumar, Pradeep

    2016-04-01

    Soil moisture is a key variable responsible for water and energy exchanges from land surface to the atmosphere (Srivastava et al., 2014). On the other hand, Soil Moisture Deficit (or SMD) can help regulating the proper use of water at specified time to avoid any agricultural losses (Srivastava et al., 2013b) and could help in preventing natural disasters, e.g. flood and drought (Srivastava et al., 2013a). In this study, evaluation of Moderate Resolution Imaging Spectroradiometer (MODIS) Land Surface Temperature (LST) and soil moisture from Soil Moisture and Ocean Salinity (SMOS) satellites are attempted for prediction of Soil Moisture Deficit (SMD). Sophisticated algorithm like Adaptive Neuro Fuzzy Inference System (ANFIS) is used for prediction of SMD using the MODIS and SMOS dataset. The benchmark SMD estimated from Probability Distributed Model (PDM) over the Brue catchment, Southwest of England, U.K. is used for all the validation. The performances are assessed in terms of Nash Sutcliffe Efficiency, Root Mean Square Error and the percentage of bias between ANFIS simulated SMD and the benchmark. The performance statistics revealed a good agreement between benchmark and the ANFIS estimated SMD using the MODIS dataset. The assessment of the products with respect to this peculiar evidence is an important step for successful development of hydro-meteorological model and forecasting system. The analysis of the satellite products (viz. SMOS soil moisture and MODIS LST) towards SMD prediction is a crucial step for successful hydrological modelling, agriculture and water resource management, and can provide important assistance in policy and decision making. Keywords: Land Surface Temperature, MODIS, SMOS, Soil Moisture Deficit, Fuzzy Logic System References: Srivastava, P.K., Han, D., Ramirez, M.A., Islam, T., 2013a. Appraisal of SMOS soil moisture at a catchment scale in a temperate maritime climate. Journal of Hydrology 498, 292-304. Srivastava, P.K., Han, D., Rico-Ramirez, M.A., Al-Shrafany, D., Islam, T., 2013b. Data fusion techniques for improving soil moisture deficit using SMOS satellite and WRF-NOAH land surface model. Water Resources Management 27, 5069-5087. Srivastava, P.K., Han, D., Rico-Ramirez, M.A., O'Neill, P., Islam, T., Gupta, M., 2014. Assessment of SMOS soil moisture retrieval parameters using tau-omega algorithms for soil moisture deficit estimation. Journal of Hydrology 519, 574-587.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jamieson, Kevin; Davis, IV, Warren L.

    Active learning methods automatically adapt data collection by selecting the most informative samples in order to accelerate machine learning. Because of this, real-world testing and comparing active learning algorithms requires collecting new datasets (adaptively), rather than simply applying algorithms to benchmark datasets, as is the norm in (passive) machine learning research. To facilitate the development, testing and deployment of active learning for real applications, we have built an open-source software system for large-scale active learning research and experimentation. The system, called NEXT, provides a unique platform for realworld, reproducible active learning research. This paper details the challenges of building themore » system and demonstrates its capabilities with several experiments. The results show how experimentation can help expose strengths and weaknesses of active learning algorithms, in sometimes unexpected and enlightening ways.« less

  9. A new collaborative recommendation approach based on users clustering using artificial bee colony algorithm.

    PubMed

    Ju, Chunhua; Xu, Chonghuan

    2013-01-01

    Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods.

  10. A New Collaborative Recommendation Approach Based on Users Clustering Using Artificial Bee Colony Algorithm

    PubMed Central

    Ju, Chunhua

    2013-01-01

    Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525

  11. HTM Spatial Pooler With Memristor Crossbar Circuits for Sparse Biometric Recognition.

    PubMed

    James, Alex Pappachen; Fedorova, Irina; Ibrayev, Timur; Kudithipudi, Dhireesha

    2017-06-01

    Hierarchical Temporal Memory (HTM) is an online machine learning algorithm that emulates the neo-cortex. The development of a scalable on-chip HTM architecture is an open research area. The two core substructures of HTM are spatial pooler and temporal memory. In this work, we propose a new Spatial Pooler circuit design with parallel memristive crossbar arrays for the 2D columns. The proposed design was validated on two different benchmark datasets, face recognition, and speech recognition. The circuits are simulated and analyzed using a practical memristor device model and 0.18 μm IBM CMOS technology model. The databases AR, YALE, ORL, and UFI, are used to test the performance of the design in face recognition. TIMIT dataset is used for the speech recognition.

  12. Accounting for Non-Gaussian Sources of Spatial Correlation in Parametric Functional Magnetic Resonance Imaging Paradigms I: Revisiting Cluster-Based Inferences.

    PubMed

    Gopinath, Kaundinya; Krishnamurthy, Venkatagiri; Sathian, K

    2018-02-01

    In a recent study, Eklund et al. employed resting-state functional magnetic resonance imaging data as a surrogate for null functional magnetic resonance imaging (fMRI) datasets and posited that cluster-wise family-wise error (FWE) rate-corrected inferences made by using parametric statistical methods in fMRI studies over the past two decades may have been invalid, particularly for cluster defining thresholds less stringent than p < 0.001; this was principally because the spatial autocorrelation functions (sACF) of fMRI data had been modeled incorrectly to follow a Gaussian form, whereas empirical data suggested otherwise. Here, we show that accounting for non-Gaussian signal components such as those arising from resting-state neural activity as well as physiological responses and motion artifacts in the null fMRI datasets yields first- and second-level general linear model analysis residuals with nearly uniform and Gaussian sACF. Further comparison with nonparametric permutation tests indicates that cluster-based FWE corrected inferences made with Gaussian spatial noise approximations are valid.

  13. A novel approach to identifying regulatory motifs in distantly related genomes

    PubMed Central

    Van Hellemont, Ruth; Monsieurs, Pieter; Thijs, Gert; De Moor, Bart; Van de Peer, Yves; Marchal, Kathleen

    2005-01-01

    Although proven successful in the identification of regulatory motifs, phylogenetic footprinting methods still show some shortcomings. To assess these difficulties, most apparent when applying phylogenetic footprinting to distantly related organisms, we developed a two-step procedure that combines the advantages of sequence alignment and motif detection approaches. The results on well-studied benchmark datasets indicate that the presented method outperforms other methods when the sequences become either too long or too heterogeneous in size. PMID:16420672

  14. Feasibility of using a large Clinical Data Warehouse to automate the selection of diagnostic cohorts.

    PubMed

    Stephen, Reejis; Boxwala, Aziz; Gertman, Paul

    2003-01-01

    Data from Clinical Data Warehouses (CDWs) can be used for retrospective studies and for benchmarking. However, automated identification of cases from large datasets containing data items in free text fields is challenging. We developed an algorithm for categorizing pediatric patients presenting with respiratory distress into Bronchiolitis, Bacterial pneumonia and Asthma using clinical variables from a CDW. A feasibility study of this approach indicates that case selection may be automated.

  15. Does standardised structured reporting contribute to quality in diagnostic pathology? The importance of evidence-based datasets.

    PubMed

    Ellis, D W; Srigley, J

    2016-01-01

    Key quality parameters in diagnostic pathology include timeliness, accuracy, completeness, conformance with current agreed standards, consistency and clarity in communication. In this review, we argue that with worldwide developments in eHealth and big data, generally, there are two further, often overlooked, parameters if our reports are to be fit for purpose. Firstly, population-level studies have clearly demonstrated the value of providing timely structured reporting data in standardised electronic format as part of system-wide quality improvement programmes. Moreover, when combined with multiple health data sources through eHealth and data linkage, structured pathology reports become central to population-level quality monitoring, benchmarking, interventions and benefit analyses in public health management. Secondly, population-level studies, particularly for benchmarking, require a single agreed international and evidence-based standard to ensure interoperability and comparability. This has been taken for granted in tumour classification and staging for many years, yet international standardisation of cancer datasets is only now underway through the International Collaboration on Cancer Reporting (ICCR). In this review, we present evidence supporting the role of structured pathology reporting in quality improvement for both clinical care and population-level health management. Although this review of available evidence largely relates to structured reporting of cancer, it is clear that the same principles can be applied throughout anatomical pathology generally, as they are elsewhere in the health system.

  16. Multi-Label Learning via Random Label Selection for Protein Subcellular Multi-Locations Prediction.

    PubMed

    Wang, Xiao; Li, Guo-Zheng

    2013-03-12

    Prediction of protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods are only used to deal with the single-location proteins. In the past few years, only a few methods have been proposed to tackle proteins with multiple locations. However, they only adopt a simple strategy, that is, transforming the multi-location proteins to multiple proteins with single location, which doesn't take correlations among different subcellular locations into account. In this paper, a novel method named RALS (multi-label learning via RAndom Label Selection), is proposed to learn from multi-location proteins in an effective and efficient way. Through five-fold cross validation test on a benchmark dataset, we demonstrate our proposed method with consideration of label correlations obviously outperforms the baseline BR method without consideration of label correlations, indicating correlations among different subcellular locations really exist and contribute to improvement of prediction performance. Experimental results on two benchmark datasets also show that our proposed methods achieve significantly higher performance than some other state-of-the-art methods in predicting subcellular multi-locations of proteins. The prediction web server is available at http://levis.tongji.edu.cn:8080/bioinfo/MLPred-Euk/ for the public usage.

  17. Measurement of the Absolute Branching Fraction for Λ_{c}^{+}→Λe^{+}ν_{e}.

    PubMed

    Ablikim, M; Achasov, M N; Ai, X C; Albayrak, O; Albrecht, M; Ambrose, D J; Amoroso, A; An, F F; An, Q; Bai, J Z; Baldini Ferroli, R; Ban, Y; Bennett, D W; Bennett, J V; Bertani, M; Bettoni, D; Bian, J M; Bianchi, F; Boger, E; Boyko, I; Briere, R A; Cai, H; Cai, X; Cakir, O; Calcaterra, A; Cao, G F; Cetin, S A; Chang, J F; Chelkov, G; Chen, G; Chen, H S; Chen, H Y; Chen, J C; Chen, M L; Chen, S J; Chen, X; Chen, X R; Chen, Y B; Cheng, H P; Chu, X K; Cibinetto, G; Dai, H L; Dai, J P; Dbeyssi, A; Dedovich, D; Deng, Z Y; Denig, A; Denysenko, I; Destefanis, M; De Mori, F; Ding, Y; Dong, C; Dong, J; Dong, L Y; Dong, M Y; Dou, Z L; Du, S X; Duan, P F; Fan, J Z; Fang, J; Fang, S S; Fang, X; Fang, Y; Fava, L; Fedorov, O; Feldbauer, F; Felici, G; Feng, C Q; Fioravanti, E; Fritsch, M; Fu, C D; Gao, Q; Gao, X L; Gao, X Y; Gao, Y; Gao, Z; Garzia, I; Goetzen, K; Gong, W X; Gradl, W; Greco, M; Gu, M H; Gu, Y T; Guan, Y H; Guo, A Q; Guo, L B; Guo, Y; Guo, Y P; Haddadi, Z; Hafner, A; Han, S; Hao, X Q; Harris, F A; He, K L; Held, T; Heng, Y K; Hou, Z L; Hu, C; Hu, H M; Hu, J F; Hu, T; Hu, Y; Huang, G M; Huang, G S; Huang, J S; Huang, X T; Huang, Y; Hussain, T; Ji, Q; Ji, Q P; Ji, X B; Ji, X L; Jiang, L W; Jiang, X S; Jiang, X Y; Jiao, J B; Jiao, Z; Jin, D P; Jin, S; Johansson, T; Julin, A; Kalantar-Nayestanaki, N; Kang, X L; Kang, X S; Kavatsyuk, M; Ke, B C; Kiese, P; Kliemt, R; Kloss, B; Kolcu, O B; Kopf, B; Kornicer, M; Kuehn, W; Kupsc, A; Lange, J S; Lara, M; Larin, P; Leng, C; Li, C; Li, Cheng; Li, D M; Li, F; Li, F Y; Li, G; Li, H B; Li, J C; Li, Jin; Li, K; Li, K; Li, Lei; Li, P R; Li, T; Li, W D; Li, W G; Li, X L; Li, X M; Li, X N; Li, X Q; Li, Z B; Liang, H; Liang, Y F; Liang, Y T; Liao, G R; Lin, D X; Liu, B J; Liu, C X; Liu, D; Liu, F H; Liu, Fang; Liu, Feng; Liu, H B; Liu, H H; Liu, H H; Liu, H M; Liu, J; Liu, J B; Liu, J P; Liu, J Y; Liu, K; Liu, K Y; Liu, L D; Liu, P L; Liu, Q; Liu, S B; Liu, X; Liu, Y B; Liu, Z A; Liu, Zhiqing; Lou, X C; Lu, H J; Lu, J G; Lu, Y; Lu, Y P; Luo, C L; Luo, M X; Luo, T; Luo, X L; Lyu, X R; Ma, F C; Ma, H L; Ma, L L; Ma, Q M; Ma, T; Ma, X N; Ma, X Y; Maas, F E; Maggiora, M; Mao, Y J; Mao, Z P; Marcello, S; Messchendorp, J G; Min, J; Mitchell, R E; Mo, X H; Mo, Y J; Morales Morales, C; Muchnoi, N Yu; Muramatsu, H; Nefedov, Y; Nerling, F; Nikolaev, I B; Ning, Z; Nisar, S; Niu, S L; Niu, X Y; Olsen, S L; Ouyang, Q; Pacetti, S; Pan, Y; Patteri, P; Pelizaeus, M; Peng, H P; Peters, K; Pettersson, J; Ping, J L; Ping, R G; Poling, R; Prasad, V; Qi, H R; Qi, M; Qian, S; Qiao, C F; Qin, L Q; Qin, N; Qin, X S; Qin, Z H; Qiu, J F; Rashid, K H; Redmer, C F; Ripka, M; Rong, G; Rosner, Ch; Ruan, X D; Santoro, V; Sarantsev, A; Savrié, M; Schoenning, K; Schumann, S; Shan, W; Shao, M; Shen, C P; Shen, P X; Shen, X Y; Sheng, H Y; Song, W M; Song, X Y; Sosio, S; Spataro, S; Sun, G X; Sun, J F; Sun, S S; Sun, Y J; Sun, Y Z; Sun, Z J; Sun, Z T; Tang, C J; Tang, X; Tapan, I; Thorndike, E H; Tiemens, M; Ullrich, M; Uman, I; Varner, G S; Wang, B; Wang, B L; Wang, D; Wang, D Y; Wang, K; Wang, L L; Wang, L S; Wang, M; Wang, P; Wang, P L; Wang, S G; Wang, W; Wang, W P; Wang, X F; Wang, Y D; Wang, Y F; Wang, Y Q; Wang, Z; Wang, Z G; Wang, Z H; Wang, Z Y; Weber, T; Wei, D H; Wei, J B; Weidenkaff, P; Wen, S P; Wiedner, U; Wolke, M; Wu, L H; Wu, Z; Xia, L; Xia, L G; Xia, Y; Xiao, D; Xiao, H; Xiao, Z J; Xie, Y G; Xiu, Q L; Xu, G F; Xu, L; Xu, Q J; Xu, Q N; Xu, X P; Yan, L; Yan, W B; Yan, W C; Yan, Y H; Yang, H J; Yang, H X; Yang, L; Yang, Y; Yang, Y X; Ye, M; Ye, M H; Yin, J H; Yu, B X; Yu, C X; Yu, J S; Yuan, C Z; Yuan, W L; Yuan, Y; Yuncu, A; Zafar, A A; Zallo, A; Zeng, Y; Zeng, Z; Zhang, B X; Zhang, B Y; Zhang, C; Zhang, C C; Zhang, D H; Zhang, H H; Zhang, H Y; Zhang, J J; Zhang, J L; Zhang, J Q; Zhang, J W; Zhang, J Y; Zhang, J Z; Zhang, K; Zhang, L; Zhang, X Y; Zhang, Y; Zhang, Y H; Zhang, Y N; Zhang, Y T; Zhang, Yu; Zhang, Z H; Zhang, Z P; Zhang, Z Y; Zhao, G; Zhao, J W; Zhao, J Y; Zhao, J Z; Zhao, Lei; Zhao, Ling; Zhao, M G; Zhao, Q; Zhao, Q W; Zhao, S J; Zhao, T C; Zhao, Y B; Zhao, Z G; Zhemchugov, A; Zheng, B; Zheng, J P; Zheng, W J; Zheng, Y H; Zhong, B; Zhou, L; Zhou, X; Zhou, X K; Zhou, X R; Zhou, X Y; Zhu, K; Zhu, K J; Zhu, S; Zhu, S H; Zhu, X L; Zhu, Y C; Zhu, Y S; Zhu, Z A; Zhuang, J; Zotti, L; Zou, B S; Zou, J H

    2015-11-27

    We report the first measurement of the absolute branching fraction for Λ_{c}^{+}→Λe^{+}ν_{e}. This measurement is based on 567  pb^{-1} of e^{+}e^{-} annihilation data produced at sqrt[s]=4.599  GeV, which is just above the Λ_{c}^{+}Λ[over ¯]_{c}^{-} threshold. The data were collected with the BESIII detector at the BEPCII storage rings. The branching fraction is determined to be B(Λ_{c}^{+}→Λe^{+}ν_{e})=[3.63±0.38(stat)±0.20(syst)]%, representing a significant improvement in precision over the current indirect determination. As the branching fraction for Λ_{c}^{+}→Λe^{+}ν_{e} is the benchmark for those of other Λ_{c}^{+} semileptonic channels, our result provides a unique test of different theoretical models, which is the most stringent to date.

  18. Robust validation of approximate 1-matrix functionals with few-electron harmonium atoms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cioslowski, Jerzy, E-mail: jerzy@wmf.univ.szczecin.pl; Piris, Mario; Matito, Eduard

    2015-12-07

    A simple comparison between the exact and approximate correlation components U of the electron-electron repulsion energy of several states of few-electron harmonium atoms with varying confinement strengths provides a stringent validation tool for 1-matrix functionals. The robustness of this tool is clearly demonstrated in a survey of 14 known functionals, which reveals their substandard performance within different electron correlation regimes. Unlike spot-testing that employs dissociation curves of diatomic molecules or more extensive benchmarking against experimental atomization energies of molecules comprising some standard set, the present approach not only uncovers the flaws and patent failures of the functionals but, even moremore » importantly, also allows for pinpointing their root causes. Since the approximate values of U are computed at exact 1-densities, the testing requires minimal programming and thus is particularly suitable for rapid screening of new functionals.« less

  19. Low-temperature magnetotransport in Si/SiGe heterostructures on 300 mm Si wafers

    NASA Astrophysics Data System (ADS)

    Scappucci, Giordano; Yeoh, L.; Sabbagh, D.; Sammak, A.; Boter, J.; Droulers, G.; Kalhor, N.; Brousse, D.; Veldhorst, M.; Vandersypen, L. M. K.; Thomas, N.; Roberts, J.; Pillarisetty, R.; Amin, P.; George, H. C.; Singh, K. J.; Clarke, J. S.

    Undoped Si/SiGe heterostructures are a promising material stack for the development of spin qubits in silicon. To deploy a qubit into high volume manufacturing in a quantum computer requires stringent control over substrate uniformity and quality. Electron mobility and valley splitting are two key electrical metrics of substrate quality relevant for qubits. Here we present low-temperature magnetotransport measurements of strained Si quantum wells with mobilities in excess of 100000 cm2/Vs fabricated on 300 mm wafers within the framework of advanced semiconductor manufacturing. These results are benchmarked against the results obtained in Si quantum wells deposited on 100 mm Si wafers in an academic research environment. To ensure rapid progress in quantum wells quality we have implemented fast feedback loops from materials growth, to heterostructure FET fabrication, and low temperature characterisation. On this topic we will present recent progress in developing a cryogenic platform for high-throughput magnetotransport measurements.

  20. Simple mathematical law benchmarks human confrontations

    PubMed Central

    Johnson, Neil F.; Medina, Pablo; Zhao, Guannan; Messinger, Daniel S.; Horgan, John; Gill, Paul; Bohorquez, Juan Camilo; Mattson, Whitney; Gangi, Devon; Qi, Hong; Manrique, Pedro; Velasquez, Nicolas; Morgenstern, Ana; Restrepo, Elvira; Johnson, Nicholas; Spagat, Michael; Zarama, Roberto

    2013-01-01

    Many high-profile societal problems involve an individual or group repeatedly attacking another – from child-parent disputes, sexual violence against women, civil unrest, violent conflicts and acts of terror, to current cyber-attacks on national infrastructure and ultrafast cyber-trades attacking stockholders. There is an urgent need to quantify the likely severity and timing of such future acts, shed light on likely perpetrators, and identify intervention strategies. Here we present a combined analysis of multiple datasets across all these domains which account for >100,000 events, and show that a simple mathematical law can benchmark them all. We derive this benchmark and interpret it, using a minimal mechanistic model grounded by state-of-the-art fieldwork. Our findings provide quantitative predictions concerning future attacks; a tool to help detect common perpetrators and abnormal behaviors; insight into the trajectory of a ‘lone wolf'; identification of a critical threshold for spreading a message or idea among perpetrators; an intervention strategy to erode the most lethal clusters; and more broadly, a quantitative starting point for cross-disciplinary theorizing about human aggression at the individual and group level, in both real and online worlds. PMID:24322528

  1. How well does your model capture the terrestrial ecosystem dynamics of the Arctic-Boreal Region?

    NASA Astrophysics Data System (ADS)

    Stofferahn, E.; Fisher, J. B.; Hayes, D. J.; Huntzinger, D. N.; Schwalm, C.

    2016-12-01

    The Arctic-Boreal Region (ABR) is a major source of uncertainties for terrestrial biosphere model (TBM) simulations. These uncertainties are precipitated by a lack of observational data from the region, affecting the parameterizations of cold environment processes in the models. Addressing these uncertainties requires a coordinated effort of data collection and integration of the following key indicators of the ABR ecosystem: disturbance, flora / fauna and related ecosystem function, carbon pools and biogeochemistry, permafrost, and hydrology. We are developing a model-data integration framework for NASA's Arctic Boreal Vulnerability Experiment (ABoVE), wherein data collection for the key ABoVE indicators is driven by matching observations and model outputs to the ABoVE indicators. The data are used as reference datasets for a benchmarking system which evaluates TBM performance with respect to ABR processes. The benchmarking system utilizes performance metrics to identify intra-model and inter-model strengths and weaknesses, which in turn provides guidance to model development teams for reducing uncertainties in TBM simulations of the ABR. The system is directly connected to the International Land Model Benchmarking (ILaMB) system, as an ABR-focused application.

  2. Towards accurate modeling of noncovalent interactions for protein rigidity analysis.

    PubMed

    Fox, Naomi; Streinu, Ileana

    2013-01-01

    Protein rigidity analysis is an efficient computational method for extracting flexibility information from static, X-ray crystallography protein data. Atoms and bonds are modeled as a mechanical structure and analyzed with a fast graph-based algorithm, producing a decomposition of the flexible molecule into interconnected rigid clusters. The result depends critically on noncovalent atomic interactions, primarily on how hydrogen bonds and hydrophobic interactions are computed and modeled. Ongoing research points to the stringent need for benchmarking rigidity analysis software systems, towards the goal of increasing their accuracy and validating their results, either against each other and against biologically relevant (functional) parameters. We propose two new methods for modeling hydrogen bonds and hydrophobic interactions that more accurately reflect a mechanical model, without being computationally more intensive. We evaluate them using a novel scoring method, based on the B-cubed score from the information retrieval literature, which measures how well two cluster decompositions match. To evaluate the modeling accuracy of KINARI, our pebble-game rigidity analysis system, we use a benchmark data set of 20 proteins, each with multiple distinct conformations deposited in the Protein Data Bank. Cluster decompositions for them were previously determined with the RigidFinder method from Gerstein's lab and validated against experimental data. When KINARI's default tuning parameters are used, an improvement of the B-cubed score over a crude baseline is observed in 30% of this data. With our new modeling options, improvements were observed in over 70% of the proteins in this data set. We investigate the sensitivity of the cluster decomposition score with case studies on pyruvate phosphate dikinase and calmodulin. To substantially improve the accuracy of protein rigidity analysis systems, thorough benchmarking must be performed on all current systems and future extensions. We have measured the gain in performance by comparing different modeling methods for noncovalent interactions. We showed that new criteria for modeling hydrogen bonds and hydrophobic interactions can significantly improve the results. The two new methods proposed here have been implemented and made publicly available in the current version of KINARI (v1.3), together with the benchmarking tools, which can be downloaded from our software's website, http://kinari.cs.umass.edu.

  3. Towards accurate modeling of noncovalent interactions for protein rigidity analysis

    PubMed Central

    2013-01-01

    Background Protein rigidity analysis is an efficient computational method for extracting flexibility information from static, X-ray crystallography protein data. Atoms and bonds are modeled as a mechanical structure and analyzed with a fast graph-based algorithm, producing a decomposition of the flexible molecule into interconnected rigid clusters. The result depends critically on noncovalent atomic interactions, primarily on how hydrogen bonds and hydrophobic interactions are computed and modeled. Ongoing research points to the stringent need for benchmarking rigidity analysis software systems, towards the goal of increasing their accuracy and validating their results, either against each other and against biologically relevant (functional) parameters. We propose two new methods for modeling hydrogen bonds and hydrophobic interactions that more accurately reflect a mechanical model, without being computationally more intensive. We evaluate them using a novel scoring method, based on the B-cubed score from the information retrieval literature, which measures how well two cluster decompositions match. Results To evaluate the modeling accuracy of KINARI, our pebble-game rigidity analysis system, we use a benchmark data set of 20 proteins, each with multiple distinct conformations deposited in the Protein Data Bank. Cluster decompositions for them were previously determined with the RigidFinder method from Gerstein's lab and validated against experimental data. When KINARI's default tuning parameters are used, an improvement of the B-cubed score over a crude baseline is observed in 30% of this data. With our new modeling options, improvements were observed in over 70% of the proteins in this data set. We investigate the sensitivity of the cluster decomposition score with case studies on pyruvate phosphate dikinase and calmodulin. Conclusion To substantially improve the accuracy of protein rigidity analysis systems, thorough benchmarking must be performed on all current systems and future extensions. We have measured the gain in performance by comparing different modeling methods for noncovalent interactions. We showed that new criteria for modeling hydrogen bonds and hydrophobic interactions can significantly improve the results. The two new methods proposed here have been implemented and made publicly available in the current version of KINARI (v1.3), together with the benchmarking tools, which can be downloaded from our software's website, http://kinari.cs.umass.edu. PMID:24564209

  4. IgSimulator: a versatile immunosequencing simulator.

    PubMed

    Safonova, Yana; Lapidus, Alla; Lill, Jennie

    2015-10-01

    The recent introduction of next-generation sequencing technologies to antibody studies have resulted in a growing number of immunoinformatics tools for antibody repertoire analysis. However, benchmarking these newly emerging tools remains problematic since the gold standard datasets that are needed to validate these tools are typically not available. Since simulating antibody repertoires is often the only feasible way to benchmark new immunoinformatics tools, we developed the IgSimulator tool that addresses various complications in generating realistic antibody repertoires. IgSimulator's code has modular structure and can be easily adapted to new requirements to simulation. IgSimulator is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from yana-safonova.github.io/ig_simulator. safonova.yana@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. Towards the quantitative evaluation of visual attention models.

    PubMed

    Bylinskii, Z; DeGennaro, E M; Rajalingham, R; Ruda, H; Zhang, J; Tsotsos, J K

    2015-11-01

    Scores of visual attention models have been developed over the past several decades of research. Differences in implementation, assumptions, and evaluations have made comparison of these models very difficult. Taxonomies have been constructed in an attempt at the organization and classification of models, but are not sufficient at quantifying which classes of models are most capable of explaining available data. At the same time, a multitude of physiological and behavioral findings have been published, measuring various aspects of human and non-human primate visual attention. All of these elements highlight the need to integrate the computational models with the data by (1) operationalizing the definitions of visual attention tasks and (2) designing benchmark datasets to measure success on specific tasks, under these definitions. In this paper, we provide some examples of operationalizing and benchmarking different visual attention tasks, along with the relevant design considerations. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Integrating CFD, CAA, and Experiments Towards Benchmark Datasets for Airframe Noise Problems

    NASA Technical Reports Server (NTRS)

    Choudhari, Meelan M.; Yamamoto, Kazuomi

    2012-01-01

    Airframe noise corresponds to the acoustic radiation due to turbulent flow in the vicinity of airframe components such as high-lift devices and landing gears. The combination of geometric complexity, high Reynolds number turbulence, multiple regions of separation, and a strong coupling with adjacent physical components makes the problem of airframe noise highly challenging. Since 2010, the American Institute of Aeronautics and Astronautics has organized an ongoing series of workshops devoted to Benchmark Problems for Airframe Noise Computations (BANC). The BANC workshops are aimed at enabling a systematic progress in the understanding and high-fidelity predictions of airframe noise via collaborative investigations that integrate state of the art computational fluid dynamics, computational aeroacoustics, and in depth, holistic, and multifacility measurements targeting a selected set of canonical yet realistic configurations. This paper provides a brief summary of the BANC effort, including its technical objectives, strategy, and selective outcomes thus far.

  7. Modeling Urban Scenarios & Experiments: Fort Indiantown Gap Data Collections Summary and Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Archer, Daniel E.; Bandstra, Mark S.; Davidson, Gregory G.

    This report summarizes experimental radiation detector, contextual sensor, weather, and global positioning system (GPS) data collected to inform and validate a comprehensive, operational radiation transport modeling framework to evaluate radiation detector system and algorithm performance. This framework will be used to study the influence of systematic effects (such as geometry, background activity, background variability, environmental shielding, etc.) on detector responses and algorithm performance using synthetic time series data. This work consists of performing data collection campaigns at a canonical, controlled environment for complete radiological characterization to help construct and benchmark a high-fidelity model with quantified system geometries, detector response functions,more » and source terms for background and threat objects. This data also provides an archival, benchmark dataset that can be used by the radiation detection community. The data reported here spans four data collection campaigns conducted between May 2015 and September 2016.« less

  8. Object recognition using deep convolutional neural networks with complete transfer and partial frozen layers

    NASA Astrophysics Data System (ADS)

    Kruithof, Maarten C.; Bouma, Henri; Fischer, Noëlle M.; Schutte, Klamer

    2016-10-01

    Object recognition is important to understand the content of video and allow flexible querying in a large number of cameras, especially for security applications. Recent benchmarks show that deep convolutional neural networks are excellent approaches for object recognition. This paper describes an approach of domain transfer, where features learned from a large annotated dataset are transferred to a target domain where less annotated examples are available as is typical for the security and defense domain. Many of these networks trained on natural images appear to learn features similar to Gabor filters and color blobs in the first layer. These first-layer features appear to be generic for many datasets and tasks while the last layer is specific. In this paper, we study the effect of copying all layers and fine-tuning a variable number. We performed an experiment with a Caffe-based network on 1000 ImageNet classes that are randomly divided in two equal subgroups for the transfer from one to the other. We copy all layers and vary the number of layers that is fine-tuned and the size of the target dataset. We performed additional experiments with the Keras platform on CIFAR-10 dataset to validate general applicability. We show with both platforms and both datasets that the accuracy on the target dataset improves when more target data is used. When the target dataset is large, it is beneficial to freeze only a few layers. For a large target dataset, the network without transfer learning performs better than the transfer network, especially if many layers are frozen. When the target dataset is small, it is beneficial to transfer (and freeze) many layers. For a small target dataset, the transfer network boosts generalization and it performs much better than the network without transfer learning. Learning time can be reduced by freezing many layers in a network.

  9. How does spatial extent of fMRI datasets affect independent component analysis decomposition?

    PubMed

    Aragri, Adriana; Scarabino, Tommaso; Seifritz, Erich; Comani, Silvia; Cirillo, Sossio; Tedeschi, Gioacchino; Esposito, Fabrizio; Di Salle, Francesco

    2006-09-01

    Spatial independent component analysis (sICA) of functional magnetic resonance imaging (fMRI) time series can generate meaningful activation maps and associated descriptive signals, which are useful to evaluate datasets of the entire brain or selected portions of it. Besides computational implications, variations in the input dataset combined with the multivariate nature of ICA may lead to different spatial or temporal readouts of brain activation phenomena. By reducing and increasing a volume of interest (VOI), we applied sICA to different datasets from real activation experiments with multislice acquisition and single or multiple sensory-motor task-induced blood oxygenation level-dependent (BOLD) signal sources with different spatial and temporal structure. Using receiver operating characteristics (ROC) methodology for accuracy evaluation and multiple regression analysis as benchmark, we compared sICA decompositions of reduced and increased VOI fMRI time-series containing auditory, motor and hemifield visual activation occurring separately or simultaneously in time. Both approaches yielded valid results; however, the results of the increased VOI approach were spatially more accurate compared to the results of the decreased VOI approach. This is consistent with the capability of sICA to take advantage of extended samples of statistical observations and suggests that sICA is more powerful with extended rather than reduced VOI datasets to delineate brain activity. (c) 2006 Wiley-Liss, Inc.

  10. Electric load shape benchmarking for small- and medium-sized commercial buildings

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Luo, Xuan; Hong, Tianzhen; Chen, Yixing

    Small- and medium-sized commercial buildings owners and utility managers often look for opportunities for energy cost savings through energy efficiency and energy waste minimization. However, they currently lack easy access to low-cost tools that help interpret the massive amount of data needed to improve understanding of their energy use behaviors. Benchmarking is one of the techniques used in energy audits to identify which buildings are priorities for an energy analysis. Traditional energy performance indicators, such as the energy use intensity (annual energy per unit of floor area), consider only the total annual energy consumption, lacking consideration of the fluctuation ofmore » energy use behavior over time, which reveals the time of use information and represents distinct energy use behaviors during different time spans. To fill the gap, this study developed a general statistical method using 24-hour electric load shape benchmarking to compare a building or business/tenant space against peers. Specifically, the study developed new forms of benchmarking metrics and data analysis methods to infer the energy performance of a building based on its load shape. We first performed a data experiment with collected smart meter data using over 2,000 small- and medium-sized businesses in California. We then conducted a cluster analysis of the source data, and determined and interpreted the load shape features and parameters with peer group analysis. Finally, we implemented the load shape benchmarking feature in an open-access web-based toolkit (the Commercial Building Energy Saver) to provide straightforward and practical recommendations to users. The analysis techniques were generic and flexible for future datasets of other building types and in other utility territories.« less

  11. Electric load shape benchmarking for small- and medium-sized commercial buildings

    DOE PAGES

    Luo, Xuan; Hong, Tianzhen; Chen, Yixing; ...

    2017-07-28

    Small- and medium-sized commercial buildings owners and utility managers often look for opportunities for energy cost savings through energy efficiency and energy waste minimization. However, they currently lack easy access to low-cost tools that help interpret the massive amount of data needed to improve understanding of their energy use behaviors. Benchmarking is one of the techniques used in energy audits to identify which buildings are priorities for an energy analysis. Traditional energy performance indicators, such as the energy use intensity (annual energy per unit of floor area), consider only the total annual energy consumption, lacking consideration of the fluctuation ofmore » energy use behavior over time, which reveals the time of use information and represents distinct energy use behaviors during different time spans. To fill the gap, this study developed a general statistical method using 24-hour electric load shape benchmarking to compare a building or business/tenant space against peers. Specifically, the study developed new forms of benchmarking metrics and data analysis methods to infer the energy performance of a building based on its load shape. We first performed a data experiment with collected smart meter data using over 2,000 small- and medium-sized businesses in California. We then conducted a cluster analysis of the source data, and determined and interpreted the load shape features and parameters with peer group analysis. Finally, we implemented the load shape benchmarking feature in an open-access web-based toolkit (the Commercial Building Energy Saver) to provide straightforward and practical recommendations to users. The analysis techniques were generic and flexible for future datasets of other building types and in other utility territories.« less

  12. Benchmarking of trauma care worldwide: the potential value of an International Trauma Data Bank (ITDB).

    PubMed

    Haider, Adil H; Hashmi, Zain G; Gupta, Sonia; Zafar, Syed Nabeel; David, Jean-Stephane; Efron, David T; Stevens, Kent A; Zafar, Hasnain; Schneider, Eric B; Voiglio, Eric; Coimbra, Raul; Haut, Elliott R

    2014-08-01

    National trauma registries have helped improve patient outcomes across the world. Recently, the idea of an International Trauma Data Bank (ITDB) has been suggested to establish global comparative assessments of trauma outcomes. The objective of this study was to determine whether global trauma data could be combined to perform international outcomes benchmarking. We used observed/expected (O/E) mortality ratios to compare two trauma centers [European high-income country (HIC) and Asian lower-middle income country (LMIC)] with centers in the North American National Trauma Data Bank (NTDB). Patients (≥16 years) with blunt/penetrating injuries were included. Multivariable logistic regression, adjusting for known predictors of trauma mortality, was performed. Estimates were used to predict the expected deaths at each center and to calculate O/E mortality ratios for benchmarking. A total of 375,433 patients from 301 centers were included from the NTDB (2002-2010). The LMIC trauma center had 806 patients (2002-2010), whereas the HIC reported 1,003 patients (2002-2004). The most important known predictors of trauma mortality were adequately recorded in all datasets. Mortality benchmarking revealed that the HIC center performed similarly to the NTDB centers [O/E = 1.11 (95% confidence interval (CI) 0.92-1.35)], whereas the LMIC center showed significantly worse survival [O/E = 1.52 (1.23-1.88)]. Subset analyses of patients with blunt or penetrating injury showed similar results. Using only a few key covariates, aggregated global trauma data can be used to adequately perform international trauma center benchmarking. The creation of the ITDB is feasible and recommended as it may be a pivotal step towards improving global trauma outcomes.

  13. A Visual Evaluation Study of Graph Sampling Techniques

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Fangyan; Zhang, Song; Wong, Pak C.

    2017-01-29

    We evaluate a dozen prevailing graph-sampling techniques with an ultimate goal to better visualize and understand big and complex graphs that exhibit different properties and structures. The evaluation uses eight benchmark datasets with four different graph types collected from Stanford Network Analysis Platform and NetworkX to give a comprehensive comparison of various types of graphs. The study provides a practical guideline for visualizing big graphs of different sizes and structures. The paper discusses results and important observations from the study.

  14. Anytime query-tuned kernel machine classifiers via Cholesky factorization

    NASA Technical Reports Server (NTRS)

    DeCoste, D.

    2002-01-01

    We recently demonstrated 2 to 64-fold query-time speedups of Support Vector Machine and Kernel Fisher classifiers via a new computational geometry method for anytime output bounds (DeCoste,2002). This new paper refines our approach in two key ways. First, we introduce a simple linear algebra formulation based on Cholesky factorization, yielding simpler equations and lower computational overhead. Second, this new formulation suggests new methods for achieving additional speedups, including tuning on query samples. We demonstrate effectiveness on benchmark datasets.

  15. The total carbon column observing network.

    PubMed

    Wunch, Debra; Toon, Geoffrey C; Blavier, Jean-François L; Washenfelder, Rebecca A; Notholt, Justus; Connor, Brian J; Griffith, David W T; Sherlock, Vanessa; Wennberg, Paul O

    2011-05-28

    A global network of ground-based Fourier transform spectrometers has been founded to remotely measure column abundances of CO(2), CO, CH(4), N(2)O and other molecules that absorb in the near-infrared. These measurements are directly comparable with the near-infrared total column measurements from space-based instruments. With stringent requirements on the instrumentation, acquisition procedures, data processing and calibration, the Total Carbon Column Observing Network (TCCON) achieves an accuracy and precision in total column measurements that is unprecedented for remote-sensing observations (better than 0.25% for CO(2)). This has enabled carbon-cycle science investigations using the TCCON dataset, and allows the TCCON to provide a link between satellite measurements and the extensive ground-based in situ network. © 2011 The Royal Society

  16. Reliable prediction intervals with regression neural networks.

    PubMed

    Papadopoulos, Harris; Haralambous, Haris

    2011-10-01

    This paper proposes an extension to conventional regression neural networks (NNs) for replacing the point predictions they produce with prediction intervals that satisfy a required level of confidence. Our approach follows a novel machine learning framework, called Conformal Prediction (CP), for assigning reliable confidence measures to predictions without assuming anything more than that the data are independent and identically distributed (i.i.d.). We evaluate the proposed method on four benchmark datasets and on the problem of predicting Total Electron Content (TEC), which is an important parameter in trans-ionospheric links; for the latter we use a dataset of more than 60000 TEC measurements collected over a period of 11 years. Our experimental results show that the prediction intervals produced by our method are both well calibrated and tight enough to be useful in practice. Copyright © 2011 Elsevier Ltd. All rights reserved.

  17. Hyperspectral Image Classification With Markov Random Fields and a Convolutional Neural Network

    NASA Astrophysics Data System (ADS)

    Cao, Xiangyong; Zhou, Feng; Xu, Lin; Meng, Deyu; Xu, Zongben; Paisley, John

    2018-05-01

    This paper presents a new supervised classification algorithm for remotely sensed hyperspectral image (HSI) which integrates spectral and spatial information in a unified Bayesian framework. First, we formulate the HSI classification problem from a Bayesian perspective. Then, we adopt a convolutional neural network (CNN) to learn the posterior class distributions using a patch-wise training strategy to better use the spatial information. Next, spatial information is further considered by placing a spatial smoothness prior on the labels. Finally, we iteratively update the CNN parameters using stochastic gradient decent (SGD) and update the class labels of all pixel vectors using an alpha-expansion min-cut-based algorithm. Compared with other state-of-the-art methods, the proposed classification method achieves better performance on one synthetic dataset and two benchmark HSI datasets in a number of experimental settings.

  18. Active learning for noisy oracle via density power divergence.

    PubMed

    Sogawa, Yasuhiro; Ueno, Tsuyoshi; Kawahara, Yoshinobu; Washio, Takashi

    2013-10-01

    The accuracy of active learning is critically influenced by the existence of noisy labels given by a noisy oracle. In this paper, we propose a novel pool-based active learning framework through robust measures based on density power divergence. By minimizing density power divergence, such as β-divergence and γ-divergence, one can estimate the model accurately even under the existence of noisy labels within data. Accordingly, we develop query selecting measures for pool-based active learning using these divergences. In addition, we propose an evaluation scheme for these measures based on asymptotic statistical analyses, which enables us to perform active learning by evaluating an estimation error directly. Experiments with benchmark datasets and real-world image datasets show that our active learning scheme performs better than several baseline methods. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. Bacterial whole genome-based phylogeny: construction of a new benchmarking dataset and assessment of some existing methods.

    PubMed

    Ahrenfeldt, Johanne; Skaarup, Carina; Hasman, Henrik; Pedersen, Anders Gorm; Aarestrup, Frank Møller; Lund, Ole

    2017-01-05

    Whole genome sequencing (WGS) is increasingly used in diagnostics and surveillance of infectious diseases. A major application for WGS is to use the data for identifying outbreak clusters, and there is therefore a need for methods that can accurately and efficiently infer phylogenies from sequencing reads. In the present study we describe a new dataset that we have created for the purpose of benchmarking such WGS-based methods for epidemiological data, and also present an analysis where we use the data to compare the performance of some current methods. Our aim was to create a benchmark data set that mimics sequencing data of the sort that might be collected during an outbreak of an infectious disease. This was achieved by letting an E. coli hypermutator strain grow in the lab for 8 consecutive days, each day splitting the culture in two while also collecting samples for sequencing. The result is a data set consisting of 101 whole genome sequences with known phylogenetic relationship. Among the sequenced samples 51 correspond to internal nodes in the phylogeny because they are ancestral, while the remaining 50 correspond to leaves. We also used the newly created data set to compare three different online available methods that infer phylogenies from whole-genome sequencing reads: NDtree, CSI Phylogeny and REALPHY. One complication when comparing the output of these methods with the known phylogeny is that phylogenetic methods typically build trees where all observed sequences are placed as leafs, even though some of them are in fact ancestral. We therefore devised a method for post processing the inferred trees by collapsing short branches (thus relocating some leafs to internal nodes), and also present two new measures of tree similarity that takes into account the identity of both internal and leaf nodes. Based on this analysis we find that, among the investigated methods, CSI Phylogeny had the best performance, correctly identifying 73% of all branches in the tree and 71% of all clades. We have made all data from this experiment (raw sequencing reads, consensus whole-genome sequences, as well as descriptions of the known phylogeny in a variety of formats) publicly available, with the hope that other groups may find this data useful for benchmarking and exploring the performance of epidemiological methods. All data is freely available at: https://cge.cbs.dtu.dk/services/evolution_data.php .

  20. multiDE: a dimension reduced model based statistical method for differential expression analysis using RNA-sequencing data with multiple treatment conditions.

    PubMed

    Kang, Guangliang; Du, Li; Zhang, Hong

    2016-06-22

    The growing complexity of biological experiment design based on high-throughput RNA sequencing (RNA-seq) is calling for more accommodative statistical tools. We focus on differential expression (DE) analysis using RNA-seq data in the presence of multiple treatment conditions. We propose a novel method, multiDE, for facilitating DE analysis using RNA-seq read count data with multiple treatment conditions. The read count is assumed to follow a log-linear model incorporating two factors (i.e., condition and gene), where an interaction term is used to quantify the association between gene and condition. The number of the degrees of freedom is reduced to one through the first order decomposition of the interaction, leading to a dramatically power improvement in testing DE genes when the number of conditions is greater than two. In our simulation situations, multiDE outperformed the benchmark methods (i.e. edgeR and DESeq2) even if the underlying model was severely misspecified, and the power gain was increasing in the number of conditions. In the application to two real datasets, multiDE identified more biologically meaningful DE genes than the benchmark methods. An R package implementing multiDE is available publicly at http://homepage.fudan.edu.cn/zhangh/softwares/multiDE . When the number of conditions is two, multiDE performs comparably with the benchmark methods. When the number of conditions is greater than two, multiDE outperforms the benchmark methods.

  1. A high level interface to SCOP and ASTRAL implemented in python.

    PubMed

    Casbon, James A; Crooks, Gavin E; Saqi, Mansoor A S

    2006-01-10

    Benchmarking algorithms in structural bioinformatics often involves the construction of datasets of proteins with given sequence and structural properties. The SCOP database is a manually curated structural classification which groups together proteins on the basis of structural similarity. The ASTRAL compendium provides non redundant subsets of SCOP domains on the basis of sequence similarity such that no two domains in a given subset share more than a defined degree of sequence similarity. Taken together these two resources provide a 'ground truth' for assessing structural bioinformatics algorithms. We present a small and easy to use API written in python to enable construction of datasets from these resources. We have designed a set of python modules to provide an abstraction of the SCOP and ASTRAL databases. The modules are designed to work as part of the Biopython distribution. Python users can now manipulate and use the SCOP hierarchy from within python programs, and use ASTRAL to return sequences of domains in SCOP, as well as clustered representations of SCOP from ASTRAL. The modules make the analysis and generation of datasets for use in structural genomics easier and more principled.

  2. Using CHIRPS Rainfall Dataset to detect rainfall trends in West Africa

    NASA Astrophysics Data System (ADS)

    Blakeley, S. L.; Husak, G. J.

    2016-12-01

    In West Africa, agriculture is often rain-fed, subjecting agricultural productivity and food availability to climate variability. Agricultural conditions will change as warming temperatures increase evaporative demand, and with a growing population dependent on the food supply, farmers will become more reliant on improved adaptation strategies. Development of such adaptation strategies will need to consider West African rainfall trends to remain relevant in a changing climate. Here, using the CHIRPS rainfall product (provided by the Climate Hazards Group at UC Santa Barbara), I examine trends in West African rainfall variability. My analysis will focus on seasonal rainfall totals, the structure of the rainy season, and the distribution of rainfall. I then use farmer-identified drought years to take an in-depth analysis of intra-seasonal rainfall irregularities. I will also examine other datasets such as potential evapotranspiration (PET) data, other remotely sensed rainfall data, rain gauge data in specific locations, and remotely sensed vegetation data. Farmer bad year data will also be used to isolate "bad" year markers in these additional datasets to provide benchmarks for identification in the future of problematic rainy seasons.

  3. A dataset of stereoscopic images and ground-truth disparity mimicking human fixations in peripersonal space

    PubMed Central

    Canessa, Andrea; Gibaldi, Agostino; Chessa, Manuela; Fato, Marco; Solari, Fabio; Sabatini, Silvio P.

    2017-01-01

    Binocular stereopsis is the ability of a visual system, belonging to a live being or a machine, to interpret the different visual information deriving from two eyes/cameras for depth perception. From this perspective, the ground-truth information about three-dimensional visual space, which is hardly available, is an ideal tool both for evaluating human performance and for benchmarking machine vision algorithms. In the present work, we implemented a rendering methodology in which the camera pose mimics realistic eye pose for a fixating observer, thus including convergent eye geometry and cyclotorsion. The virtual environment we developed relies on highly accurate 3D virtual models, and its full controllability allows us to obtain the stereoscopic pairs together with the ground-truth depth and camera pose information. We thus created a stereoscopic dataset: GENUA PESTO—GENoa hUman Active fixation database: PEripersonal space STereoscopic images and grOund truth disparity. The dataset aims to provide a unified framework useful for a number of problems relevant to human and computer vision, from scene exploration and eye movement studies to 3D scene reconstruction. PMID:28350382

  4. First stereo video dataset with ground truth for remote car pose estimation using satellite markers

    NASA Astrophysics Data System (ADS)

    Gil, Gustavo; Savino, Giovanni; Pierini, Marco

    2018-04-01

    Leading causes of PTW (Powered Two-Wheeler) crashes and near misses in urban areas are on the part of a failure or delayed prediction of the changing trajectories of other vehicles. Regrettably, misperception from both car drivers and motorcycle riders results in fatal or serious consequences for riders. Intelligent vehicles could provide early warning about possible collisions, helping to avoid the crash. There is evidence that stereo cameras can be used for estimating the heading angle of other vehicles, which is key to anticipate their imminent location, but there is limited heading ground truth data available in the public domain. Consequently, we employed a marker-based technique for creating ground truth of car pose and create a dataset∗ for computer vision benchmarking purposes. This dataset of a moving vehicle collected from a static mounted stereo camera is a simplification of a complex and dynamic reality, which serves as a test bed for car pose estimation algorithms. The dataset contains the accurate pose of the moving obstacle, and realistic imagery including texture-less and non-lambertian surfaces (e.g. reflectance and transparency).

  5. 101 Labeled Brain Images and a Consistent Human Cortical Labeling Protocol

    PubMed Central

    Klein, Arno; Tourville, Jason

    2012-01-01

    We introduce the Mindboggle-101 dataset, the largest and most complete set of free, publicly accessible, manually labeled human brain images. To manually label the macroscopic anatomy in magnetic resonance images of 101 healthy participants, we created a new cortical labeling protocol that relies on robust anatomical landmarks and minimal manual edits after initialization with automated labels. The “Desikan–Killiany–Tourville” (DKT) protocol is intended to improve the ease, consistency, and accuracy of labeling human cortical areas. Given how difficult it is to label brains, the Mindboggle-101 dataset is intended to serve as brain atlases for use in labeling other brains, as a normative dataset to establish morphometric variation in a healthy population for comparison against clinical populations, and contribute to the development, training, testing, and evaluation of automated registration and labeling algorithms. To this end, we also introduce benchmarks for the evaluation of such algorithms by comparing our manual labels with labels automatically generated by probabilistic and multi-atlas registration-based approaches. All data and related software and updated information are available on the http://mindboggle.info/data website. PMID:23227001

  6. A large dataset of synthetic SEM images of powder materials and their ground truth 3D structures.

    PubMed

    DeCost, Brian L; Holm, Elizabeth A

    2016-12-01

    This data article presents a data set comprised of 2048 synthetic scanning electron microscope (SEM) images of powder materials and descriptions of the corresponding 3D structures that they represent. These images were created using open source rendering software, and the generating scripts are included with the data set. Eight particle size distributions are represented with 256 independent images from each. The particle size distributions are relatively similar to each other, so that the dataset offers a useful benchmark to assess the fidelity of image analysis techniques. The characteristics of the PSDs and the resulting images are described and analyzed in more detail in the research article "Characterizing powder materials using keypoint-based computer vision methods" (B.L. DeCost, E.A. Holm, 2016) [1]. These data are freely available in a Mendeley Data archive "A large dataset of synthetic SEM images of powder materials and their ground truth 3D structures" (B.L. DeCost, E.A. Holm, 2016) located at http://dx.doi.org/10.17632/tj4syyj9mr.1[2] for any academic, educational, or research purposes.

  7. Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare.

    PubMed

    Mozaffari-Kermani, Mehran; Sur-Kolay, Susmita; Raghunathan, Anand; Jha, Niraj K

    2015-11-01

    Machine learning is being used in a wide range of application domains to discover patterns in large datasets. Increasingly, the results of machine learning drive critical decisions in applications related to healthcare and biomedicine. Such health-related applications are often sensitive, and thus, any security breach would be catastrophic. Naturally, the integrity of the results computed by machine learning is of great importance. Recent research has shown that some machine-learning algorithms can be compromised by augmenting their training datasets with malicious data, leading to a new class of attacks called poisoning attacks. Hindrance of a diagnosis may have life-threatening consequences and could cause distrust. On the other hand, not only may a false diagnosis prompt users to distrust the machine-learning algorithm and even abandon the entire system but also such a false positive classification may cause patient distress. In this paper, we present a systematic, algorithm-independent approach for mounting poisoning attacks across a wide range of machine-learning algorithms and healthcare datasets. The proposed attack procedure generates input data, which, when added to the training set, can either cause the results of machine learning to have targeted errors (e.g., increase the likelihood of classification into a specific class), or simply introduce arbitrary errors (incorrect classification). These attacks may be applied to both fixed and evolving datasets. They can be applied even when only statistics of the training dataset are available or, in some cases, even without access to the training dataset, although at a lower efficacy. We establish the effectiveness of the proposed attacks using a suite of six machine-learning algorithms and five healthcare datasets. Finally, we present countermeasures against the proposed generic attacks that are based on tracking and detecting deviations in various accuracy metrics, and benchmark their effectiveness.

  8. A critical evaluation of ecological indices for the comparative analysis of microbial communities based on molecular datasets.

    PubMed

    Lucas, Rico; Groeneveld, Jürgen; Harms, Hauke; Johst, Karin; Frank, Karin; Kleinsteuber, Sabine

    2017-01-01

    In times of global change and intensified resource exploitation, advanced knowledge of ecophysiological processes in natural and engineered systems driven by complex microbial communities is crucial for both safeguarding environmental processes and optimising rational control of biotechnological processes. To gain such knowledge, high-throughput molecular techniques are routinely employed to investigate microbial community composition and dynamics within a wide range of natural or engineered environments. However, for molecular dataset analyses no consensus about a generally applicable alpha diversity concept and no appropriate benchmarking of corresponding statistical indices exist yet. To overcome this, we listed criteria for the appropriateness of an index for such analyses and systematically scrutinised commonly employed ecological indices describing diversity, evenness and richness based on artificial and real molecular datasets. We identified appropriate indices warranting interstudy comparability and intuitive interpretability. The unified diversity concept based on 'effective numbers of types' provides the mathematical framework for describing community composition. Additionally, the Bray-Curtis dissimilarity as a beta-diversity index was found to reflect compositional changes. The employed statistical procedure is presented comprising commented R-scripts and example datasets for user-friendly trial application. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. National Geospatial Data Asset Lifecycle Baseline Maturity Assessment for the Federal Geographic Data Committee

    NASA Astrophysics Data System (ADS)

    Peltz-Lewis, L. A.; Blake-Coleman, W.; Johnston, J.; DeLoatch, I. B.

    2014-12-01

    The Federal Geographic Data Committee (FGDC) is designing a portfolio management process for 193 geospatial datasets contained within the 16 topical National Spatial Data Infrastructure themes managed under OMB Circular A-16 "Coordination of Geographic Information and Related Spatial Data Activities." The 193 datasets are designated as National Geospatial Data Assets (NGDA) because of their significance in implementing to the missions of multiple levels of government, partners and stakeholders. As a starting point, the data managers of these NGDAs will conduct a baseline maturity assessment of the dataset(s) for which they are responsible. The maturity is measured against benchmarks related to each of the seven stages of the data lifecycle management framework promulgated within the OMB Circular A-16 Supplemental Guidance issued by OMB in November 2010. This framework was developed by the interagency Lifecycle Management Work Group (LMWG), consisting of 16 Federal agencies, under the 2004 Presidential Initiative the Geospatial Line of Business,using OMB Circular A-130" Management of Federal Information Resources" as guidance The seven lifecycle stages are: Define, Inventory/Evaluate, Obtain, Access, Maintain, Use/Evaluate, and Archive. This paper will focus on the Lifecycle Baseline Maturity Assessment, and efforts to integration the FGDC approach with other data maturity assessments.

  10. Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts.

    PubMed

    Dashtban, M; Balafar, Mohammadali

    2017-03-01

    Gene selection is a demanding task for microarray data analysis. The diverse complexity of different cancers makes this issue still challenging. In this study, a novel evolutionary method based on genetic algorithms and artificial intelligence is proposed to identify predictive genes for cancer classification. A filter method was first applied to reduce the dimensionality of feature space followed by employing an integer-coded genetic algorithm with dynamic-length genotype, intelligent parameter settings, and modified operators. The algorithmic behaviors including convergence trends, mutation and crossover rate changes, and running time were studied, conceptually discussed, and shown to be coherent with literature findings. Two well-known filter methods, Laplacian and Fisher score, were examined considering similarities, the quality of selected genes, and their influences on the evolutionary approach. Several statistical tests concerning choice of classifier, choice of dataset, and choice of filter method were performed, and they revealed some significant differences between the performance of different classifiers and filter methods over datasets. The proposed method was benchmarked upon five popular high-dimensional cancer datasets; for each, top explored genes were reported. Comparing the experimental results with several state-of-the-art methods revealed that the proposed method outperforms previous methods in DLBCL dataset. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Speeding up the Consensus Clustering methodology for microarray data analysis

    PubMed Central

    2011-01-01

    Background The inference of the number of clusters in a dataset, a fundamental problem in Statistics, Data Analysis and Classification, is usually addressed via internal validation measures. The stated problem is quite difficult, in particular for microarrays, since the inferred prediction must be sensible enough to capture the inherent biological structure in a dataset, e.g., functionally related genes. Despite the rich literature present in that area, the identification of an internal validation measure that is both fast and precise has proved to be elusive. In order to partially fill this gap, we propose a speed-up of Consensus (Consensus Clustering), a methodology whose purpose is the provision of a prediction of the number of clusters in a dataset, together with a dissimilarity matrix (the consensus matrix) that can be used by clustering algorithms. As detailed in the remainder of the paper, Consensus is a natural candidate for a speed-up. Results Since the time-precision performance of Consensus depends on two parameters, our first task is to show that a simple adjustment of the parameters is not enough to obtain a good precision-time trade-off. Our second task is to provide a fast approximation algorithm for Consensus. That is, the closely related algorithm FC (Fast Consensus) that would have the same precision as Consensus with a substantially better time performance. The performance of FC has been assessed via extensive experiments on twelve benchmark datasets that summarize key features of microarray applications, such as cancer studies, gene expression with up and down patterns, and a full spectrum of dimensionality up to over a thousand. Based on their outcome, compared with previous benchmarking results available in the literature, FC turns out to be among the fastest internal validation methods, while retaining the same outstanding precision of Consensus. Moreover, it also provides a consensus matrix that can be used as a dissimilarity matrix, guaranteeing the same performance as the corresponding matrix produced by Consensus. We have also experimented with the use of Consensus and FC in conjunction with NMF (Nonnegative Matrix Factorization), in order to identify the correct number of clusters in a dataset. Although NMF is an increasingly popular technique for biological data mining, our results are somewhat disappointing and complement quite well the state of the art about NMF, shedding further light on its merits and limitations. Conclusions In summary, FC with a parameter setting that makes it robust with respect to small and medium-sized datasets, i.e, number of items to cluster in the hundreds and number of conditions up to a thousand, seems to be the internal validation measure of choice. Moreover, the technique we have developed here can be used in other contexts, in particular for the speed-up of stability-based validation measures. PMID:21235792

  12. A shortest-path graph kernel for estimating gene product semantic similarity.

    PubMed

    Alvarez, Marco A; Qi, Xiaojun; Yan, Changhui

    2011-07-29

    Existing methods for calculating semantic similarity between gene products using the Gene Ontology (GO) often rely on external resources, which are not part of the ontology. Consequently, changes in these external resources like biased term distribution caused by shifting of hot research topics, will affect the calculation of semantic similarity. One way to avoid this problem is to use semantic methods that are "intrinsic" to the ontology, i.e. independent of external knowledge. We present a shortest-path graph kernel (spgk) method that relies exclusively on the GO and its structure. In spgk, a gene product is represented by an induced subgraph of the GO, which consists of all the GO terms annotating it. Then a shortest-path graph kernel is used to compute the similarity between two graphs. In a comprehensive evaluation using a benchmark dataset, spgk compares favorably with other methods that depend on external resources. Compared with simUI, a method that is also intrinsic to GO, spgk achieves slightly better results on the benchmark dataset. Statistical tests show that the improvement is significant when the resolution and EC similarity correlation coefficient are used to measure the performance, but is insignificant when the Pfam similarity correlation coefficient is used. Spgk uses a graph kernel method in polynomial time to exploit the structure of the GO to calculate semantic similarity between gene products. It provides an alternative to both methods that use external resources and "intrinsic" methods with comparable performance.

  13. Evaluating virtual hosted desktops for graphics-intensive astronomy

    NASA Astrophysics Data System (ADS)

    Meade, B. F.; Fluke, C. J.

    2018-04-01

    Visualisation of data is critical to understanding astronomical phenomena. Today, many instruments produce datasets that are too big to be downloaded to a local computer, yet many of the visualisation tools used by astronomers are deployed only on desktop computers. Cloud computing is increasingly used to provide a computation and simulation platform in astronomy, but it also offers great potential as a visualisation platform. Virtual hosted desktops, with graphics processing unit (GPU) acceleration, allow interactive, graphics-intensive desktop applications to operate co-located with astronomy datasets stored in remote data centres. By combining benchmarking and user experience testing, with a cohort of 20 astronomers, we investigate the viability of replacing physical desktop computers with virtual hosted desktops. In our work, we compare two Apple MacBook computers (one old and one new, representing hardware and opposite ends of the useful lifetime) with two virtual hosted desktops: one commercial (Amazon Web Services) and one in a private research cloud (the Australian NeCTAR Research Cloud). For two-dimensional image-based tasks and graphics-intensive three-dimensional operations - typical of astronomy visualisation workflows - we found that benchmarks do not necessarily provide the best indication of performance. When compared to typical laptop computers, virtual hosted desktops can provide a better user experience, even with lower performing graphics cards. We also found that virtual hosted desktops are equally simple to use, provide greater flexibility in choice of configuration, and may actually be a more cost-effective option for typical usage profiles.

  14. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection.

    PubMed

    Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning

    2014-01-01

    X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.

  15. Efficient algorithms for fast integration on large data sets from multiple sources.

    PubMed

    Mi, Tian; Rajasekaran, Sanguthevar; Aseltine, Robert

    2012-06-28

    Recent large scale deployments of health information technology have created opportunities for the integration of patient medical records with disparate public health, human service, and educational databases to provide comprehensive information related to health and development. Data integration techniques, which identify records belonging to the same individual that reside in multiple data sets, are essential to these efforts. Several algorithms have been proposed in the literatures that are adept in integrating records from two different datasets. Our algorithms are aimed at integrating multiple (in particular more than two) datasets efficiently. Hierarchical clustering based solutions are used to integrate multiple (in particular more than two) datasets. Edit distance is used as the basic distance calculation, while distance calculation of common input errors is also studied. Several techniques have been applied to improve the algorithms in terms of both time and space: 1) Partial Construction of the Dendrogram (PCD) that ignores the level above the threshold; 2) Ignoring the Dendrogram Structure (IDS); 3) Faster Computation of the Edit Distance (FCED) that predicts the distance with the threshold by upper bounds on edit distance; and 4) A pre-processing blocking phase that limits dynamic computation within each block. We have experimentally validated our algorithms on large simulated as well as real data. Accuracy and completeness are defined stringently to show the performance of our algorithms. In addition, we employ a four-category analysis. Comparison with FEBRL shows the robustness of our approach. In the experiments we conducted, the accuracy we observed exceeded 90% for the simulated data in most cases. 97.7% and 98.1% accuracy were achieved for the constant and proportional threshold, respectively, in a real dataset of 1,083,878 records.

  16. Functional evaluation of out-of-the-box text-mining tools for data-mining tasks

    PubMed Central

    Jung, Kenneth; LePendu, Paea; Iyer, Srinivasan; Bauer-Mehren, Anna; Percha, Bethany; Shah, Nigam H

    2015-01-01

    Objective The trade-off between the speed and simplicity of dictionary-based term recognition and the richer linguistic information provided by more advanced natural language processing (NLP) is an area of active discussion in clinical informatics. In this paper, we quantify this trade-off among text processing systems that make different trade-offs between speed and linguistic understanding. We tested both types of systems in three clinical research tasks: phase IV safety profiling of a drug, learning adverse drug–drug interactions, and learning used-to-treat relationships between drugs and indications. Materials We first benchmarked the accuracy of the NCBO Annotator and REVEAL in a manually annotated, publically available dataset from the 2008 i2b2 Obesity Challenge. We then applied the NCBO Annotator and REVEAL to 9 million clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE) and used the resulting data for three research tasks. Results There is no significant difference between using the NCBO Annotator and REVEAL in the results of the three research tasks when using large datasets. In one subtask, REVEAL achieved higher sensitivity with smaller datasets. Conclusions For a variety of tasks, employing simple term recognition methods instead of advanced NLP methods results in little or no impact on accuracy when using large datasets. Simpler dictionary-based methods have the advantage of scaling well to very large datasets. Promoting the use of simple, dictionary-based methods for population level analyses can advance adoption of NLP in practice. PMID:25336595

  17. DeepPap: Deep Convolutional Networks for Cervical Cell Classification.

    PubMed

    Zhang, Ling; Le Lu; Nogues, Isabella; Summers, Ronald M; Liu, Shaoxiong; Yao, Jianhua

    2017-11-01

    Automation-assisted cervical screening via Pap smear or liquid-based cytology (LBC) is a highly effective cell imaging based cancer detection tool, where cells are partitioned into "abnormal" and "normal" categories. However, the success of most traditional classification methods relies on the presence of accurate cell segmentations. Despite sixty years of research in this field, accurate segmentation remains a challenge in the presence of cell clusters and pathologies. Moreover, previous classification methods are only built upon the extraction of hand-crafted features, such as morphology and texture. This paper addresses these limitations by proposing a method to directly classify cervical cells-without prior segmentation-based on deep features, using convolutional neural networks (ConvNets). First, the ConvNet is pretrained on a natural image dataset. It is subsequently fine-tuned on a cervical cell dataset consisting of adaptively resampled image patches coarsely centered on the nuclei. In the testing phase, aggregation is used to average the prediction scores of a similar set of image patches. The proposed method is evaluated on both Pap smear and LBC datasets. Results show that our method outperforms previous algorithms in classification accuracy (98.3%), area under the curve (0.99) values, and especially specificity (98.3%), when applied to the Herlev benchmark Pap smear dataset and evaluated using five-fold cross validation. Similar superior performances are also achieved on the HEMLBC (H&E stained manual LBC) dataset. Our method is promising for the development of automation-assisted reading systems in primary cervical screening.

  18. An Active Patch Model for Real World Texture and Appearance Classification

    PubMed Central

    Mao, Junhua; Zhu, Jun; Yuille, Alan L.

    2014-01-01

    This paper addresses the task of natural texture and appearance classification. Our goal is to develop a simple and intuitive method that performs at state of the art on datasets ranging from homogeneous texture (e.g., material texture), to less homogeneous texture (e.g., the fur of animals), and to inhomogeneous texture (the appearance patterns of vehicles). Our method uses a bag-of-words model where the features are based on a dictionary of active patches. Active patches are raw intensity patches which can undergo spatial transformations (e.g., rotation and scaling) and adjust themselves to best match the image regions. The dictionary of active patches is required to be compact and representative, in the sense that we can use it to approximately reconstruct the images that we want to classify. We propose a probabilistic model to quantify the quality of image reconstruction and design a greedy learning algorithm to obtain the dictionary. We classify images using the occurrence frequency of the active patches. Feature extraction is fast (about 100 ms per image) using the GPU. The experimental results show that our method improves the state of the art on a challenging material texture benchmark dataset (KTH-TIPS2). To test our method on less homogeneous or inhomogeneous images, we construct two new datasets consisting of appearance image patches of animals and vehicles cropped from the PASCAL VOC dataset. Our method outperforms competing methods on these datasets. PMID:25531013

  19. CMIP: a software package capable of reconstructing genome-wide regulatory networks using gene expression data.

    PubMed

    Zheng, Guangyong; Xu, Yaochen; Zhang, Xiujun; Liu, Zhi-Ping; Wang, Zhuo; Chen, Luonan; Zhu, Xin-Guang

    2016-12-23

    A gene regulatory network (GRN) represents interactions of genes inside a cell or tissue, in which vertexes and edges stand for genes and their regulatory interactions respectively. Reconstruction of gene regulatory networks, in particular, genome-scale networks, is essential for comparative exploration of different species and mechanistic investigation of biological processes. Currently, most of network inference methods are computationally intensive, which are usually effective for small-scale tasks (e.g., networks with a few hundred genes), but are difficult to construct GRNs at genome-scale. Here, we present a software package for gene regulatory network reconstruction at a genomic level, in which gene interaction is measured by the conditional mutual information measurement using a parallel computing framework (so the package is named CMIP). The package is a greatly improved implementation of our previous PCA-CMI algorithm. In CMIP, we provide not only an automatic threshold determination method but also an effective parallel computing framework for network inference. Performance tests on benchmark datasets show that the accuracy of CMIP is comparable to most current network inference methods. Moreover, running tests on synthetic datasets demonstrate that CMIP can handle large datasets especially genome-wide datasets within an acceptable time period. In addition, successful application on a real genomic dataset confirms its practical applicability of the package. This new software package provides a powerful tool for genomic network reconstruction to biological community. The software can be accessed at http://www.picb.ac.cn/CMIP/ .

  20. PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.

    PubMed

    Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

    2014-01-01

    Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.

  1. The Next Generation Heated Halo for Blackbody Emissivity Measurement

    NASA Astrophysics Data System (ADS)

    Gero, P.; Taylor, J. K.; Best, F. A.; Revercomb, H. E.; Knuteson, R. O.; Tobin, D. C.; Adler, D. P.; Ciganovich, N. N.; Dutcher, S. T.; Garcia, R. K.

    2011-12-01

    The accuracy of radiance measurements from space-based infrared spectrometers is contingent on the quality of the calibration subsystem, as well as knowledge of its uncertainty. Future climate benchmarking missions call for measurement uncertainties better than 0.1 K (k=3) in radiance temperature for the detection of spectral climate signatures. Blackbody cavities impart the most accurate calibration for spaceborne infrared sensors, provided that their temperature and emissivity is traceably determined on-orbit. The On-Orbit Absolute Radiance Standard (OARS) has been developed at the University of Wisconsin to meet the stringent requirements of the next generation of infrared remote sensing instruments. It provides on-orbit determination of both traceable temperature and emissivity for calibration blackbodies. The Heated Halo is the component of the OARS that provides a robust and compact method to measure the spectral emissivity of a blackbody in situ. A carefully baffled thermal source is placed in front of a blackbody in an infrared spectrometer system, and the combined radiance of the blackbody and Heated Halo reflection is observed. Knowledge of key temperatures and the viewing geometry allow the blackbody cavity spectral emissivity to be calculated. We present the results from the Heated Halo methodology implemented with a new Absolute Radiance Interferometer (ARI), which is a prototype space-based infrared spectrometer designed for climate benchmarking that was developed under the NASA Instrument Incubator Program (IIP). We compare our findings to models and other experimental methods of emissivity determination.

  2. A community detection algorithm using network topologies and rule-based hierarchical arc-merging strategies

    PubMed Central

    2017-01-01

    The authors use four criteria to examine a novel community detection algorithm: (a) effectiveness in terms of producing high values of normalized mutual information (NMI) and modularity, using well-known social networks for testing; (b) examination, meaning the ability to examine mitigating resolution limit problems using NMI values and synthetic networks; (c) correctness, meaning the ability to identify useful community structure results in terms of NMI values and Lancichinetti-Fortunato-Radicchi (LFR) benchmark networks; and (d) scalability, or the ability to produce comparable modularity values with fast execution times when working with large-scale real-world networks. In addition to describing a simple hierarchical arc-merging (HAM) algorithm that uses network topology information, we introduce rule-based arc-merging strategies for identifying community structures. Five well-studied social network datasets and eight sets of LFR benchmark networks were employed to validate the correctness of a ground-truth community, eight large-scale real-world complex networks were used to measure its efficiency, and two synthetic networks were used to determine its susceptibility to two resolution limit problems. Our experimental results indicate that the proposed HAM algorithm exhibited satisfactory performance efficiency, and that HAM-identified and ground-truth communities were comparable in terms of social and LFR benchmark networks, while mitigating resolution limit problems. PMID:29121100

  3. Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods

    PubMed Central

    Mu, John C.; Tootoonchi Afshar, Pegah; Mohiyuddin, Marghoob; Chen, Xi; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B.; Wong, Wing H.; Lam, Hugo Y. K.

    2015-01-01

    A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools. PMID:26412485

  4. Sustainable aggregate production planning in the chemical process industry - A benchmark problem and dataset.

    PubMed

    Brandenburg, Marcus; Hahn, Gerd J

    2018-06-01

    Process industries typically involve complex manufacturing operations and thus require adequate decision support for aggregate production planning (APP). The need for powerful and efficient approaches to solve complex APP problems persists. Problem-specific solution approaches are advantageous compared to standardized approaches that are designed to provide basic decision support for a broad range of planning problems but inadequate to optimize under consideration of specific settings. This in turn calls for methods to compare different approaches regarding their computational performance and solution quality. In this paper, we present a benchmarking problem for APP in the chemical process industry. The presented problem focuses on (i) sustainable operations planning involving multiple alternative production modes/routings with specific production-related carbon emission and the social dimension of varying operating rates and (ii) integrated campaign planning with production mix/volume on the operational level. The mutual trade-offs between economic, environmental and social factors can be considered as externalized factors (production-related carbon emission and overtime working hours) as well as internalized ones (resulting costs). We provide data for all problem parameters in addition to a detailed verbal problem statement. We refer to Hahn and Brandenburg [1] for a first numerical analysis based on and for future research perspectives arising from this benchmarking problem.

  5. Detecting text in natural scenes with multi-level MSER and SWT

    NASA Astrophysics Data System (ADS)

    Lu, Tongwei; Liu, Renjun

    2018-04-01

    The detection of the characters in the natural scene is susceptible to factors such as complex background, variable viewing angle and diverse forms of language, which leads to poor detection results. Aiming at these problems, a new text detection method was proposed, which consisted of two main stages, candidate region extraction and text region detection. At first stage, the method used multiple scale transformations of original image and multiple thresholds of maximally stable extremal regions (MSER) to detect the text regions which could detect character regions comprehensively. At second stage, obtained SWT maps by using the stroke width transform (SWT) algorithm to compute the candidate regions, then using cascaded classifiers to propose non-text regions. The proposed method was evaluated on the standard benchmark datasets of ICDAR2011 and the datasets that we made our own data sets. The experiment results showed that the proposed method have greatly improved that compared to other text detection methods.

  6. A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification.

    PubMed

    Kang, Qi; Chen, XiaoShuang; Li, SiSi; Zhou, MengChu

    2017-12-01

    Under-sampling is a popular data preprocessing method in dealing with class imbalance problems, with the purposes of balancing datasets to achieve a high classification rate and avoiding the bias toward majority class examples. It always uses full minority data in a training dataset. However, some noisy minority examples may reduce the performance of classifiers. In this paper, a new under-sampling scheme is proposed by incorporating a noise filter before executing resampling. In order to verify the efficiency, this scheme is implemented based on four popular under-sampling methods, i.e., Undersampling + Adaboost, RUSBoost, UnderBagging, and EasyEnsemble through benchmarks and significance analysis. Furthermore, this paper also summarizes the relationship between algorithm performance and imbalanced ratio. Experimental results indicate that the proposed scheme can improve the original undersampling-based methods with significance in terms of three popular metrics for imbalanced classification, i.e., the area under the curve, -measure, and -mean.

  7. A novel clinical decision support system using improved adaptive genetic algorithm for the assessment of fetal well-being.

    PubMed

    Ravindran, Sindhu; Jambek, Asral Bahari; Muthusamy, Hariharan; Neoh, Siew-Chin

    2015-01-01

    A novel clinical decision support system is proposed in this paper for evaluating the fetal well-being from the cardiotocogram (CTG) dataset through an Improved Adaptive Genetic Algorithm (IAGA) and Extreme Learning Machine (ELM). IAGA employs a new scaling technique (called sigma scaling) to avoid premature convergence and applies adaptive crossover and mutation techniques with masking concepts to enhance population diversity. Also, this search algorithm utilizes three different fitness functions (two single objective fitness functions and multi-objective fitness function) to assess its performance. The classification results unfold that promising classification accuracy of 94% is obtained with an optimal feature subset using IAGA. Also, the classification results are compared with those of other Feature Reduction techniques to substantiate its exhaustive search towards the global optimum. Besides, five other benchmark datasets are used to gauge the strength of the proposed IAGA algorithm.

  8. NegBio: a high-performance tool for negation and uncertainty detection in radiology reports.

    PubMed

    Peng, Yifan; Wang, Xiaosong; Lu, Le; Bagheri, Mohammadhadi; Summers, Ronald; Lu, Zhiyong

    2018-01-01

    Negative and uncertain medical findings are frequent in radiology reports, but discriminating them from positive findings remains challenging for information extraction. Here, we propose a new algorithm, NegBio, to detect negative and uncertain findings in radiology reports. Unlike previous rule-based methods, NegBio utilizes patterns on universal dependencies to identify the scope of triggers that are indicative of negation or uncertainty. We evaluated NegBio on four datasets, including two public benchmarking corpora of radiology reports, a new radiology corpus that we annotated for this work, and a public corpus of general clinical texts. Evaluation on these datasets demonstrates that NegBio is highly accurate for detecting negative and uncertain findings and compares favorably to a widely-used state-of-the-art system NegEx (an average of 9.5% improvement in precision and 5.1% in F1-score). https://github.com/ncbi-nlp/NegBio.

  9. RBind: computational network method to predict RNA binding sites.

    PubMed

    Wang, Kaili; Jian, Yiren; Wang, Huiwen; Zeng, Chen; Zhao, Yunjie

    2018-04-26

    Non-coding RNA molecules play essential roles by interacting with other molecules to perform various biological functions. However, it is difficult to determine RNA structures due to their flexibility. At present, the number of experimentally solved RNA-ligand and RNA-protein structures is still insufficient. Therefore, binding sites prediction of non-coding RNA is required to understand their functions. Current RNA binding site prediction algorithms produce many false positive nucleotides that are distance away from the binding sites. Here, we present a network approach, RBind, to predict the RNA binding sites. We benchmarked RBind in RNA-ligand and RNA-protein datasets. The average accuracy of 0.82 in RNA-ligand and 0.63 in RNA-protein testing showed that this network strategy has a reliable accuracy for binding sites prediction. The codes and datasets are available at https://zhaolab.com.cn/RBind. yjzhaowh@mail.ccnu.edu.cn. Supplementary data are available at Bioinformatics online.

  10. On macromolecular refinement at subatomic resolution withinteratomic scatterers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Afonine, Pavel V.; Grosse-Kunstleve, Ralf W.; Adams, Paul D.

    2007-11-09

    A study of the accurate electron density distribution in molecular crystals at subatomic resolution, better than {approx} 1.0 {angstrom}, requires more detailed models than those based on independent spherical atoms. A tool conventionally used in small-molecule crystallography is the multipolar model. Even at upper resolution limits of 0.8-1.0 {angstrom}, the number of experimental data is insufficient for the full multipolar model refinement. As an alternative, a simpler model composed of conventional independent spherical atoms augmented by additional scatterers to model bonding effects has been proposed. Refinement of these mixed models for several benchmark datasets gave results comparable in quality withmore » results of multipolar refinement and superior of those for conventional models. Applications to several datasets of both small- and macro-molecules are shown. These refinements were performed using the general-purpose macromolecular refinement module phenix.refine of the PHENIX package.« less

  11. MetaQUAST: evaluation of metagenome assemblies.

    PubMed

    Mikheenko, Alla; Saveliev, Vladislav; Gurevich, Alexey

    2016-04-01

    During the past years we have witnessed the rapid development of new metagenome assembly methods. Although there are many benchmark utilities designed for single-genome assemblies, there is no well-recognized evaluation and comparison tool for metagenomic-specific analogues. In this article, we present MetaQUAST, a modification of QUAST, the state-of-the-art tool for genome assembly evaluation based on alignment of contigs to a reference. MetaQUAST addresses such metagenome datasets features as (i) unknown species content by detecting and downloading reference sequences, (ii) huge diversity by giving comprehensive reports for multiple genomes and (iii) presence of highly relative species by detecting chimeric contigs. We demonstrate MetaQUAST performance by comparing several leading assemblers on one simulated and two real datasets. http://bioinf.spbau.ru/metaquast aleksey.gurevich@spbu.ru Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. Adapting generalization tools to physiographic diversity for the united states national hydrography dataset

    USGS Publications Warehouse

    Buttenfield, B.P.; Stanislawski, L.V.; Brewer, C.A.

    2011-01-01

    This paper reports on generalization and data modeling to create reduced scale versions of the National Hydrographic Dataset (NHD) for dissemination through The National Map, the primary data delivery portal for USGS. Our approach distinguishes local differences in physiographic factors, to demonstrate that knowledge about varying terrain (mountainous, hilly or flat) and varying climate (dry or humid) can support decisions about algorithms, parameters, and processing sequences to create generalized, smaller scale data versions which preserve distinct hydrographic patterns in these regions. We work with multiple subbasins of the NHD that provide a range of terrain and climate characteristics. Specifically tailored generalization sequences are used to create simplified versions of the high resolution data, which was compiled for 1:24,000 scale mapping. Results are evaluated cartographically and metrically against a medium resolution benchmark version compiled for 1:100,000, developing coefficients of linear and areal correspondence.

  13. Local feature saliency classifier for real-time intrusion monitoring

    NASA Astrophysics Data System (ADS)

    Buch, Norbert; Velastin, Sergio A.

    2014-07-01

    We propose a texture saliency classifier to detect people in a video frame by identifying salient texture regions. The image is classified into foreground and background in real time. No temporal image information is used during the classification. The system is used for the task of detecting people entering a sterile zone, which is a common scenario for visual surveillance. Testing is performed on the Imagery Library for Intelligent Detection Systems sterile zone benchmark dataset of the United Kingdom's Home Office. The basic classifier is extended by fusing its output with simple motion information, which significantly outperforms standard motion tracking. A lower detection time can be achieved by combining texture classification with Kalman filtering. The fusion approach running at 10 fps gives the highest result of F1=0.92 for the 24-h test dataset. The paper concludes with a detailed analysis of the computation time required for the different parts of the algorithm.

  14. Shot boundary detection and label propagation for spatio-temporal video segmentation

    NASA Astrophysics Data System (ADS)

    Piramanayagam, Sankaranaryanan; Saber, Eli; Cahill, Nathan D.; Messinger, David

    2015-02-01

    This paper proposes a two stage algorithm for streaming video segmentation. In the first stage, shot boundaries are detected within a window of frames by comparing dissimilarity between 2-D segmentations of each frame. In the second stage, the 2-D segments are propagated across the window of frames in both spatial and temporal direction. The window is moved across the video to find all shot transitions and obtain spatio-temporal segments simultaneously. As opposed to techniques that operate on entire video, the proposed approach consumes significantly less memory and enables segmentation of lengthy videos. We tested our segmentation based shot detection method on the TRECVID 2007 video dataset and compared it with block-based technique. Cut detection results on the TRECVID 2007 dataset indicate that our algorithm has comparable results to the best of the block-based methods. The streaming video segmentation routine also achieves promising results on a challenging video segmentation benchmark database.

  15. Improving average ranking precision in user searches for biomedical research datasets

    PubMed Central

    Gobeill, Julien; Gaudinat, Arnaud; Vachon, Thérèse; Ruch, Patrick

    2017-01-01

    Abstract Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorization method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries, and provided competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP, being +22.3% higher than the median infAP of the participant’s best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system’s performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. The similarity measure algorithm showed robust performance in different training conditions, with small performance variations compared to the Divergence from Randomness framework. Finally, the result categorization did not have significant impact on the system’s performance. We believe that our solution could be used to enhance biomedical dataset management systems. The use of data driven expansion methods, such as those based on word embeddings, could be an alternative to the complexity of biomedical terminologies. Nevertheless, due to the limited size of the assessment set, further experiments need to be performed to draw conclusive results. Database URL: https://biocaddie.org/benchmark-data PMID:29220475

  16. Study of Huizhou architecture component point cloud in surface reconstruction

    NASA Astrophysics Data System (ADS)

    Zhang, Runmei; Wang, Guangyin; Ma, Jixiang; Wu, Yulu; Zhang, Guangbin

    2017-06-01

    Surface reconfiguration softwares have many problems such as complicated operation on point cloud data, too many interaction definitions, and too stringent requirements for inputing data. Thus, it has not been widely popularized so far. This paper selects the unique Huizhou Architecture chuandou wooden beam framework as the research object, and presents a complete set of implementation in data acquisition from point, point cloud preprocessing and finally implemented surface reconstruction. Firstly, preprocessing the acquired point cloud data, including segmentation and filtering. Secondly, the surface’s normals are deduced directly from the point cloud dataset. Finally, the surface reconstruction is studied by using Greedy Projection Triangulation Algorithm. Comparing the reconstructed model with the three-dimensional surface reconstruction softwares, the results show that the proposed scheme is more smooth, time efficient and portable.

  17. Skipping Strategy (SS) for Initial Population of Job-Shop Scheduling Problem

    NASA Astrophysics Data System (ADS)

    Abdolrazzagh-Nezhad, M.; Nababan, E. B.; Sarim, H. M.

    2018-03-01

    Initial population in job-shop scheduling problem (JSSP) is an essential step to obtain near optimal solution. Techniques used to solve JSSP are computationally demanding. Skipping strategy (SS) is employed to acquire initial population after sequence of job on machine and sequence of operations (expressed in Plates-jobs and mPlates-jobs) are determined. The proposed technique is applied to benchmark datasets and the results are compared to that of other initialization techniques. It is shown that the initial population obtained from the SS approach could generate optimal solution.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reiche, Helmut Matthias; Vogel, Sven C.

    New in situ data for the U-C system are presented, with the goal of improving knowledge of the phase diagram to enable production of new ceramic fuels. The none quenchable, cubic, δ-phase, which in turn is fundamental to computational methods, was identified. Rich datasets of the formation synthesis of uranium carbide yield kinetics data which allow the benchmarking of modeling, thermodynamic parameters etc. The order-disorder transition (carbon sublattice melting) was observed due to equal sensitivity of neutrons to both elements. This dynamic has not been accurately described in some recent simulation-based publications.

  19. Performance evaluation of firefly algorithm with variation in sorting for non-linear benchmark problems

    NASA Astrophysics Data System (ADS)

    Umbarkar, A. J.; Balande, U. T.; Seth, P. D.

    2017-06-01

    The field of nature inspired computing and optimization techniques have evolved to solve difficult optimization problems in diverse fields of engineering, science and technology. The firefly attraction process is mimicked in the algorithm for solving optimization problems. In Firefly Algorithm (FA) sorting of fireflies is done by using sorting algorithm. The original FA is proposed with bubble sort for ranking the fireflies. In this paper, the quick sort replaces bubble sort to decrease the time complexity of FA. The dataset used is unconstrained benchmark functions from CEC 2005 [22]. The comparison of FA using bubble sort and FA using quick sort is performed with respect to best, worst, mean, standard deviation, number of comparisons and execution time. The experimental result shows that FA using quick sort requires less number of comparisons but requires more execution time. The increased number of fireflies helps to converge into optimal solution whereas by varying dimension for algorithm performed better at a lower dimension than higher dimension.

  20. DataMed - an open source discovery index for finding biomedical datasets.

    PubMed

    Chen, Xiaoling; Gururaj, Anupama E; Ozyurt, Burak; Liu, Ruiling; Soysal, Ergin; Cohen, Trevor; Tiryaki, Firat; Li, Yueling; Zong, Nansu; Jiang, Min; Rogith, Deevakar; Salimi, Mandana; Kim, Hyeon-Eui; Rocca-Serra, Philippe; Gonzalez-Beltran, Alejandra; Farcas, Claudiu; Johnson, Todd; Margolis, Ron; Alter, George; Sansone, Susanna-Assunta; Fore, Ian M; Ohno-Machado, Lucila; Grethe, Jeffrey S; Xu, Hua

    2018-01-13

    Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health-funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community. © The Author 2018. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  1. Deep convolutional neural networks for pan-specific peptide-MHC class I binding prediction.

    PubMed

    Han, Youngmahn; Kim, Dongsup

    2017-12-28

    Computational scanning of peptide candidates that bind to a specific major histocompatibility complex (MHC) can speed up the peptide-based vaccine development process and therefore various methods are being actively developed. Recently, machine-learning-based methods have generated successful results by training large amounts of experimental data. However, many machine learning-based methods are generally less sensitive in recognizing locally-clustered interactions, which can synergistically stabilize peptide binding. Deep convolutional neural network (DCNN) is a deep learning method inspired by visual recognition process of animal brain and it is known to be able to capture meaningful local patterns from 2D images. Once the peptide-MHC interactions can be encoded into image-like array(ILA) data, DCNN can be employed to build a predictive model for peptide-MHC binding prediction. In this study, we demonstrated that DCNN is able to not only reliably predict peptide-MHC binding, but also sensitively detect locally-clustered interactions. Nonapeptide-HLA-A and -B binding data were encoded into ILA data. A DCNN, as a pan-specific prediction model, was trained on the ILA data. The DCNN showed higher performance than other prediction tools for the latest benchmark datasets, which consist of 43 datasets for 15 HLA-A alleles and 25 datasets for 10 HLA-B alleles. In particular, the DCNN outperformed other tools for alleles belonging to the HLA-A3 supertype. The F1 scores of the DCNN were 0.86, 0.94, and 0.67 for HLA-A*31:01, HLA-A*03:01, and HLA-A*68:01 alleles, respectively, which were significantly higher than those of other tools. We found that the DCNN was able to recognize locally-clustered interactions that could synergistically stabilize peptide binding. We developed ConvMHC, a web server to provide user-friendly web interfaces for peptide-MHC class I binding predictions using the DCNN. ConvMHC web server can be accessible via http://jumong.kaist.ac.kr:8080/convmhc . We developed a novel method for peptide-HLA-I binding predictions using DCNN trained on ILA data that encode peptide binding data and demonstrated the reliable performance of the DCNN in nonapeptide binding predictions through the independent evaluation on the latest IEDB benchmark datasets. Our approaches can be applied to characterize locally-clustered patterns in molecular interactions, such as protein/DNA, protein/RNA, and drug/protein interactions.

  2. Automatic segmentation of male pelvic anatomy on computed tomography images: a comparison with multiple observers in the context of a multicentre clinical trial.

    PubMed

    Geraghty, John P; Grogan, Garry; Ebert, Martin A

    2013-04-30

    This study investigates the variation in segmentation of several pelvic anatomical structures on computed tomography (CT) between multiple observers and a commercial automatic segmentation method, in the context of quality assurance and evaluation during a multicentre clinical trial. CT scans of two prostate cancer patients ('benchmarking cases'), one high risk (HR) and one intermediate risk (IR), were sent to multiple radiotherapy centres for segmentation of prostate, rectum and bladder structures according to the TROG 03.04 "RADAR" trial protocol definitions. The same structures were automatically segmented using iPlan software for the same two patients, allowing structures defined by automatic segmentation to be quantitatively compared with those defined by multiple observers. A sample of twenty trial patient datasets were also used to automatically generate anatomical structures for quantitative comparison with structures defined by individual observers for the same datasets. There was considerable agreement amongst all observers and automatic segmentation of the benchmarking cases for bladder (mean spatial variations < 0.4 cm across the majority of image slices). Although there was some variation in interpretation of the superior-inferior (cranio-caudal) extent of rectum, human-observer contours were typically within a mean 0.6 cm of automatically-defined contours. Prostate structures were more consistent for the HR case than the IR case with all human observers segmenting a prostate with considerably more volume (mean +113.3%) than that automatically segmented. Similar results were seen across the twenty sample datasets, with disagreement between iPlan and observers dominant at the prostatic apex and superior part of the rectum, which is consistent with observations made during quality assurance reviews during the trial. This study has demonstrated quantitative analysis for comparison of multi-observer segmentation studies. For automatic segmentation algorithms based on image-registration as in iPlan, it is apparent that agreement between observer and automatic segmentation will be a function of patient-specific image characteristics, particularly for anatomy with poor contrast definition. For this reason, it is suggested that automatic registration based on transformation of a single reference dataset adds a significant systematic bias to the resulting volumes and their use in the context of a multicentre trial should be carefully considered.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shatsky, Maxim; Allen, Simon; Gold, Barbara

    Numerous affinity purification – mass-spectrometry (AP-MS) and yeast two hybrid (Y2H) screens have each defined thousands of pairwise protein-protein interactions (PPIs), most between functionally unrelated proteins. The accuracy of these networks, however, is under debate. Here we present an AP-MS survey of the bacterium Desulfovibrio vulgaris together with a critical reanalysis of nine published bacterial Y2H and AP-MS screens. We have identified 459 high confidence PPIs from D. vulgaris and 391 from Escherichia coli. Compared to the nine published interactomes, our two networks are smaller; are much less highly connected; have significantly lower false discovery rates; and are much moremore » enriched in protein pairs that are encoded in the same operon, have similar functions, and are reproducibly detected in other physical interaction assays. Lastly, our work establishes more stringent benchmarks for the properties of protein interactomes and suggests that bona fide PPIs much more frequently involve protein partners that are annotated with similar functions or that can be validated in independent assays than earlier studies suggested.« less

  4. Search for Resonances Decaying to Top and Bottom Quarks with the CDF Experiment.

    PubMed

    Aaltonen, T; Amerio, S; Amidei, D; Anastassov, A; Annovi, A; Antos, J; Anzà, F; Apollinari, G; Appel, J A; Arisawa, T; Artikov, A; Asaadi, J; Ashmanskas, W; Auerbach, B; Aurisano, A; Azfar, F; Badgett, W; Bae, T; Barbaro-Galtieri, A; Barnes, V E; Barnett, B A; Barria, P; Bartos, P; Bauce, M; Bedeschi, F; Behari, S; Bellettini, G; Bellinger, J; Benjamin, D; Beretvas, A; Bhatti, A; Bianchi, L; Bland, K R; Blumenfeld, B; Bocci, A; Bodek, A; Bortoletto, D; Boudreau, J; Boveia, A; Brigliadori, L; Bromberg, C; Brucken, E; Budagov, J; Budd, H S; Burkett, K; Busetto, G; Bussey, P; Butti, P; Buzatu, A; Calamba, A; Camarda, S; Campanelli, M; Canelli, F; Carls, B; Carlsmith, D; Carosi, R; Carrillo, S; Casal, B; Casarsa, M; Castro, A; Catastini, P; Cauz, D; Cavaliere, V; Cerri, A; Cerrito, L; Chen, Y C; Chertok, M; Chiarelli, G; Chlachidze, G; Cho, K; Chokheli, D; Clark, A; Clarke, C; Convery, M E; Conway, J; Corbo, M; Cordelli, M; Cox, C A; Cox, D J; Cremonesi, M; Cruz, D; Cuevas, J; Culbertson, R; d'Ascenzo, N; Datta, M; de Barbaro, P; Demortier, L; Deninno, M; D'Errico, M; Devoto, F; Di Canto, A; Di Ruzza, B; Dittmann, J R; Donati, S; D'Onofrio, M; Dorigo, M; Driutti, A; Ebina, K; Edgar, R; Elagin, A; Erbacher, R; Errede, S; Esham, B; Farrington, S; Fernández Ramos, J P; Field, R; Flanagan, G; Forrest, R; Franklin, M; Freeman, J C; Frisch, H; Funakoshi, Y; Galloni, C; Garfinkel, A F; Garosi, P; Gerberich, H; Gerchtein, E; Giagu, S; Giakoumopoulou, V; Gibson, K; Ginsburg, C M; Giokaris, N; Giromini, P; Glagolev, V; Glenzinski, D; Gold, M; Goldin, D; Golossanov, A; Gomez, G; Gomez-Ceballos, G; Goncharov, M; González López, O; Gorelov, I; Goshaw, A T; Goulianos, K; Gramellini, E; Grosso-Pilcher, C; Group, R C; Guimaraes da Costa, J; Hahn, S R; Han, J Y; Happacher, F; Hara, K; Hare, M; Harr, R F; Harrington-Taber, T; Hatakeyama, K; Hays, C; Heinrich, J; Herndon, M; Hocker, A; Hong, Z; Hopkins, W; Hou, S; Hughes, R E; Husemann, U; Hussein, M; Huston, J; Introzzi, G; Iori, M; Ivanov, A; James, E; Jang, D; Jayatilaka, B; Jeon, E J; Jindariani, S; Jones, M; Joo, K K; Jun, S Y; Junk, T R; Kambeitz, M; Kamon, T; Karchin, P E; Kasmi, A; Kato, Y; Ketchum, W; Keung, J; Kilminster, B; Kim, D H; Kim, H S; Kim, J E; Kim, M J; Kim, S H; Kim, S B; Kim, Y J; Kim, Y K; Kimura, N; Kirby, M; Knoepfel, K; Kondo, K; Kong, D J; Konigsberg, J; Kotwal, A V; Kreps, M; Kroll, J; Kruse, M; Kuhr, T; Kurata, M; Laasanen, A T; Lammel, S; Lancaster, M; Lannon, K; Latino, G; Lee, H S; Lee, J S; Leo, S; Leone, S; Lewis, J D; Limosani, A; Lipeles, E; Lister, A; Liu, H; Liu, Q; Liu, T; Lockwitz, S; Loginov, A; Lucchesi, D; Lucà, A; Lueck, J; Lujan, P; Lukens, P; Lungu, G; Lys, J; Lysak, R; Madrak, R; Maestro, P; Malik, S; Manca, G; Manousakis-Katsikakis, A; Marchese, L; Margaroli, F; Marino, P; Matera, K; Mattson, M E; Mazzacane, A; Mazzanti, P; McNulty, R; Mehta, A; Mehtala, P; Mesropian, C; Miao, T; Mietlicki, D; Mitra, A; Miyake, H; Moed, S; Moggi, N; Moon, C S; Moore, R; Morello, M J; Mukherjee, A; Muller, Th; Murat, P; Mussini, M; Nachtman, J; Nagai, Y; Naganoma, J; Nakano, I; Napier, A; Nett, J; Neu, C; Nigmanov, T; Nodulman, L; Noh, S Y; Norniella, O; Oakes, L; Oh, S H; Oh, Y D; Oksuzian, I; Okusawa, T; Orava, R; Ortolan, L; Pagliarone, C; Palencia, E; Palni, P; Papadimitriou, V; Parker, W; Pauletta, G; Paulini, M; Paus, C; Phillips, T J; Piacentino, G; Pianori, E; Pilot, J; Pitts, K; Plager, C; Pondrom, L; Poprocki, S; Potamianos, K; Pranko, A; Prokoshin, F; Ptohos, F; Punzi, G; Redondo Fernández, I; Renton, P; Rescigno, M; Rimondi, F; Ristori, L; Robson, A; Rodriguez, T; Rolli, S; Ronzani, M; Roser, R; Rosner, J L; Ruffini, F; Ruiz, A; Russ, J; Rusu, V; Sakumoto, W K; Sakurai, Y; Santi, L; Sato, K; Saveliev, V; Savoy-Navarro, A; Schlabach, P; Schmidt, E E; Schwarz, T; Scodellaro, L; Scuri, F; Seidel, S; Seiya, Y; Semenov, A; Sforza, F; Shalhout, S Z; Shears, T; Shepard, P F; Shimojima, M; Shochet, M; Shreyber-Tecker, I; Simonenko, A; Sliwa, K; Smith, J R; Snider, F D; Song, H; Sorin, V; St Denis, R; Stancari, M; Stentz, D; Strologas, J; Sudo, Y; Sukhanov, A; Suslov, I; Takemasa, K; Takeuchi, Y; Tang, J; Tecchio, M; Teng, P K; Thom, J; Thomson, E; Thukral, V; Toback, D; Tokar, S; Tollefson, K; Tomura, T; Tonelli, D; Torre, S; Torretta, D; Totaro, P; Trovato, M; Ukegawa, F; Uozumi, S; Vázquez, F; Velev, G; Vellidis, C; Vernieri, C; Vidal, M; Vilar, R; Vizán, J; Vogel, M; Volpi, G; Wagner, P; Wallny, R; Wang, S M; Waters, D; Wester, W C; Whiteson, D; Wicklund, A B; Wilbur, S; Williams, H H; Wilson, J S; Wilson, P; Winer, B L; Wittich, P; Wolbers, S; Wolfe, H; Wright, T; Wu, X; Wu, Z; Yamamoto, K; Yamato, D; Yang, T; Yang, U K; Yang, Y C; Yao, W-M; Yeh, G P; Yi, K; Yoh, J; Yorita, K; Yoshida, T; Yu, G B; Yu, I; Zanetti, A M; Zeng, Y; Zhou, C; Zucchelli, S

    2015-08-07

    We report on a search for charged massive resonances decaying to top (t) and bottom (b) quarks in the full data set of proton-antiproton collisions at a center-of-mass energy of √[s]=1.96  TeV collected by the CDF II detector at the Tevatron, corresponding to an integrated luminosity of 9.5  fb(-1). No significant excess above the standard model background prediction is observed. We set 95% Bayesian credibility mass-dependent upper limits on the heavy charged-particle production cross section times branching ratio to tb. Using a standard model extension with a W'→tb and left-right-symmetric couplings as a benchmark model, we constrain the W' mass and couplings in the 300-900  GeV/c(2) range. The limits presented here are the most stringent for a charged resonance with mass in the range 300-600  GeV/c(2) decaying to top and bottom quarks.

  5. Plasma diagnostics for x-ray driven foils at Z

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Heeter, R F; Bailey, J E; Cuneo, M E

    We report the development of techniques to diagnose plasmas produced by X-ray photoionization of thin foils placed near the Z-pinch on the Sandia Z Machine. The development of 100+ TW X-ray sources enables access to novel plasma regimes, such as the photoionization equilibrium. To diagnose these plasmas one must simultaneously characterize both the foil and the driving pinch. The desired photoionized plasma equilibrium is only reached transiently for a 2-ns window, placing stringent requirements on diagnostic synchronization. We have adapted existing Sandia diagnostics and fielded an additional gated 3-crystal Johann spectrometer with dual lines of sight to meet these requirements.more » We present sample data from experiments in which 1 cm, 180 eV tungsten pinches photoionized foils composed of 200{angstrom} Fe and 300{angstrom} NaF co-mixed and sandwiched between 1000{angstrom} layers of Lexan (CHO), and discuss the application of this work to benchmarking astrophysical models.« less

  6. Plasma diagnostics for x-ray driven foils at Z

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Heeter, R. F.; Bailey, J. E.; Cuneo, M. E.

    We report the development of techniques to diagnose plasmas produced by x-ray photoionization of thin foils placed near the Z-pinch on the Sandia Z Machine. The development of 100+ TW x-ray sources enables access to novel plasma regimes, such as the photoionization equilibrium. To diagnose these plasmas one must simultaneously characterize both the foil and the driving pinch. The desired photoionized plasma equilibrium is only reached transiently for a 2-ns window, placing stringent requirements on diagnostic synchronization. We have adapted existing Sandia diagnostics and fielded an additional gated three-crystal Johann spectrometer with dual lines of sight to meet these requirements.more » We present sample data from experiments using 1-cm, 180-eV tungsten pinches to photoionize foils made of 200 Aa Fe and 300 Aa NaF co-mixed and sandwiched between 1000 Aa layers of Lexan (C16H14O3), and discuss the application of this work to benchmarking astrophysical models.« less

  7. Search for resonances decaying to top and bottom quarks with the CDF experiment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aaltonen, Timo Antero

    2015-08-03

    We report on a search for charged massive resonances decaying to top (t) and bottom (b) quarks in the full data set of proton-antiproton collisions at a center-of-mass energy of √s = 1.96 TeV collected by the CDF II detector at the Tevatron, corresponding to an integrated luminosity of 9.5 fb –1. No significant excess above the standard model background prediction is observed. We set 95% Bayesian credibility mass-dependent upper limits on the heavy charged-particle production cross section times branching ratio to tb. Using a standard model extension with a W' → tb and left-right-symmetric couplings as a benchmark model,more » we constrain the W' mass and couplings in the 300–900 GeV/c 2 range. As a result, the limits presented here are the most stringent for a charged resonance with mass in the range 300–600 GeV/c 2 decaying to top and bottom quarks.« less

  8. Prediction of Body Fluids where Proteins are Secreted into Based on Protein Interaction Network

    PubMed Central

    Hu, Le-Le; Huang, Tao; Cai, Yu-Dong; Chou, Kuo-Chen

    2011-01-01

    Determining the body fluids where secreted proteins can be secreted into is important for protein function annotation and disease biomarker discovery. In this study, we developed a network-based method to predict which kind of body fluids human proteins can be secreted into. For a newly constructed benchmark dataset that consists of 529 human-secreted proteins, the prediction accuracy for the most possible body fluid location predicted by our method via the jackknife test was 79.02%, significantly higher than the success rate by a random guess (29.36%). The likelihood that the predicted body fluids of the first four orders contain all the true body fluids where the proteins can be secreted into is 62.94%. Our method was further demonstrated with two independent datasets: one contains 57 proteins that can be secreted into blood; while the other contains 61 proteins that can be secreted into plasma/serum and were possible biomarkers associated with various cancers. For the 57 proteins in first dataset, 55 were correctly predicted as blood-secrete proteins. For the 61 proteins in the second dataset, 58 were predicted to be most possible in plasma/serum. These encouraging results indicate that the network-based prediction method is quite promising. It is anticipated that the method will benefit the relevant areas for both basic research and drug development. PMID:21829572

  9. Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data

    PubMed Central

    2014-01-01

    Background The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. Results In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. Conclusions A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform. PMID:24708189

  10. Robust prediction of protein subcellular localization combining PCA and WSVMs.

    PubMed

    Tian, Jiang; Gu, Hong; Liu, Wenqi; Gao, Chiyang

    2011-08-01

    Automated prediction of protein subcellular localization is an important tool for genome annotation and drug discovery, and Support Vector Machines (SVMs) can effectively solve this problem in a supervised manner. However, the datasets obtained from real experiments are likely to contain outliers or noises, which can lead to poor generalization ability and classification accuracy. To explore this problem, we adopt strategies to lower the effect of outliers. First we design a method based on Weighted SVMs, different weights are assigned to different data points, so the training algorithm will learn the decision boundary according to the relative importance of the data points. Second we analyse the influence of Principal Component Analysis (PCA) on WSVM classification, propose a hybrid classifier combining merits of both PCA and WSVM. After performing dimension reduction operations on the datasets, kernel-based possibilistic c-means algorithm can generate more suitable weights for the training, as PCA transforms the data into a new coordinate system with largest variances affected greatly by the outliers. Experiments on benchmark datasets show promising results, which confirms the effectiveness of the proposed method in terms of prediction accuracy. Copyright © 2011 Elsevier Ltd. All rights reserved.

  11. Picking ChIP-seq peak detectors for analyzing chromatin modification experiments

    PubMed Central

    Micsinai, Mariann; Parisi, Fabio; Strino, Francesco; Asp, Patrik; Dynlacht, Brian D.; Kluger, Yuval

    2012-01-01

    Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development. PMID:22307239

  12. Picking ChIP-seq peak detectors for analyzing chromatin modification experiments.

    PubMed

    Micsinai, Mariann; Parisi, Fabio; Strino, Francesco; Asp, Patrik; Dynlacht, Brian D; Kluger, Yuval

    2012-05-01

    Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development.

  13. Comparing NetCDF and SciDB on managing and querying 5D hydrologic dataset

    NASA Astrophysics Data System (ADS)

    Liu, Haicheng; Xiao, Xiao

    2016-11-01

    Efficiently extracting information from high dimensional hydro-meteorological modelling datasets requires smart solutions. Traditional methods are mostly based on files, which can be edited and accessed handily. But they have problems of efficiency due to contiguous storage structure. Others propose databases as an alternative for advantages such as native functionalities for manipulating multidimensional (MD) arrays, smart caching strategy and scalability. In this research, NetCDF file based solutions and the multidimensional array database management system (DBMS) SciDB applying chunked storage structure are benchmarked to determine the best solution for storing and querying 5D large hydrologic modelling dataset. The effect of data storage configurations including chunk size, dimension order and compression on query performance is explored. Results indicate that dimension order to organize storage of 5D data has significant influence on query performance if chunk size is very large. But the effect becomes insignificant when chunk size is properly set. Compression of SciDB mostly has negative influence on query performance. Caching is an advantage but may be influenced by execution of different query processes. On the whole, NetCDF solution without compression is in general more efficient than the SciDB DBMS.

  14. Classification as clustering: a Pareto cooperative-competitive GP approach.

    PubMed

    McIntyre, Andrew R; Heywood, Malcolm I

    2011-01-01

    Intuitively population based algorithms such as genetic programming provide a natural environment for supporting solutions that learn to decompose the overall task between multiple individuals, or a team. This work presents a framework for evolving teams without recourse to prespecifying the number of cooperating individuals. To do so, each individual evolves a mapping to a distribution of outcomes that, following clustering, establishes the parameterization of a (Gaussian) local membership function. This gives individuals the opportunity to represent subsets of tasks, where the overall task is that of classification under the supervised learning domain. Thus, rather than each team member representing an entire class, individuals are free to identify unique subsets of the overall classification task. The framework is supported by techniques from evolutionary multiobjective optimization (EMO) and Pareto competitive coevolution. EMO establishes the basis for encouraging individuals to provide accurate yet nonoverlaping behaviors; whereas competitive coevolution provides the mechanism for scaling to potentially large unbalanced datasets. Benchmarking is performed against recent examples of nonlinear SVM classifiers over 12 UCI datasets with between 150 and 200,000 training instances. Solutions from the proposed coevolutionary multiobjective GP framework appear to provide a good balance between classification performance and model complexity, especially as the dataset instance count increases.

  15. Reconstruction of Complex Directional Networks with Group Lasso Nonlinear Conditional Granger Causality.

    PubMed

    Yang, Guanxue; Wang, Lin; Wang, Xiaofan

    2017-06-07

    Reconstruction of networks underlying complex systems is one of the most crucial problems in many areas of engineering and science. In this paper, rather than identifying parameters of complex systems governed by pre-defined models or taking some polynomial and rational functions as a prior information for subsequent model selection, we put forward a general framework for nonlinear causal network reconstruction from time-series with limited observations. With obtaining multi-source datasets based on the data-fusion strategy, we propose a novel method to handle nonlinearity and directionality of complex networked systems, namely group lasso nonlinear conditional granger causality. Specially, our method can exploit different sets of radial basis functions to approximate the nonlinear interactions between each pair of nodes and integrate sparsity into grouped variables selection. The performance characteristic of our approach is firstly assessed with two types of simulated datasets from nonlinear vector autoregressive model and nonlinear dynamic models, and then verified based on the benchmark datasets from DREAM3 Challenge4. Effects of data size and noise intensity are also discussed. All of the results demonstrate that the proposed method performs better in terms of higher area under precision-recall curve.

  16. Gene network-based analysis identifies two potential subtypes of small intestinal neuroendocrine tumors.

    PubMed

    Kidd, Mark; Modlin, Irvin M; Drozdov, Ignat

    2014-07-15

    Tumor transcriptomes contain information of critical value to understanding the different capacities of a cell at both a physiological and pathological level. In terms of clinical relevance, they provide information regarding the cellular "toolbox" e.g., pathways associated with malignancy and metastasis or drug dependency. Exploration of this resource can therefore be leveraged as a translational tool to better manage and assess neoplastic behavior. The availability of public genome-wide expression datasets, provide an opportunity to reassess neuroendocrine tumors at a more fundamental level. We hypothesized that stringent analysis of expression profiles as well as regulatory networks of the neoplastic cell would provide novel information that facilitates further delineation of the genomic basis of small intestinal neuroendocrine tumors. We re-analyzed two publically available small intestinal tumor transcriptomes using stringent quality control parameters and network-based approaches and validated expression of core secretory regulatory elements e.g., CPE, PCSK1, secretogranins, including genes involved in depolarization e.g., SCN3A, as well as transcription factors associated with neurodevelopment (NKX2-2, NeuroD1, INSM1) and glucose homeostasis (APLP1). The candidate metastasis-associated transcription factor, ST18, was highly expressed (>14-fold, p < 0.004). Genes previously associated with neoplasia, CEBPA and SDHD, were decreased in expression (-1.5 - -2, p < 0.02). Genomic interrogation indicated that intestinal tumors may consist of two different subtypes, serotonin-producing neoplasms and serotonin/substance P/tachykinin lesions. QPCR validation in an independent dataset (n = 13 neuroendocrine tumors), confirmed up-regulated expression of 87% of genes (13/15). An integrated cellular transcriptomic analysis of small intestinal neuroendocrine tumors identified that they are regulated at a developmental level, have key activation of hypoxic pathways (a known regulator of malignant stem cell phenotypes) as well as activation of genes involved in apoptosis and proliferation. Further refinement of these analyses by RNAseq studies of large-scale databases will enable definition of individual master regulators and facilitate the development of novel tissue and blood-based tools to better understand diagnose and treat tumors.

  17. Toward a standard for the evaluation of PET-Auto-Segmentation methods following the recommendations of AAPM task group No. 211: Requirements and implementation.

    PubMed

    Berthon, Beatrice; Spezi, Emiliano; Galavis, Paulina; Shepherd, Tony; Apte, Aditya; Hatt, Mathieu; Fayad, Hadi; De Bernardi, Elisabetta; Soffientini, Chiara D; Ross Schmidtlein, C; El Naqa, Issam; Jeraj, Robert; Lu, Wei; Das, Shiva; Zaidi, Habib; Mawlawi, Osama R; Visvikis, Dimitris; Lee, John A; Kirov, Assen S

    2017-08-01

    The aim of this paper is to define the requirements and describe the design and implementation of a standard benchmark tool for evaluation and validation of PET-auto-segmentation (PET-AS) algorithms. This work follows the recommendations of Task Group 211 (TG211) appointed by the American Association of Physicists in Medicine (AAPM). The recommendations published in the AAPM TG211 report were used to derive a set of required features and to guide the design and structure of a benchmarking software tool. These items included the selection of appropriate representative data and reference contours obtained from established approaches and the description of available metrics. The benchmark was designed in a way that it could be extendable by inclusion of bespoke segmentation methods, while maintaining its main purpose of being a standard testing platform for newly developed PET-AS methods. An example of implementation of the proposed framework, named PETASset, was built. In this work, a selection of PET-AS methods representing common approaches to PET image segmentation was evaluated within PETASset for the purpose of testing and demonstrating the capabilities of the software as a benchmark platform. A selection of clinical, physical, and simulated phantom data, including "best estimates" reference contours from macroscopic specimens, simulation template, and CT scans was built into the PETASset application database. Specific metrics such as Dice Similarity Coefficient (DSC), Positive Predictive Value (PPV), and Sensitivity (S), were included to allow the user to compare the results of any given PET-AS algorithm to the reference contours. In addition, a tool to generate structured reports on the evaluation of the performance of PET-AS algorithms against the reference contours was built. The variation of the metric agreement values with the reference contours across the PET-AS methods evaluated for demonstration were between 0.51 and 0.83, 0.44 and 0.86, and 0.61 and 1.00 for DSC, PPV, and the S metric, respectively. Examples of agreement limits were provided to show how the software could be used to evaluate a new algorithm against the existing state-of-the art. PETASset provides a platform that allows standardizing the evaluation and comparison of different PET-AS methods on a wide range of PET datasets. The developed platform will be available to users willing to evaluate their PET-AS methods and contribute with more evaluation datasets. © 2017 The Authors. Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.

  18. Inclusion of Highest Glasgow Coma Scale Motor Component Score in Mortality Risk Adjustment for Benchmarking of Trauma Center Performance.

    PubMed

    Gomez, David; Byrne, James P; Alali, Aziz S; Xiong, Wei; Hoeft, Chris; Neal, Melanie; Subacius, Harris; Nathens, Avery B

    2017-12-01

    The Glasgow Coma Scale (GCS) is the most widely used measure of traumatic brain injury (TBI) severity. Currently, the arrival GCS motor component (mGCS) score is used in risk-adjustment models for external benchmarking of mortality. However, there is evidence that the highest mGCS score in the first 24 hours after injury might be a better predictor of death. Our objective was to evaluate the impact of including the highest mGCS score on the performance of risk-adjustment models and subsequent external benchmarking results. Data were derived from the Trauma Quality Improvement Program analytic dataset (January 2014 through March 2015) and were limited to the severe TBI cohort (16 years or older, isolated head injury, GCS ≤8). Risk-adjustment models were created that varied in the mGCS covariates only (initial score, highest score, or both initial and highest mGCS scores). Model performance and fit, as well as external benchmarking results, were compared. There were 6,553 patients with severe TBI across 231 trauma centers included. Initial and highest mGCS scores were different in 47% of patients (n = 3,097). Model performance and fit improved when both initial and highest mGCS scores were included, as evidenced by improved C-statistic, Akaike Information Criterion, and adjusted R-squared values. Three-quarters of centers changed their adjusted odds ratio decile, 2.6% of centers changed outlier status, and 45% of centers exhibited a ≥0.5-SD change in the odds ratio of death after including highest mGCS score in the model. This study supports the concept that additional clinical information has the potential to not only improve the performance of current risk-adjustment models, but can also have a meaningful impact on external benchmarking strategies. Highest mGCS score is a good potential candidate for inclusion in additional models. Copyright © 2017 American College of Surgeons. Published by Elsevier Inc. All rights reserved.

  19. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition.

    PubMed

    Feng, Peng-Mian; Chen, Wei; Lin, Hao; Chou, Kuo-Chen

    2013-11-01

    Heat shock proteins (HSPs) are a type of functionally related proteins present in all living organisms, both prokaryotes and eukaryotes. They play essential roles in protein-protein interactions such as folding and assisting in the establishment of proper protein conformation and prevention of unwanted protein aggregation. Their dysfunction may cause various life-threatening disorders, such as Parkinson's, Alzheimer's, and cardiovascular diseases. Based on their functions, HSPs are usually classified into six families: (i) HSP20 or sHSP, (ii) HSP40 or J-class proteins, (iii) HSP60 or GroEL/ES, (iv) HSP70, (v) HSP90, and (vi) HSP100. Although considerable progress has been achieved in discriminating HSPs from other proteins, it is still a big challenge to identify HSPs among their six different functional types according to their sequence information alone. With the avalanche of protein sequences generated in the post-genomic age, it is highly desirable to develop a high-throughput computational tool in this regard. To take up such a challenge, a predictor called iHSP-PseRAAAC has been developed by incorporating the reduced amino acid alphabet information into the general form of pseudo amino acid composition. One of the remarkable advantages of introducing the reduced amino acid alphabet is being able to avoid the notorious dimension disaster or overfitting problem in statistical prediction. It was observed that the overall success rate achieved by iHSP-PseRAAAC in identifying the functional types of HSPs among the aforementioned six types was more than 87%, which was derived by the jackknife test on a stringent benchmark dataset in which none of HSPs included has ≥40% pairwise sequence identity to any other in the same subset. It has not escaped our notice that the reduced amino acid alphabet approach can also be used to investigate other protein classification problems. As a user-friendly web server, iHSP-PseRAAAC is accessible to the public at http://lin.uestc.edu.cn/server/iHSP-PseRAAAC. Copyright © 2013 Elsevier Inc. All rights reserved.

  20. iNR-PhysChem: A Sequence-Based Predictor for Identifying Nuclear Receptors and Their Subfamilies via Physical-Chemical Property Matrix

    PubMed Central

    Xiao, Xuan; Wang, Pu; Chou, Kuo-Chen

    2012-01-01

    Nuclear receptors (NRs) form a family of ligand-activated transcription factors that regulate a wide variety of biological processes, such as homeostasis, reproduction, development, and metabolism. Human genome contains 48 genes encoding NRs. These receptors have become one of the most important targets for therapeutic drug development. According to their different action mechanisms or functions, NRs have been classified into seven subfamilies. With the avalanche of protein sequences generated in the postgenomic age, we are facing the following challenging problems. Given an uncharacterized protein sequence, how can we identify whether it is a nuclear receptor? If it is, what subfamily it belongs to? To address these problems, we developed a predictor called iNR-PhysChem in which the protein samples were expressed by a novel mode of pseudo amino acid composition (PseAAC) whose components were derived from a physical-chemical matrix via a series of auto-covariance and cross-covariance transformations. It was observed that the overall success rate achieved by iNR-PhysChem was over 98% in identifying NRs or non-NRs, and over 92% in identifying NRs among the following seven subfamilies: NR1thyroid hormone like, NR2HNF4-like, NR3estrogen like, NR4nerve growth factor IB-like, NR5fushi tarazu-F1 like, NR6germ cell nuclear factor like, and NR0knirps like. These rates were derived by the jackknife tests on a stringent benchmark dataset in which none of protein sequences included has pairwise sequence identity to any other in a same subset. As a user-friendly web-server, iNR-PhysChem is freely accessible to the public at either http://www.jci-bioinfo.cn/iNR-PhysChem or http://icpr.jci.edu.cn/bioinfo/iNR-PhysChem. Also a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics involved in developing the predictor. It is anticipated that iNR-PhysChem may become a useful high throughput tool for both basic research and drug design. PMID:22363503

  1. Post-LHC7 fine-tuning in the minimal supergravity/CMSSM model with a 125 GeV Higgs boson

    NASA Astrophysics Data System (ADS)

    Baer, Howard; Barger, Vernon; Huang, Peisi; Mickelson, Dan; Mustafayev, Azar; Tata, Xerxes

    2013-02-01

    The recent discovery of a 125 GeV Higgs-like resonance at LHC, coupled with the lack of evidence for weak scale supersymmetry (SUSY), has severely constrained SUSY models such as minimal supergravity (mSUGRA)/CMSSM. As LHC probes deeper into SUSY model parameter space, the little hierarchy problem—how to reconcile the Z and Higgs boson mass scale with the scale of SUSY breaking—will become increasingly exacerbated unless a sparticle signal is found. We evaluate two different measures of fine-tuning in the mSUGRA/CMSSM model. The more stringent of these, ΔHS, includes effects that arise from the high-scale origin of the mSUGRA parameters while the second measure, ΔEW, is determined only by weak scale parameters: hence, it is universal to any model with the same particle spectrum and couplings. Our results incorporate the latest constraints from LHC7 sparticle searches, LHCb limits from Bs→μ+μ- and also require a light Higgs scalar with mh˜123-127GeV. We present fine-tuning contours in the m0 vs m1/2 plane for several sets of A0 and tan⁡β values. We also present results for ΔHS and ΔEW from a scan over the entire viable model parameter space. We find a ΔHS≳103, or at best 0.1%, fine-tuning. For the less stringent electroweak fine-tuning, we find ΔEW≳102, or at best 1%, fine-tuning. Two benchmark points are presented that have the lowest values of ΔHS and ΔEW. Our results provide a quantitative measure for ascertaining whether or not the remaining mSUGRA/CMSSM model parameter space is excessively fine-tuned and so could provide impetus for considering alternative SUSY models.

  2. BEST: Improved Prediction of B-Cell Epitopes from Antigen Sequences

    PubMed Central

    Gao, Jianzhao; Faraggi, Eshel; Zhou, Yaoqi; Ruan, Jishou; Kurgan, Lukasz

    2012-01-01

    Accurate identification of immunogenic regions in a given antigen chain is a difficult and actively pursued problem. Although accurate predictors for T-cell epitopes are already in place, the prediction of the B-cell epitopes requires further research. We overview the available approaches for the prediction of B-cell epitopes and propose a novel and accurate sequence-based solution. Our BEST (B-cell Epitope prediction using Support vector machine Tool) method predicts epitopes from antigen sequences, in contrast to some method that predict only from short sequence fragments, using a new architecture based on averaging selected scores generated from sliding 20-mers by a Support Vector Machine (SVM). The SVM predictor utilizes a comprehensive and custom designed set of inputs generated by combining information derived from the chain, sequence conservation, similarity to known (training) epitopes, and predicted secondary structure and relative solvent accessibility. Empirical evaluation on benchmark datasets demonstrates that BEST outperforms several modern sequence-based B-cell epitope predictors including ABCPred, method by Chen et al. (2007), BCPred, COBEpro, BayesB, and CBTOPE, when considering the predictions from antigen chains and from the chain fragments. Our method obtains a cross-validated area under the receiver operating characteristic curve (AUC) for the fragment-based prediction at 0.81 and 0.85, depending on the dataset. The AUCs of BEST on the benchmark sets of full antigen chains equal 0.57 and 0.6, which is significantly and slightly better than the next best method we tested. We also present case studies to contrast the propensity profiles generated by BEST and several other methods. PMID:22761950

  3. Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles.

    PubMed

    Zou, Lingyun; Nan, Chonghan; Hu, Fuquan

    2013-12-15

    Various human pathogens secret effector proteins into hosts cells via the type IV secretion system (T4SS). These proteins play important roles in the interaction between bacteria and hosts. Computational methods for T4SS effector prediction have been developed for screening experimental targets in several isolated bacterial species; however, widely applicable prediction approaches are still unavailable In this work, four types of distinctive features, namely, amino acid composition, dipeptide composition, .position-specific scoring matrix composition and auto covariance transformation of position-specific scoring matrix, were calculated from primary sequences. A classifier, T4EffPred, was developed using the support vector machine with these features and their different combinations for effector prediction. Various theoretical tests were performed in a newly established dataset, and the results were measured with four indexes. We demonstrated that T4EffPred can discriminate IVA and IVB effectors in benchmark datasets with positive rates of 76.7% and 89.7%, respectively. The overall accuracy of 95.9% shows that the present method is accurate for distinguishing the T4SS effector in unidentified sequences. A classifier ensemble was designed to synthesize all single classifiers. Notable performance improvement was observed using this ensemble system in benchmark tests. To demonstrate the model's application, a genome-scale prediction of effectors was performed in Bartonella henselae, an important zoonotic pathogen. A number of putative candidates were distinguished. A web server implementing the prediction method and the source code are both available at http://bioinfo.tmmu.edu.cn/T4EffPred.

  4. Modeling spatial-temporal dynamics of global wetlands: Comprehensive evaluation of a new sub-grid TOPMODEL parameterization and uncertainties

    NASA Astrophysics Data System (ADS)

    Zhang, Z.; Zimmermann, N. E.; Poulter, B.

    2015-12-01

    Simulations of the spatial-temporal dynamics of wetlands is key to understanding the role of wetland biogeochemistry under past and future climate variability. Hydrologic inundation models, such as TOPMODEL, are based on a fundamental parameter known as the compound topographic index (CTI) and provide a computationally cost-efficient approach to simulate global wetland dynamics. However, there remains large discrepancy in the implementations of TOPMODEL in land-surface models (LSMs) and thus their performance against observations. This study describes new improvements to TOPMODEL implementation and estimates of global wetland dynamics using the LPJ-wsl DGVM, and quantifies uncertainties by comparing three digital elevation model products (HYDRO1k, GMTED, and HydroSHEDS) at different spatial resolution and accuracy on simulated inundation dynamics. We found that calibrating TOPMODEL with a benchmark dataset can help to successfully predict the seasonal and interannual variations of wetlands, as well as improve the spatial distribution of wetlands to be consistent with inventories. The HydroSHEDS DEM, using a river-basin scheme for aggregating the CTI, shows best accuracy for capturing the spatio-temporal dynamics of wetland among three DEM products. This study demonstrates the feasibility to capture spatial heterogeneity of inundation and to estimate seasonal and interannual variations in wetland by coupling a hydrological module in LSMs with appropriate benchmark datasets. It additionally highlight the importance of an adequate understanding of topographic indices for simulating global wetlands and show the opportunity to converge wetland estimations in LSMs by identifying the uncertainty associated with existing wetland products.

  5. Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery.

    PubMed

    Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I

    2017-01-01

    A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.

  6. Functional evaluation of out-of-the-box text-mining tools for data-mining tasks.

    PubMed

    Jung, Kenneth; LePendu, Paea; Iyer, Srinivasan; Bauer-Mehren, Anna; Percha, Bethany; Shah, Nigam H

    2015-01-01

    The trade-off between the speed and simplicity of dictionary-based term recognition and the richer linguistic information provided by more advanced natural language processing (NLP) is an area of active discussion in clinical informatics. In this paper, we quantify this trade-off among text processing systems that make different trade-offs between speed and linguistic understanding. We tested both types of systems in three clinical research tasks: phase IV safety profiling of a drug, learning adverse drug-drug interactions, and learning used-to-treat relationships between drugs and indications. We first benchmarked the accuracy of the NCBO Annotator and REVEAL in a manually annotated, publically available dataset from the 2008 i2b2 Obesity Challenge. We then applied the NCBO Annotator and REVEAL to 9 million clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE) and used the resulting data for three research tasks. There is no significant difference between using the NCBO Annotator and REVEAL in the results of the three research tasks when using large datasets. In one subtask, REVEAL achieved higher sensitivity with smaller datasets. For a variety of tasks, employing simple term recognition methods instead of advanced NLP methods results in little or no impact on accuracy when using large datasets. Simpler dictionary-based methods have the advantage of scaling well to very large datasets. Promoting the use of simple, dictionary-based methods for population level analyses can advance adoption of NLP in practice. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  7. Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification.

    PubMed

    Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo

    2016-01-01

    Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.

  8. Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification

    PubMed Central

    Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo

    2016-01-01

    Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases. PMID:26764911

  9. Harnessing Diversity towards the Reconstructing of Large Scale Gene Regulatory Networks

    PubMed Central

    Yamanaka, Ryota; Kitano, Hiroaki

    2013-01-01

    Elucidating gene regulatory network (GRN) from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i) a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii) TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks. PMID:24278007

  10. Adsorption structures and energetics of molecules on metal surfaces: Bridging experiment and theory

    NASA Astrophysics Data System (ADS)

    Maurer, Reinhard J.; Ruiz, Victor G.; Camarillo-Cisneros, Javier; Liu, Wei; Ferri, Nicola; Reuter, Karsten; Tkatchenko, Alexandre

    2016-05-01

    Adsorption geometry and stability of organic molecules on surfaces are key parameters that determine the observable properties and functions of hybrid inorganic/organic systems (HIOSs). Despite many recent advances in precise experimental characterization and improvements in first-principles electronic structure methods, reliable databases of structures and energetics for large adsorbed molecules are largely amiss. In this review, we present such a database for a range of molecules adsorbed on metal single-crystal surfaces. The systems we analyze include noble-gas atoms, conjugated aromatic molecules, carbon nanostructures, and heteroaromatic compounds adsorbed on five different metal surfaces. The overall objective is to establish a diverse benchmark dataset that enables an assessment of current and future electronic structure methods, and motivates further experimental studies that provide ever more reliable data. Specifically, the benchmark structures and energetics from experiment are here compared with the recently developed van der Waals (vdW) inclusive density-functional theory (DFT) method, DFT + vdWsurf. In comparison to 23 adsorption heights and 17 adsorption energies from experiment we find a mean average deviation of 0.06 Å and 0.16 eV, respectively. This confirms the DFT + vdWsurf method as an accurate and efficient approach to treat HIOSs. A detailed discussion identifies remaining challenges to be addressed in future development of electronic structure methods, for which the here presented benchmark database may serve as an important reference.

  11. Stringency and relaxation among the halobacteria.

    PubMed Central

    Cimmino, C; Scoarughi, G L; Donini, P

    1993-01-01

    Accumulation of stable RNA and production of guanosine polyphosphates (ppGpp and pppGpp) were studied during amino acid starvation in four species of halobacteria. In two of the four species, stable RNA was under stringent control, whereas one of the remaining two species was relaxed and the other gave an intermediate phenotype. The stringent reaction was reversed by anisomycin, an effect analogous to the chloroamphenicol-induced reversal of stringency in the eubacteria. During the stringent response, neither ppGpp nor pppGpp accumulation took place during starvation. In both growing and starved cells a very low basal level of the two polyphosphates appeared to be present. In the stringent species the intracellular concentration of GTP did not diminish but actually increased during the course of the stringent response. These data demonstrate that (i) wild-type halobacteria can have either the stringent or the relaxed phenotype (all wild-type eubacteria tested have been shown to be stringent); (ii) stringency in the halobacteria is dependent on the deaminoacylation of tRNA, as in the eubacteria; and (iii) in the halobacteria, ppGpp is not an effector of stringent control over stable-RNA synthesis. Images PMID:7691798

  12. Photosynthetic productivity and its efficiencies in ISIMIP2a biome models: benchmarking for impact assessment studies

    NASA Astrophysics Data System (ADS)

    Ito, Akihiko; Nishina, Kazuya; Reyer, Christopher P. O.; François, Louis; Henrot, Alexandra-Jane; Munhoven, Guy; Jacquemin, Ingrid; Tian, Hanqin; Yang, Jia; Pan, Shufen; Morfopoulos, Catherine; Betts, Richard; Hickler, Thomas; Steinkamp, Jörg; Ostberg, Sebastian; Schaphoff, Sibyll; Ciais, Philippe; Chang, Jinfeng; Rafique, Rashid; Zeng, Ning; Zhao, Fang

    2017-08-01

    Simulating vegetation photosynthetic productivity (or gross primary production, GPP) is a critical feature of the biome models used for impact assessments of climate change. We conducted a benchmarking of global GPP simulated by eight biome models participating in the second phase of the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP2a) with four meteorological forcing datasets (30 simulations), using independent GPP estimates and recent satellite data of solar-induced chlorophyll fluorescence as a proxy of GPP. The simulated global terrestrial GPP ranged from 98 to 141 Pg C yr-1 (1981-2000 mean); considerable inter-model and inter-data differences were found. Major features of spatial distribution and seasonal change of GPP were captured by each model, showing good agreement with the benchmarking data. All simulations showed incremental trends of annual GPP, seasonal-cycle amplitude, radiation-use efficiency, and water-use efficiency, mainly caused by the CO2 fertilization effect. The incremental slopes were higher than those obtained by remote sensing studies, but comparable with those by recent atmospheric observation. Apparent differences were found in the relationship between GPP and incoming solar radiation, for which forcing data differed considerably. The simulated GPP trends co-varied with a vegetation structural parameter, leaf area index, at model-dependent strengths, implying the importance of constraining canopy properties. In terms of extreme events, GPP anomalies associated with a historical El Niño event and large volcanic eruption were not consistently simulated in the model experiments due to deficiencies in both forcing data and parameterized environmental responsiveness. Although the benchmarking demonstrated the overall advancement of contemporary biome models, further refinements are required, for example, for solar radiation data and vegetation canopy schemes.

  13. The Heated Halo for Space-Based Blackbody Emissivity Measurement

    NASA Astrophysics Data System (ADS)

    Gero, P.; Taylor, J. K.; Best, F. A.; Revercomb, H. E.; Garcia, R. K.; Adler, D. P.; Ciganovich, N. N.; Knuteson, R. O.; Tobin, D. C.

    2012-12-01

    The accuracy of radiance measurements with space-based infrared spectrometers is contingent on the quality of the calibration subsystem, as well as knowledge of its uncertainty. Upcoming climate benchmark missions call for measurement uncertainties better than 0.1 K (k=3) in radiance temperature for the detection of spectral climate signatures. Blackbody cavities impart the most accurate calibration for spaceborne infrared sensors, provided that their temperature and emissivity is traceably determined on-orbit. The On-Orbit Absolute Radiance Standard (OARS) has been developed at the University of Wisconsin and has undergone further refinement under the NASA Instrument Incubator Program (IIP) to meet the stringent requirements of the next generation of infrared remote sensing instruments. It provides on-orbit determination of both traceable temperature and emissivity for calibration blackbodies. The Heated Halo is the component of the OARS that provides a robust and compact method to measure the spectral emissivity of a blackbody in situ. A carefully baffled thermal source is placed in front of a blackbody in an infrared spectrometer system, and the combined radiance of the blackbody and Heated Halo reflection is observed. Knowledge of key temperatures and the viewing geometry allow the blackbody cavity spectral emissivity to be calculated. We present the results from the Heated Halo methodology implemented with a new Absolute Radiance Interferometer (ARI), which is a prototype space-based infrared spectrometer designed for climate benchmarking. We show the evolution of the technical readiness level of this technology and we compare our findings to models and other experimental methods of emissivity determination.

  14. Examining the predictive accuracy of the novel 3D N-linear algebraic molecular codifications on benchmark datasets.

    PubMed

    García-Jacas, César R; Contreras-Torres, Ernesto; Marrero-Ponce, Yovani; Pupo-Meriño, Mario; Barigye, Stephen J; Cabrera-Leyva, Lisset

    2016-01-01

    Recently, novel 3D alignment-free molecular descriptors (also known as QuBiLS-MIDAS) based on two-linear, three-linear and four-linear algebraic forms have been introduced. These descriptors codify chemical information for relations between two, three and four atoms by using several (dis-)similarity metrics and multi-metrics. Several studies aimed at assessing the quality of these novel descriptors have been performed. However, a deeper analysis of their performance is necessary. Therefore, in the present manuscript an assessment and statistical validation of the performance of these novel descriptors in QSAR studies is performed. To this end, eight molecular datasets (angiotensin converting enzyme, acetylcholinesterase inhibitors, benzodiazepine receptor, cyclooxygenase-2 inhibitors, dihydrofolate reductase inhibitors, glycogen phosphorylase b, thermolysin inhibitors, thrombin inhibitors) widely used as benchmarks in the evaluation of several procedures are utilized. Three to nine variable QSAR models based on Multiple Linear Regression are built for each chemical dataset according to the original division into training/test sets. Comparisons with respect to leave-one-out cross-validation correlation coefficients[Formula: see text] reveal that the models based on QuBiLS-MIDAS indices possess superior predictive ability in 7 of the 8 datasets analyzed, outperforming methodologies based on similar or more complex techniques such as: Partial Least Square, Neural Networks, Support Vector Machine and others. On the other hand, superior external correlation coefficients[Formula: see text] are attained in 6 of the 8 test sets considered, confirming the good predictive power of the obtained models. For the [Formula: see text] values non-parametric statistic tests were performed, which demonstrated that the models based on QuBiLS-MIDAS indices have the best global performance and yield significantly better predictions in 11 of the 12 QSAR procedures used in the comparison. Lastly, a study concerning to the performance of the indices according to several conformer generation methods was performed. This demonstrated that the quality of predictions of the QSAR models based on QuBiLS-MIDAS indices depend on 3D structure generation method considered, although in this preliminary study the results achieved do not present significant statistical differences among them. As conclusions it can be stated that the QuBiLS-MIDAS indices are suitable for extracting structural information of the molecules and thus, constitute a promissory alternative to build models that contribute to the prediction of pharmacokinetic, pharmacodynamics and toxicological properties on novel compounds.Graphical abstractComparative graphical representation of the performance of the novel QuBiLS-MIDAS 3D-MDs with respect to other methodologies in QSAR modeling of eight chemical datasets.

  15. Extreme learning machine for ranking: generalization analysis and applications.

    PubMed

    Chen, Hong; Peng, Jiangtao; Zhou, Yicong; Li, Luoqing; Pan, Zhibin

    2014-05-01

    The extreme learning machine (ELM) has attracted increasing attention recently with its successful applications in classification and regression. In this paper, we investigate the generalization performance of ELM-based ranking. A new regularized ranking algorithm is proposed based on the combinations of activation functions in ELM. The generalization analysis is established for the ELM-based ranking (ELMRank) in terms of the covering numbers of hypothesis space. Empirical results on the benchmark datasets show the competitive performance of the ELMRank over the state-of-the-art ranking methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Electron collisions with atoms, ions, molecules, and surfaces: Fundamental science empowering advances in technology

    PubMed Central

    Bartschat, Klaus; Kushner, Mark J.

    2016-01-01

    Electron collisions with atoms, ions, molecules, and surfaces are critically important to the understanding and modeling of low-temperature plasmas (LTPs), and so in the development of technologies based on LTPs. Recent progress in obtaining experimental benchmark data and the development of highly sophisticated computational methods is highlighted. With the cesium-based diode-pumped alkali laser and remote plasma etching of Si3N4 as examples, we demonstrate how accurate and comprehensive datasets for electron collisions enable complex modeling of plasma-using technologies that empower our high-technology–based society. PMID:27317740

  17. PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection

    PubMed Central

    Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning

    2014-01-01

    X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys. PMID:25148528

  18. Event recognition in personal photo collections via multiple instance learning-based classification of multiple images

    NASA Astrophysics Data System (ADS)

    Ahmad, Kashif; Conci, Nicola; Boato, Giulia; De Natale, Francesco G. B.

    2017-11-01

    Over the last few years, a rapid growth has been witnessed in the number of digital photos produced per year. This rapid process poses challenges in the organization and management of multimedia collections, and one viable solution consists of arranging the media on the basis of the underlying events. However, album-level annotation and the presence of irrelevant pictures in photo collections make event-based organization of personal photo albums a more challenging task. To tackle these challenges, in contrast to conventional approaches relying on supervised learning, we propose a pipeline for event recognition in personal photo collections relying on a multiple instance-learning (MIL) strategy. MIL is a modified form of supervised learning and fits well for such applications with weakly labeled data. The experimental evaluation of the proposed approach is carried out on two large-scale datasets including a self-collected and a benchmark dataset. On both, our approach significantly outperforms the existing state-of-the-art.

  19. Robust QRS peak detection by multimodal information fusion of ECG and blood pressure signals.

    PubMed

    Ding, Quan; Bai, Yong; Erol, Yusuf Bugra; Salas-Boni, Rebeca; Zhang, Xiaorong; Hu, Xiao

    2016-11-01

    QRS peak detection is a challenging problem when ECG signal is corrupted. However, additional physiological signals may also provide information about the QRS position. In this study, we focus on a unique benchmark provided by PhysioNet/Computing in Cardiology Challenge 2014 and Physiological Measurement focus issue: robust detection of heart beats in multimodal data, which aimed to explore robust methods for QRS detection in multimodal physiological signals. A dataset of 200 training and 210 testing records are used, where the testing records are hidden for evaluating the performance only. An information fusion framework for robust QRS detection is proposed by leveraging existing ECG and ABP analysis tools and combining heart beats derived from different sources. Results show that our approach achieves an overall accuracy of 90.94% and 88.66% on the training and testing datasets, respectively. Furthermore, we observe expected performance at each step of the proposed approach, as an evidence of the effectiveness of our approach. Discussion on the limitations of our approach is also provided.

  20. Effect of the time window on the heat-conduction information filtering model

    NASA Astrophysics Data System (ADS)

    Guo, Qiang; Song, Wen-Jun; Hou, Lei; Zhang, Yi-Lu; Liu, Jian-Guo

    2014-05-01

    Recommendation systems have been proposed to filter out the potential tastes and preferences of the normal users online, however, the physics of the time window effect on the performance is missing, which is critical for saving the memory and decreasing the computation complexity. In this paper, by gradually expanding the time window, we investigate the impact of the time window on the heat-conduction information filtering model with ten similarity measures. The experimental results on the benchmark dataset Netflix indicate that by only using approximately 11.11% recent rating records, the accuracy could be improved by an average of 33.16% and the diversity could be improved by 30.62%. In addition, the recommendation performance on the dataset MovieLens could be preserved by only considering approximately 10.91% recent records. Under the circumstance of improving the recommendation performance, our discoveries possess significant practical value by largely reducing the computational time and shortening the data storage space.

  1. Improved hybrid information filtering based on limited time window

    NASA Astrophysics Data System (ADS)

    Song, Wen-Jun; Guo, Qiang; Liu, Jian-Guo

    2014-12-01

    Adopting the entire collecting information of users, the hybrid information filtering of heat conduction and mass diffusion (HHM) (Zhou et al., 2010) was successfully proposed to solve the apparent diversity-accuracy dilemma. Since the recent behaviors are more effective to capture the users' potential interests, we present an improved hybrid information filtering of adopting the partial recent information. We expand the time window to generate a series of training sets, each of which is treated as known information to predict the future links proven by the testing set. The experimental results on one benchmark dataset Netflix indicate that by only using approximately 31% recent rating records, the accuracy could be improved by an average of 4.22% and the diversity could be improved by 13.74%. In addition, the performance on the dataset MovieLens could be preserved by considering approximately 60% recent records. Furthermore, we find that the improved algorithm is effective to solve the cold-start problem. This work could improve the information filtering performance and shorten the computational time.

  2. Processing large remote sensing image data sets on Beowulf clusters

    USGS Publications Warehouse

    Steinwand, Daniel R.; Maddox, Brian; Beckmann, Tim; Schmidt, Gail

    2003-01-01

    High-performance computing is often concerned with the speed at which floating- point calculations can be performed. The architectures of many parallel computers and/or their network topologies are based on these investigations. Often, benchmarks resulting from these investigations are compiled with little regard to how a large dataset would move about in these systems. This part of the Beowulf study addresses that concern by looking at specific applications software and system-level modifications. Applications include an implementation of a smoothing filter for time-series data, a parallel implementation of the decision tree algorithm used in the Landcover Characterization project, a parallel Kriging algorithm used to fit point data collected in the field on invasive species to a regular grid, and modifications to the Beowulf project's resampling algorithm to handle larger, higher resolution datasets at a national scale. Systems-level investigations include a feasibility study on Flat Neighborhood Networks and modifications of that concept with Parallel File Systems.

  3. On representation of temporal variability in electricity capacity planning models

    DOE PAGES

    Merrick, James H.

    2016-08-23

    This study systematically investigates how to represent intra-annual temporal variability in models of optimum electricity capacity investment. Inappropriate aggregation of temporal resolution can introduce substantial error into model outputs and associated economic insight. The mechanisms underlying the introduction of this error are shown. How many representative periods are needed to fully capture the variability is then investigated. For a sample dataset, a scenario-robust aggregation of hourly (8760) resolution is possible in the order of 10 representative hours when electricity demand is the only source of variability. The inclusion of wind and solar supply variability increases the resolution of the robustmore » aggregation to the order of 1000. A similar scale of expansion is shown for representative days and weeks. These concepts can be applied to any such temporal dataset, providing, at the least, a benchmark that any other aggregation method can aim to emulate. Finally, how prior information about peak pricing hours can potentially reduce resolution further is also discussed.« less

  4. On representation of temporal variability in electricity capacity planning models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Merrick, James H.

    This study systematically investigates how to represent intra-annual temporal variability in models of optimum electricity capacity investment. Inappropriate aggregation of temporal resolution can introduce substantial error into model outputs and associated economic insight. The mechanisms underlying the introduction of this error are shown. How many representative periods are needed to fully capture the variability is then investigated. For a sample dataset, a scenario-robust aggregation of hourly (8760) resolution is possible in the order of 10 representative hours when electricity demand is the only source of variability. The inclusion of wind and solar supply variability increases the resolution of the robustmore » aggregation to the order of 1000. A similar scale of expansion is shown for representative days and weeks. These concepts can be applied to any such temporal dataset, providing, at the least, a benchmark that any other aggregation method can aim to emulate. Finally, how prior information about peak pricing hours can potentially reduce resolution further is also discussed.« less

  5. A multi-center study benchmarks software tools for label-free proteome quantification

    PubMed Central

    Gillet, Ludovic C; Bernhardt, Oliver M.; MacLean, Brendan; Röst, Hannes L.; Tate, Stephen A.; Tsou, Chih-Chiang; Reiter, Lukas; Distler, Ute; Rosenberger, George; Perez-Riverol, Yasset; Nesvizhskii, Alexey I.; Aebersold, Ruedi; Tenzer, Stefan

    2016-01-01

    The consistent and accurate quantification of proteins by mass spectrometry (MS)-based proteomics depends on the performance of instruments, acquisition methods and data analysis software. In collaboration with the software developers, we evaluated OpenSWATH, SWATH2.0, Skyline, Spectronaut and DIA-Umpire, five of the most widely used software methods for processing data from SWATH-MS (sequential window acquisition of all theoretical fragment ion spectra), a method that uses data-independent acquisition (DIA) for label-free protein quantification. We analyzed high-complexity test datasets from hybrid proteome samples of defined quantitative composition acquired on two different MS instruments using different SWATH isolation windows setups. For consistent evaluation we developed LFQbench, an R-package to calculate metrics of precision and accuracy in label-free quantitative MS, and report the identification performance, robustness and specificity of each software tool. Our reference datasets enabled developers to improve their software tools. After optimization, all tools provided highly convergent identification and reliable quantification performance, underscoring their robustness for label-free quantitative proteomics. PMID:27701404

  6. Multiclass Posterior Probability Twin SVM for Motor Imagery EEG Classification.

    PubMed

    She, Qingshan; Ma, Yuliang; Meng, Ming; Luo, Zhizeng

    2015-01-01

    Motor imagery electroencephalography is widely used in the brain-computer interface systems. Due to inherent characteristics of electroencephalography signals, accurate and real-time multiclass classification is always challenging. In order to solve this problem, a multiclass posterior probability solution for twin SVM is proposed by the ranking continuous output and pairwise coupling in this paper. First, two-class posterior probability model is constructed to approximate the posterior probability by the ranking continuous output techniques and Platt's estimating method. Secondly, a solution of multiclass probabilistic outputs for twin SVM is provided by combining every pair of class probabilities according to the method of pairwise coupling. Finally, the proposed method is compared with multiclass SVM and twin SVM via voting, and multiclass posterior probability SVM using different coupling approaches. The efficacy on the classification accuracy and time complexity of the proposed method has been demonstrated by both the UCI benchmark datasets and real world EEG data from BCI Competition IV Dataset 2a, respectively.

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pucci, Fabrizio, E-mail: fapucci@ulb.ac.be; Bourgeas, Raphaël, E-mail: rbourgeas@ulb.ac.be; Rooman, Marianne, E-mail: mrooman@ulb.ac.be

    We have set up and manually curated a dataset containing experimental information on the impact of amino acid substitutions in a protein on its thermal stability. It consists of a repository of experimentally measured melting temperatures (T{sub m}) and their changes upon point mutations (ΔT{sub m}) for proteins having a well-resolved x-ray structure. This high-quality dataset is designed for being used for the training or benchmarking of in silico thermal stability prediction methods. It also reports other experimentally measured thermodynamic quantities when available, i.e., the folding enthalpy (ΔH) and heat capacity (ΔC{sub P}) of the wild type proteins and theirmore » changes upon mutations (ΔΔH and ΔΔC{sub P}), as well as the change in folding free energy (ΔΔG) at a reference temperature. These data are analyzed in view of improving our insights into the correlation between thermal and thermodynamic stabilities, the asymmetry between the number of stabilizing and destabilizing mutations, and the difference in stabilization potential of thermostable versus mesostable proteins.« less

  8. A Multifeatures Fusion and Discrete Firefly Optimization Method for Prediction of Protein Tyrosine Sulfation Residues.

    PubMed

    Guo, Song; Liu, Chunhua; Zhou, Peng; Li, Yanling

    2016-01-01

    Tyrosine sulfation is one of the ubiquitous protein posttranslational modifications, where some sulfate groups are added to the tyrosine residues. It plays significant roles in various physiological processes in eukaryotic cells. To explore the molecular mechanism of tyrosine sulfation, one of the prerequisites is to correctly identify possible protein tyrosine sulfation residues. In this paper, a novel method was presented to predict protein tyrosine sulfation residues from primary sequences. By means of informative feature construction and elaborate feature selection and parameter optimization scheme, the proposed predictor achieved promising results and outperformed many other state-of-the-art predictors. Using the optimal features subset, the proposed method achieved mean MCC of 94.41% on the benchmark dataset, and a MCC of 90.09% on the independent dataset. The experimental performance indicated that our new proposed method could be effective in identifying the important protein posttranslational modifications and the feature selection scheme would be powerful in protein functional residues prediction research fields.

  9. A Multifeatures Fusion and Discrete Firefly Optimization Method for Prediction of Protein Tyrosine Sulfation Residues

    PubMed Central

    Liu, Chunhua; Zhou, Peng; Li, Yanling

    2016-01-01

    Tyrosine sulfation is one of the ubiquitous protein posttranslational modifications, where some sulfate groups are added to the tyrosine residues. It plays significant roles in various physiological processes in eukaryotic cells. To explore the molecular mechanism of tyrosine sulfation, one of the prerequisites is to correctly identify possible protein tyrosine sulfation residues. In this paper, a novel method was presented to predict protein tyrosine sulfation residues from primary sequences. By means of informative feature construction and elaborate feature selection and parameter optimization scheme, the proposed predictor achieved promising results and outperformed many other state-of-the-art predictors. Using the optimal features subset, the proposed method achieved mean MCC of 94.41% on the benchmark dataset, and a MCC of 90.09% on the independent dataset. The experimental performance indicated that our new proposed method could be effective in identifying the important protein posttranslational modifications and the feature selection scheme would be powerful in protein functional residues prediction research fields. PMID:27034949

  10. Stylized facts in social networks: Community-based static modeling

    NASA Astrophysics Data System (ADS)

    Jo, Hang-Hyun; Murase, Yohsuke; Török, János; Kertész, János; Kaski, Kimmo

    2018-06-01

    The past analyses of datasets of social networks have enabled us to make empirical findings of a number of aspects of human society, which are commonly featured as stylized facts of social networks, such as broad distributions of network quantities, existence of communities, assortative mixing, and intensity-topology correlations. Since the understanding of the structure of these complex social networks is far from complete, for deeper insight into human society more comprehensive datasets and modeling of the stylized facts are needed. Although the existing dynamical and static models can generate some stylized facts, here we take an alternative approach by devising a community-based static model with heterogeneous community sizes and larger communities having smaller link density and weight. With these few assumptions we are able to generate realistic social networks that show most stylized facts for a wide range of parameters, as demonstrated numerically and analytically. Since our community-based static model is simple to implement and easily scalable, it can be used as a reference system, benchmark, or testbed for further applications.

  11. SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures.

    PubMed

    Suresh, V; Parthasarathy, S

    2014-01-01

    We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.

  12. Systematic evaluation of deep learning based detection frameworks for aerial imagery

    NASA Astrophysics Data System (ADS)

    Sommer, Lars; Steinmann, Lucas; Schumann, Arne; Beyerer, Jürgen

    2018-04-01

    Object detection in aerial imagery is crucial for many applications in the civil and military domain. In recent years, deep learning based object detection frameworks significantly outperformed conventional approaches based on hand-crafted features on several datasets. However, these detection frameworks are generally designed and optimized for common benchmark datasets, which considerably differ from aerial imagery especially in object sizes. As already demonstrated for Faster R-CNN, several adaptations are necessary to account for these differences. In this work, we adapt several state-of-the-art detection frameworks including Faster R-CNN, R-FCN, and Single Shot MultiBox Detector (SSD) to aerial imagery. We discuss adaptations that mainly improve the detection accuracy of all frameworks in detail. As the output of deeper convolutional layers comprise more semantic information, these layers are generally used in detection frameworks as feature map to locate and classify objects. However, the resolution of these feature maps is insufficient for handling small object instances, which results in an inaccurate localization or incorrect classification of small objects. Furthermore, state-of-the-art detection frameworks perform bounding box regression to predict the exact object location. Therefore, so called anchor or default boxes are used as reference. We demonstrate how an appropriate choice of anchor box sizes can considerably improve detection performance. Furthermore, we evaluate the impact of the performed adaptations on two publicly available datasets to account for various ground sampling distances or differing backgrounds. The presented adaptations can be used as guideline for further datasets or detection frameworks.

  13. Comprehensive decision tree models in bioinformatics.

    PubMed

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.

  14. Comprehensive Decision Tree Models in Bioinformatics

    PubMed Central

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics. PMID:22479449

  15. Area-to-point regression kriging for pan-sharpening

    NASA Astrophysics Data System (ADS)

    Wang, Qunming; Shi, Wenzhong; Atkinson, Peter M.

    2016-04-01

    Pan-sharpening is a technique to combine the fine spatial resolution panchromatic (PAN) band with the coarse spatial resolution multispectral bands of the same satellite to create a fine spatial resolution multispectral image. In this paper, area-to-point regression kriging (ATPRK) is proposed for pan-sharpening. ATPRK considers the PAN band as the covariate. Moreover, ATPRK is extended with a local approach, called adaptive ATPRK (AATPRK), which fits a regression model using a local, non-stationary scheme such that the regression coefficients change across the image. The two geostatistical approaches, ATPRK and AATPRK, were compared to the 13 state-of-the-art pan-sharpening approaches summarized in Vivone et al. (2015) in experiments on three separate datasets. ATPRK and AATPRK produced more accurate pan-sharpened images than the 13 benchmark algorithms in all three experiments. Unlike the benchmark algorithms, the two geostatistical solutions precisely preserved the spectral properties of the original coarse data. Furthermore, ATPRK can be enhanced by a local scheme in AATRPK, in cases where the residuals from a global regression model are such that their spatial character varies locally.

  16. Scalable and cost-effective NGS genotyping in the cloud.

    PubMed

    Souilmi, Yassine; Lancaster, Alex K; Jung, Jae-Yoon; Rizzo, Ettore; Hawkins, Jared B; Powles, Ryan; Amzazi, Saaïd; Ghazal, Hassan; Tonellato, Peter J; Wall, Dennis P

    2015-10-15

    While next-generation sequencing (NGS) costs have plummeted in recent years, cost and complexity of computation remain substantial barriers to the use of NGS in routine clinical care. The clinical potential of NGS will not be realized until robust and routine whole genome sequencing data can be accurately rendered to medically actionable reports within a time window of hours and at scales of economy in the 10's of dollars. We take a step towards addressing this challenge, by using COSMOS, a cloud-enabled workflow management system, to develop GenomeKey, an NGS whole genome analysis workflow. COSMOS implements complex workflows making optimal use of high-performance compute clusters. Here we show that the Amazon Web Service (AWS) implementation of GenomeKey via COSMOS provides a fast, scalable, and cost-effective analysis of both public benchmarking and large-scale heterogeneous clinical NGS datasets. Our systematic benchmarking reveals important new insights and considerations to produce clinical turn-around of whole genome analysis optimization and workflow management including strategic batching of individual genomes and efficient cluster resource configuration.

  17. ePCR: an R-package for survival and time-to-event prediction in advanced prostate cancer, applied to real-world patient cohorts.

    PubMed

    Laajala, Teemu D; Murtojärvi, Mika; Virkki, Arho; Aittokallio, Tero

    2018-06-15

    Prognostic models are widely used in clinical decision-making, such as risk stratification and tailoring treatment strategies, with the aim to improve patient outcomes while reducing overall healthcare costs. While prognostic models have been adopted into clinical use, benchmarking their performance has been difficult due to lack of open clinical datasets. The recent DREAM 9.5 Prostate Cancer Challenge carried out an extensive benchmarking of prognostic models for metastatic Castration-Resistant Prostate Cancer (mCRPC), based on multiple cohorts of open clinical trial data. We make available an open-source implementation of the top-performing model, ePCR, along with an extended toolbox for its further re-use and development, and demonstrate how to best apply the implemented model to real-world data cohorts of advanced prostate cancer patients. The open-source R-package ePCR and its reference documentation are available at the Central R Archive Network (CRAN): https://CRAN.R-project.org/package=ePCR. R-vignette provides step-by-step examples for the ePCR usage. Supplementary data are available at Bioinformatics online.

  18. Big heart data: advancing health informatics through data sharing in cardiovascular imaging.

    PubMed

    Suinesiaputra, Avan; Medrano-Gracia, Pau; Cowan, Brett R; Young, Alistair A

    2015-07-01

    The burden of heart disease is rapidly worsening due to the increasing prevalence of obesity and diabetes. Data sharing and open database resources for heart health informatics are important for advancing our understanding of cardiovascular function, disease progression and therapeutics. Data sharing enables valuable information, often obtained at considerable expense and effort, to be reused beyond the specific objectives of the original study. Many government funding agencies and journal publishers are requiring data reuse, and are providing mechanisms for data curation and archival. Tools and infrastructure are available to archive anonymous data from a wide range of studies, from descriptive epidemiological data to gigabytes of imaging data. Meta-analyses can be performed to combine raw data from disparate studies to obtain unique comparisons or to enhance statistical power. Open benchmark datasets are invaluable for validating data analysis algorithms and objectively comparing results. This review provides a rationale for increased data sharing and surveys recent progress in the cardiovascular domain. We also highlight the potential of recent large cardiovascular epidemiological studies enabling collaborative efforts to facilitate data sharing, algorithms benchmarking, disease modeling and statistical atlases.

  19. DeltaSA tool for source apportionment benchmarking, description and sensitivity analysis

    NASA Astrophysics Data System (ADS)

    Pernigotti, D.; Belis, C. A.

    2018-05-01

    DeltaSA is an R-package and a Java on-line tool developed at the EC-Joint Research Centre to assist and benchmark source apportionment applications. Its key functionalities support two critical tasks in this kind of studies: the assignment of a factor to a source in factor analytical models (source identification) and the model performance evaluation. The source identification is based on the similarity between a given factor and source chemical profiles from public databases. The model performance evaluation is based on statistical indicators used to compare model output with reference values generated in intercomparison exercises. The references values are calculated as the ensemble average of the results reported by participants that have passed a set of testing criteria based on chemical profiles and time series similarity. In this study, a sensitivity analysis of the model performance criteria is accomplished using the results of a synthetic dataset where "a priori" references are available. The consensus modulated standard deviation punc gives the best choice for the model performance evaluation when a conservative approach is adopted.

  20. Object Detection and Classification by Decision-Level Fusion for Intelligent Vehicle Systems.

    PubMed

    Oh, Sang-Il; Kang, Hang-Bong

    2017-01-22

    To understand driving environments effectively, it is important to achieve accurate detection and classification of objects detected by sensor-based intelligent vehicle systems, which are significantly important tasks. Object detection is performed for the localization of objects, whereas object classification recognizes object classes from detected object regions. For accurate object detection and classification, fusing multiple sensor information into a key component of the representation and perception processes is necessary. In this paper, we propose a new object-detection and classification method using decision-level fusion. We fuse the classification outputs from independent unary classifiers, such as 3D point clouds and image data using a convolutional neural network (CNN). The unary classifiers for the two sensors are the CNN with five layers, which use more than two pre-trained convolutional layers to consider local to global features as data representation. To represent data using convolutional layers, we apply region of interest (ROI) pooling to the outputs of each layer on the object candidate regions generated using object proposal generation to realize color flattening and semantic grouping for charge-coupled device and Light Detection And Ranging (LiDAR) sensors. We evaluate our proposed method on a KITTI benchmark dataset to detect and classify three object classes: cars, pedestrians and cyclists. The evaluation results show that the proposed method achieves better performance than the previous methods. Our proposed method extracted approximately 500 proposals on a 1226 × 370 image, whereas the original selective search method extracted approximately 10 6 × n proposals. We obtained classification performance with 77.72% mean average precision over the entirety of the classes in the moderate detection level of the KITTI benchmark dataset.

  1. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set.

    PubMed

    Lenselink, Eelke B; Ten Dijke, Niels; Bongers, Brandon; Papadatos, George; van Vlijmen, Herman W T; Kowalczyk, Wojtek; IJzerman, Adriaan P; van Westen, Gerard J P

    2017-08-14

    The increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitative Structure Activity Relationship (QSAR)-based protocols. However, such studies are typically conducted on different datasets, using different validation strategies, and different metrics. In this study, different methods were compared using one single standardized dataset obtained from ChEMBL, which is made available to the public, using standardized metrics (BEDROC and Matthews Correlation Coefficient). Specifically, the performance of Naïve Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods. All methods were validated using both a random split validation and a temporal validation, with the latter being a more realistic benchmark of expected prospective execution. Deep Neural Networks are the top performing classifiers, highlighting the added value of Deep Neural Networks over other more conventional methods. Moreover, the best method ('DNN_PCM') performed significantly better at almost one standard deviation higher than the mean performance. Furthermore, Multi-task and PCM implementations were shown to improve performance over single task Deep Neural Networks. Conversely, target prediction performed almost two standard deviations under the mean performance. Random Forests, Support Vector Machines, and Logistic Regression performed around mean performance. Finally, using an ensemble of DNNs, alongside additional tuning, enhanced the relative performance by another 27% (compared with unoptimized 'DNN_PCM'). Here, a standardized set to test and evaluate different machine learning algorithms in the context of multi-task learning is offered by providing the data and the protocols. Graphical Abstract .

  2. Object Detection and Classification by Decision-Level Fusion for Intelligent Vehicle Systems

    PubMed Central

    Oh, Sang-Il; Kang, Hang-Bong

    2017-01-01

    To understand driving environments effectively, it is important to achieve accurate detection and classification of objects detected by sensor-based intelligent vehicle systems, which are significantly important tasks. Object detection is performed for the localization of objects, whereas object classification recognizes object classes from detected object regions. For accurate object detection and classification, fusing multiple sensor information into a key component of the representation and perception processes is necessary. In this paper, we propose a new object-detection and classification method using decision-level fusion. We fuse the classification outputs from independent unary classifiers, such as 3D point clouds and image data using a convolutional neural network (CNN). The unary classifiers for the two sensors are the CNN with five layers, which use more than two pre-trained convolutional layers to consider local to global features as data representation. To represent data using convolutional layers, we apply region of interest (ROI) pooling to the outputs of each layer on the object candidate regions generated using object proposal generation to realize color flattening and semantic grouping for charge-coupled device and Light Detection And Ranging (LiDAR) sensors. We evaluate our proposed method on a KITTI benchmark dataset to detect and classify three object classes: cars, pedestrians and cyclists. The evaluation results show that the proposed method achieves better performance than the previous methods. Our proposed method extracted approximately 500 proposals on a 1226×370 image, whereas the original selective search method extracted approximately 106×n proposals. We obtained classification performance with 77.72% mean average precision over the entirety of the classes in the moderate detection level of the KITTI benchmark dataset. PMID:28117742

  3. SparseBeads data: benchmarking sparsity-regularized computed tomography

    NASA Astrophysics Data System (ADS)

    Jørgensen, Jakob S.; Coban, Sophia B.; Lionheart, William R. B.; McDonald, Samuel A.; Withers, Philip J.

    2017-12-01

    Sparsity regularization (SR) such as total variation (TV) minimization allows accurate image reconstruction in x-ray computed tomography (CT) from fewer projections than analytical methods. Exactly how few projections suffice and how this number may depend on the image remain poorly understood. Compressive sensing connects the critical number of projections to the image sparsity, but does not cover CT, however empirical results suggest a similar connection. The present work establishes for real CT data a connection between gradient sparsity and the sufficient number of projections for accurate TV-regularized reconstruction. A collection of 48 x-ray CT datasets called SparseBeads was designed for benchmarking SR reconstruction algorithms. Beadpacks comprising glass beads of five different sizes as well as mixtures were scanned in a micro-CT scanner to provide structured datasets with variable image sparsity levels, number of projections and noise levels to allow the systematic assessment of parameters affecting performance of SR reconstruction algorithms6. Using the SparseBeads data, TV-regularized reconstruction quality was assessed as a function of numbers of projections and gradient sparsity. The critical number of projections for satisfactory TV-regularized reconstruction increased almost linearly with the gradient sparsity. This establishes a quantitative guideline from which one may predict how few projections to acquire based on expected sample sparsity level as an aid in planning of dose- or time-critical experiments. The results are expected to hold for samples of similar characteristics, i.e. consisting of few, distinct phases with relatively simple structure. Such cases are plentiful in porous media, composite materials, foams, as well as non-destructive testing and metrology. For samples of other characteristics the proposed methodology may be used to investigate similar relations.

  4. Modeling spatial-temporal dynamics of global wetlands: comprehensive evaluation of a new sub-grid TOPMODEL parameterization and uncertainties

    NASA Astrophysics Data System (ADS)

    Zhang, Z.; Zimmermann, N. E.; Poulter, B.

    2015-11-01

    Simulations of the spatial-temporal dynamics of wetlands are key to understanding the role of wetland biogeochemistry under past and future climate variability. Hydrologic inundation models, such as TOPMODEL, are based on a fundamental parameter known as the compound topographic index (CTI) and provide a computationally cost-efficient approach to simulate wetland dynamics at global scales. However, there remains large discrepancy in the implementations of TOPMODEL in land-surface models (LSMs) and thus their performance against observations. This study describes new improvements to TOPMODEL implementation and estimates of global wetland dynamics using the LPJ-wsl dynamic global vegetation model (DGVM), and quantifies uncertainties by comparing three digital elevation model products (HYDRO1k, GMTED, and HydroSHEDS) at different spatial resolution and accuracy on simulated inundation dynamics. In addition, we found that calibrating TOPMODEL with a benchmark wetland dataset can help to successfully delineate the seasonal and interannual variations of wetlands, as well as improve the spatial distribution of wetlands to be consistent with inventories. The HydroSHEDS DEM, using a river-basin scheme for aggregating the CTI, shows best accuracy for capturing the spatio-temporal dynamics of wetlands among the three DEM products. The estimate of global wetland potential/maximum is ∼ 10.3 Mkm2 (106 km2), with a mean annual maximum of ∼ 5.17 Mkm2 for 1980-2010. This study demonstrates the feasibility to capture spatial heterogeneity of inundation and to estimate seasonal and interannual variations in wetland by coupling a hydrological module in LSMs with appropriate benchmark datasets. It additionally highlights the importance of an adequate investigation of topographic indices for simulating global wetlands and shows the opportunity to converge wetland estimates across LSMs by identifying the uncertainty associated with existing wetland products.

  5. A novel binary shape context for 3D local surface description

    NASA Astrophysics Data System (ADS)

    Dong, Zhen; Yang, Bisheng; Liu, Yuan; Liang, Fuxun; Li, Bijun; Zang, Yufu

    2017-08-01

    3D local surface description is now at the core of many computer vision technologies, such as 3D object recognition, intelligent driving, and 3D model reconstruction. However, most of the existing 3D feature descriptors still suffer from low descriptiveness, weak robustness, and inefficiency in both time and memory. To overcome these challenges, this paper presents a robust and descriptive 3D Binary Shape Context (BSC) descriptor with high efficiency in both time and memory. First, a novel BSC descriptor is generated for 3D local surface description, and the performance of the BSC descriptor under different settings of its parameters is analyzed. Next, the descriptiveness, robustness, and efficiency in both time and memory of the BSC descriptor are evaluated and compared to those of several state-of-the-art 3D feature descriptors. Finally, the performance of the BSC descriptor for 3D object recognition is also evaluated on a number of popular benchmark datasets, and an urban-scene dataset is collected by a terrestrial laser scanner system. Comprehensive experiments demonstrate that the proposed BSC descriptor obtained high descriptiveness, strong robustness, and high efficiency in both time and memory and achieved high recognition rates of 94.8%, 94.1% and 82.1% on the considered UWA, Queen, and WHU datasets, respectively.

  6. Identification of DNA-binding proteins using multi-features fusion and binary firefly optimization algorithm.

    PubMed

    Zhang, Jian; Gao, Bo; Chai, Haiting; Ma, Zhiqiang; Yang, Guifu

    2016-08-26

    DNA-binding proteins (DBPs) play fundamental roles in many biological processes. Therefore, the developing of effective computational tools for identifying DBPs is becoming highly desirable. In this study, we proposed an accurate method for the prediction of DBPs. Firstly, we focused on the challenge of improving DBP prediction accuracy with information solely from the sequence. Secondly, we used multiple informative features to encode the protein. These features included evolutionary conservation profile, secondary structure motifs, and physicochemical properties. Thirdly, we introduced a novel improved Binary Firefly Algorithm (BFA) to remove redundant or noisy features as well as select optimal parameters for the classifier. The experimental results of our predictor on two benchmark datasets outperformed many state-of-the-art predictors, which revealed the effectiveness of our method. The promising prediction performance on a new-compiled independent testing dataset from PDB and a large-scale dataset from UniProt proved the good generalization ability of our method. In addition, the BFA forged in this research would be of great potential in practical applications in optimization fields, especially in feature selection problems. A highly accurate method was proposed for the identification of DBPs. A user-friendly web-server named iDbP (identification of DNA-binding Proteins) was constructed and provided for academic use.

  7. BMDExpress Data Viewer: A Visualization Tool to Analyze ...

    EPA Pesticide Factsheets

    Regulatory agencies increasingly apply benchmark dose (BMD) modeling to determine points of departure in human risk assessments. BMDExpress applies BMD modeling to transcriptomics datasets and groups genes to biological processes and pathways for rapid assessment of doses at which biological perturbations occur. However, graphing and analytical capabilities within BMDExpress are limited, and the analysis of output files is challenging. We developed a web-based application, BMDExpress Data Viewer, for visualization and graphical analyses of BMDExpress output files. The software application consists of two main components: ‘Summary Visualization Tools’ and ‘Dataset Exploratory Tools’. We demonstrate through two case studies that the ‘Summary Visualization Tools’ can be used to examine and assess the distributions of probe and pathway BMD outputs, as well as derive a potential regulatory BMD through the modes or means of the distributions. The ‘Functional Enrichment Analysis’ tool presents biological processes in a two-dimensional bubble chart view. By applying filters of pathway enrichment p-value and minimum number of significant genes, we showed that the Functional Enrichment Analysis tool can be applied to select pathways that are potentially sensitive to chemical perturbations. The ‘Multiple Dataset Comparison’ tool enables comparison of BMDs across multiple experiments (e.g., across time points, tissues, or organisms, etc.). The ‘BMDL-BM

  8. An improved Pearson's correlation proximity-based hierarchical clustering for mining biological association between genes.

    PubMed

    Booma, P M; Prabhakaran, S; Dhanalakshmi, R

    2014-01-01

    Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.

  9. An Improved Pearson's Correlation Proximity-Based Hierarchical Clustering for Mining Biological Association between Genes

    PubMed Central

    Booma, P. M.; Prabhakaran, S.; Dhanalakshmi, R.

    2014-01-01

    Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality. PMID:25136661

  10. Automatic building extraction from LiDAR data fusion of point and grid-based features

    NASA Astrophysics Data System (ADS)

    Du, Shouji; Zhang, Yunsheng; Zou, Zhengrong; Xu, Shenghua; He, Xue; Chen, Siyang

    2017-08-01

    This paper proposes a method for extracting buildings from LiDAR point cloud data by combining point-based and grid-based features. To accurately discriminate buildings from vegetation, a point feature based on the variance of normal vectors is proposed. For a robust building extraction, a graph cuts algorithm is employed to combine the used features and consider the neighbor contexture information. As grid feature computing and a graph cuts algorithm are performed on a grid structure, a feature-retained DSM interpolation method is proposed in this paper. The proposed method is validated by the benchmark ISPRS Test Project on Urban Classification and 3D Building Reconstruction and compared to the state-art-of-the methods. The evaluation shows that the proposed method can obtain a promising result both at area-level and at object-level. The method is further applied to the entire ISPRS dataset and to a real dataset of the Wuhan City. The results show a completeness of 94.9% and a correctness of 92.2% at the per-area level for the former dataset and a completeness of 94.4% and a correctness of 95.8% for the latter one. The proposed method has a good potential for large-size LiDAR data.

  11. TSPmap, a tool making use of traveling salesperson problem solvers in the efficient and accurate construction of high-density genetic linkage maps.

    PubMed

    Monroe, J Grey; Allen, Zachariah A; Tanger, Paul; Mullen, Jack L; Lovell, John T; Moyers, Brook T; Whitley, Darrell; McKay, John K

    2017-01-01

    Recent advances in nucleic acid sequencing technologies have led to a dramatic increase in the number of markers available to generate genetic linkage maps. This increased marker density can be used to improve genome assemblies as well as add much needed resolution for loci controlling variation in ecologically and agriculturally important traits. However, traditional genetic map construction methods from these large marker datasets can be computationally prohibitive and highly error prone. We present TSPmap , a method which implements both approximate and exact Traveling Salesperson Problem solvers to generate linkage maps. We demonstrate that for datasets with large numbers of genomic markers (e.g. 10,000) and in multiple population types generated from inbred parents, TSPmap can rapidly produce high quality linkage maps with low sensitivity to missing and erroneous genotyping data compared to two other benchmark methods, JoinMap and MSTmap . TSPmap is open source and freely available as an R package. With the advancement of low cost sequencing technologies, the number of markers used in the generation of genetic maps is expected to continue to rise. TSPmap will be a useful tool to handle such large datasets into the future, quickly producing high quality maps using a large number of genomic markers.

  12. Online collaboration and model sharing in volcanology via VHub.org

    NASA Astrophysics Data System (ADS)

    Valentine, G.; Patra, A. K.; Bajo, J. V.; Bursik, M. I.; Calder, E.; Carn, S. A.; Charbonnier, S. J.; Connor, C.; Connor, L.; Courtland, L. M.; Gallo, S.; Jones, M.; Palma Lizana, J. L.; Moore-Russo, D.; Renschler, C. S.; Rose, W. I.

    2013-12-01

    VHub (short for VolcanoHub, and accessible at vhub.org) is an online platform for barrier free access to high end modeling and simulation and collaboration in research and training related to volcanoes, the hazards they pose, and risk mitigation. The underlying concept is to provide a platform, building upon the successful HUBzero software infrastructure (hubzero.org), that enables workers to collaborate online and to easily share information, modeling and analysis tools, and educational materials with colleagues around the globe. Collaboration occurs around several different points: (1) modeling and simulation; (2) data sharing; (3) education and training; (4) volcano observatories; and (5) project-specific groups. VHub promotes modeling and simulation in two ways: (1) some models can be implemented on VHub for online execution. VHub can provide a central warehouse for such models that should result in broader dissemination. VHub also provides a platform that supports the more complex CFD models by enabling the sharing of code development and problem-solving knowledge, benchmarking datasets, and the development of validation exercises. VHub also provides a platform for sharing of data and datasets. The VHub development team is implementing the iRODS data sharing middleware (see irods.org). iRODS allows a researcher to access data that are located at participating data sources around the world (a cloud of data) as if the data were housed in a single virtual database. Projects associated with VHub are also going to introduce the use of data driven workflow tools to support the use of multistage analysis processes where computing and data are integrated for model validation, hazard analysis etc. Audio-video recordings of seminars, PowerPoint slide sets, and educational simulations are all items that can be placed onto VHub for use by the community or by selected collaborators. An important point is that the manager of a given educational resource (or any other resource, such as a dataset or a model) can control the privacy of that resource, ranging from private (only accessible by, and known to, specific collaborators) to completely public. VHub is a very useful platform for project-specific collaborations. With a group site on VHub collaborators share documents, datasets, maps, and have ongoing discussions using the discussion board function. VHub is funded by the U.S. National Science Foundation, and is participating in development of larger earth-science cyberinfrastructure initiatives (EarthCube), as well as supporting efforts such as the Global Volcano Model. Emerging VHub-facilitated efforts include model benchmarking, collaborative code development, and growth in online modeling tools.

  13. SWAT use of gridded observations for simulating runoff - a Vietnam river basin study

    NASA Astrophysics Data System (ADS)

    Vu, M. T.; Raghavan, S. V.; Liong, S. Y.

    2011-12-01

    Many research studies that focus on basin hydrology have used the SWAT model to simulate runoff. One common practice in calibrating the SWAT model is the application of station data rainfall to simulate runoff. But over regions lacking robust station data, there is a problem of applying the model to study the hydrological responses. For some countries and remote areas, the rainfall data availability might be a constraint due to many different reasons such as lacking of technology, war time and financial limitation that lead to difficulty in constructing the runoff data. To overcome such a limitation, this research study uses some of the available globally gridded high resolution precipitation datasets to simulate runoff. Five popular gridded observation precipitation datasets: (1) Asian Precipitation Highly Resolved Observational Data Integration Towards the Evaluation of Water Resources (APHRODITE), (2) Tropical Rainfall Measuring Mission (TRMM), (3) Precipitation Estimation from Remote Sensing Information using Artificial Neural Network (PERSIANN), (4) Global Precipitation Climatology Project (GPCP), (5) modified Global Historical Climatology Network version 2 (GHCN2) and one reanalysis dataset National Centers for Environment Prediction/National Center for Atmospheric Research (NCEP/NCAR) are used to simulate runoff over the Dakbla River (a small tributary of the Mekong River) in Vietnam. Wherever possible, available station data are also used for comparison. Bilinear interpolation of these gridded datasets is used to input the precipitation data at the closest grid points to the station locations. Sensitivity Analysis and Auto-calibration are performed for the SWAT model. The Nash-Sutcliffe Efficiency (NSE) and Coefficient of Determination (R2) indices are used to benchmark the model performance. This entails a good understanding of the response of the hydrological model to different datasets and a quantification of the uncertainties in these datasets. Such a methodology is also useful for planning on Rainfall-runoff and even reservoir/river management both at rural and urban scales.

  14. Automated labelling of cancer textures in colorectal histopathology slides using quasi-supervised learning.

    PubMed

    Onder, Devrim; Sarioglu, Sulen; Karacali, Bilge

    2013-04-01

    Quasi-supervised learning is a statistical learning algorithm that contrasts two datasets by computing estimate for the posterior probability of each sample in either dataset. This method has not been applied to histopathological images before. The purpose of this study is to evaluate the performance of the method to identify colorectal tissues with or without adenocarcinoma. Light microscopic digital images from histopathological sections were obtained from 30 colorectal radical surgery materials including adenocarcinoma and non-neoplastic regions. The texture features were extracted by using local histograms and co-occurrence matrices. The quasi-supervised learning algorithm operates on two datasets, one containing samples of normal tissues labelled only indirectly, and the other containing an unlabeled collection of samples of both normal and cancer tissues. As such, the algorithm eliminates the need for manually labelled samples of normal and cancer tissues for conventional supervised learning and significantly reduces the expert intervention. Several texture feature vector datasets corresponding to different extraction parameters were tested within the proposed framework. The Independent Component Analysis dimensionality reduction approach was also identified as the one improving the labelling performance evaluated in this series. In this series, the proposed method was applied to the dataset of 22,080 vectors with reduced dimensionality 119 from 132. Regions containing cancer tissue could be identified accurately having false and true positive rates up to 19% and 88% respectively without using manually labelled ground-truth datasets in a quasi-supervised strategy. The resulting labelling performances were compared to that of a conventional powerful supervised classifier using manually labelled ground-truth data. The supervised classifier results were calculated as 3.5% and 95% for the same case. The results in this series in comparison with the benchmark classifier, suggest that quasi-supervised image texture labelling may be a useful method in the analysis and classification of pathological slides but further study is required to improve the results. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. Missing value imputation for microarray data: a comprehensive comparison study and a web tool.

    PubMed

    Chiu, Chia-Chun; Chan, Shih-Yao; Wang, Chung-Ching; Wu, Wei-Sheng

    2013-01-01

    Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses.

  16. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.

    PubMed

    Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver

    2012-07-15

    In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license.

  17. Search for long-lived, weakly interacting particles that decay to displaced hadronic jets in proton-proton collisions at s = 8 TeV with the ATLAS detector

    DOE PAGES

    Aad, G.; Abbott, B.; Abdallah, J.; ...

    2015-07-17

    A search for the decay of neutral, weakly interacting, long-lived particles using data collected by the ATLAS detector at the LHC is presented. This analysis uses the full data set recorded in 2012: 20.3 fb –1 of proton-proton collision data at √s = 8 TeV. The search employs techniques for reconstructing decay vertices of long-lived particles decaying to jets in the inner tracking detector and muon spectrometer. Signal events require at least two reconstructed vertices. No significant excess of events over the expected background is found, and limits as a function of proper lifetime are reported for the decay ofmore » the Higgs boson and other scalar bosons to long-lived particles and for Hidden Valley Z' and Stealth SUSY benchmark models. The first search results for displaced decays in Z' and Stealth SUSY models are presented. The upper bounds of the excluded proper lifetimes are the most stringent to date.« less

  18. Bacterial interactomes: Interacting protein partners share similar function and are validated in independent assays more frequently than previously reported.

    DOE PAGES

    Shatsky, Maxim; Allen, Simon; Gold, Barbara; ...

    2016-05-01

    Numerous affinity purification – mass-spectrometry (AP-MS) and yeast two hybrid (Y2H) screens have each defined thousands of pairwise protein-protein interactions (PPIs), most between functionally unrelated proteins. The accuracy of these networks, however, is under debate. Here we present an AP-MS survey of the bacterium Desulfovibrio vulgaris together with a critical reanalysis of nine published bacterial Y2H and AP-MS screens. We have identified 459 high confidence PPIs from D. vulgaris and 391 from Escherichia coli. Compared to the nine published interactomes, our two networks are smaller; are much less highly connected; have significantly lower false discovery rates; and are much moremore » enriched in protein pairs that are encoded in the same operon, have similar functions, and are reproducibly detected in other physical interaction assays. Lastly, our work establishes more stringent benchmarks for the properties of protein interactomes and suggests that bona fide PPIs much more frequently involve protein partners that are annotated with similar functions or that can be validated in independent assays than earlier studies suggested.« less

  19. Early Benchmarks of Product Generation Capabilities of the GOES-R Ground System for Operational Weather Prediction

    NASA Astrophysics Data System (ADS)

    Kalluri, S. N.; Haman, B.; Vititoe, D.

    2014-12-01

    The ground system under development for Geostationary Operational Environmental Satellite-R (GOES-R) series of weather satellite has completed a key milestone in implementing the science algorithms that process raw sensor data to higher level products in preparation for launch. Real time observations from GOES-R are expected to make significant contributions to Earth and space weather prediction, and there are stringent requirements to product weather products at very low latency to meet NOAA's operational needs. Simulated test data from all the six GOES-R sensors are being processed by the system to test and verify performance of the fielded system. Early results show that the system development is on track to meet functional and performance requirements to process science data. Comparison of science products generated by the ground system from simulated data with those generated by the algorithm developers show close agreement among data sets which demonstrates that the algorithms are implemented correctly. Successful delivery of products to AWIPS and the Product Distribution and Access (PDA) system from the core system demonstrate that the external interfaces are working.

  20. Promoting Charge Separation and Injection by Optimizing the Interfaces of GaN:ZnO Photoanode for Efficient Solar Water Oxidation.

    PubMed

    Wang, Zhiliang; Zong, Xu; Gao, Yuying; Han, Jingfeng; Xu, Zhiqiang; Li, Zheng; Ding, Chunmei; Wang, Shengyang; Li, Can

    2017-09-13

    Photoelectrochemical water splitting provides an attractive way to store solar energy in molecular hydrogen as a kind of sustainable fuel. To achieve high solar conversion efficiency, the most stringent criteria are effective charge separation and injection in electrodes. Herein, efficient photoelectrochemical water oxidation is realized by optimizing charge separation and surface charge transfer of GaN:ZnO photoanode. The charge separation can be greatly improved through modified moisture-assisted nitridation and HCl acid treatment, by which the interfaces in GaN:ZnO solid solution particles are optimized and recombination centers existing at the interfaces are depressed in GaN:ZnO photoanode. Moreover, a multimetal phosphide of NiCoFeP was employed as water oxidation cocatalyst to improve the charge injection at the photoanode/electrolyte interface. Consequently, it significantly decreases the overpotential and brings the photocurrent to a benchmark of 3.9 mA cm -2 at 1.23 V vs RHE and a solar conversion efficiency over 1% was obtained.

  1. Validation and Comparison of 2D and 3D Codes for Nearshore Motion of Long Waves Using Benchmark Problems

    NASA Astrophysics Data System (ADS)

    Velioǧlu, Deniz; Cevdet Yalçıner, Ahmet; Zaytsev, Andrey

    2016-04-01

    Tsunamis are huge waves with long wave periods and wave lengths that can cause great devastation and loss of life when they strike a coast. The interest in experimental and numerical modeling of tsunami propagation and inundation increased considerably after the 2011 Great East Japan earthquake. In this study, two numerical codes, FLOW 3D and NAMI DANCE, that analyze tsunami propagation and inundation patterns are considered. Flow 3D simulates linear and nonlinear propagating surface waves as well as long waves by solving three-dimensional Navier-Stokes (3D-NS) equations. NAMI DANCE uses finite difference computational method to solve 2D depth-averaged linear and nonlinear forms of shallow water equations (NSWE) in long wave problems, specifically tsunamis. In order to validate these two codes and analyze the differences between 3D-NS and 2D depth-averaged NSWE equations, two benchmark problems are applied. One benchmark problem investigates the runup of long waves over a complex 3D beach. The experimental setup is a 1:400 scale model of Monai Valley located on the west coast of Okushiri Island, Japan. Other benchmark problem is discussed in 2015 National Tsunami Hazard Mitigation Program (NTHMP) Annual meeting in Portland, USA. It is a field dataset, recording the Japan 2011 tsunami in Hilo Harbor, Hawaii. The computed water surface elevation and velocity data are compared with the measured data. The comparisons showed that both codes are in fairly good agreement with each other and benchmark data. The differences between 3D-NS and 2D depth-averaged NSWE equations are highlighted. All results are presented with discussions and comparisons. Acknowledgements: Partial support by Japan-Turkey Joint Research Project by JICA on earthquakes and tsunamis in Marmara Region (JICA SATREPS - MarDiM Project), 603839 ASTARTE Project of EU, UDAP-C-12-14 project of AFAD Turkey, 108Y227, 113M556 and 213M534 projects of TUBITAK Turkey, RAPSODI (CONCERT_Dis-021) of CONCERT-Japan Joint Call and Istanbul Metropolitan Municipality are all acknowledged.

  2. Assessing streamflow sensitivity to variations in glacier mass balance

    USGS Publications Warehouse

    O'Neel, Shad; Hood, Eran; Arendt, Anthony; Sass, Louis

    2014-01-01

    The purpose of this paper is to evaluate relationships among seasonal and annual glacier mass balances, glacier runoff and streamflow in two glacierized basins in different climate settings. We use long-term glacier mass balance and streamflow datasets from the United States Geological Survey (USGS) Alaska Benchmark Glacier Program to compare and contrast glacier-streamflow interactions in a maritime climate (Wolverine Glacier) with those in a continental climate (Gulkana Glacier). Our overall goal is to improve our understanding of how glacier mass balance processes impact streamflow, ultimately improving our conceptual understanding of the future evolution of glacier runoff in continental and maritime climates.

  3. Auction dynamics: A volume constrained MBO scheme

    NASA Astrophysics Data System (ADS)

    Jacobs, Matt; Merkurjev, Ekaterina; Esedoǧlu, Selim

    2018-02-01

    We show how auction algorithms, originally developed for the assignment problem, can be utilized in Merriman, Bence, and Osher's threshold dynamics scheme to simulate multi-phase motion by mean curvature in the presence of equality and inequality volume constraints on the individual phases. The resulting algorithms are highly efficient and robust, and can be used in simulations ranging from minimal partition problems in Euclidean space to semi-supervised machine learning via clustering on graphs. In the case of the latter application, numerous experimental results on benchmark machine learning datasets show that our approach exceeds the performance of current state-of-the-art methods, while requiring a fraction of the computation time.

  4. Comparison between extreme learning machine and wavelet neural networks in data classification

    NASA Astrophysics Data System (ADS)

    Yahia, Siwar; Said, Salwa; Jemai, Olfa; Zaied, Mourad; Ben Amar, Chokri

    2017-03-01

    Extreme learning Machine is a well known learning algorithm in the field of machine learning. It's about a feed forward neural network with a single-hidden layer. It is an extremely fast learning algorithm with good generalization performance. In this paper, we aim to compare the Extreme learning Machine with wavelet neural networks, which is a very used algorithm. We have used six benchmark data sets to evaluate each technique. These datasets Including Wisconsin Breast Cancer, Glass Identification, Ionosphere, Pima Indians Diabetes, Wine Recognition and Iris Plant. Experimental results have shown that both extreme learning machine and wavelet neural networks have reached good results.

  5. Automatic Clustering Using FSDE-Forced Strategy Differential Evolution

    NASA Astrophysics Data System (ADS)

    Yasid, A.

    2018-01-01

    Clustering analysis is important in datamining for unsupervised data, cause no adequate prior knowledge. One of the important tasks is defining the number of clusters without user involvement that is known as automatic clustering. This study intends on acquiring cluster number automatically utilizing forced strategy differential evolution (AC-FSDE). Two mutation parameters, namely: constant parameter and variable parameter are employed to boost differential evolution performance. Four well-known benchmark datasets were used to evaluate the algorithm. Moreover, the result is compared with other state of the art automatic clustering methods. The experiment results evidence that AC-FSDE is better or competitive with other existing automatic clustering algorithm.

  6. Digital orthoimagery base specification V1.0

    USGS Publications Warehouse

    Rufe, Philip P.

    2014-01-01

    The resolution requirement for orthoimagery in support of the The National Map of the U.S. Geological Survey (USGS) is 1 meter. However, as the Office of Management and Budget A-16 designated Federal agency responsible for base orthoimagery, the USGS National Geospatial Program (NGP) has developed this base specification to include higher resolution orthoimagery. Many Federal, State, and local programs use high-resolution orthoimagery for various purposes including critical infrastructure management, vector data updates, land-use analysis, natural resource inventory, and extraction of data. The complex nature of large-area orthoimagery datasets, combined with the broad interest in orthoimagery, which is of consistent quality and spatial accuracy, requires high-resolution orthoimagery to meet or exceed the format and content outlined in this specification. The USGS intends to use this specification primarily to create consistency across all NGP funded and managed orthoimagery collections, in particular, collections in support of the National Digital Orthoimagery Program (NDOP). In the absence of other comprehensive specifications or standards, the USGS intends that this specification will, to the highest degree practical, be adopted by other USGS programs and mission areas, and by other Federal agencies. This base specification, defining minimum parameters for orthoimagery data collection. Local conditions in any given project area, specialized applications for the data, or the preferences of cooperators, may mandate more stringent requirements. The USGS fully supports the acquisition of more detailed, accurate, or value-added data that exceed the base specification outlined herein. A partial list of common “buy-up” options is provided in appendix 1 for those areas and projects that require more stringent or expanded specifications.

  7. Development and validation of a national data registry for midwife-led births: the Midwives Alliance of North America Statistics Project 2.0 dataset.

    PubMed

    Cheyney, Melissa; Bovbjerg, Marit; Everson, Courtney; Gordon, Wendy; Hannibal, Darcy; Vedam, Saraswathi

    2014-01-01

    In 2004, the Midwives Alliance of North America's (MANA's) Division of Research developed a Web-based data collection system to gather information on the practices and outcomes associated with midwife-led births in the United States. This system, called the MANA Statistics Project (MANA Stats), grew out of a widely acknowledged need for more reliable data on outcomes by intended place of birth. This article describes the history and development of the MANA Stats birth registry and provides an analysis of the 2.0 dataset's content, strengths, and limitations. Data collection and review procedures for the MANA Stats 2.0 dataset are described, along with methods for the assessment of data accuracy. We calculated descriptive statistics for client demographics and contributing midwife credentials, and assessed the quality of data by calculating point estimates, 95% confidence intervals, and kappa statistics for key outcomes on pre- and postreview samples of records. The MANA Stats 2.0 dataset (2004-2009) contains 24,848 courses of care, 20,893 of which are for women who planned a home or birth center birth at the onset of labor. The majority of these records were planned home births (81%). Births were attended primarily by certified professional midwives (73%), and clients were largely white (92%), married (87%), and college-educated (49%). Data quality analyses of 9932 records revealed no differences between pre- and postreviewed samples for 7 key benchmarking variables (kappa, 0.98-1.00). The MANA Stats 2.0 data were accurately entered by participants; any errors in this dataset are likely random and not systematic. The primary limitation of the 2.0 dataset is that the sample was captured through voluntary participation; thus, it may not accurately reflect population-based outcomes. The dataset's primary strength is that it will allow for the examination of research questions on normal physiologic birth and midwife-led birth outcomes by intended place of birth. © 2014 by the American College of Nurse-Midwives.

  8. Robust Fusion of Color and Depth Data for RGB-D Target Tracking Using Adaptive Range-Invariant Depth Models and Spatio-Temporal Consistency Constraints.

    PubMed

    Xiao, Jingjing; Stolkin, Rustam; Gao, Yuqing; Leonardis, Ales

    2017-09-06

    This paper presents a novel robust method for single target tracking in RGB-D images, and also contributes a substantial new benchmark dataset for evaluating RGB-D trackers. While a target object's color distribution is reasonably motion-invariant, this is not true for the target's depth distribution, which continually varies as the target moves relative to the camera. It is therefore nontrivial to design target models which can fully exploit (potentially very rich) depth information for target tracking. For this reason, much of the previous RGB-D literature relies on color information for tracking, while exploiting depth information only for occlusion reasoning. In contrast, we propose an adaptive range-invariant target depth model, and show how both depth and color information can be fully and adaptively fused during the search for the target in each new RGB-D image. We introduce a new, hierarchical, two-layered target model (comprising local and global models) which uses spatio-temporal consistency constraints to achieve stable and robust on-the-fly target relearning. In the global layer, multiple features, derived from both color and depth data, are adaptively fused to find a candidate target region. In ambiguous frames, where one or more features disagree, this global candidate region is further decomposed into smaller local candidate regions for matching to local-layer models of small target parts. We also note that conventional use of depth data, for occlusion reasoning, can easily trigger false occlusion detections when the target moves rapidly toward the camera. To overcome this problem, we show how combining target information with contextual information enables the target's depth constraint to be relaxed. Our adaptively relaxed depth constraints can robustly accommodate large and rapid target motion in the depth direction, while still enabling the use of depth data for highly accurate reasoning about occlusions. For evaluation, we introduce a new RGB-D benchmark dataset with per-frame annotated attributes and extensive bias analysis. Our tracker is evaluated using two different state-of-the-art methodologies, VOT and object tracking benchmark, and in both cases it significantly outperforms four other state-of-the-art RGB-D trackers from the literature.

  9. Learning a peptide-protein binding affinity predictor with kernel ridge regression

    PubMed Central

    2013-01-01

    Background The cellular function of a vast majority of proteins is performed through physical interactions with other biomolecules, which, most of the time, are other proteins. Peptides represent templates of choice for mimicking a secondary structure in order to modulate protein-protein interaction. They are thus an interesting class of therapeutics since they also display strong activity, high selectivity, low toxicity and few drug-drug interactions. Furthermore, predicting peptides that would bind to a specific MHC alleles would be of tremendous benefit to improve vaccine based therapy and possibly generate antibodies with greater affinity. Modern computational methods have the potential to accelerate and lower the cost of drug and vaccine discovery by selecting potential compounds for testing in silico prior to biological validation. Results We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalizes eight kernels, comprised of the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it’s approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of predicting the binding affinity of any peptide to any protein with reasonable accuracy. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. Conclusion On all benchmarks, our method significantly (p-value ≤ 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. Moreover, generating reliable peptide-protein binding affinities will also improve system biology modelling of interaction pathways. Lastly, the method should be of value to a large segment of the research community with the potential to accelerate the discovery of peptide-based drugs and facilitate vaccine development. The proposed kernel is freely available at http://graal.ift.ulaval.ca/downloads/gs-kernel/. PMID:23497081

  10. mBEEF-vdW: Robust fitting of error estimation density functionals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lundgaard, Keld T.; Wellendorff, Jess; Voss, Johannes

    Here, we propose a general-purpose semilocal/nonlocal exchange-correlation functional approximation, named mBEEF-vdW. The exchange is a meta generalized gradient approximation, and the correlation is a semilocal and nonlocal mixture, with the Rutgers-Chalmers approximation for van der Waals (vdW) forces. The functional is fitted within the Bayesian error estimation functional (BEEF) framework. We improve the previously used fitting procedures by introducing a robust MM-estimator based loss function, reducing the sensitivity to outliers in the datasets. To more reliably determine the optimal model complexity, we furthermore introduce a generalization of the bootstrap 0.632 estimator with hierarchical bootstrap sampling and geometric mean estimator overmore » the training datasets. Using this estimator, we show that the robust loss function leads to a 10% improvement in the estimated prediction error over the previously used least-squares loss function. The mBEEF-vdW functional is benchmarked against popular density functional approximations over a wide range of datasets relevant for heterogeneous catalysis, including datasets that were not used for its training. Overall, we find that mBEEF-vdW has a higher general accuracy than competing popular functionals, and it is one of the best performing functionals on chemisorption systems, surface energies, lattice constants, and dispersion. We also show the potential-energy curve of graphene on the nickel(111) surface, where mBEEF-vdW matches the experimental binding length. mBEEF-vdW is currently available in gpaw and other density functional theory codes through Libxc, version 3.0.0.« less

  11. mBEEF-vdW: Robust fitting of error estimation density functionals

    DOE PAGES

    Lundgaard, Keld T.; Wellendorff, Jess; Voss, Johannes; ...

    2016-06-15

    Here, we propose a general-purpose semilocal/nonlocal exchange-correlation functional approximation, named mBEEF-vdW. The exchange is a meta generalized gradient approximation, and the correlation is a semilocal and nonlocal mixture, with the Rutgers-Chalmers approximation for van der Waals (vdW) forces. The functional is fitted within the Bayesian error estimation functional (BEEF) framework. We improve the previously used fitting procedures by introducing a robust MM-estimator based loss function, reducing the sensitivity to outliers in the datasets. To more reliably determine the optimal model complexity, we furthermore introduce a generalization of the bootstrap 0.632 estimator with hierarchical bootstrap sampling and geometric mean estimator overmore » the training datasets. Using this estimator, we show that the robust loss function leads to a 10% improvement in the estimated prediction error over the previously used least-squares loss function. The mBEEF-vdW functional is benchmarked against popular density functional approximations over a wide range of datasets relevant for heterogeneous catalysis, including datasets that were not used for its training. Overall, we find that mBEEF-vdW has a higher general accuracy than competing popular functionals, and it is one of the best performing functionals on chemisorption systems, surface energies, lattice constants, and dispersion. We also show the potential-energy curve of graphene on the nickel(111) surface, where mBEEF-vdW matches the experimental binding length. mBEEF-vdW is currently available in gpaw and other density functional theory codes through Libxc, version 3.0.0.« less

  12. The Streptococcus agalactiae Stringent Response Enhances Virulence and Persistence in Human Blood

    PubMed Central

    Hooven, Thomas A.; Catomeris, Andrew J.; Bonakdar, Maryam; Tallon, Luke J.; Santana-Cruz, Ivette; Ott, Sandra; Daugherty, Sean C.; Tettelin, Hervé

    2017-01-01

    ABSTRACT Streptococcus agalactiae (group B Streptococcus [GBS]) causes serious infections in neonates. We previously reported a transposon sequencing (Tn-seq) system for performing genomewide assessment of gene fitness in GBS. In order to identify molecular mechanisms required for GBS to transition from a mucosal commensal lifestyle to bloodstream invasion, we performed Tn-seq on GBS strain A909 with human whole blood. Our analysis identified 16 genes conditionally essential for GBS survival in blood, of which 75% were members of the capsular polysaccharide (cps) operon. Among the non-cps genes identified as conditionally essential was relA, which encodes an enzyme whose activity is central to the bacterial stringent response—a conserved adaptation to environmental stress. We used blood coincubation studies of targeted knockout strains to confirm the expected growth defects of GBS deficient in capsule or stringent response activation. Unexpectedly, we found that the relA knockout strains demonstrated decreased expression of β-hemolysin/cytolysin, an important cytotoxin implicated in facilitating GBS invasion. Furthermore, chemical activation of the stringent response with serine hydroxamate increased β-hemolysin/cytolysin expression. To establish a mechanism by which the stringent response leads to increased cytotoxicity, we performed transcriptome sequencing (RNA-seq) on two GBS strains grown under stringent response or control conditions. This revealed a conserved decrease in the expression of genes in the arginine deiminase pathway during stringent response activation. Through coincubation with supplemental arginine and the arginine antagonist canavanine, we show that arginine availability is a determinant of GBS cytotoxicity and that the pathway between stringent response activation and increased virulence is arginine dependent. PMID:29109175

  13. Regional restoration benchmarks for Acropora cervicornis

    NASA Astrophysics Data System (ADS)

    Schopmeyer, Stephanie A.; Lirman, Diego; Bartels, Erich; Gilliam, David S.; Goergen, Elizabeth A.; Griffin, Sean P.; Johnson, Meaghan E.; Lustic, Caitlin; Maxwell, Kerry; Walter, Cory S.

    2017-12-01

    Coral gardening plays an important role in the recovery of depleted populations of threatened Acropora cervicornis in the Caribbean. Over the past decade, high survival coupled with fast growth of in situ nursery corals have allowed practitioners to create healthy and genotypically diverse nursery stocks. Currently, thousands of corals are propagated and outplanted onto degraded reefs on a yearly basis, representing a substantial increase in the abundance, biomass, and overall footprint of A. cervicornis. Here, we combined an extensive dataset collected by restoration practitioners to document early (1-2 yr) restoration success metrics in Florida and Puerto Rico, USA. By reporting region-specific data on the impacts of fragment collection on donor colonies, survivorship and productivity of nursery corals, and survivorship and productivity of outplanted corals during normal conditions, we provide the basis for a stop-light indicator framework for new or existing restoration programs to evaluate their performance. We show that current restoration methods are very effective, that no excess damage is caused to donor colonies, and that once outplanted, corals behave just as wild colonies. We also provide science-based benchmarks that can be used by programs to evaluate successes and challenges of their efforts, and to make modifications where needed. We propose that up to 10% of the biomass can be collected from healthy, large A. cervicornis donor colonies for nursery propagation. We also propose the following benchmarks for the first year of activities for A. cervicornis restoration: (1) >75% live tissue cover on donor colonies; (2) >80% survivorship of nursery corals; and (3) >70% survivorship of outplanted corals. Finally, we report productivity means of 4.4 cm yr-1 for nursery corals and 4.8 cm yr-1 for outplants as a frame of reference for ranking performance within programs. Such benchmarks, and potential subsequent adaptive actions, are needed to fully assess the long-term success of coral restoration and species recovery programs.

  14. Computational identification of binding energy hot spots in protein-RNA complexes using an ensemble approach.

    PubMed

    Pan, Yuliang; Wang, Zixiang; Zhan, Weihua; Deng, Lei

    2018-05-01

    Identifying RNA-binding residues, especially energetically favored hot spots, can provide valuable clues for understanding the mechanisms and functional importance of protein-RNA interactions. Yet, limited availability of experimentally recognized energy hot spots in protein-RNA crystal structures leads to the difficulties in developing empirical identification approaches. Computational prediction of RNA-binding hot spot residues is still in its infant stage. Here, we describe a computational method, PrabHot (Prediction of protein-RNA binding hot spots), that can effectively detect hot spot residues on protein-RNA binding interfaces using an ensemble of conceptually different machine learning classifiers. Residue interaction network features and new solvent exposure characteristics are combined together and selected for classification with the Boruta algorithm. In particular, two new reference datasets (benchmark and independent) have been generated containing 107 hot spots from 47 known protein-RNA complex structures. In 10-fold cross-validation on the training dataset, PrabHot achieves promising performances with an AUC score of 0.86 and a sensitivity of 0.78, which are significantly better than that of the pioneer RNA-binding hot spot prediction method HotSPRing. We also demonstrate the capability of our proposed method on the independent test dataset and gain a competitive advantage as a result. The PrabHot webserver is freely available at http://denglab.org/PrabHot/. leideng@csu.edu.cn. Supplementary data are available at Bioinformatics online.

  15. Accuracy and robustness evaluation in stereo matching

    NASA Astrophysics Data System (ADS)

    Nguyen, Duc M.; Hanca, Jan; Lu, Shao-Ping; Schelkens, Peter; Munteanu, Adrian

    2016-09-01

    Stereo matching has received a lot of attention from the computer vision community, thanks to its wide range of applications. Despite of the large variety of algorithms that have been proposed so far, it is not trivial to select suitable algorithms for the construction of practical systems. One of the main problems is that many algorithms lack sufficient robustness when employed in various operational conditions. This problem is due to the fact that most of the proposed methods in the literature are usually tested and tuned to perform well on one specific dataset. To alleviate this problem, an extensive evaluation in terms of accuracy and robustness of state-of-the-art stereo matching algorithms is presented. Three datasets (Middlebury, KITTI, and MPEG FTV) representing different operational conditions are employed. Based on the analysis, improvements over existing algorithms have been proposed. The experimental results show that our improved versions of cross-based and cost volume filtering algorithms outperform the original versions with large margins on Middlebury and KITTI datasets. In addition, the latter of the two proposed algorithms ranks itself among the best local stereo matching approaches on the KITTI benchmark. Under evaluations using specific settings for depth-image-based-rendering applications, our improved belief propagation algorithm is less complex than MPEG's FTV depth estimation reference software (DERS), while yielding similar depth estimation performance. Finally, several conclusions on stereo matching algorithms are also presented.

  16. When drug discovery meets web search: Learning to Rank for ligand-based virtual screening.

    PubMed

    Zhang, Wei; Ji, Lijuan; Chen, Yanan; Tang, Kailin; Wang, Haiping; Zhu, Ruixin; Jia, Wei; Cao, Zhiwei; Liu, Qi

    2015-01-01

    The rapid increase in the emergence of novel chemical substances presents a substantial demands for more sophisticated computational methodologies for drug discovery. In this study, the idea of Learning to Rank in web search was presented in drug virtual screening, which has the following unique capabilities of 1). Applicable of identifying compounds on novel targets when there is not enough training data available for these targets, and 2). Integration of heterogeneous data when compound affinities are measured in different platforms. A standard pipeline was designed to carry out Learning to Rank in virtual screening. Six Learning to Rank algorithms were investigated based on two public datasets collected from Binding Database and the newly-published Community Structure-Activity Resource benchmark dataset. The results have demonstrated that Learning to rank is an efficient computational strategy for drug virtual screening, particularly due to its novel use in cross-target virtual screening and heterogeneous data integration. To the best of our knowledge, we have introduced here the first application of Learning to Rank in virtual screening. The experiment workflow and algorithm assessment designed in this study will provide a standard protocol for other similar studies. All the datasets as well as the implementations of Learning to Rank algorithms are available at http://www.tongji.edu.cn/~qiliu/lor_vs.html. Graphical AbstractThe analogy between web search and ligand-based drug discovery.

  17. Application of 3D Zernike descriptors to shape-based ligand similarity searching.

    PubMed

    Venkatraman, Vishwesh; Chakravarthy, Padmasini Ramji; Kihara, Daisuke

    2009-12-17

    The identification of promising drug leads from a large database of compounds is an important step in the preliminary stages of drug design. Although shape is known to play a key role in the molecular recognition process, its application to virtual screening poses significant hurdles both in terms of the encoding scheme and speed. In this study, we have examined the efficacy of the alignment independent three-dimensional Zernike descriptor (3DZD) for fast shape based similarity searching. Performance of this approach was compared with several other methods including the statistical moments based ultrafast shape recognition scheme (USR) and SIMCOMP, a graph matching algorithm that compares atom environments. Three benchmark datasets are used to thoroughly test the methods in terms of their ability for molecular classification, retrieval rate, and performance under the situation that simulates actual virtual screening tasks over a large pharmaceutical database. The 3DZD performed better than or comparable to the other methods examined, depending on the datasets and evaluation metrics used. Reasons for the success and the failure of the shape based methods for specific cases are investigated. Based on the results for the three datasets, general conclusions are drawn with regard to their efficiency and applicability. The 3DZD has unique ability for fast comparison of three-dimensional shape of compounds. Examples analyzed illustrate the advantages and the room for improvements for the 3DZD.

  18. Application of 3D Zernike descriptors to shape-based ligand similarity searching

    PubMed Central

    2009-01-01

    Background The identification of promising drug leads from a large database of compounds is an important step in the preliminary stages of drug design. Although shape is known to play a key role in the molecular recognition process, its application to virtual screening poses significant hurdles both in terms of the encoding scheme and speed. Results In this study, we have examined the efficacy of the alignment independent three-dimensional Zernike descriptor (3DZD) for fast shape based similarity searching. Performance of this approach was compared with several other methods including the statistical moments based ultrafast shape recognition scheme (USR) and SIMCOMP, a graph matching algorithm that compares atom environments. Three benchmark datasets are used to thoroughly test the methods in terms of their ability for molecular classification, retrieval rate, and performance under the situation that simulates actual virtual screening tasks over a large pharmaceutical database. The 3DZD performed better than or comparable to the other methods examined, depending on the datasets and evaluation metrics used. Reasons for the success and the failure of the shape based methods for specific cases are investigated. Based on the results for the three datasets, general conclusions are drawn with regard to their efficiency and applicability. Conclusion The 3DZD has unique ability for fast comparison of three-dimensional shape of compounds. Examples analyzed illustrate the advantages and the room for improvements for the 3DZD. PMID:20150998

  19. Finding undetected protein associations in cell signaling by belief propagation.

    PubMed

    Bailly-Bechet, M; Borgs, C; Braunstein, A; Chayes, J; Dagkessamanskaia, A; François, J-M; Zecchina, R

    2011-01-11

    External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High-throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field.

  20. Training radial basis function networks for wind speed prediction using PSO enhanced differential search optimizer

    PubMed Central

    2018-01-01

    This paper presents an integrated hybrid optimization algorithm for training the radial basis function neural network (RBF NN). Training of neural networks is still a challenging exercise in machine learning domain. Traditional training algorithms in general suffer and trap in local optima and lead to premature convergence, which makes them ineffective when applied for datasets with diverse features. Training algorithms based on evolutionary computations are becoming popular due to their robust nature in overcoming the drawbacks of the traditional algorithms. Accordingly, this paper proposes a hybrid training procedure with differential search (DS) algorithm functionally integrated with the particle swarm optimization (PSO). To surmount the local trapping of the search procedure, a new population initialization scheme is proposed using Logistic chaotic sequence, which enhances the population diversity and aid the search capability. To demonstrate the effectiveness of the proposed RBF hybrid training algorithm, experimental analysis on publicly available 7 benchmark datasets are performed. Subsequently, experiments were conducted on a practical application case for wind speed prediction to expound the superiority of the proposed RBF training algorithm in terms of prediction accuracy. PMID:29768463

  1. RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection.

    PubMed

    Wu, Ke; Zhang, Kun; Fan, Wei; Edwards, Andrea; Yu, Philip S

    Anomaly detection in streaming data is of high interest in numerous application domains. In this paper, we propose a novel one-class semi-supervised algorithm to detect anomalies in streaming data. Underlying the algorithm is a fast and accurate density estimator implemented by multiple fully randomized space trees (RS-Trees), named RS-Forest. The piecewise constant density estimate of each RS-tree is defined on the tree node into which an instance falls. Each incoming instance in a data stream is scored by the density estimates averaged over all trees in the forest. Two strategies, statistical attribute range estimation of high probability guarantee and dual node profiles for rapid model update, are seamlessly integrated into RS-Forest to systematically address the ever-evolving nature of data streams. We derive the theoretical upper bound for the proposed algorithm and analyze its asymptotic properties via bias-variance decomposition. Empirical comparisons to the state-of-the-art methods on multiple benchmark datasets demonstrate that the proposed method features high detection rate, fast response, and insensitivity to most of the parameter settings. Algorithm implementations and datasets are available upon request.

  2. A Generic Deep-Learning-Based Approach for Automated Surface Inspection.

    PubMed

    Ren, Ruoxu; Hung, Terence; Tan, Kay Chen

    2018-03-01

    Automated surface inspection (ASI) is a challenging task in industry, as collecting training dataset is usually costly and related methods are highly dataset-dependent. In this paper, a generic approach that requires small training data for ASI is proposed. First, this approach builds classifier on the features of image patches, where the features are transferred from a pretrained deep learning network. Next, pixel-wise prediction is obtained by convolving the trained classifier over input image. An experiment on three public and one industrial data set is carried out. The experiment involves two tasks: 1) image classification and 2) defect segmentation. The results of proposed algorithm are compared against several best benchmarks in literature. In the classification tasks, the proposed method improves accuracy by 0.66%-25.50%. In the segmentation tasks, the proposed method reduces error escape rates by 6.00%-19.00% in three defect types and improves accuracies by 2.29%-9.86% in all seven defect types. In addition, the proposed method achieves 0.0% error escape rate in the segmentation task of industrial data.

  3. Training radial basis function networks for wind speed prediction using PSO enhanced differential search optimizer.

    PubMed

    Rani R, Hannah Jessie; Victoire T, Aruldoss Albert

    2018-01-01

    This paper presents an integrated hybrid optimization algorithm for training the radial basis function neural network (RBF NN). Training of neural networks is still a challenging exercise in machine learning domain. Traditional training algorithms in general suffer and trap in local optima and lead to premature convergence, which makes them ineffective when applied for datasets with diverse features. Training algorithms based on evolutionary computations are becoming popular due to their robust nature in overcoming the drawbacks of the traditional algorithms. Accordingly, this paper proposes a hybrid training procedure with differential search (DS) algorithm functionally integrated with the particle swarm optimization (PSO). To surmount the local trapping of the search procedure, a new population initialization scheme is proposed using Logistic chaotic sequence, which enhances the population diversity and aid the search capability. To demonstrate the effectiveness of the proposed RBF hybrid training algorithm, experimental analysis on publicly available 7 benchmark datasets are performed. Subsequently, experiments were conducted on a practical application case for wind speed prediction to expound the superiority of the proposed RBF training algorithm in terms of prediction accuracy.

  4. Discrimination of Breast Cancer with Microcalcifications on Mammography by Deep Learning.

    PubMed

    Wang, Jinhua; Yang, Xi; Cai, Hongmin; Tan, Wanchang; Jin, Cangzheng; Li, Li

    2016-06-07

    Microcalcification is an effective indicator of early breast cancer. To improve the diagnostic accuracy of microcalcifications, this study evaluates the performance of deep learning-based models on large datasets for its discrimination. A semi-automated segmentation method was used to characterize all microcalcifications. A discrimination classifier model was constructed to assess the accuracies of microcalcifications and breast masses, either in isolation or combination, for classifying breast lesions. Performances were compared to benchmark models. Our deep learning model achieved a discriminative accuracy of 87.3% if microcalcifications were characterized alone, compared to 85.8% with a support vector machine. The accuracies were 61.3% for both methods with masses alone and improved to 89.7% and 85.8% after the combined analysis with microcalcifications. Image segmentation with our deep learning model yielded 15, 26 and 41 features for the three scenarios, respectively. Overall, deep learning based on large datasets was superior to standard methods for the discrimination of microcalcifications. Accuracy was increased by adopting a combinatorial approach to detect microcalcifications and masses simultaneously. This may have clinical value for early detection and treatment of breast cancer.

  5. Robust Pedestrian Classification Based on Hierarchical Kernel Sparse Representation.

    PubMed

    Sun, Rui; Zhang, Guanghai; Yan, Xiaoxing; Gao, Jun

    2016-08-16

    Vision-based pedestrian detection has become an active topic in computer vision and autonomous vehicles. It aims at detecting pedestrians appearing ahead of the vehicle using a camera so that autonomous vehicles can assess the danger and take action. Due to varied illumination and appearance, complex background and occlusion pedestrian detection in outdoor environments is a difficult problem. In this paper, we propose a novel hierarchical feature extraction and weighted kernel sparse representation model for pedestrian classification. Initially, hierarchical feature extraction based on a CENTRIST descriptor is used to capture discriminative structures. A max pooling operation is used to enhance the invariance of varying appearance. Then, a kernel sparse representation model is proposed to fully exploit the discrimination information embedded in the hierarchical local features, and a Gaussian weight function as the measure to effectively handle the occlusion in pedestrian images. Extensive experiments are conducted on benchmark databases, including INRIA, Daimler, an artificially generated dataset and a real occluded dataset, demonstrating the more robust performance of the proposed method compared to state-of-the-art pedestrian classification methods.

  6. A novel Multi-Agent Ada-Boost algorithm for predicting protein structural class with the information of protein secondary structure.

    PubMed

    Fan, Ming; Zheng, Bin; Li, Lihua

    2015-10-01

    Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.

  7. Online Least Squares One-Class Support Vector Machines-Based Abnormal Visual Event Detection

    PubMed Central

    Wang, Tian; Chen, Jie; Zhou, Yi; Snoussi, Hichem

    2013-01-01

    The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method. PMID:24351629

  8. Quantifying density cues in grouping displays.

    PubMed

    Machilsen, Bart; Wagemans, Johan; Demeyer, Maarten

    2016-09-01

    Perceptual grouping processes are typically studied using sparse displays of spatially separated elements. Unless the grouping cue of interest is a proximity cue, researchers will want to ascertain that such a cue is absent from the display. Various solutions to this problem have been employed in the literature; however, no validation of these methods exists. Here, we test a number of local density metrics both through their performance as constrained ideal observer models, and through a comparison with a large dataset of human detection trials. We conclude that for the selection of stimuli without a density cue, the Voronoi density metric is preferable, especially if combined with a measurement of the distance to each element's nearest neighbor. We offer the entirety of the dataset as a benchmark for the evaluation of future, possibly improved, metrics. With regard to human processes of grouping by proximity, we found observers to be insensitive to target groupings that are more sparse than the surrounding distractor elements, and less sensitive to regularity cues in element positioning than to local clusterings of target elements. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Brain-Inspired Constructive Learning Algorithms with Evolutionally Additive Nonlinear Neurons

    NASA Astrophysics Data System (ADS)

    Fang, Le-Heng; Lin, Wei; Luo, Qiang

    In this article, inspired partially by the physiological evidence of brain’s growth and development, we developed a new type of constructive learning algorithm with evolutionally additive nonlinear neurons. The new algorithms have remarkable ability in effective regression and accurate classification. In particular, the algorithms are able to sustain a certain reduction of the loss function when the dynamics of the trained network are bogged down in the vicinity of the local minima. The algorithm augments the neural network by adding only a few connections as well as neurons whose activation functions are nonlinear, nonmonotonic, and self-adapted to the dynamics of the loss functions. Indeed, we analytically demonstrate the reduction dynamics of the algorithm for different problems, and further modify the algorithms so as to obtain an improved generalization capability for the augmented neural networks. Finally, through comparing with the classical algorithm and architecture for neural network construction, we show that our constructive learning algorithms as well as their modified versions have better performances, such as faster training speed and smaller network size, on several representative benchmark datasets including the MNIST dataset for handwriting digits.

  10. Robust Pedestrian Classification Based on Hierarchical Kernel Sparse Representation

    PubMed Central

    Sun, Rui; Zhang, Guanghai; Yan, Xiaoxing; Gao, Jun

    2016-01-01

    Vision-based pedestrian detection has become an active topic in computer vision and autonomous vehicles. It aims at detecting pedestrians appearing ahead of the vehicle using a camera so that autonomous vehicles can assess the danger and take action. Due to varied illumination and appearance, complex background and occlusion pedestrian detection in outdoor environments is a difficult problem. In this paper, we propose a novel hierarchical feature extraction and weighted kernel sparse representation model for pedestrian classification. Initially, hierarchical feature extraction based on a CENTRIST descriptor is used to capture discriminative structures. A max pooling operation is used to enhance the invariance of varying appearance. Then, a kernel sparse representation model is proposed to fully exploit the discrimination information embedded in the hierarchical local features, and a Gaussian weight function as the measure to effectively handle the occlusion in pedestrian images. Extensive experiments are conducted on benchmark databases, including INRIA, Daimler, an artificially generated dataset and a real occluded dataset, demonstrating the more robust performance of the proposed method compared to state-of-the-art pedestrian classification methods. PMID:27537888

  11. Online least squares one-class support vector machines-based abnormal visual event detection.

    PubMed

    Wang, Tian; Chen, Jie; Zhou, Yi; Snoussi, Hichem

    2013-12-12

    The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method.

  12. RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection

    PubMed Central

    Wu, Ke; Zhang, Kun; Fan, Wei; Edwards, Andrea; Yu, Philip S.

    2015-01-01

    Anomaly detection in streaming data is of high interest in numerous application domains. In this paper, we propose a novel one-class semi-supervised algorithm to detect anomalies in streaming data. Underlying the algorithm is a fast and accurate density estimator implemented by multiple fully randomized space trees (RS-Trees), named RS-Forest. The piecewise constant density estimate of each RS-tree is defined on the tree node into which an instance falls. Each incoming instance in a data stream is scored by the density estimates averaged over all trees in the forest. Two strategies, statistical attribute range estimation of high probability guarantee and dual node profiles for rapid model update, are seamlessly integrated into RS-Forest to systematically address the ever-evolving nature of data streams. We derive the theoretical upper bound for the proposed algorithm and analyze its asymptotic properties via bias-variance decomposition. Empirical comparisons to the state-of-the-art methods on multiple benchmark datasets demonstrate that the proposed method features high detection rate, fast response, and insensitivity to most of the parameter settings. Algorithm implementations and datasets are available upon request. PMID:25685112

  13. Relevance popularity: A term event model based feature selection scheme for text classification.

    PubMed

    Feng, Guozhong; An, Baiguo; Yang, Fengqin; Wang, Han; Zhang, Libiao

    2017-01-01

    Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods.

  14. A trace ratio maximization approach to multiple kernel-based dimensionality reduction.

    PubMed

    Jiang, Wenhao; Chung, Fu-lai

    2014-01-01

    Most dimensionality reduction techniques are based on one metric or one kernel, hence it is necessary to select an appropriate kernel for kernel-based dimensionality reduction. Multiple kernel learning for dimensionality reduction (MKL-DR) has been recently proposed to learn a kernel from a set of base kernels which are seen as different descriptions of data. As MKL-DR does not involve regularization, it might be ill-posed under some conditions and consequently its applications are hindered. This paper proposes a multiple kernel learning framework for dimensionality reduction based on regularized trace ratio, termed as MKL-TR. Our method aims at learning a transformation into a space of lower dimension and a corresponding kernel from the given base kernels among which some may not be suitable for the given data. The solutions for the proposed framework can be found based on trace ratio maximization. The experimental results demonstrate its effectiveness in benchmark datasets, which include text, image and sound datasets, for supervised, unsupervised as well as semi-supervised settings. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. Sparse coding joint decision rule for ear print recognition

    NASA Astrophysics Data System (ADS)

    Guermoui, Mawloud; Melaab, Djamel; Mekhalfi, Mohamed Lamine

    2016-09-01

    Human ear recognition has been promoted as a profitable biometric over the past few years. With respect to other modalities, such as the face and iris, that have undergone a significant investigation in the literature, ear pattern is relatively still uncommon. We put forth a sparse coding-induced decision-making for ear recognition. It jointly involves the reconstruction residuals and the respective reconstruction coefficients pertaining to the input features (co-occurrence of adjacent local binary patterns) for a further fusion. We particularly show that combining both components (i.e., the residuals as well as the coefficients) yields better outcomes than the case when either of them is deemed singly. The proposed method has been evaluated on two benchmark datasets, namely IITD1 (125 subject) and IITD2 (221 subjects). The recognition rates of the suggested scheme amount for 99.5% and 98.95% for both datasets, respectively, which suggest that our method decently stands out against reference state-of-the-art methodologies. Furthermore, experiments conclude that the presented scheme manifests a promising robustness under large-scale occlusion scenarios.

  16. Discrimination of Breast Cancer with Microcalcifications on Mammography by Deep Learning

    PubMed Central

    Wang, Jinhua; Yang, Xi; Cai, Hongmin; Tan, Wanchang; Jin, Cangzheng; Li, Li

    2016-01-01

    Microcalcification is an effective indicator of early breast cancer. To improve the diagnostic accuracy of microcalcifications, this study evaluates the performance of deep learning-based models on large datasets for its discrimination. A semi-automated segmentation method was used to characterize all microcalcifications. A discrimination classifier model was constructed to assess the accuracies of microcalcifications and breast masses, either in isolation or combination, for classifying breast lesions. Performances were compared to benchmark models. Our deep learning model achieved a discriminative accuracy of 87.3% if microcalcifications were characterized alone, compared to 85.8% with a support vector machine. The accuracies were 61.3% for both methods with masses alone and improved to 89.7% and 85.8% after the combined analysis with microcalcifications. Image segmentation with our deep learning model yielded 15, 26 and 41 features for the three scenarios, respectively. Overall, deep learning based on large datasets was superior to standard methods for the discrimination of microcalcifications. Accuracy was increased by adopting a combinatorial approach to detect microcalcifications and masses simultaneously. This may have clinical value for early detection and treatment of breast cancer. PMID:27273294

  17. Robust visual tracking via multiscale deep sparse networks

    NASA Astrophysics Data System (ADS)

    Wang, Xin; Hou, Zhiqiang; Yu, Wangsheng; Xue, Yang; Jin, Zefenfen; Dai, Bo

    2017-04-01

    In visual tracking, deep learning with offline pretraining can extract more intrinsic and robust features. It has significant success solving the tracking drift in a complicated environment. However, offline pretraining requires numerous auxiliary training datasets and is considerably time-consuming for tracking tasks. To solve these problems, a multiscale sparse networks-based tracker (MSNT) under the particle filter framework is proposed. Based on the stacked sparse autoencoders and rectifier linear unit, the tracker has a flexible and adjustable architecture without the offline pretraining process and exploits the robust and powerful features effectively only through online training of limited labeled data. Meanwhile, the tracker builds four deep sparse networks of different scales, according to the target's profile type. During tracking, the tracker selects the matched tracking network adaptively in accordance with the initial target's profile type. It preserves the inherent structural information more efficiently than the single-scale networks. Additionally, a corresponding update strategy is proposed to improve the robustness of the tracker. Extensive experimental results on a large scale benchmark dataset show that the proposed method performs favorably against state-of-the-art methods in challenging environments.

  18. MLACP: machine-learning-based prediction of anticancer peptides

    PubMed Central

    Manavalan, Balachandran; Basith, Shaherin; Shin, Tae Hwan; Choi, Sun; Kim, Myeong Ok; Lee, Gwang

    2017-01-01

    Cancer is the second leading cause of death globally, and use of therapeutic peptides to target and kill cancer cells has received considerable attention in recent years. Identification of anticancer peptides (ACPs) through wet-lab experimentation is expensive and often time consuming; therefore, development of an efficient computational method is essential to identify potential ACP candidates prior to in vitro experimentation. In this study, we developed support vector machine- and random forest-based machine-learning methods for the prediction of ACPs using the features calculated from the amino acid sequence, including amino acid composition, dipeptide composition, atomic composition, and physicochemical properties. We trained our methods using the Tyagi-B dataset and determined the machine parameters by 10-fold cross-validation. Furthermore, we evaluated the performance of our methods on two benchmarking datasets, with our results showing that the random forest-based method outperformed the existing methods with an average accuracy and Matthews correlation coefficient value of 88.7% and 0.78, respectively. To assist the scientific community, we also developed a publicly accessible web server at www.thegleelab.org/MLACP.html. PMID:29100375

  19. Textual data compression in computational biology: a synopsis.

    PubMed

    Giancarlo, Raffaele; Scaturro, Davide; Utro, Filippo

    2009-07-01

    Textual data compression, and the associated techniques coming from information theory, are often perceived as being of interest for data communication and storage. However, they are also deeply related to classification and data mining and analysis. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison and reverse engineering of biological networks. The main focus of this review is on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been used. When possible, a unifying organization of the main ideas and techniques is also provided. It goes without saying that most of the research results reviewed here offer software prototypes to the bioinformatics community. The Supplementary Material provides pointers to software and benchmark datasets for a range of applications of broad interest. In addition to provide reference to software, the Supplementary Material also gives a brief presentation of some fundamental results and techniques related to this paper. It is at: http://www.math.unipa.it/ approximately raffaele/suppMaterial/compReview/

  20. Orphan Toxin OrtT (YdcX) of Escherichia coli Reduces Growth during the Stringent Response

    DTIC Science & Technology

    2015-01-29

    antimicrobials trimethoprim and sulfamethoxazole; these antimicrobials induce the stringent response by inhibiting tetrahydrofolate synthesis...in the presence of both antimicrobials trimethoprim and sulfamethoxazole; these antimicrobials induce the stringent response by inhibiting...level [20]. Toxins 2015, 7 301 Despite these difficulties in determining physiological roles, TA systems are clearly phage inhibition systems

  1. Benchmark Comparison of Cloud Analytics Methods Applied to Earth Observations

    NASA Technical Reports Server (NTRS)

    Lynnes, Chris; Little, Mike; Huang, Thomas; Jacob, Joseph; Yang, Phil; Kuo, Kwo-Sen

    2016-01-01

    Cloud computing has the potential to bring high performance computing capabilities to the average science researcher. However, in order to take full advantage of cloud capabilities, the science data used in the analysis must often be reorganized. This typically involves sharding the data across multiple nodes to enable relatively fine-grained parallelism. This can be either via cloud-based file systems or cloud-enabled databases such as Cassandra, Rasdaman or SciDB. Since storing an extra copy of data leads to increased cost and data management complexity, NASA is interested in determining the benefits and costs of various cloud analytics methods for real Earth Observation cases. Accordingly, NASA's Earth Science Technology Office and Earth Science Data and Information Systems project have teamed with cloud analytics practitioners to run a benchmark comparison on cloud analytics methods using the same input data and analysis algorithms. We have particularly looked at analysis algorithms that work over long time series, because these are particularly intractable for many Earth Observation datasets which typically store data with one or just a few time steps per file. This post will present side-by-side cost and performance results for several common Earth observation analysis operations.

  2. Benchmark Comparison of Cloud Analytics Methods Applied to Earth Observations

    NASA Astrophysics Data System (ADS)

    Lynnes, C.; Little, M. M.; Huang, T.; Jacob, J. C.; Yang, C. P.; Kuo, K. S.

    2016-12-01

    Cloud computing has the potential to bring high performance computing capabilities to the average science researcher. However, in order to take full advantage of cloud capabilities, the science data used in the analysis must often be reorganized. This typically involves sharding the data across multiple nodes to enable relatively fine-grained parallelism. This can be either via cloud-based filesystems or cloud-enabled databases such as Cassandra, Rasdaman or SciDB. Since storing an extra copy of data leads to increased cost and data management complexity, NASA is interested in determining the benefits and costs of various cloud analytics methods for real Earth Observation cases. Accordingly, NASA's Earth Science Technology Office and Earth Science Data and Information Systems project have teamed with cloud analytics practitioners to run a benchmark comparison on cloud analytics methods using the same input data and analysis algorithms. We have particularly looked at analysis algorithms that work over long time series, because these are particularly intractable for many Earth Observation datasets which typically store data with one or just a few time steps per file. This post will present side-by-side cost and performance results for several common Earth observation analysis operations.

  3. Improving Protein Fold Recognition by Deep Learning Networks.

    PubMed

    Jo, Taeho; Hou, Jie; Eickholt, Jesse; Cheng, Jianlin

    2015-12-04

    For accurate recognition of protein folds, a deep learning network method (DN-Fold) was developed to predict if a given query-template protein pair belongs to the same structural fold. The input used stemmed from the protein sequence and structural features extracted from the protein pair. We evaluated the performance of DN-Fold along with 18 different methods on Lindahl's benchmark dataset and on a large benchmark set extracted from SCOP 1.75 consisting of about one million protein pairs, at three different levels of fold recognition (i.e., protein family, superfamily, and fold) depending on the evolutionary distance between protein sequences. The correct recognition rate of ensembled DN-Fold for Top 1 predictions is 84.5%, 61.5%, and 33.6% and for Top 5 is 91.2%, 76.5%, and 60.7% at family, superfamily, and fold levels, respectively. We also evaluated the performance of single DN-Fold (DN-FoldS), which showed the comparable results at the level of family and superfamily, compared to ensemble DN-Fold. Finally, we extended the binary classification problem of fold recognition to real-value regression task, which also show a promising performance. DN-Fold is freely available through a web server at http://iris.rnet.missouri.edu/dnfold.

  4. Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function

    PubMed Central

    Tian, Weidong; Zhang, Lan V; Taşan, Murat; Gibbons, Francis D; King, Oliver D; Park, Julie; Wunderlich, Zeba; Cherry, J Michael; Roth, Frederick P

    2008-01-01

    Background: Learning the function of genes is a major goal of computational genomics. Methods for inferring gene function have typically fallen into two categories: 'guilt-by-profiling', which exploits correlation between function and other gene characteristics; and 'guilt-by-association', which transfers function from one gene to another via biological relationships. Results: We have developed a strategy ('Funckenstein') that performs guilt-by-profiling and guilt-by-association and combines the results. Using a benchmark set of functional categories and input data for protein-coding genes in Saccharomyces cerevisiae, Funckenstein was compared with a previous combined strategy. Subsequently, we applied Funckenstein to 2,455 Gene Ontology terms. In the process, we developed 2,455 guilt-by-profiling classifiers based on 8,848 gene characteristics and 12 functional linkage graphs based on 23 biological relationships. Conclusion: Funckenstein outperforms a previous combined strategy using a common benchmark dataset. The combination of 'guilt-by-profiling' and 'guilt-by-association' gave significant improvement over the component classifiers, showing the greatest synergy for the most specific functions. Performance was evaluated by cross-validation and by literature examination of the top-scoring novel predictions. These quantitative predictions should help prioritize experimental study of yeast gene functions. PMID:18613951

  5. Results of the Australasian (Trans-Tasman Oncology Group) radiotherapy benchmarking exercise in preparation for participation in the PORTEC-3 trial.

    PubMed

    Jameson, Michael G; McNamara, Jo; Bailey, Michael; Metcalfe, Peter E; Holloway, Lois C; Foo, Kerwyn; Do, Viet; Mileshkin, Linda; Creutzberg, Carien L; Khaw, Pearly

    2016-08-01

    Protocol deviations in Randomised Controlled Trials have been found to result in a significant decrease in survival and local control. In some cases, the magnitude of the detrimental effect can be larger than the anticipated benefits of the interventions involved. The implementation of appropriate quality assurance of radiotherapy measures for clinical trials has been found to result in fewer deviations from protocol. This paper reports on a benchmarking study conducted in preparation for the PORTEC-3 trial in Australasia. A benchmarking CT dataset was sent to each of the Australasian investigators, it was requested they contour and plan the case according to trial protocol using local treatment planning systems. These data was then sent back to Trans-Tasman Oncology Group for collation and analysis. Thirty three investigators from eighteen institutions across Australia and New Zealand took part in the study. The mean clinical target volume (CTV) volume was 383.4 (228.5-497.8) cm(3) and the mean dose to a reference gold standard CTV was 48.8 (46.4-50.3) Gy. Although there were some large differences in the contouring of the CTV and its constituent parts, these did not translate into large variations in dosimetry. Where individual investigators had deviations from the trial contouring protocol, feedback was provided. The results of this study will be used to compare with the international study QA for the PORTEC-3 trial. © 2016 The Royal Australian and New Zealand College of Radiologists.

  6. Benchmark of four popular virtual screening programs: construction of the active/decoy dataset remains a major determinant of measured performance.

    PubMed

    Chaput, Ludovic; Martinez-Sanz, Juan; Saettel, Nicolas; Mouawad, Liliane

    2016-01-01

    In a structure-based virtual screening, the choice of the docking program is essential for the success of a hit identification. Benchmarks are meant to help in guiding this choice, especially when undertaken on a large variety of protein targets. Here, the performance of four popular virtual screening programs, Gold, Glide, Surflex and FlexX, is compared using the Directory of Useful Decoys-Enhanced database (DUD-E), which includes 102 targets with an average of 224 ligands per target and 50 decoys per ligand, generated to avoid biases in the benchmarking. Then, a relationship between these program performances and the properties of the targets or the small molecules was investigated. The comparison was based on two metrics, with three different parameters each. The BEDROC scores with α = 80.5, indicated that, on the overall database, Glide succeeded (score > 0.5) for 30 targets, Gold for 27, FlexX for 14 and Surflex for 11. The performance did not depend on the hydrophobicity nor the openness of the protein cavities, neither on the families to which the proteins belong. However, despite the care in the construction of the DUD-E database, the small differences that remain between the actives and the decoys likely explain the successes of Gold, Surflex and FlexX. Moreover, the similarity between the actives of a target and its crystal structure ligand seems to be at the basis of the good performance of Glide. When all targets with significant biases are removed from the benchmarking, a subset of 47 targets remains, for which Glide succeeded for only 5 targets, Gold for 4 and FlexX and Surflex for 2. The performance dramatic drop of all four programs when the biases are removed shows that we should beware of virtual screening benchmarks, because good performances may be due to wrong reasons. Therefore, benchmarking would hardly provide guidelines for virtual screening experiments, despite the tendency that is maintained, i.e., Glide and Gold display better performance than FlexX and Surflex. We recommend to always use several programs and combine their results. Graphical AbstractSummary of the results obtained by virtual screening with the four programs, Glide, Gold, Surflex and FlexX, on the 102 targets of the DUD-E database. The percentage of targets with successful results, i.e., with BDEROC(α = 80.5) > 0.5, when the entire database is considered are in Blue, and when targets with biased chemical libraries are removed are in Red.

  7. Missing value imputation for microarray data: a comprehensive comparison study and a web tool

    PubMed Central

    2013-01-01

    Background Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. Results In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. Conclusions In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses. PMID:24565220

  8. Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets

    PubMed Central

    2011-01-01

    Background Verbal autopsy methods are critically important for evaluating the leading causes of death in populations without adequate vital registration systems. With a myriad of analytical and data collection approaches, it is essential to create a high quality validation dataset from different populations to evaluate comparative method performance and make recommendations for future verbal autopsy implementation. This study was undertaken to compile a set of strictly defined gold standard deaths for which verbal autopsies were collected to validate the accuracy of different methods of verbal autopsy cause of death assignment. Methods Data collection was implemented in six sites in four countries: Andhra Pradesh, India; Bohol, Philippines; Dar es Salaam, Tanzania; Mexico City, Mexico; Pemba Island, Tanzania; and Uttar Pradesh, India. The Population Health Metrics Research Consortium (PHMRC) developed stringent diagnostic criteria including laboratory, pathology, and medical imaging findings to identify gold standard deaths in health facilities as well as an enhanced verbal autopsy instrument based on World Health Organization (WHO) standards. A cause list was constructed based on the WHO Global Burden of Disease estimates of the leading causes of death, potential to identify unique signs and symptoms, and the likely existence of sufficient medical technology to ascertain gold standard cases. Blinded verbal autopsies were collected on all gold standard deaths. Results Over 12,000 verbal autopsies on deaths with gold standard diagnoses were collected (7,836 adults, 2,075 children, 1,629 neonates, and 1,002 stillbirths). Difficulties in finding sufficient cases to meet gold standard criteria as well as problems with misclassification for certain causes meant that the target list of causes for analysis was reduced to 34 for adults, 21 for children, and 10 for neonates, excluding stillbirths. To ensure strict independence for the validation of methods and assessment of comparative performance, 500 test-train datasets were created from the universe of cases, covering a range of cause-specific compositions. Conclusions This unique, robust validation dataset will allow scholars to evaluate the performance of different verbal autopsy analytic methods as well as instrument design. This dataset can be used to inform the implementation of verbal autopsies to more reliably ascertain cause of death in national health information systems. PMID:21816095

  9. Detection of Abnormal Events via Optical Flow Feature Analysis

    PubMed Central

    Wang, Tian; Snoussi, Hichem

    2015-01-01

    In this paper, a novel algorithm is proposed to detect abnormal events in video streams. The algorithm is based on the histogram of the optical flow orientation descriptor and the classification method. The details of the histogram of the optical flow orientation descriptor are illustrated for describing movement information of the global video frame or foreground frame. By combining one-class support vector machine and kernel principal component analysis methods, the abnormal events in the current frame can be detected after a learning period characterizing normal behaviors. The difference abnormal detection results are analyzed and explained. The proposed detection method is tested on benchmark datasets, then the experimental results show the effectiveness of the algorithm. PMID:25811227

  10. A label distance maximum-based classifier for multi-label learning.

    PubMed

    Liu, Xiaoli; Bao, Hang; Zhao, Dazhe; Cao, Peng

    2015-01-01

    Multi-label classification is useful in many bioinformatics tasks such as gene function prediction and protein site localization. This paper presents an improved neural network algorithm, Max Label Distance Back Propagation Algorithm for Multi-Label Classification. The method was formulated by modifying the total error function of the standard BP by adding a penalty term, which was realized by maximizing the distance between the positive and negative labels. Extensive experiments were conducted to compare this method against state-of-the-art multi-label methods on three popular bioinformatic benchmark datasets. The results illustrated that this proposed method is more effective for bioinformatic multi-label classification compared to commonly used techniques.

  11. Symmetrical and overloaded effect of diffusion in information filtering

    NASA Astrophysics Data System (ADS)

    Zhu, Xuzhen; Tian, Hui; Chen, Guilin; Cai, Shimin

    2017-10-01

    In physical dynamics, mass diffusion theory has been applied to design effective information filtering models on bipartite network. In previous works, researchers unilaterally believe objects' similarities are determined by single directional mass diffusion from the collected object to the uncollected, meanwhile, inadvertently ignore adverse influence of diffusion overload. It in some extent veils the essence of diffusion in physical dynamics and hurts the recommendation accuracy and diversity. After delicate investigation, we argue that symmetrical diffusion effectively discloses essence of mass diffusion, and high diffusion overload should be published. Accordingly, in this paper, we propose an symmetrical and overload penalized diffusion based model (SOPD), which shows excellent performances in extensive experiments on benchmark datasets Movielens and Netflix.

  12. Elastic K-means using posterior probability.

    PubMed

    Zheng, Aihua; Jiang, Bo; Li, Yan; Zhang, Xuehan; Ding, Chris

    2017-01-01

    The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model.

  13. Automatic multi-organ segmentation using learning-based segmentation and level set optimization.

    PubMed

    Kohlberger, Timo; Sofka, Michal; Zhang, Jingdan; Birkbeck, Neil; Wetzl, Jens; Kaftan, Jens; Declerck, Jérôme; Zhou, S Kevin

    2011-01-01

    We present a novel generic segmentation system for the fully automatic multi-organ segmentation from CT medical images. Thereby we combine the advantages of learning-based approaches on point cloud-based shape representation, such a speed, robustness, point correspondences, with those of PDE-optimization-based level set approaches, such as high accuracy and the straightforward prevention of segment overlaps. In a benchmark on 10-100 annotated datasets for the liver, the lungs, and the kidneys we show that the proposed system yields segmentation accuracies of 1.17-2.89 mm average surface errors. Thereby the level set segmentation (which is initialized by the learning-based segmentations) contributes with an 20%-40% increase in accuracy.

  14. Imbalanced Learning for Functional State Assessment

    NASA Technical Reports Server (NTRS)

    Li, Feng; McKenzie, Frederick; Li, Jiang; Zhang, Guangfan; Xu, Roger; Richey, Carl; Schnell, Tom

    2011-01-01

    This paper presents results of several imbalanced learning techniques applied to operator functional state assessment where the data is highly imbalanced, i.e., some function states (majority classes) have much more training samples than other states (minority classes). Conventional machine learning techniques usually tend to classify all data samples into majority classes and perform poorly for minority classes. In this study, we implemented five imbalanced learning techniques, including random undersampling, random over-sampling, synthetic minority over-sampling technique (SMOTE), borderline-SMOTE and adaptive synthetic sampling (ADASYN) to solve this problem. Experimental results on a benchmark driving lest dataset show thai accuracies for minority classes could be improved dramatically with a cost of slight performance degradations for majority classes,

  15. Exploring the QSAR's predictive truthfulness of the novel N-tuple discrete derivative indices on benchmark datasets.

    PubMed

    Martínez-Santiago, O; Marrero-Ponce, Y; Vivas-Reyes, R; Rivera-Borroto, O M; Hurtado, E; Treto-Suarez, M A; Ramos, Y; Vergara-Murillo, F; Orozco-Ugarriza, M E; Martínez-López, Y

    2017-05-01

    Graph derivative indices (GDIs) have recently been defined over N-atoms (N = 2, 3 and 4) simultaneously, which are based on the concept of derivatives in discrete mathematics (finite difference), metaphorical to the derivative concept in classical mathematical analysis. These molecular descriptors (MDs) codify topo-chemical and topo-structural information based on the concept of the derivative of a molecular graph with respect to a given event (S) over duplex, triplex and quadruplex relations of atoms (vertices). These GDIs have been successfully applied in the description of physicochemical properties like reactivity, solubility and chemical shift, among others, and in several comparative quantitative structure activity/property relationship (QSAR/QSPR) studies. Although satisfactory results have been obtained in previous modelling studies with the aforementioned indices, it is necessary to develop new, more rigorous analysis to assess the true predictive performance of the novel structure codification. So, in the present paper, an assessment and statistical validation of the performance of these novel approaches in QSAR studies are executed, as well as a comparison with those of other QSAR procedures reported in the literature. To achieve the main aim of this research, QSARs were developed on eight chemical datasets widely used as benchmarks in the evaluation/validation of several QSAR methods and/or many different MDs (fundamentally 3D MDs). Three to seven variable QSAR models were built for each chemical dataset, according to the original dissection into training/test sets. The models were developed by using multiple linear regression (MLR) coupled with a genetic algorithm as the feature wrapper selection technique in the MobyDigs software. Each family of GDIs (for duplex, triplex and quadruplex) behaves similarly in all modelling, although there were some exceptions. However, when all families were used in combination, the results achieved were quantitatively higher than those reported by other authors in similar experiments. Comparisons with respect to external correlation coefficients (q 2 ext ) revealed that the models based on GDIs possess superior predictive ability in seven of the eight datasets analysed, outperforming methodologies based on similar or more complex techniques and confirming the good predictive power of the obtained models. For the q 2 ext values, the non-parametric comparison revealed significantly different results to those reported so far, which demonstrated that the models based on DIVATI's indices presented the best global performance and yielded significantly better predictions than the 12 0-3D QSAR procedures used in the comparison. Therefore, GDIs are suitable for structure codification of the molecules and constitute a good alternative to build QSARs for the prediction of physicochemical, biological and environmental endpoints.

  16. The first MICCAI challenge on PET tumor segmentation.

    PubMed

    Hatt, Mathieu; Laurent, Baptiste; Ouahabi, Anouar; Fayad, Hadi; Tan, Shan; Li, Laquan; Lu, Wei; Jaouen, Vincent; Tauber, Clovis; Czakon, Jakub; Drapejkowski, Filip; Dyrka, Witold; Camarasu-Pop, Sorina; Cervenansky, Frédéric; Girard, Pascal; Glatard, Tristan; Kain, Michael; Yao, Yao; Barillot, Christian; Kirov, Assen; Visvikis, Dimitris

    2018-02-01

    Automatic functional volume segmentation in PET images is a challenge that has been addressed using a large array of methods. A major limitation for the field has been the lack of a benchmark dataset that would allow direct comparison of the results in the various publications. In the present work, we describe a comparison of recent methods on a large dataset following recommendations by the American Association of Physicists in Medicine (AAPM) task group (TG) 211, which was carried out within a MICCAI (Medical Image Computing and Computer Assisted Intervention) challenge. Organization and funding was provided by France Life Imaging (FLI). A dataset of 176 images combining simulated, phantom and clinical images was assembled. A website allowed the participants to register and download training data (n = 19). Challengers then submitted encapsulated pipelines on an online platform that autonomously ran the algorithms on the testing data (n = 157) and evaluated the results. The methods were ranked according to the arithmetic mean of sensitivity and positive predictive value. Sixteen teams registered but only four provided manuscripts and pipeline(s) for a total of 10 methods. In addition, results using two thresholds and the Fuzzy Locally Adaptive Bayesian (FLAB) were generated. All competing methods except one performed with median accuracy above 0.8. The method with the highest score was the convolutional neural network-based segmentation, which significantly outperformed 9 out of 12 of the other methods, but not the improved K-Means, Gaussian Model Mixture and Fuzzy C-Means methods. The most rigorous comparative study of PET segmentation algorithms to date was carried out using a dataset that is the largest used in such studies so far. The hierarchy amongst the methods in terms of accuracy did not depend strongly on the subset of datasets or the metrics (or combination of metrics). All the methods submitted by the challengers except one demonstrated good performance with median accuracy scores above 0.8. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Multisource Estimation of Long-term Global Terrestrial Surface Radiation

    NASA Astrophysics Data System (ADS)

    Peng, L.; Sheffield, J.

    2017-12-01

    Land surface net radiation is the essential energy source at the earth's surface. It determines the surface energy budget and its partitioning, drives the hydrological cycle by providing available energy, and offers heat, light, and energy for biological processes. Individual components in net radiation have changed historically due to natural and anthropogenic climate change and land use change. Decadal variations in radiation such as global dimming or brightening have important implications for hydrological and carbon cycles. In order to assess the trends and variability of net radiation and evapotranspiration, there is a need for accurate estimates of long-term terrestrial surface radiation. While large progress in measuring top of atmosphere energy budget has been made, huge discrepancies exist among ground observations, satellite retrievals, and reanalysis fields of surface radiation, due to the lack of observational networks, the difficulty in measuring from space, and the uncertainty in algorithm parameters. To overcome the weakness of single source datasets, we propose a multi-source merging approach to fully utilize and combine multiple datasets of radiation components separately, as they are complementary in space and time. First, we conduct diagnostic analysis of multiple satellite and reanalysis datasets based on in-situ measurements such as Global Energy Balance Archive (GEBA), existing validation studies, and other information such as network density and consistency with other meteorological variables. Then, we calculate the optimal weighted average of multiple datasets by minimizing the variance of error between in-situ measurements and other observations. Finally, we quantify the uncertainties in the estimates of surface net radiation and employ physical constraints based on the surface energy balance to reduce these uncertainties. The final dataset is evaluated in terms of the long-term variability and its attribution to changes in individual components. The goal of this study is to provide a merged observational benchmark for large-scale diagnostic analyses, remote sensing and land surface modeling.

  18. An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier.

    PubMed

    Xia, Jiaqi; Peng, Zhenling; Qi, Dawei; Mu, Hongbo; Yang, Jianyi

    2017-03-15

    Protein fold classification is a critical step in protein structure prediction. There are two possible ways to classify protein folds. One is through template-based fold assignment and the other is ab-initio prediction using machine learning algorithms. Combination of both solutions to improve the prediction accuracy was never explored before. We developed two algorithms, HH-fold and SVM-fold for protein fold classification. HH-fold is a template-based fold assignment algorithm using the HHsearch program. SVM-fold is a support vector machine-based ab-initio classification algorithm, in which a comprehensive set of features are extracted from three complementary sequence profiles. These two algorithms are then combined, resulting to the ensemble approach TA-fold. We performed a comprehensive assessment for the proposed methods by comparing with ab-initio methods and template-based threading methods on six benchmark datasets. An accuracy of 0.799 was achieved by TA-fold on the DD dataset that consists of proteins from 27 folds. This represents improvement of 5.4-11.7% over ab-initio methods. After updating this dataset to include more proteins in the same folds, the accuracy increased to 0.971. In addition, TA-fold achieved >0.9 accuracy on a large dataset consisting of 6451 proteins from 184 folds. Experiments on the LE dataset show that TA-fold consistently outperforms other threading methods at the family, superfamily and fold levels. The success of TA-fold is attributed to the combination of template-based fold assignment and ab-initio classification using features from complementary sequence profiles that contain rich evolution information. http://yanglab.nankai.edu.cn/TA-fold/. yangjy@nankai.edu.cn or mhb-506@163.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  19. Cerebrospinal fluid lactate and pyruvate concentrations and their ratio.

    PubMed

    Zhang, Wan-Ming; Natowicz, Marvin R

    2013-05-01

    Determinations of cerebrospinal fluid (CSF) lactate and pyruvate concentrations and CSF lactate:pyruvate (L/P) ratios are important in several clinical settings, yet published normative data have significant limitations. We sought to determine a large dataset of stringently-defined normative data for CSF lactate and pyruvate concentrations and CSF L/P ratios. We evaluated data from 627 patients who had determinations of CSF lactate and/or CSF pyruvate from 2001 to 2011 at the Cleveland Clinic. Inclusion in the normal reference population required normal CSF cell counts, glucose and protein and routine serum chemistries and absence of progressive brain disorder, epilepsy, or seizure within 24h. Brain MRI, if done, showed no evidence of tumor, acute changes or basal ganglia abnormality. CSF cytology, CSF alanine and immunoglobulin levels, and oligoclonal band analysis were required to be normal, if done. Various inclusion/exclusion criteria were compared. 92 patients fulfilled inclusion/exclusion criteria for a reference population. The 95% central intervals (2.5%-97.5%) for CSF lactate and pyruvate levels were 1.01-2.09mM and 0.03-0.15mM, respectively, and 9.05-26.37 for CSF L/P. There were no significant gender-related differences of CSF lactate or pyruvate concentrations or of CSF L/P. Weak positive correlations between the concentration of CSF lactate or pyruvate and age were noted. Using stringent inclusion/exclusion criteria, we determined normative data for CSF lactate and pyruvate concentrations and CSF L/P ratios in a large, well-characterized reference population. Normalcy of routine CSF and blood analytes are the most important parameters in determining reference intervals for CSF lactate and pyruvate. Copyright © 2012 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

  20. Fast and accurate non-sequential protein structure alignment using a new asymmetric linear sum assignment heuristic.

    PubMed

    Brown, Peter; Pullan, Wayne; Yang, Yuedong; Zhou, Yaoqi

    2016-02-01

    The three dimensional tertiary structure of a protein at near atomic level resolution provides insight alluding to its function and evolution. As protein structure decides its functionality, similarity in structure usually implies similarity in function. As such, structure alignment techniques are often useful in the classifications of protein function. Given the rapidly growing rate of new, experimentally determined structures being made available from repositories such as the Protein Data Bank, fast and accurate computational structure comparison tools are required. This paper presents SPalignNS, a non-sequential protein structure alignment tool using a novel asymmetrical greedy search technique. The performance of SPalignNS was evaluated against existing sequential and non-sequential structure alignment methods by performing trials with commonly used datasets. These benchmark datasets used to gauge alignment accuracy include (i) 9538 pairwise alignments implied by the HOMSTRAD database of homologous proteins; (ii) a subset of 64 difficult alignments from set (i) that have low structure similarity; (iii) 199 pairwise alignments of proteins with similar structure but different topology; and (iv) a subset of 20 pairwise alignments from the RIPC set. SPalignNS is shown to achieve greater alignment accuracy (lower or comparable root-mean squared distance with increased structure overlap coverage) for all datasets, and the highest agreement with reference alignments from the challenging dataset (iv) above, when compared with both sequentially constrained alignments and other non-sequential alignments. SPalignNS was implemented in C++. The source code, binary executable, and a web server version is freely available at: http://sparks-lab.org yaoqi.zhou@griffith.edu.au. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. shinyheatmap: Ultra fast low memory heatmap web interface for big data genomics.

    PubMed

    Khomtchouk, Bohdan B; Hennessy, James R; Wahlestedt, Claes

    2017-01-01

    Transcriptomics, metabolomics, metagenomics, and other various next-generation sequencing (-omics) fields are known for their production of large datasets, especially across single-cell sequencing studies. Visualizing such big data has posed technical challenges in biology, both in terms of available computational resources as well as programming acumen. Since heatmaps are used to depict high-dimensional numerical data as a colored grid of cells, efficiency and speed have often proven to be critical considerations in the process of successfully converting data into graphics. For example, rendering interactive heatmaps from large input datasets (e.g., 100k+ rows) has been computationally infeasible on both desktop computers and web browsers. In addition to memory requirements, programming skills and knowledge have frequently been barriers-to-entry for creating highly customizable heatmaps. We propose shinyheatmap: an advanced user-friendly heatmap software suite capable of efficiently creating highly customizable static and interactive biological heatmaps in a web browser. shinyheatmap is a low memory footprint program, making it particularly well-suited for the interactive visualization of extremely large datasets that cannot typically be computed in-memory due to size restrictions. Also, shinyheatmap features a built-in high performance web plug-in, fastheatmap, for rapidly plotting interactive heatmaps of datasets as large as 105-107 rows within seconds, effectively shattering previous performance benchmarks of heatmap rendering speed. shinyheatmap is hosted online as a freely available web server with an intuitive graphical user interface: http://shinyheatmap.com. The methods are implemented in R, and are available as part of the shinyheatmap project at: https://github.com/Bohdan-Khomtchouk/shinyheatmap. Users can access fastheatmap directly from within the shinyheatmap web interface, and all source code has been made publicly available on Github: https://github.com/Bohdan-Khomtchouk/fastheatmap.

  2. mBEEF-vdW: Robust fitting of error estimation density functionals

    NASA Astrophysics Data System (ADS)

    Lundgaard, Keld T.; Wellendorff, Jess; Voss, Johannes; Jacobsen, Karsten W.; Bligaard, Thomas

    2016-06-01

    We propose a general-purpose semilocal/nonlocal exchange-correlation functional approximation, named mBEEF-vdW. The exchange is a meta generalized gradient approximation, and the correlation is a semilocal and nonlocal mixture, with the Rutgers-Chalmers approximation for van der Waals (vdW) forces. The functional is fitted within the Bayesian error estimation functional (BEEF) framework [J. Wellendorff et al., Phys. Rev. B 85, 235149 (2012), 10.1103/PhysRevB.85.235149; J. Wellendorff et al., J. Chem. Phys. 140, 144107 (2014), 10.1063/1.4870397]. We improve the previously used fitting procedures by introducing a robust MM-estimator based loss function, reducing the sensitivity to outliers in the datasets. To more reliably determine the optimal model complexity, we furthermore introduce a generalization of the bootstrap 0.632 estimator with hierarchical bootstrap sampling and geometric mean estimator over the training datasets. Using this estimator, we show that the robust loss function leads to a 10 % improvement in the estimated prediction error over the previously used least-squares loss function. The mBEEF-vdW functional is benchmarked against popular density functional approximations over a wide range of datasets relevant for heterogeneous catalysis, including datasets that were not used for its training. Overall, we find that mBEEF-vdW has a higher general accuracy than competing popular functionals, and it is one of the best performing functionals on chemisorption systems, surface energies, lattice constants, and dispersion. We also show the potential-energy curve of graphene on the nickel(111) surface, where mBEEF-vdW matches the experimental binding length. mBEEF-vdW is currently available in gpaw and other density functional theory codes through Libxc, version 3.0.0.

  3. densityCut: an efficient and versatile topological approach for automatic clustering of biological data

    PubMed Central

    Ding, Jiarui; Shah, Sohrab; Condon, Anne

    2016-01-01

    Motivation: Many biological data processing problems can be formalized as clustering problems to partition data points into sensible and biologically interpretable groups. Results: This article introduces densityCut, a novel density-based clustering algorithm, which is both time- and space-efficient and proceeds as follows: densityCut first roughly estimates the densities of data points from a K-nearest neighbour graph and then refines the densities via a random walk. A cluster consists of points falling into the basin of attraction of an estimated mode of the underlining density function. A post-processing step merges clusters and generates a hierarchical cluster tree. The number of clusters is selected from the most stable clustering in the hierarchical cluster tree. Experimental results on ten synthetic benchmark datasets and two microarray gene expression datasets demonstrate that densityCut performs better than state-of-the-art algorithms for clustering biological datasets. For applications, we focus on the recent cancer mutation clustering and single cell data analyses, namely to cluster variant allele frequencies of somatic mutations to reveal clonal architectures of individual tumours, to cluster single-cell gene expression data to uncover cell population compositions, and to cluster single-cell mass cytometry data to detect communities of cells of the same functional states or types. densityCut performs better than competing algorithms and is scalable to large datasets. Availability and Implementation: Data and the densityCut R package is available from https://bitbucket.org/jerry00/densitycut_dev. Contact: condon@cs.ubc.ca or sshah@bccrc.ca or jiaruid@cs.ubc.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153661

  4. a Comparison of Two Strategies for Avoiding Negative Transfer in Domain Adaptation Based on Logistic Regression

    NASA Astrophysics Data System (ADS)

    Paul, A.; Vogt, K.; Rottensteiner, F.; Ostermann, J.; Heipke, C.

    2018-05-01

    In this paper we deal with the problem of measuring the similarity between training and tests datasets in the context of transfer learning (TL) for image classification. TL tries to transfer knowledge from a source domain, where labelled training samples are abundant but the data may follow a different distribution, to a target domain, where labelled training samples are scarce or even unavailable, assuming that the domains are related. Thus, the requirements w.r.t. the availability of labelled training samples in the target domain are reduced. In particular, if no labelled target data are available, it is inherently difficult to find a robust measure of relatedness between the source and target domains. This is of crucial importance for the performance of TL, because the knowledge transfer between unrelated data may lead to negative transfer, i.e. to a decrease of classification performance after transfer. We address the problem of measuring the relatedness between source and target datasets and investigate three different strategies to predict and, consequently, to avoid negative transfer in this paper. The first strategy is based on circular validation. The second strategy relies on the Maximum Mean Discrepancy (MMD) similarity metric, whereas the third one is an extension of MMD which incorporates the knowledge about the class labels in the source domain. Our method is evaluated using two different benchmark datasets. The experiments highlight the strengths and weaknesses of the investigated methods. We also show that it is possible to reduce the amount of negative transfer using these strategies for a TL method and to generate a consistent performance improvement over the whole dataset.

  5. Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis.

    PubMed

    Ozçift, Akin

    2011-05-01

    Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.

  6. The Interaction Network Ontology-supported modeling and mining of complex interactions represented with multiple keywords in biomedical literature.

    PubMed

    Özgür, Arzucan; Hur, Junguk; He, Yongqun

    2016-01-01

    The Interaction Network Ontology (INO) logically represents biological interactions, pathways, and networks. INO has been demonstrated to be valuable in providing a set of structured ontological terms and associated keywords to support literature mining of gene-gene interactions from biomedical literature. However, previous work using INO focused on single keyword matching, while many interactions are represented with two or more interaction keywords used in combination. This paper reports our extension of INO to include combinatory patterns of two or more literature mining keywords co-existing in one sentence to represent specific INO interaction classes. Such keyword combinations and related INO interaction type information could be automatically obtained via SPARQL queries, formatted in Excel format, and used in an INO-supported SciMiner, an in-house literature mining program. We studied the gene interaction sentences from the commonly used benchmark Learning Logic in Language (LLL) dataset and one internally generated vaccine-related dataset to identify and analyze interaction types containing multiple keywords. Patterns obtained from the dependency parse trees of the sentences were used to identify the interaction keywords that are related to each other and collectively represent an interaction type. The INO ontology currently has 575 terms including 202 terms under the interaction branch. The relations between the INO interaction types and associated keywords are represented using the INO annotation relations: 'has literature mining keywords' and 'has keyword dependency pattern'. The keyword dependency patterns were generated via running the Stanford Parser to obtain dependency relation types. Out of the 107 interactions in the LLL dataset represented with two-keyword interaction types, 86 were identified by using the direct dependency relations. The LLL dataset contained 34 gene regulation interaction types, each of which associated with multiple keywords. A hierarchical display of these 34 interaction types and their ancestor terms in INO resulted in the identification of specific gene-gene interaction patterns from the LLL dataset. The phenomenon of having multi-keyword interaction types was also frequently observed in the vaccine dataset. By modeling and representing multiple textual keywords for interaction types, the extended INO enabled the identification of complex biological gene-gene interactions represented with multiple keywords.

  7. Fast Longwave and Shortwave Radiative Fluxes (FLASHFlux) From CERES and MODIS Measurements

    NASA Astrophysics Data System (ADS)

    Stackhouse, Paul; Gupta, Shashi; Kratz, David; Geier, Erika; Edwards, Anne; Wilber, Anne

    The Clouds and the Earth's Radiant Energy System (CERES) project is currently producing highly accurate surface and top-of-atmosphere (TOA) radiation budget datasets from measurements taken by CERES broadband radiometers and a subset of imaging channels on the Moderate-resolution Imaging Spectroradiometer (MODIS) instrument operating onboard Terra and Aqua satellites. The primary objective of CERES is to produce highly accurate and stable time-series datasets of radiation budget parameters to meet the needs of climate change research. Accomplishing such accuracy and stability requires monitoring the calibration and stability of the instruments, maintaining constancy of processing algorithms and meteorological inputs, and extensively validating the products against independent measurements. Such stringent requirements inevitably delay the release of products to the user community by as much as six months to a year. While such delays are inconsequential for climate research, other applications like short-term and seasonal predictions, agricultural and solar energy research, ocean and atmosphere assimilation, and field experiment support could greatly benefit if CERES products were available quickly after satellite measurements. To meet the needs of the latter class of applications, FLASHFlux was developed and is being implemented at the NASA/LaRC. FLASHFlux produces reliable surface and TOA radiative parameters within a one week of satellite observations using CERES "quicklook" data stream and fast surface flux algorithms. Cloud properties used in flux computation are derived concurrently using MODIS channel radiances. In the process, a modest degree of accuracy is sacrificed in the interest of speed. All fluxes are derived initially on a CERES footprint basis. Daily average fluxes are then derived on a 1° x1° grid in the next stage of processing. To date, FLASHFlux datasets have been used in operational processing of CloudSat data, in support of a field experiment, and for the S'COOL education outreach program. In this presentation, examples will be presented of footprint level and gridded/daily averaged fluxes and their validation. FLASHFlux datasets are available to the science community at the LaRC Atmospheric Sciences Data Center (ASDC) at: eosweb.larc.nasa.gov/PRODOCS/flashflux/table flashflux.html.

  8. Registering medicines for low-income countries: how suitable are the stringent review procedures of the World Health Organisation, the US Food and Drug Administration and the European Medicines Agency?

    PubMed

    Doua, Joachim Y; Van Geertruyden, Jean-Pierre

    2014-01-01

    New medicines are registered after a resource-demanding process. Unfortunately, in low-income countries (LICs), demand outweighs resources. To facilitate registration in LICs, stringent review procedures of the European Medicines Agency (EMA Article-58), Food and Drug Administration (FDA PEPFAR-linked review) and WHO Prequalification programme have been established. Only the PEPFAR-linked review gives approval, while the others make recommendations for approval. This study assessed the performance and discussed the challenges of these three stringent review procedures. Data from WHO, FDA, EMA, Medline and Internet were analysed. Over 60% of medicines reviewed by stringent review procedures are manufactured in India. Until 2012, WHO prequalified 400 medicines (211 vaccines, 130 antiretrovirals, 29 tuberculostatics, 15 antimalarials and 15 others). PEPFAR-linked review approved 156 antiretrovirals, while EMA Article 58 recommended approval of 3 antiretrovirals, 1 vaccine and 1 antimalarial. WHO Prequalification and PEPFAR-linked review are free of charge and as a result have accelerated access to antiretrovirals. They both built capacity in sub-Saharan Africa, although WHO prequalification relies technically on stringent regulatory authorities and financially on donors. Article-58 offers the largest disease coverage and strongest technical capacities, is costly and involves fewer LICs. To meet the high demand for quality medicines in LICs, these stringent review procedures need to enlarge their disease coverage. To improve registration, EMA Article 58 should actively involve LICs. Furthermore, LIC regulatory activities must not be fully resigned to stringent review procedure. © 2013 John Wiley & Sons Ltd.

  9. Benchmarking successional progress in a quantitative food web.

    PubMed

    Boit, Alice; Gaedke, Ursula

    2014-01-01

    Central to ecology and ecosystem management, succession theory aims to mechanistically explain and predict the assembly and development of ecological communities. Yet processes at lower hierarchical levels, e.g. at the species and functional group level, are rarely mechanistically linked to the under-investigated system-level processes which drive changes in ecosystem properties and functioning and are comparable across ecosystems. As a model system for secondary succession, seasonal plankton succession during the growing season is readily observable and largely driven autogenically. We used a long-term dataset from large, deep Lake Constance comprising biomasses, auto- and heterotrophic production, food quality, functional diversity, and mass-balanced food webs of the energy and nutrient flows between functional guilds of plankton and partly fish. Extracting population- and system-level indices from this dataset, we tested current hypotheses about the directionality of successional progress which are rooted in ecosystem theory, the metabolic theory of ecology, quantitative food web theory, thermodynamics, and information theory. Our results indicate that successional progress in Lake Constance is quantifiable, passing through predictable stages. Mean body mass, functional diversity, predator-prey weight ratios, trophic positions, system residence times of carbon and nutrients, and the complexity of the energy flow patterns increased during succession. In contrast, both the mass-specific metabolic activity and the system export decreased, while the succession rate exhibited a bimodal pattern. The weighted connectance introduced here represents a suitable index for assessing the evenness and interconnectedness of energy flows during succession. Diverging from earlier predictions, ascendency and eco-exergy did not increase during succession. Linking aspects of functional diversity to metabolic theory and food web complexity, we reconcile previously disjoint bodies of ecological theory to form a complete picture of successional progress within a pelagic food web. This comprehensive synthesis may be used as a benchmark for quantifying successional progress in other ecosystems.

  10. Biclustering as a method for RNA local multiple sequence alignment.

    PubMed

    Wang, Shu; Gutell, Robin R; Miranker, Daniel P

    2007-12-15

    Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering is intended to address. We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was tested on the subsets of the BRAliBase 2.1 benchmark suite that display high variability and on an extension to that suite to larger problem sizes. Also, alignments were evaluated of two large datasets of current biological interest, T box sequences and Group IC1 Introns. The results were compared with alignments computed by ClustalW, MAFFT, MUCLE and PROBCONS alignment programs using Sum of Pairs (SPS) and Consensus Count. Results for the benchmark suite are sensitive to problem size. On problems of 15 or greater sequences, BlockMSA is consistently the best. On none of the problems in the test suite are there appreciable differences in scores among BlockMSA, MAFFT and PROBCONS. On the T box sequences, BlockMSA does the most faithful job of reproducing known annotations. MAFFT and PROBCONS do not. On the Intron sequences, BlockMSA, MAFFT and MUSCLE are comparable at identifying conserved regions. BlockMSA is implemented in Java. Source code and supplementary datasets are available at http://aug.csres.utexas.edu/msa/

  11. Identifying Drug-Target Interactions with Decision Templates.

    PubMed

    Yan, Xiao-Ying; Zhang, Shao-Wu

    2018-01-01

    During the development process of new drugs, identification of the drug-target interactions wins primary concerns. However, the chemical or biological experiments bear the limitation in coverage as well as the huge cost of both time and money. Based on drug similarity and target similarity, chemogenomic methods can be able to predict potential drug-target interactions (DTIs) on a large scale and have no luxurious need about target structures or ligand entries. In order to reflect the cases that the drugs having variant structures interact with common targets and the targets having dissimilar sequences interact with same drugs. In addition, though several other similarity metrics have been developed to predict DTIs, the combination of multiple similarity metrics (especially heterogeneous similarities) is too naïve to sufficiently explore the multiple similarities. In this paper, based on Gene Ontology and pathway annotation, we introduce two novel target similarity metrics to address above issues. More importantly, we propose a more effective strategy via decision template to integrate multiple classifiers designed with multiple similarity metrics. In the scenarios that predict existing targets for new drugs and predict approved drugs for new protein targets, the results on the DTI benchmark datasets show that our target similarity metrics are able to enhance the predictive accuracies in two scenarios. And the elaborate fusion strategy of multiple classifiers has better predictive power than the naïve combination of multiple similarity metrics. Compared with other two state-of-the-art approaches on the four popular benchmark datasets of binary drug-target interactions, our method achieves the best results in terms of AUC and AUPR for predicting available targets for new drugs (S2), and predicting approved drugs for new protein targets (S3).These results demonstrate that our method can effectively predict the drug-target interactions. The software package can freely available at https://github.com/NwpuSY/DT_all.git for academic users. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  12. Benchmarking Successional Progress in a Quantitative Food Web

    PubMed Central

    Boit, Alice; Gaedke, Ursula

    2014-01-01

    Central to ecology and ecosystem management, succession theory aims to mechanistically explain and predict the assembly and development of ecological communities. Yet processes at lower hierarchical levels, e.g. at the species and functional group level, are rarely mechanistically linked to the under-investigated system-level processes which drive changes in ecosystem properties and functioning and are comparable across ecosystems. As a model system for secondary succession, seasonal plankton succession during the growing season is readily observable and largely driven autogenically. We used a long-term dataset from large, deep Lake Constance comprising biomasses, auto- and heterotrophic production, food quality, functional diversity, and mass-balanced food webs of the energy and nutrient flows between functional guilds of plankton and partly fish. Extracting population- and system-level indices from this dataset, we tested current hypotheses about the directionality of successional progress which are rooted in ecosystem theory, the metabolic theory of ecology, quantitative food web theory, thermodynamics, and information theory. Our results indicate that successional progress in Lake Constance is quantifiable, passing through predictable stages. Mean body mass, functional diversity, predator-prey weight ratios, trophic positions, system residence times of carbon and nutrients, and the complexity of the energy flow patterns increased during succession. In contrast, both the mass-specific metabolic activity and the system export decreased, while the succession rate exhibited a bimodal pattern. The weighted connectance introduced here represents a suitable index for assessing the evenness and interconnectedness of energy flows during succession. Diverging from earlier predictions, ascendency and eco-exergy did not increase during succession. Linking aspects of functional diversity to metabolic theory and food web complexity, we reconcile previously disjoint bodies of ecological theory to form a complete picture of successional progress within a pelagic food web. This comprehensive synthesis may be used as a benchmark for quantifying successional progress in other ecosystems. PMID:24587353

  13. Homogenising time series: Beliefs, dogmas and facts

    NASA Astrophysics Data System (ADS)

    Domonkos, P.

    2010-09-01

    For obtaining reliable information about climate change and climate variability the use of high quality data series is essentially important, and one basic tool of quality improvements is the statistical homogenisation of observed time series. In the recent decades large number of homogenisation methods has been developed, but the real effects of their application on time series are still not known entirely. The ongoing COST HOME project (COST ES0601) is devoted to reveal the real impacts of homogenisation methods more detailed and with higher confidence than earlier. As part of the COST activity, a benchmark dataset was built whose characteristics approach well the characteristics of real networks of observed time series. This dataset offers much better opportunity than ever to test the wide variety of homogenisation methods, and analyse the real effects of selected theoretical recommendations. The author believes that several old theoretical rules have to be re-evaluated. Some examples of the hot questions, a) Statistically detected change-points can be accepted only with the confirmation of metadata information? b) Do semi-hierarchic algorithms for detecting multiple change-points in time series function effectively in practise? c) Is it good to limit the spatial comparison of candidate series with up to five other series in the neighbourhood? Empirical results - those from the COST benchmark, and other experiments too - show that real observed time series usually include several inhomogeneities of different sizes. Small inhomogeneities seem like part of the climatic variability, thus the pure application of classic theory that change-points of observed time series can be found and corrected one-by-one is impossible. However, after homogenisation the linear trends, seasonal changes and long-term fluctuations of time series are usually much closer to the reality, than in raw time series. The developers and users of homogenisation methods have to bear in mind that the eventual purpose of homogenisation is not to find change-points, but to have the observed time series with statistical properties those characterise well the climate change and climate variability.

  14. Modeling and Prediction of Fan Noise

    NASA Technical Reports Server (NTRS)

    Envia, Ed

    2008-01-01

    Fan noise is a significant contributor to the total noise signature of a modern high bypass ratio aircraft engine and with the advent of ultra high bypass ratio engines like the geared turbofan, it is likely to remain so in the future. As such, accurate modeling and prediction of the basic characteristics of fan noise are necessary ingredients in designing quieter aircraft engines in order to ensure compliance with ever more stringent aviation noise regulations. In this paper, results from a comprehensive study aimed at establishing the utility of current tools for modeling and predicting fan noise will be summarized. It should be emphasized that these tools exemplify present state of the practice and embody what is currently used at NASA and Industry for predicting fan noise. The ability of these tools to model and predict fan noise is assessed against a set of benchmark fan noise databases obtained for a range of representative fan cycles and operating conditions. Detailed comparisons between the predicted and measured narrowband spectral and directivity characteristics of fan nose will be presented in the full paper. General conclusions regarding the utility of current tools and recommendations for future improvements will also be given.

  15. Underwater Robot Task Planning Using Multi-Objective Meta-Heuristics

    PubMed Central

    Landa-Torres, Itziar; Manjarres, Diana; Bilbao, Sonia; Del Ser, Javier

    2017-01-01

    Robotics deployed in the underwater medium are subject to stringent operational conditions that impose a high degree of criticality on the allocation of resources and the schedule of operations in mission planning. In this context the so-called cost of a mission must be considered as an additional criterion when designing optimal task schedules within the mission at hand. Such a cost can be conceived as the impact of the mission on the robotic resources themselves, which range from the consumption of battery to other negative effects such as mechanic erosion. This manuscript focuses on this issue by devising three heuristic solvers aimed at efficiently scheduling tasks in robotic swarms, which collaborate together to accomplish a mission, and by presenting experimental results obtained over realistic scenarios in the underwater environment. The heuristic techniques resort to a Random-Keys encoding strategy to represent the allocation of robots to tasks and the relative execution order of such tasks within the schedule of certain robots. The obtained results reveal interesting differences in terms of Pareto optimality and spread between the algorithms considered in the benchmark, which are insightful for the selection of a proper task scheduler in real underwater campaigns. PMID:28375160

  16. Search for heavy resonances that decay into a vector boson and a Higgs boson in hadronic final states at √{s} = 13 {TeV}

    NASA Astrophysics Data System (ADS)

    Sirunyan, A. M.; Tumasyan, A.; Adam, W.; Ambrogi, F.; Asilar, E.; Bergauer, T.; Brandstetter, J.; Brondolin, E.; Dragicevic, M.; Erö, J.; Flechl, M.; Friedl, M.; Frühwirth, R.; Ghete, V. M.; Grossmann, J.; Hrubec, J.; Jeitler, M.; König, A.; Krammer, N.; Krätschmer, I.; Liko, D.; Madlener, T.; Mikulec, I.; Pree, E.; Rabady, D.; Rad, N.; Rohringer, H.; Schieck, J.; Schöfbeck, R.; Spanring, M.; Spitzbart, D.; Strauss, J.; Waltenberger, W.; Wittmann, J.; Wulz, C.-E.; Zarucki, M.; Chekhovsky, V.; Mossolov, V.; Gonzalez, J. Suarez; De Wolf, E. A.; Di Croce, D.; Janssen, X.; Lauwers, J.; Van Haevermaet, H.; Van Mechelen, P.; Van Remortel, N.; Zeid, S. Abu; Blekman, F.; D'Hondt, J.; De Bruyn, I.; De Clercq, J.; Deroover, K.; Flouris, G.; Lontkovskyi, D.; Lowette, S.; Moortgat, S.; Moreels, L.; Olbrechts, A.; Python, Q.; Skovpen, K.; Tavernier, S.; Van Doninck, W.; Van Mulders, P.; Van Parijs, I.; Brun, H.; Clerbaux, B.; De Lentdecker, G.; Delannoy, H.; Fasanella, G.; Favart, L.; Goldouzian, R.; Grebenyuk, A.; Karapostoli, G.; Lenzi, T.; Luetic, J.; Maerschalk, T.; Marinov, A.; Randle-conde, A.; Seva, T.; Velde, C. Vander; Vanlaer, P.; Vannerom, D.; Yonamine, R.; Zenoni, F.; Zhang, F.; Cimmino, A.; Cornelis, T.; Dobur, D.; Fagot, A.; Gul, M.; Khvastunov, I.; Poyraz, D.; Roskas, C.; Salva, S.; Tytgat, M.; Verbeke, W.; Zaganidis, N.; Bakhshiansohi, H.; Bondu, O.; Brochet, S.; Bruno, G.; Caudron, A.; De Visscher, S.; Delaere, C.; Delcourt, M.; Francois, B.; Giammanco, A.; Jafari, A.; Komm, M.; Krintiras, G.; Lemaitre, V.; Magitteri, A.; Mertens, A.; Musich, M.; Piotrzkowski, K.; Quertenmont, L.; Marono, M. Vidal; Wertz, S.; Beliy, N.; Aldá Júnior, W. L.; Alves, F. L.; Alves, G. A.; Brito, L.; Martins Junior, M. Correa; Hensel, C.; Moraes, A.; Pol, M. E.; Teles, P. Rebello; Chagas, E. Belchior Batista Das; Carvalho, W.; Chinellato, J.; Custódio, A.; Costa, E. M. Da; Silveira, G. G. Da; Damiao, D. De Jesus; De Souza, S. Fonseca; Guativa, L. M. Huertas; Malbouisson, H.; De Almeida, M. Melo; Herrera, C. Mora; Mundim, L.; Nogima, H.; Santoro, A.; Sznajder, A.; Manganote, E. J. Tonelli; De Araujo, F. Torres Da Silva; Pereira, A. Vilela; Ahuja, S.; Bernardes, C. A.; Tomei, T. R. Fernandez Perez; Gregores, E. M.; Mercadante, P. G.; Novaes, S. F.; Padula, Sandra S.; Abad, D. Romero; Vargas, J. C. Ruiz; Aleksandrov, A.; Hadjiiska, R.; Iaydjiev, P.; Misheva, M.; Rodozov, M.; Shopova, M.; Stoykova, S.; Sultanov, G.; Dimitrov, A.; Glushkov, I.; Litov, L.; Pavlov, B.; Petkov, P.; Fang, W.; Gao, X.; Ahmad, M.; Bian, J. G.; Chen, G. M.; Chen, H. S.; Chen, M.; Chen, Y.; Jiang, C. H.; Leggat, D.; Liao, H.; Liu, Z.; Romeo, F.; Shaheen, S. M.; Spiezia, A.; Tao, J.; Wang, C.; Wang, Z.; Yazgan, E.; Zhang, H.; Zhao, J.; Ban, Y.; Chen, G.; Li, Q.; Liu, S.; Mao, Y.; Qian, S. J.; Wang, D.; Xu, Z.; Avila, C.; Cabrera, A.; Sierra, L. F. Chaparro; Florez, C.; Hernández, C. F. González; Alvarez, J. D. Ruiz; Courbon, B.; Godinovic, N.; Lelas, D.; Puljak, I.; Cipriano, P. M. Ribeiro; Sculac, T.; Antunovic, Z.; Kovac, M.; Brigljevic, V.; Ferencek, D.; Kadija, K.; Mesic, B.; Starodumov, A.; Susa, T.; Ather, M. W.; Attikis, A.; Mavromanolakis, G.; Mousa, J.; Nicolaou, C.; Ptochos, F.; Razis, P. A.; Rykaczewski, H.; Finger, M.; Finger, M.; Jarrin, E. Carrera; Abdelalim, A. A.; Mohammed, Y.; Salama, E.; Dewanjee, R. K.; Kadastik, M.; Perrini, L.; Raidal, M.; Tiko, A.; Veelken, C.; Eerola, P.; Pekkanen, J.; Voutilainen, M.; Härkönen, J.; Järvinen, T.; Karimäki, V.; Kinnunen, R.; Lampén, T.; Lassila-Perini, K.; Lehti, S.; Lindén, T.; Luukka, P.; Tuominen, E.; Tuominiemi, J.; Tuovinen, E.; Talvitie, J.; Tuuva, T.; Besancon, M.; Couderc, F.; Dejardin, M.; Denegri, D.; Faure, J. L.; Ferri, F.; Ganjour, S.; Ghosh, S.; Givernaud, A.; Gras, P.; de Monchenault, G. Hamel; Jarry, P.; Kucher, I.; Locci, E.; Machet, M.; Malcles, J.; Negro, G.; Rander, J.; Rosowsky, A.; Sahin, M. Ö.; Titov, M.; Abdulsalam, A.; Antropov, I.; Baffioni, S.; Beaudette, F.; Busson, P.; Cadamuro, L.; Charlot, C.; de Cassagnac, R. Granier; Jo, M.; Lisniak, S.; Lobanov, A.; Blanco, J. Martin; Nguyen, M.; Ochando, C.; Ortona, G.; Paganini, P.; Pigard, P.; Regnard, S.; Salerno, R.; Sauvan, J. B.; Sirois, Y.; Leiton, A. G. Stahl; Strebler, T.; Yilmaz, Y.; Zabi, A.; Zghiche, A.; Agram, J.-L.; Andrea, J.; Bloch, D.; Brom, J.-M.; Buttignol, M.; Chabert, E. C.; Chanon, N.; Collard, C.; Conte, E.; Coubez, X.; Fontaine, J.-C.; Gelé, D.; Goerlach, U.; Jansová, M.; Bihan, A.-C. Le; Tonon, N.; Van Hove, P.; Gadrat, S.; Beauceron, S.; Bernet, C.; Boudoul, G.; Chierici, R.; Contardo, D.; Depasse, P.; Mamouni, H. El; Fay, J.; Finco, L.; Gascon, S.; Gouzevitch, M.; Grenier, G.; Ille, B.; Lagarde, F.; Laktineh, I. B.; Lethuillier, M.; Mirabito, L.; Pequegnot, A. L.; Perries, S.; Popov, A.; Sordini, V.; Donckt, M. Vander; Viret, S.; Toriashvili, T.; Tsamalaidze, Z.; Autermann, C.; Beranek, S.; Feld, L.; Kiesel, M. K.; Klein, K.; Lipinski, M.; Preuten, M.; Schomakers, C.; Schulz, J.; Verlage, T.; Albert, A.; Dietz-Laursonn, E.; Duchardt, D.; Endres, M.; Erdmann, M.; Erdweg, S.; Esch, T.; Fischer, R.; Güth, A.; Hamer, M.; Hebbeker, T.; Heidemann, C.; Hoepfner, K.; Knutzen, S.; Merschmeyer, M.; Meyer, A.; Millet, P.; Mukherjee, S.; Olschewski, M.; Padeken, K.; Pook, T.; Radziej, M.; Reithler, H.; Rieger, M.; Scheuch, F.; Teyssier, D.; Thüer, S.; Flügge, G.; Kargoll, B.; Kress, T.; Künsken, A.; Lingemann, J.; Müller, T.; Nehrkorn, A.; Nowack, A.; Pistone, C.; Pooth, O.; Stahl, A.; Martin, M. Aldaya; Arndt, T.; Asawatangtrakuldee, C.; Beernaert, K.; Behnke, O.; Behrens, U.; Martínez, A. Bermúdez; Anuar, A. A. Bin; Borras, K.; Botta, V.; Campbell, A.; Connor, P.; Contreras-Campana, C.; Costanza, F.; Pardos, C. Diez; Eckerlin, G.; Eckstein, D.; Eichhorn, T.; Eren, E.; Gallo, E.; Garcia, J. Garay; Geiser, A.; Gizhko, A.; Luyando, J. M. Grados; Grohsjean, A.; Gunnellini, P.; Harb, A.; Hauk, J.; Hempel, M.; Jung, H.; Kalogeropoulos, A.; Kasemann, M.; Keaveney, J.; Kleinwort, C.; Korol, I.; Krücker, D.; Lange, W.; Lelek, A.; Lenz, T.; Leonard, J.; Lipka, K.; Lohmann, W.; Mankel, R.; Melzer-Pellmann, I.-A.; Meyer, A. B.; Mittag, G.; Mnich, J.; Mussgiller, A.; Ntomari, E.; Pitzl, D.; Placakyte, R.; Raspereza, A.; Roland, B.; Savitskyi, M.; Saxena, P.; Shevchenko, R.; Spannagel, S.; Stefaniuk, N.; Van Onsem, G. P.; Walsh, R.; Wen, Y.; Wichmann, K.; Wissing, C.; Zenaiev, O.; Bein, S.; Blobel, V.; Vignali, M. Centis; Draeger, A. R.; Dreyer, T.; Garutti, E.; Gonzalez, D.; Haller, J.; Hinzmann, A.; Hoffmann, M.; Karavdina, A.; Klanner, R.; Kogler, R.; Kovalchuk, N.; Kurz, S.; Lapsien, T.; Marchesini, I.; Marconi, D.; Meyer, M.; Niedziela, M.; Nowatschin, D.; Pantaleo, F.; Peiffer, T.; Perieanu, A.; Scharf, C.; Schleper, P.; Schmidt, A.; Schumann, S.; Schwandt, J.; Sonneveld, J.; Stadie, H.; Steinbrück, G.; Stober, F. M.; Stöver, M.; Tholen, H.; Troendle, D.; Usai, E.; Vanelderen, L.; Vanhoefer, A.; Vormwald, B.; Akbiyik, M.; Barth, C.; Baur, S.; Butz, E.; Caspart, R.; Chwalek, T.; Colombo, F.; De Boer, W.; Dierlamm, A.; Freund, B.; Friese, R.; Giffels, M.; Gilbert, A.; Haitz, D.; Hartmann, F.; Heindl, S. M.; Husemann, U.; Kassel, F.; Kudella, S.; Mildner, H.; Mozer, M. U.; Müller, Th.; Plagge, M.; Quast, G.; Rabbertz, K.; Schröder, M.; Shvetsov, I.; Sieber, G.; Simonis, H. J.; Ulrich, R.; Wayand, S.; Weber, M.; Weiler, T.; Williamson, S.; Wöhrmann, C.; Wolf, R.; Anagnostou, G.; Daskalakis, G.; Geralis, T.; Giakoumopoulou, V. A.; Kyriakis, A.; Loukas, D.; Topsis-Giotis, I.; Kesisoglou, S.; Panagiotou, A.; Saoulidou, N.; Evangelou, I.; Foudas, C.; Kokkas, P.; Mallios, S.; Manthos, N.; Papadopoulos, I.; Paradas, E.; Strologas, J.; Triantis, F. A.; Csanad, M.; Filipovic, N.; Pasztor, G.; Bencze, G.; Hajdu, C.; Horvath, D.; Hunyadi, Á.; Sikler, F.; Veszpremi, V.; Vesztergombi, G.; Zsigmond, A. J.; Beni, N.; Czellar, S.; Karancsi, J.; Makovec, A.; Molnar, J.; Szillasi, Z.; Bartók, M.; Raics, P.; Trocsanyi, Z. L.; Ujvari, B.; Choudhury, S.; Komaragiri, J. R.; Bahinipati, S.; Bhowmik, S.; Mal, P.; Mandal, K.; Nayak, A.; Sahoo, D. K.; Sahoo, N.; Swain, S. K.; Bansal, S.; Beri, S. B.; Bhatnagar, V.; Bhawandeep, U.; Chawla, R.; Dhingra, N.; Kalsi, A. K.; Kaur, A.; Kaur, M.; Kumar, R.; Kumari, P.; Mehta, A.; Singh, J. B.; Walia, G.; Kumar, Ashok; Shah, Aashaq; Bhardwaj, A.; Chauhan, S.; Choudhary, B. C.; Garg, R. B.; Keshri, S.; Kumar, A.; Malhotra, S.; Naimuddin, M.; Ranjan, K.; Sharma, R.; Sharma, V.; Bhardwaj, R.; Bhattacharya, R.; Bhattacharya, S.; Dey, S.; Dutt, S.; Dutta, S.; Ghosh, S.; Majumdar, N.; Modak, A.; Mondal, K.; Mukhopadhyay, S.; Nandan, S.; Purohit, A.; Roy, A.; Roy, D.; Chowdhury, S. Roy; Sarkar, S.; Sharan, M.; Thakur, S.; Behera, P. K.; Chudasama, R.; Dutta, D.; Jha, V.; Kumar, V.; Mohanty, A. K.; Netrakanti, P. K.; Pant, L. M.; Shukla, P.; Topkar, A.; Aziz, T.; Dugad, S.; Mahakud, B.; Mitra, S.; Mohanty, G. B.; Parida, B.; Sur, N.; Sutar, B.; Banerjee, S.; Bhattacharya, S.; Chatterjee, S.; Das, P.; Guchait, M.; Jain, Sa.; Kumar, S.; Maity, M.; Majumder, G.; Mazumdar, K.; Sarkar, T.; Wickramage, N.; Chauhan, S.; Dube, S.; Hegde, V.; Kapoor, A.; Kothekar, K.; Pandey, S.; Rane, A.; Sharma, S.; Chenarani, S.; Tadavani, E. Eskandari; Etesami, S. M.; Khakzad, M.; Najafabadi, M. Mohammadi; Naseri, M.; Mehdiabadi, S. Paktinat; Hosseinabadi, F. Rezaei; Safarzadeh, B.; Zeinali, M.; Felcini, M.; Grunewald, M.; Abbrescia, M.; Calabria, C.; Caputo, C.; Colaleo, A.; Creanza, D.; Cristella, L.; De Filippis, N.; De Palma, M.; Errico, F.; Fiore, L.; Iaselli, G.; Lezki, S.; Maggi, G.; Maggi, M.; Miniello, G.; My, S.; Nuzzo, S.; Pompili, A.; Pugliese, G.; Radogna, R.; Ranieri, A.; Selvaggi, G.; Sharma, A.; Silvestris, L.; Venditti, R.; Verwilligen, P.; Abbiendi, G.; Battilana, C.; Bonacorsi, D.; Braibant-Giacomelli, S.; Campanini, R.; Capiluppi, P.; Castro, A.; Cavallo, F. R.; Chhibra, S. S.; Codispoti, G.; Cuffiani, M.; Dallavalle, G. M.; Fabbri, F.; Fanfani, A.; Fasanella, D.; Giacomelli, P.; Grandi, C.; Guiducci, L.; Marcellini, S.; Masetti, G.; Montanari, A.; Navarria, F. L.; Perrotta, A.; Rossi, A. M.; Rovelli, T.; Siroli, G. P.; Tosi, N.; Albergo, S.; Costa, S.; Di Mattia, A.; Giordano, F.; Potenza, R.; Tricomi, A.; Tuve, C.; Barbagli, G.; Chatterjee, K.; Ciulli, V.; Civinini, C.; D'Alessandro, R.; Focardi, E.; Lenzi, P.; Meschini, M.; Paoletti, S.; Russo, L.; Sguazzoni, G.; Strom, D.; Viliani, L.; Benussi, L.; Bianco, S.; Fabbri, F.; Piccolo, D.; Primavera, F.; Calvelli, V.; Ferro, F.; Robutti, E.; Tosi, S.; Brianza, L.; Brivio, F.; Ciriolo, V.; Dinardo, M. E.; Fiorendi, S.; Gennai, S.; Ghezzi, A.; Govoni, P.; Malberti, M.; Malvezzi, S.; Manzoni, R. A.; Menasce, D.; Moroni, L.; Paganoni, M.; Pauwels, K.; Pedrini, D.; Pigazzini, S.; Ragazzi, S.; de Fatis, T. Tabarelli; Buontempo, S.; Cavallo, N.; Di Guida, S.; Fabozzi, F.; Fienga, F.; Iorio, A. O. M.; Khan, W. A.; Lista, L.; Meola, S.; Paolucci, P.; Sciacca, C.; Thyssen, F.; Azzi, P.; Bacchetta, N.; Benato, L.; Bisello, D.; Boletti, A.; Carlin, R.; De Oliveira, A. Carvalho Antunes; Checchia, P.; Manzano, P. De Castro; Dorigo, T.; Dosselli, U.; Gasparini, F.; Gasparini, U.; Gozzelino, A.; Lacaprara, S.; Margoni, M.; Meneguzzo, A. T.; Pozzobon, N.; Ronchese, P.; Rossin, R.; Simonetto, F.; Torassa, E.; Zanetti, M.; Zotto, P.; Zumerle, G.; Braghieri, A.; Fallavollita, F.; Magnani, A.; Montagna, P.; Ratti, S. P.; Re, V.; Ressegotti, M.; Riccardi, C.; Salvini, P.; Vai, I.; Vitulo, P.; Solestizi, L. Alunni; Biasini, M.; Bilei, G. M.; Cecchi, C.; Ciangottini, D.; Fanò, L.; Lariccia, P.; Leonardi, R.; Manoni, E.; Mantovani, G.; Mariani, V.; Menichelli, M.; Rossi, A.; Santocchia, A.; Spiga, D.; Androsov, K.; Azzurri, P.; Bagliesi, G.; Bernardini, J.; Boccali, T.; Borrello, L.; Castaldi, R.; Ciocci, M. A.; Dell'Orso, R.; Fedi, G.; Giannini, L.; Giassi, A.; Grippo, M. T.; Ligabue, F.; Lomtadze, T.; Manca, E.; Mandorli, G.; Martini, L.; Messineo, A.; Palla, F.; Rizzi, A.; Savoy-Navarro, A.; Spagnolo, P.; Tenchini, R.; Tonelli, G.; Venturi, A.; Verdini, P. G.; Barone, L.; Cavallari, F.; Cipriani, M.; Del Re, D.; Diemoz, M.; Gelli, S.; Longo, E.; Margaroli, F.; Marzocchi, B.; Meridiani, P.; Organtini, G.; Paramatti, R.; Preiato, F.; Rahatlou, S.; Rovelli, C.; Santanastasio, F.; Amapane, N.; Arcidiacono, R.; Argiro, S.; Arneodo, M.; Bartosik, N.; Bellan, R.; Biino, C.; Cartiglia, N.; Cenna, F.; Costa, M.; Covarelli, R.; Degano, A.; Demaria, N.; Kiani, B.; Mariotti, C.; Maselli, S.; Migliore, E.; Monaco, V.; Monteil, E.; Monteno, M.; Obertino, M. M.; Pacher, L.; Pastrone, N.; Pelliccioni, M.; Angioni, G. L. Pinna; Ravera, F.; Romero, A.; Ruspa, M.; Sacchi, R.; Shchelina, K.; Sola, V.; Solano, A.; Staiano, A.; Traczyk, P.; Belforte, S.; Casarsa, M.; Cossutti, F.; Della Ricca, G.; Zanetti, A.; Kim, D. H.; Kim, G. N.; Kim, M. S.; Lee, J.; Lee, S.; Lee, S. W.; Moon, C. S.; Oh, Y. D.; Sekmen, S.; Son, D. C.; Yang, Y. C.; Lee, A.; Kim, H.; Moon, D. H.; Oh, G.; Cifuentes, J. A. Brochero; Goh, J.; Kim, T. J.; Cho, S.; Choi, S.; Go, Y.; Gyun, D.; Ha, S.; Hong, B.; Jo, Y.; Kim, Y.; Lee, K.; Lee, K. S.; Lee, S.; Lim, J.; Park, S. K.; Roh, Y.; Almond, J.; Kim, J.; Kim, J. S.; Lee, H.; Lee, K.; Nam, K.; Oh, S. B.; Radburn-Smith, B. C.; Seo, S. h.; Yang, U. K.; Yoo, H. D.; Yu, G. B.; Choi, M.; Kim, H.; Kim, J. H.; Lee, J. S. H.; Park, I. C.; Ryu, G.; Choi, Y.; Hwang, C.; Lee, J.; Yu, I.; Dudenas, V.; Juodagalvis, A.; Vaitkus, J.; Ahmed, I.; Ibrahim, Z. A.; Ali, M. A. B. Md; Idris, F. Mohamad; Abdullah, W. A. T. Wan; Yusli, M. N.; Zolkapli, Z.; Castilla-Valdez, H.; Cruz-Burelo, E. De La; Cruz, I. Heredia-De La; Lopez-Fernandez, R.; Guisao, J. Mejia; Sanchez-Hernandez, A.; Moreno, S. Carrillo; Barrera, C. Oropeza; Valencia, F. Vazquez; Pedraza, I.; Ibarguen, H. A. Salazar; Estrada, C. Uribe; Pineda, A. Morelos; Krofcheck, D.; Butler, P. H.; Ahmad, A.; Ahmad, M.; Hassan, Q.; Hoorani, H. R.; Saddique, A.; Shah, M. A.; Shoaib, M.; Waqas, M.; Bialkowska, H.; Bluj, M.; Boimska, B.; Frueboes, T.; Górski, M.; Kazana, M.; Nawrocki, K.; Romanowska-Rybinska, K.; Szleper, M.; Zalewski, P.; Bunkowski, K.; Byszuk, A.; Doroba, K.; Kalinowski, A.; Konecki, M.; Krolikowski, J.; Misiura, M.; Olszewski, M.; Pyskir, A.; Walczak, M.; Bargassa, P.; Silva, C. Beirão Da Cruz E.; Calpas, B.; Di Francesco, A.; Faccioli, P.; Gallinaro, M.; Hollar, J.; Leonardo, N.; Iglesias, L. Lloret; Nemallapudi, M. V.; Seixas, J.; Toldaiev, O.; Vadruccio, D.; Varela, J.; Afanasiev, S.; Bunin, P.; Gavrilenko, M.; Golutvin, I.; Gorbunov, I.; Kamenev, A.; Karjavin, V.; Lanev, A.; Malakhov, A.; Matveev, V.; Palichik, V.; Perelygin, V.; Shmatov, S.; Shulha, S.; Skatchkov, N.; Smirnov, V.; Voytishin, N.; Zarubin, A.; Ivanov, Y.; Kim, V.; Kuznetsova, E.; Levchenko, P.; Murzin, V.; Oreshkin, V.; Smirnov, I.; Sulimov, V.; Uvarov, L.; Vavilov, S.; Vorobyev, A.; Andreev, Yu.; Dermenev, A.; Gninenko, S.; Golubev, N.; Karneyeu, A.; Kirsanov, M.; Krasnikov, N.; Pashenkov, A.; Tlisov, D.; Toropin, A.; Epshteyn, V.; Gavrilov, V.; Lychkovskaya, N.; Popov, V.; Pozdnyakov, I.; Safronov, G.; Spiridonov, A.; Stepennov, A.; Toms, M.; Vlasov, E.; Zhokin, A.; Aushev, T.; Bylinkin, A.; Chistov, R.; Danilov, M.; Parygin, P.; Philippov, D.; Polikarpov, S.; Tarkovskii, E.; Andreev, V.; Azarkin, M.; Dremin, I.; Kirakosyan, M.; Terkulov, A.; Baskakov, A.; Belyaev, A.; Boos, E.; Dubinin, M.; Dudko, L.; Ershov, A.; Gribushin, A.; Klyukhin, V.; Kodolova, O.; Lokhtin, I.; Miagkov, I.; Obraztsov, S.; Petrushanko, S.; Savrin, V.; Snigirev, A.; Blinov, V.; Skovpen, Y.; Shtol, D.; Azhgirey, I.; Bayshev, I.; Bitioukov, S.; Elumakhov, D.; Kachanov, V.; Kalinin, A.; Konstantinov, D.; Krychkine, V.; Petrov, V.; Ryutin, R.; Sobol, A.; Troshin, S.; Tyurin, N.; Uzunian, A.; Volkov, A.; Adzic, P.; Cirkovic, P.; Devetak, D.; Dordevic, M.; Milosevic, J.; Rekovic, V.; Maestre, J. Alcaraz; Luna, M. Barrio; Cerrada, M.; Colino, N.; Cruz, B. De La; Peris, A. Delgado; Del Valle, A. Escalante; Bedoya, C. Fernandez; Ramos, J. P. Fernández; Flix, J.; Fouz, M. C.; Garcia-Abia, P.; Lopez, O. Gonzalez; Lopez, S. Goy; Hernandez, J. M.; Josa, M. I.; Yzquierdo, A. Pérez-Calero; Pelayo, J. Puerta; Olmeda, A. Quintario; Redondo, I.; Romero, L.; Soares, M. S.; Fernández, A. Álvarez; de Trocóniz, J. F.; Missiroli, M.; Moran, D.; Cuevas, J.; Erice, C.; Menendez, J. Fernandez; Caballero, I. Gonzalez; Fernández, J. R. González; Cortezon, E. Palencia; Cruz, S. Sanchez; Andrés, I. Suárez; Vischia, P.; Garcia, J. M. Vizan; Cabrillo, I. J.; Calderon, A.; Quero, B. Chazin; Curras, E.; Fernandez, M.; Garcia-Ferrero, J.; Gomez, G.; Virto, A. Lopez; Marco, J.; Rivero, C. Martinez; Arbol, P. Martinez Ruiz del; Matorras, F.; Gomez, J. Piedra; Rodrigo, T.; Ruiz-Jimeno, A.; Scodellaro, L.; Trevisani, N.; Vila, I.; Cortabitarte, R. Vilar; Abbaneo, D.; Auffray, E.; Baillon, P.; Ball, A. H.; Barney, D.; Bianco, M.; Bloch, P.; Bocci, A.; Botta, C.; Camporesi, T.; Castello, R.; Cepeda, M.; Cerminara, G.; Chapon, E.; Chen, Y.; d'Enterria, D.; Dabrowski, A.; Daponte, V.; David, A.; De Gruttola, M.; De Roeck, A.; Di Marco, E.; Dobson, M.; Dorney, B.; du Pree, T.; Dünser, M.; Dupont, N.; Elliott-Peisert, A.; Everaerts, P.; Franzoni, G.; Fulcher, J.; Funk, W.; Gigi, D.; Gill, K.; Glege, F.; Gulhan, D.; Gundacker, S.; Guthoff, M.; Harris, P.; Hegeman, J.; Innocente, V.; Janot, P.; Karacheban, O.; Kieseler, J.; Kirschenmann, H.; Knünz, V.; Kornmayer, A.; Kortelainen, M. J.; Lange, C.; Lecoq, P.; Lourenço, C.; Lucchini, M. T.; Malgeri, L.; Mannelli, M.; Martelli, A.; Meijers, F.; Merlin, J. A.; Mersi, S.; Meschi, E.; Milenovic, P.; Moortgat, F.; Mulders, M.; Neugebauer, H.; Orfanelli, S.; Orsini, L.; Pape, L.; Perez, E.; Peruzzi, M.; Petrilli, A.; Petrucciani, G.; Pfeiffer, A.; Pierini, M.; Racz, A.; Reis, T.; Rolandi, G.; Rovere, M.; Sakulin, H.; Schäfer, C.; Schwick, C.; Seidel, M.; Selvaggi, M.; Sharma, A.; Silva, P.; Sphicas, P.; Steggemann, J.; Stoye, M.; Tosi, M.; Treille, D.; Triossi, A.; Tsirou, A.; Veckalns, V.; Veres, G. I.; Verweij, M.; Wardle, N.; Zeuner, W. D.; Bertl, W.; Caminada, L.; Deiters, K.; Erdmann, W.; Horisberger, R.; Ingram, Q.; Kaestli, H. C.; Kotlinski, D.; Langenegger, U.; Rohe, T.; Wiederkehr, S. A.; Bachmair, F.; Bäni, L.; Berger, P.; Bianchini, L.; Casal, B.; Dissertori, G.; Dittmar, M.; Donegà, M.; Grab, C.; Heidegger, C.; Hits, D.; Hoss, J.; Kasieczka, G.; Klijnsma, T.; Lustermann, W.; Mangano, B.; Marionneau, M.; Meinhard, M. T.; Meister, D.; Micheli, F.; Musella, P.; Nessi-Tedaldi, F.; Pandolfi, F.; Pata, J.; Pauss, F.; Perrin, G.; Perrozzi, L.; Quittnat, M.; Schönenberger, M.; Shchutska, L.; Tavolaro, V. R.; Theofilatos, K.; Olsson, M. L. Vesterbacka; Wallny, R.; Zagozdzinska, A.; Zhu, D. H.; Aarrestad, T. K.; Amsler, C.; Canelli, M. F.; De Cosa, A.; Donato, S.; Galloni, C.; Hreus, T.; Kilminster, B.; Ngadiuba, J.; Pinna, D.; Rauco, G.; Robmann, P.; Salerno, D.; Seitz, C.; Zucchetta, A.; Candelise, V.; Doan, T. H.; Jain, Sh.; Khurana, R.; Kuo, C. M.; Lin, W.; Pozdnyakov, A.; Yu, S. S.; Kumar, Arun; Chang, P.; Chao, Y.; Chen, K. F.; Chen, P. H.; Fiori, F.; Hou, W.-S.; Hsiung, Y.; Liu, Y. F.; Lu, R.-S.; Moya, M. Miñano; Paganis, E.; Psallidas, A.; Tsai, J. f.; Asavapibhop, B.; Kovitanggoon, K.; Singh, G.; Srimanobhas, N.; Adiguzel, A.; Boran, F.; Cerci, S.; Damarseckin, S.; Demiroglu, Z. S.; Dozen, C.; Dumanoglu, I.; Girgis, S.; Gokbulut, G.; Guler, Y.; Hos, I.; Kangal, E. E.; Kara, O.; Topaksu, A. Kayis; Kiminsu, U.; Oglakci, M.; Onengut, G.; Ozdemir, K.; Cerci, D. Sunar; Tali, B.; Turkcapar, S.; Zorbakir, I. S.; Zorbilmez, C.; Bilin, B.; Karapinar, G.; Ocalan, K.; Yalvac, M.; Zeyrek, M.; Gülmez, E.; Kaya, M.; Kaya, O.; Tekten, S.; Yetkin, E. A.; Agaras, M. N.; Atay, S.; Cakir, A.; Cankocak, K.; Grynyov, B.; Levchuk, L.; Sorokin, P.; Aggleton, R.; Ball, F.; Beck, L.; Brooke, J. J.; Burns, D.; Clement, E.; Cussans, D.; Davignon, O.; Flacher, H.; Goldstein, J.; Grimes, M.; Heath, G. P.; Heath, H. F.; Jacob, J.; Kreczko, L.; Lucas, C.; Newbold, D. M.; Paramesvaran, S.; Poll, A.; Sakuma, T.; Nasr-storey, S. Seif El; Smith, D.; Smith, V. J.; Bell, K. W.; Belyaev, A.; Brew, C.; Brown, R. M.; Calligaris, L.; Cieri, D.; Cockerill, D. J. A.; Coughlan, J. A.; Harder, K.; Harper, S.; Olaiya, E.; Petyt, D.; Shepherd-Themistocleous, C. H.; Thea, A.; Tomalin, I. R.; Williams, T.; Bainbridge, R.; Breeze, S.; Buchmuller, O.; Bundock, A.; Casasso, S.; Citron, M.; Colling, D.; Corpe, L.; Dauncey, P.; Davies, G.; De Wit, A.; Della Negra, M.; Di Maria, R.; Elwood, A.; Haddad, Y.; Hall, G.; Iles, G.; James, T.; Lane, R.; Laner, C.; Lyons, L.; Magnan, A.-M.; Malik, S.; Mastrolorenzo, L.; Matsushita, T.; Nash, J.; Nikitenko, A.; Palladino, V.; Pesaresi, M.; Raymond, D. M.; Richards, A.; Rose, A.; Scott, E.; Seez, C.; Shtipliyski, A.; Summers, S.; Tapper, A.; Uchida, K.; Acosta, M. Vazquez; Virdee, T.; Winterbottom, D.; Wright, J.; Zenz, S. C.; Cole, J. E.; Hobson, P. R.; Khan, A.; Kyberd, P.; Reid, I. D.; Symonds, P.; Teodorescu, L.; Turner, M.; Borzou, A.; Call, K.; Dittmann, J.; Hatakeyama, K.; Liu, H.; Pastika, N.; Smith, C.; Bartek, R.; Dominguez, A.; Buccilli, A.; Cooper, S. I.; Henderson, C.; Rumerio, P.; West, C.; Arcaro, D.; Avetisyan, A.; Bose, T.; Gastler, D.; Rankin, D.; Richardson, C.; Rohlf, J.; Sulak, L.; Zou, D.; Benelli, G.; Cutts, D.; Garabedian, A.; Hakala, J.; Heintz, U.; Hogan, J. M.; Kwok, K. H. M.; Laird, E.; Landsberg, G.; Mao, Z.; Narain, M.; Pazzini, J.; Piperov, S.; Sagir, S.; Syarif, R.; Yu, D.; Band, R.; Brainerd, C.; Burns, D.; Sanchez, M. Calderon De La Barca; Chertok, M.; Conway, J.; Conway, R.; Cox, P. T.; Erbacher, R.; Flores, C.; Funk, G.; Gardner, M.; Ko, W.; Lander, R.; Mclean, C.; Mulhearn, M.; Pellett, D.; Pilot, J.; Shalhout, S.; Shi, M.; Smith, J.; Squires, M.; Stolp, D.; Tos, K.; Tripathi, M.; Wang, Z.; Bachtis, M.; Bravo, C.; Cousins, R.; Dasgupta, A.; Florent, A.; Hauser, J.; Ignatenko, M.; Mccoll, N.; Saltzberg, D.; Schnaible, C.; Valuev, V.; Bouvier, E.; Burt, K.; Clare, R.; Ellison, J.; Gary, J. W.; Shirazi, S. M. A. Ghiasi; Hanson, G.; Heilman, J.; Jandir, P.; Kennedy, E.; Lacroix, F.; Long, O. R.; Negrete, M. Olmedo; Paneva, M. I.; Shrinivas, A.; Si, W.; Wang, L.; Wei, H.; Wimpenny, S.; Yates, B. R.; Branson, J. G.; Cittolin, S.; Derdzinski, M.; Hashemi, B.; Holzner, A.; Klein, D.; Kole, G.; Krutelyov, V.; Letts, J.; Macneill, I.; Masciovecchio, M.; Olivito, D.; Padhi, S.; Pieri, M.; Sani, M.; Sharma, V.; Simon, S.; Tadel, M.; Vartak, A.; Wasserbaech, S.; Wood, J.; Würthwein, F.; Yagil, A.; Della Porta, G. Zevi; Amin, N.; Bhandari, R.; Bradmiller-Feld, J.; Campagnari, C.; Dishaw, A.; Dutta, V.; Sevilla, M. Franco; George, C.; Golf, F.; Gouskos, L.; Gran, J.; Heller, R.; Incandela, J.; Mullin, S. D.; Ovcharova, A.; Qu, H.; Richman, J.; Stuart, D.; Suarez, I.; Yoo, J.; Anderson, D.; Bendavid, J.; Bornheim, A.; Lawhorn, J. M.; Newman, H. B.; Nguyen, T.; Pena, C.; Spiropulu, M.; Vlimant, J. R.; Xie, S.; Zhang, Z.; Zhu, R. Y.; Andrews, M. B.; Ferguson, T.; Mudholkar, T.; Paulini, M.; Russ, J.; Sun, M.; Vogel, H.; Vorobiev, I.; Weinberg, M.; Cumalat, J. P.; Ford, W. T.; Jensen, F.; Johnson, A.; Krohn, M.; Leontsinis, S.; Mulholland, T.; Stenson, K.; Wagner, S. R.; Alexander, J.; Chaves, J.; Chu, J.; Dittmer, S.; Mcdermott, K.; Mirman, N.; Patterson, J. R.; Rinkevicius, A.; Ryd, A.; Skinnari, L.; Soffi, L.; Tan, S. M.; Tao, Z.; Thom, J.; Tucker, J.; Wittich, P.; Zientek, M.; Abdullin, S.; Albrow, M.; Apollinari, G.; Apresyan, A.; Apyan, A.; Banerjee, S.; Bauerdick, L. A. T.; Beretvas, A.; Berryhill, J.; Bhat, P. C.; Bolla, G.; Burkett, K.; Butler, J. N.; Canepa, A.; Cerati, G. B.; Cheung, H. W. K.; Chlebana, F.; Cremonesi, M.; Duarte, J.; Elvira, V. D.; Freeman, J.; Gecse, Z.; Gottschalk, E.; Gray, L.; Green, D.; Grünendahl, S.; Gutsche, O.; Harris, R. M.; Hasegawa, S.; Hirschauer, J.; Hu, Z.; Jayatilaka, B.; Jindariani, S.; Johnson, M.; Joshi, U.; Klima, B.; Kreis, B.; Lammel, S.; Lincoln, D.; Lipton, R.; Liu, M.; Liu, T.; De Sá, R. Lopes; Lykken, J.; Maeshima, K.; Magini, N.; Marraffino, J. M.; Maruyama, S.; Mason, D.; McBride, P.; Merkel, P.; Mrenna, S.; Nahn, S.; O'Dell, V.; Pedro, K.; Prokofyev, O.; Rakness, G.; Ristori, L.; Schneider, B.; Sexton-Kennedy, E.; Soha, A.; Spalding, W. J.; Spiegel, L.; Stoynev, S.; Strait, J.; Strobbe, N.; Taylor, L.; Tkaczyk, S.; Tran, N. V.; Uplegger, L.; Vaandering, E. W.; Vernieri, C.; Verzocchi, M.; Vidal, R.; Wang, M.; Weber, H. A.; Whitbeck, A.; Acosta, D.; Avery, P.; Bortignon, P.; Bourilkov, D.; Brinkerhoff, A.; Carnes, A.; Carver, M.; Curry, D.; Das, S.; Field, R. D.; Furic, I. K.; Konigsberg, J.; Korytov, A.; Kotov, K.; Ma, P.; Matchev, K.; Mei, H.; Mitselmakher, G.; Rank, D.; Sperka, D.; Terentyev, N.; Thomas, L.; Wang, J.; Wang, S.; Yelton, J.; Joshi, Y. R.; Linn, S.; Markowitz, P.; Rodriguez, J. L.; Ackert, A.; Adams, T.; Askew, A.; Hagopian, S.; Hagopian, V.; Johnson, K. F.; Kolberg, T.; Martinez, G.; Perry, T.; Prosper, H.; Saha, A.; Santra, A.; Yohay, R.; Baarmand, M. M.; Bhopatkar, V.; Colafranceschi, S.; Hohlmann, M.; Noonan, D.; Roy, T.; Yumiceva, F.; Adams, M. R.; Apanasevich, L.; Berry, D.; Betts, R. R.; Cavanaugh, R.; Chen, X.; Evdokimov, O.; Gerber, C. E.; Hangal, D. A.; Hofman, D. J.; Jung, K.; Kamin, J.; Gonzalez, I. D. Sandoval; Tonjes, M. B.; Trauger, H.; Varelas, N.; Wang, H.; Wu, Z.; Zhang, J.; Bilki, B.; Clarida, W.; Dilsiz, K.; Durgut, S.; Gandrajula, R. P.; Haytmyradov, M.; Khristenko, V.; Merlo, J.-P.; Mermerkaya, H.; Mestvirishvili, A.; Moeller, A.; Nachtman, J.; Ogul, H.; Onel, Y.; Ozok, F.; Penzo, A.; Snyder, C.; Tiras, E.; Wetzel, J.; Yi, K.; Blumenfeld, B.; Cocoros, A.; Eminizer, N.; Fehling, D.; Feng, L.; Gritsan, A. V.; Maksimovic, P.; Roskes, J.; Sarica, U.; Swartz, M.; Xiao, M.; You, C.; Al-bataineh, A.; Baringer, P.; Bean, A.; Boren, S.; Bowen, J.; Castle, J.; Khalil, S.; Kropivnitskaya, A.; Majumder, D.; Mcbrayer, W.; Murray, M.; Royon, C.; Sanders, S.; Schmitz, E.; Stringer, R.; Takaki, J. D. Tapia; Wang, Q.; Ivanov, A.; Kaadze, K.; Maravin, Y.; Mohammadi, A.; Saini, L. K.; Skhirtladze, N.; Toda, S.; Rebassoo, F.; Wright, D.; Anelli, C.; Baden, A.; Baron, O.; Belloni, A.; Calvert, B.; Eno, S. C.; Ferraioli, C.; Hadley, N. J.; Jabeen, S.; Jeng, G. Y.; Kellogg, R. G.; Kunkle, J.; Mignerey, A. C.; Ricci-Tam, F.; Shin, Y. H.; Skuja, A.; Tonwar, S. C.; Abercrombie, D.; Allen, B.; Azzolini, V.; Barbieri, R.; Baty, A.; Bi, R.; Brandt, S.; Busza, W.; Cali, I. A.; D'Alfonso, M.; Demiragli, Z.; Ceballos, G. Gomez; Goncharov, M.; Hsu, D.; Iiyama, Y.; Innocenti, G. M.; Klute, M.; Kovalskyi, D.; Lai, Y. S.; Lee, Y.-J.; Levin, A.; Luckey, P. D.; Maier, B.; Marini, A. C.; Mcginn, C.; Mironov, C.; Narayanan, S.; Niu, X.; Paus, C.; Roland, C.; Roland, G.; Salfeld-Nebgen, J.; Stephans, G. S. F.; Tatar, K.; Velicanu, D.; Wang, J.; Wang, T. W.; Wyslouch, B.; Benvenuti, A. C.; Chatterjee, R. M.; Evans, A.; Hansen, P.; Kalafut, S.; Kubota, Y.; Lesko, Z.; Mans, J.; Nourbakhsh, S.; Ruckstuhl, N.; Rusack, R.; Turkewitz, J.; Acosta, J. G.; Oliveros, S.; Avdeeva, E.; Bloom, K.; Claes, D. R.; Fangmeier, C.; Suarez, R. Gonzalez; Kamalieddin, R.; Kravchenko, I.; Monroy, J.; Siado, J. E.; Snow, G. R.; Stieger, B.; Alyari, M.; Dolen, J.; Godshalk, A.; Harrington, C.; Iashvili, I.; Nguyen, D.; Parker, A.; Rappoccio, S.; Roozbahani, B.; Alverson, G.; Barberis, E.; Hortiangtham, A.; Massironi, A.; Morse, D. M.; Nash, D.; Orimoto, T.; De Lima, R. Teixeira; Trocino, D.; Wood, D.; Bhattacharya, S.; Charaf, O.; Hahn, K. A.; Mucia, N.; Odell, N.; Pollack, B.; Schmitt, M. H.; Sung, K.; Trovato, M.; Velasco, M.; Dev, N.; Hildreth, M.; Anampa, K. Hurtado; Jessop, C.; Karmgard, D. J.; Kellams, N.; Lannon, K.; Loukas, N.; Marinelli, N.; Meng, F.; Mueller, C.; Musienko, Y.; Planer, M.; Reinsvold, A.; Ruchti, R.; Smith, G.; Taroni, S.; Wayne, M.; Wolf, M.; Woodard, A.; Alimena, J.; Antonelli, L.; Bylsma, B.; Durkin, L. S.; Flowers, S.; Francis, B.; Hart, A.; Hill, C.; Ji, W.; Liu, B.; Luo, W.; Puigh, D.; Winer, B. L.; Wulsin, H. W.; Benaglia, A.; Cooperstein, S.; Driga, O.; Elmer, P.; Hardenbrook, J.; Hebda, P.; Higginbotham, S.; Lange, D.; Luo, J.; Marlow, D.; Mei, K.; Ojalvo, I.; Olsen, J.; Palmer, C.; Piroué, P.; Stickland, D.; Tully, C.; Malik, S.; Norberg, S.; Barker, A.; Barnes, V. E.; Folgueras, S.; Gutay, L.; Jha, M. K.; Jones, M.; Jung, A. W.; Khatiwada, A.; Miller, D. H.; Neumeister, N.; Peng, C. C.; Schulte, J. F.; Sun, J.; Wang, F.; Xie, W.; Cheng, T.; Parashar, N.; Stupak, J.; Adair, A.; Akgun, B.; Chen, Z.; Ecklund, K. M.; Geurts, F. J. M.; Guilbaud, M.; Li, W.; Michlin, B.; Northup, M.; Padley, B. P.; Roberts, J.; Rorie, J.; Tu, Z.; Zabel, J.; Bodek, A.; de Barbaro, P.; Demina, R.; Duh, Y. t.; Ferbel, T.; Galanti, M.; Garcia-Bellido, A.; Han, J.; Hindrichs, O.; Khukhunaishvili, A.; Lo, K. H.; Tan, P.; Verzetti, M.; Ciesielski, R.; Goulianos, K.; Mesropian, C.; Agapitos, A.; Chou, J. P.; Gershtein, Y.; Espinosa, T. A. Gómez; Halkiadakis, E.; Heindl, M.; Hughes, E.; Kaplan, S.; Elayavalli, R. Kunnawalkam; Kyriacou, S.; Lath, A.; Montalvo, R.; Nash, K.; Osherson, M.; Saka, H.; Salur, S.; Schnetzer, S.; Sheffield, D.; Somalwar, S.; Stone, R.; Thomas, S.; Thomassen, P.; Walker, M.; Delannoy, A. G.; Foerster, M.; Heideman, J.; Riley, G.; Rose, K.; Spanier, S.; Thapa, K.; Bouhali, O.; Hernandez, A. Castaneda; Celik, A.; Dalchenko, M.; De Mattia, M.; Delgado, A.; Dildick, S.; Eusebi, R.; Gilmore, J.; Huang, T.; Kamon, T.; Mueller, R.; Pakhotin, Y.; Patel, R.; Perloff, A.; Perniè, L.; Rathjens, D.; Safonov, A.; Tatarinov, A.; Ulmer, K. A.; Akchurin, N.; Damgov, J.; De Guio, F.; Dudero, P. R.; Faulkner, J.; Gurpinar, E.; Kunori, S.; Lamichhane, K.; Lee, S. W.; Libeiro, T.; Peltola, T.; Undleeb, S.; Volobouev, I.; Wang, Z.; Greene, S.; Gurrola, A.; Janjam, R.; Johns, W.; Maguire, C.; Melo, A.; Ni, H.; Sheldon, P.; Tuo, S.; Velkovska, J.; Xu, Q.; Arenton, M. W.; Barria, P.; Cox, B.; Hirosky, R.; Ledovskoy, A.; Li, H.; Neu, C.; Sinthuprasith, T.; Sun, X.; Wang, Y.; Wolfe, E.; Xia, F.; Harr, R.; Karchin, P. E.; Sturdy, J.; Zaleski, S.; Brodski, M.; Buchanan, J.; Caillol, C.; Dasu, S.; Dodd, L.; Duric, S.; Gomber, B.; Grothe, M.; Herndon, M.; Hervé, A.; Hussain, U.; Klabbers, P.; Lanaro, A.; Levine, A.; Long, K.; Loveless, R.; Pierro, G. A.; Polese, G.; Ruggles, T.; Savin, A.; Smith, N.; Smith, W. H.; Taylor, D.; Woods, N.

    2017-09-01

    A search for heavy resonances with masses above 1 {TeV}, decaying to final states containing a vector boson and a Higgs boson, is presented. The search considers hadronic decays of the vector boson, and Higgs boson decays to b quarks. The decay products are highly boosted, and each collimated pair of quarks is reconstructed as a single, massive jet. The analysis is performed using a data sample collected in 2016 by the CMS experiment at the LHC in proton-proton collisions at a center-of-mass energy of 13 {TeV}, corresponding to an integrated luminosity of 35.9 {fb}^{-1}. The data are consistent with the background expectation and are used to place limits on the parameters of a theoretical model with a heavy vector triplet. In the benchmark scenario with mass-degenerate W' and Z' bosons decaying predominantly to pairs of standard model bosons, for the first time heavy resonances for masses as high as 3.3 {TeV} are excluded at 95% confidence level, setting the most stringent constraints to date on such states decaying into a vector boson and a Higgs boson.

  17. Diagrammatic Monte Carlo study of Fröhlich polaron dispersion in two and three dimensions

    NASA Astrophysics Data System (ADS)

    Hahn, Thomas; Klimin, Sergei; Tempere, Jacques; Devreese, Jozef T.; Franchini, Cesare

    2018-04-01

    We present results for the solution of the large polaron Fröhlich Hamiltonian in 3 dimensions (3D) and 2 dimensions (2D) obtained via the diagrammatic Monte Carlo (DMC) method. Our implementation is based on the approach by Mishchenko [A. S. Mishchenko et al., Phys. Rev. B 62, 6317 (2000), 10.1103/PhysRevB.62.6317]. Polaron ground state energies and effective polaron masses are successfully benchmarked with data obtained using Feynman's path integral formalism. By comparing 3D and 2D data, we verify the analytically exact scaling relations for energies and effective masses from 3 D →2 D , which provides a stringent test for the quality of DMC predictions. The accuracy of our results is further proven by providing values for the exactly known coefficients in weak- and strong-coupling expansions. Moreover, we compute polaron dispersion curves which are validated with analytically known lower and upper limits in the small-coupling regime and verify the first-order expansion results for larger couplings, thus disproving previous critiques on the apparent incompatibility of DMC with analytical results and furnishing useful reference for a wide range of coupling strengths.

  18. Underwater Robot Task Planning Using Multi-Objective Meta-Heuristics.

    PubMed

    Landa-Torres, Itziar; Manjarres, Diana; Bilbao, Sonia; Del Ser, Javier

    2017-04-04

    Robotics deployed in the underwater medium are subject to stringent operational conditions that impose a high degree of criticality on the allocation of resources and the schedule of operations in mission planning. In this context the so-called cost of a mission must be considered as an additional criterion when designing optimal task schedules within the mission at hand. Such a cost can be conceived as the impact of the mission on the robotic resources themselves, which range from the consumption of battery to other negative effects such as mechanic erosion. This manuscript focuses on this issue by devising three heuristic solvers aimed at efficiently scheduling tasks in robotic swarms, which collaborate together to accomplish a mission, and by presenting experimental results obtained over realistic scenarios in the underwater environment. The heuristic techniques resort to a Random-Keys encoding strategy to represent the allocation of robots to tasks and the relative execution order of such tasks within the schedule of certain robots. The obtained results reveal interesting differences in terms of Pareto optimality and spread between the algorithms considered in the benchmark, which are insightful for the selection of a proper task scheduler in real underwater campaigns.

  19. Search for the production of an excited bottom quark decaying to tW in proton-proton collisions at $$\\sqrt{s} =$$ 8 TeV

    DOE PAGES

    Khachatryan, Vardan

    2016-01-27

    Our search is presented for a singly produced excited bottom quark (b*) decaying to a top quark and a W boson in the all-hadronic, lepton+jets, and dilepton final states in proton-proton collisions at √s = 8 TeV recorded by the CMS experiment at the CERN LHC. Data corresponding to an integrated luminosity of 19.7 fb -1 are used. No significant excess of events is observed with respect to standard model expectations. We set limits at 95% confidence on the product of the b* quark production cross section and its branching fraction to tW. Furthermore, the cross section limits are interpretedmore » for scenarios including left-handed, right-handed, and vector-like couplings of the b* quark and are presented in the two-dimensional coupling plane based on the production and decay coupling constants. The masses of the left-handed, right-handed, and vectorlike b* quark states are excluded at 95% confidence below 1390, 1430, and 1530 GeV, respectively, for benchmark couplings. This analysis gives the most stringent limits on the mass of the b* quark to date.« less

  20. Bacterial Interactomes: Interacting Protein Partners Share Similar Function and Are Validated in Independent Assays More Frequently Than Previously Reported*

    PubMed Central

    Shatsky, Maxim; Allen, Simon; Gold, Barbara L.; Liu, Nancy L.; Juba, Thomas R.; Reveco, Sonia A.; Elias, Dwayne A.; Prathapam, Ramadevi; He, Jennifer; Yang, Wenhong; Szakal, Evelin D.; Liu, Haichuan; Singer, Mary E.; Geller, Jil T.; Lam, Bonita R.; Saini, Avneesh; Trotter, Valentine V.; Hall, Steven C.; Fisher, Susan J.; Brenner, Steven E.; Chhabra, Swapnil R.; Hazen, Terry C.; Wall, Judy D.; Witkowska, H. Ewa; Biggin, Mark D.; Chandonia, John-Marc; Butland, Gareth

    2016-01-01

    Numerous affinity purification-mass spectrometry (AP-MS) and yeast two-hybrid screens have each defined thousands of pairwise protein-protein interactions (PPIs), most of which are between functionally unrelated proteins. The accuracy of these networks, however, is under debate. Here, we present an AP-MS survey of the bacterium Desulfovibrio vulgaris together with a critical reanalysis of nine published bacterial yeast two-hybrid and AP-MS screens. We have identified 459 high confidence PPIs from D. vulgaris and 391 from Escherichia coli. Compared with the nine published interactomes, our two networks are smaller, are much less highly connected, and have significantly lower false discovery rates. In addition, our interactomes are much more enriched in protein pairs that are encoded in the same operon, have similar functions, and are reproducibly detected in other physical interaction assays than the pairs reported in prior studies. Our work establishes more stringent benchmarks for the properties of protein interactomes and suggests that bona fide PPIs much more frequently involve protein partners that are annotated with similar functions or that can be validated in independent assays than earlier studies suggested. PMID:26873250

  1. Search for new physics with dijet angular distributions in proton-proton collisions at √{s}=13 TeV

    NASA Astrophysics Data System (ADS)

    Sirunyan, A. M.; Tumasyan, A.; Adam, W.; Asilar, E.; Bergauer, T.; Brandstetter, J.; Brondolin, E.; Dragicevic, M.; Erö, J.; Flechl, M.; Friedl, M.; Frühwirth, R.; Ghete, V. M.; Hartl, C.; Hörmann, N.; Hrubec, J.; Jeitler, M.; König, A.; Krätschmer, I.; Liko, D.; Matsushita, T.; Mikulec, I.; Rabady, D.; Rad, N.; Rahbaran, B.; Rohringer, H.; Schieck, J.; Strauss, J.; Waltenberger, W.; Wulz, C.-E.; Dvornikov, O.; Makarenko, V.; Mossolov, V.; Suarez Gonzalez, J.; Zykunov, V.; Shumeiko, N.; Alderweireldt, S.; De Wolf, E. A.; Janssen, X.; Lauwers, J.; Van De Klundert, M.; Van Haevermaet, H.; Van Mechelen, P.; Van Remortel, N.; Van Spilbeeck, A.; Abu Zeid, S.; Blekman, F.; D'Hondt, J.; Daci, N.; De Bruyn, I.; Deroover, K.; Lowette, S.; Moortgat, S.; Moreels, L.; Olbrechts, A.; Python, Q.; Skovpen, K.; Tavernier, S.; Van Doninck, W.; Van Mulders, P.; Van Parijs, I.; Brun, H.; Clerbaux, B.; De Lentdecker, G.; Delannoy, H.; Fasanella, G.; Favart, L.; Goldouzian, R.; Grebenyuk, A.; Karapostoli, G.; Lenzi, T.; Léonard, A.; Luetic, J.; Maerschalk, T.; Marinov, A.; Randle-conde, A.; Seva, T.; Vander Velde, C.; Vanlaer, P.; Vannerom, D.; Yonamine, R.; Zenoni, F.; Zhang, F.; Cimmino, A.; Cornelis, T.; Dobur, D.; Fagot, A.; Gul, M.; Khvastunov, I.; Poyraz, D.; Salva, S.; Schöfbeck, R.; Tytgat, M.; Van Driessche, W.; Yazgan, E.; Zaganidis, N.; Bakhshiansohi, H.; Beluffi, C.; Bondu, O.; Brochet, S.; Bruno, G.; Caudron, A.; De Visscher, S.; Delaere, C.; Delcourt, M.; Francois, B.; Giammanco, A.; Jafari, A.; Komm, M.; Krintiras, G.; Lemaitre, V.; Magitteri, A.; Mertens, A.; Musich, M.; Piotrzkowski, K.; Quertenmont, L.; Selvaggi, M.; Vidal Marono, M.; Wertz, S.; Beliy, N.; Aldá Júnior, W. L.; Alves, F. L.; Alves, G. A.; Brito, L.; Hensel, C.; Moraes, A.; Pol, M. E.; Rebello Teles, P.; Belchior Batista Das Chagas, E.; Carvalho, W.; Chinellato, J.; Custódio, A.; Da Costa, E. M.; Da Silveira, G. G.; De Jesus Damiao, D.; De Oliveira Martins, C.; Fonseca De Souza, S.; Huertas Guativa, L. M.; Malbouisson, H.; Matos Figueiredo, D.; Mora Herrera, C.; Mundim, L.; Nogima, H.; Prado Da Silva, W. L.; Santoro, A.; Sznajder, A.; Tonelli Manganote, E. J.; Torres Da Silva De Araujo, F.; Vilela Pereira, A.; Ahuja, S.; Bernardes, C. A.; Dogra, S.; Fernandez Perez Tomei, T. R.; Gregores, E. M.; Mercadante, P. G.; Moon, C. S.; Novaes, S. F.; Padula, Sandra S.; Romero Abad, D.; Ruiz Vargas, J. C.; Aleksandrov, A.; Hadjiiska, R.; Iaydjiev, P.; Rodozov, M.; Stoykova, S.; Sultanov, G.; Vutova, M.; Dimitrov, A.; Glushkov, I.; Litov, L.; Pavlov, B.; Petkov, P.; Fang, W.; Ahmad, M.; Bian, J. G.; Chen, G. M.; Chen, H. S.; Chen, M.; Chen, Y.; Cheng, T.; Jiang, C. H.; Leggat, D.; Liu, Z.; Romeo, F.; Ruan, M.; Shaheen, S. M.; Spiezia, A.; Tao, J.; Wang, C.; Wang, Z.; Zhang, H.; Zhao, J.; Ban, Y.; Chen, G.; Li, Q.; Liu, S.; Mao, Y.; Qian, S. J.; Wang, D.; Xu, Z.; Avila, C.; Cabrera, A.; Chaparro Sierra, L. F.; Florez, C.; Gomez, J. P.; González Hernández, C. F.; Ruiz Alvarez, J. D.; Sanabria, J. C.; Godinovic, N.; Lelas, D.; Puljak, I.; Ribeiro Cipriano, P. M.; Sculac, T.; Antunovic, Z.; Kovac, M.; Brigljevic, V.; Ferencek, D.; Kadija, K.; Mesic, B.; Susa, T.; Attikis, A.; Mavromanolakis, G.; Mousa, J.; Nicolaou, C.; Ptochos, F.; Razis, P. A.; Rykaczewski, H.; Tsiakkouri, D.; Finger, M.; Finger, M.; Carrera Jarrin, E.; Abdelalim, A. A.; Mohammed, Y.; Salama, E.; Kadastik, M.; Perrini, L.; Raidal, M.; Tiko, A.; Veelken, C.; Eerola, P.; Pekkanen, J.; Voutilainen, M.; Härkönen, J.; Järvinen, T.; Karimäki, V.; Kinnunen, R.; Lampén, T.; Lassila-Perini, K.; Lehti, S.; Lindén, T.; Luukka, P.; Tuominiemi, J.; Tuovinen, E.; Wendland, L.; Talvitie, J.; Tuuva, T.; Besancon, M.; Couderc, F.; Dejardin, M.; Denegri, D.; Fabbro, B.; Faure, J. L.; Favaro, C.; Ferri, F.; Ganjour, S.; Ghosh, S.; Givernaud, A.; Gras, P.; Hamel de Monchenault, G.; Jarry, P.; Kucher, I.; Locci, E.; Machet, M.; Malcles, J.; Rander, J.; Rosowsky, A.; Titov, M.; Abdulsalam, A.; Antropov, I.; Baffioni, S.; Beaudette, F.; Busson, P.; Cadamuro, L.; Chapon, E.; Charlot, C.; Davignon, O.; Granier de Cassagnac, R.; Jo, M.; Lisniak, S.; Miné, P.; Nguyen, M.; Ochando, C.; Ortona, G.; Paganini, P.; Pigard, P.; Regnard, S.; Salerno, R.; Sirois, Y.; Strebler, T.; Yilmaz, Y.; Zabi, A.; Zghiche, A.; Agram, J.-L.; Andrea, J.; Aubin, A.; Bloch, D.; Brom, J.-M.; Buttignol, M.; Chabert, E. C.; Chanon, N.; Collard, C.; Conte, E.; Coubez, X.; Fontaine, J.-C.; Gelé, D.; Goerlach, U.; Le Bihan, A.-C.; Van Hove, P.; Gadrat, S.; Beauceron, S.; Bernet, C.; Boudoul, G.; Carrillo Montoya, C. A.; Chierici, R.; Contardo, D.; Courbon, B.; Depasse, P.; El Mamouni, H.; Fay, J.; Gascon, S.; Gouzevitch, M.; Grenier, G.; Ille, B.; Lagarde, F.; Laktineh, I. B.; Lethuillier, M.; Mirabito, L.; Pequegnot, A. L.; Perries, S.; Popov, A.; Sabes, D.; Sordini, V.; Vander Donckt, M.; Verdier, P.; Viret, S.; Toriashvili, T.; Tsamalaidze, Z.; Autermann, C.; Beranek, S.; Feld, L.; Kiesel, M. K.; Klein, K.; Lipinski, M.; Preuten, M.; Schomakers, C.; Schulz, J.; Verlage, T.; Albert, A.; Brodski, M.; Dietz-Laursonn, E.; Duchardt, D.; Endres, M.; Erdmann, M.; Erdweg, S.; Esch, T.; Fischer, R.; Güth, A.; Hamer, M.; Hebbeker, T.; Heidemann, C.; Hoepfner, K.; Knutzen, S.; Merschmeyer, M.; Meyer, A.; Millet, P.; Mukherjee, S.; Olschewski, M.; Padeken, K.; Pook, T.; Radziej, M.; Reithler, H.; Rieger, M.; Scheuch, F.; Sonnenschein, L.; Teyssier, D.; Thüer, S.; Cherepanov, V.; Flügge, G.; Kargoll, B.; Kress, T.; Künsken, A.; Lingemann, J.; Müller, T.; Nehrkorn, A.; Nowack, A.; Pistone, C.; Pooth, O.; Stahl, A.; Aldaya Martin, M.; Arndt, T.; Asawatangtrakuldee, C.; Beernaert, K.; Behnke, O.; Behrens, U.; Bin Anuar, A. A.; Borras, K.; Campbell, A.; Connor, P.; Contreras-Campana, C.; Costanza, F.; Diez Pardos, C.; Dolinska, G.; Eckerlin, G.; Eckstein, D.; Eichhorn, T.; Eren, E.; Gallo, E.; Garay Garcia, J.; Geiser, A.; Gizhko, A.; Grados Luyando, J. M.; Grohsjean, A.; Gunnellini, P.; Harb, A.; Hauk, J.; Hempel, M.; Jung, H.; Kalogeropoulos, A.; Karacheban, O.; Kasemann, M.; Keaveney, J.; Kleinwort, C.; Korol, I.; Krücker, D.; Lange, W.; Lelek, A.; Lenz, T.; Leonard, J.; Lipka, K.; Lobanov, A.; Lohmann, W.; Mankel, R.; Melzer-Pellmann, I.-A.; Meyer, A. B.; Mittag, G.; Mnich, J.; Mussgiller, A.; Pitzl, D.; Placakyte, R.; Raspereza, A.; Roland, B.; Sahin, M. Ö.; Saxena, P.; SchoernerSadenius, T.; Spannagel, S.; Stefaniuk, N.; Van Onsem, G. P.; Walsh, R.; Wissing, C.; Blobel, V.; Centis Vignali, M.; Draeger, A. R.; Dreyer, T.; Garutti, E.; Gonzalez, D.; Haller, J.; Hoffmann, M.; Junkes, A.; Klanner, R.; Kogler, R.; Kovalchuk, N.; Lapsien, T.; Marchesini, I.; Marconi, D.; Meyer, M.; Niedziela, M.; Nowatschin, D.; Pantaleo, F.; Peiffer, T.; Perieanu, A.; Poehlsen, J.; Scharf, C.; Schleper, P.; Schmidt, A.; Schumann, S.; Schwandt, J.; Stadie, H.; Steinbrück, G.; Stober, F. M.; Stöver, M.; Tholen, H.; Troendle, D.; Usai, E.; Vanelderen, L.; Vanhoefer, A.; Vormwald, B.; Akbiyik, M.; Barth, C.; Baur, S.; Baus, C.; Berger, J.; Butz, E.; Caspart, R.; Chwalek, T.; Colombo, F.; De Boer, W.; Dierlamm, A.; Fink, S.; Freund, B.; Friese, R.; Giffels, M.; Gilbert, A.; Goldenzweig, P.; Haitz, D.; Hartmann, F.; Heindl, S. M.; Husemann, U.; Katkov, I.; Kudella, S.; Mildner, H.; Mozer, M. U.; Müller, Th.; Plagge, M.; Quast, G.; Rabbertz, K.; Röcker, S.; Roscher, F.; Schröder, M.; Shvetsov, I.; Sieber, G.; Simonis, H. J.; Ulrich, R.; Wayand, S.; Weber, M.; Weiler, T.; Williamson, S.; Wöhrmann, C.; Wolf, R.; Anagnostou, G.; Daskalakis, G.; Geralis, T.; Giakoumopoulou, V. A.; Kyriakis, A.; Loukas, D.; Topsis-Giotis, I.; Kesisoglou, S.; Panagiotou, A.; Saoulidou, N.; Tziaferi, E.; Evangelou, I.; Flouris, G.; Foudas, C.; Kokkas, P.; Loukas, N.; Manthos, N.; Papadopoulos, I.; Paradas, E.; Filipovic, N.; Pasztor, G.; Bencze, G.; Hajdu, C.; Horvath, D.; Sikler, F.; Veszpremi, V.; Vesztergombi, G.; Zsigmond, A. J.; Beni, N.; Czellar, S.; Karancsi, J.; Makovec, A.; Molnar, J.; Szillasi, Z.; Bartók, M.; Raics, P.; Trocsanyi, Z. L.; Ujvari, B.; Komaragiri, J. R.; Bahinipati, S.; Bhowmik, S.; Choudhury, S.; Mal, P.; Mandal, K.; Nayak, A.; Sahoo, D. K.; Sahoo, N.; Swain, S. K.; Bansal, S.; Beri, S. B.; Bhatnagar, V.; Chawla, R.; Bhawandeep, U.; Kalsi, A. K.; Kaur, A.; Kaur, M.; Kumar, R.; Kumari, P.; Mehta, A.; Mittal, M.; Singh, J. B.; Walia, G.; Kumar, Ashok; Bhardwaj, A.; Choudhary, B. C.; Garg, R. B.; Keshri, S.; Malhotra, S.; Naimuddin, M.; Ranjan, K.; Sharma, R.; Sharma, V.; Bhattacharya, R.; Bhattacharya, S.; Chatterjee, K.; Dey, S.; Dutt, S.; Dutta, S.; Ghosh, S.; Majumdar, N.; Modak, A.; Mondal, K.; Mukhopadhyay, S.; Nandan, S.; Purohit, A.; Roy, A.; Roy, D.; Roy Chowdhury, S.; Sarkar, S.; Sharan, M.; Thakur, S.; Behera, P. K.; Chudasama, R.; Dutta, D.; Jha, V.; Kumar, V.; Mohanty, A. K.; Netrakanti, P. K.; Pant, L. M.; Shukla, P.; Topkar, A.; Aziz, T.; Dugad, S.; Kole, G.; Mahakud, B.; Mitra, S.; Mohanty, G. B.; Parida, B.; Sur, N.; Sutar, B.; Banerjee, S.; Dewanjee, R. K.; Ganguly, S.; Guchait, M.; Jain, Sa.; Kumar, S.; Maity, M.; Majumder, G.; Mazumdar, K.; Sarkar, T.; Wickramage, N.; Chauhan, S.; Dube, S.; Hegde, V.; Kapoor, A.; Kothekar, K.; Pandey, S.; Rane, A.; Sharma, S.; Chenarani, S.; Eskandari Tadavani, E.; Etesami, S. M.; Khakzad, M.; Mohammadi Najafabadi, M.; Naseri, M.; Paktinat Mehdiabadi, S.; Rezaei Hosseinabadi, F.; Safarzadeh, B.; Zeinali, M.; Felcini, M.; Grunewald, M.; Abbrescia, M.; Calabria, C.; Caputo, C.; Colaleo, A.; Creanza, D.; Cristella, L.; De Filippis, N.; De Palma, M.; Fiore, L.; Iaselli, G.; Maggi, G.; Maggi, M.; Miniello, G.; My, S.; Nuzzo, S.; Pompili, A.; Pugliese, G.; Radogna, R.; Ranieri, A.; Selvaggi, G.; Sharma, A.; Silvestris, L.; Venditti, R.; Verwilligen, P.; Abbiendi, G.; Battilana, C.; Bonacorsi, D.; Braibant-Giacomelli, S.; Brigliadori, L.; Campanini, R.; Capiluppi, P.; Castro, A.; Cavallo, F. R.; Chhibra, S. S.; Codispoti, G.; Cuffiani, M.; Dallavalle, G. M.; Fabbri, F.; Fanfani, A.; Fasanella, D.; Giacomelli, P.; Grandi, C.; Guiducci, L.; Marcellini, S.; Masetti, G.; Montanari, A.; Navarria, F. L.; Perrotta, A.; Rossi, A. M.; Rovelli, T.; Siroli, G. P.; Tosi, N.; Albergo, S.; Costa, S.; Di Mattia, A.; Giordano, F.; Potenza, R.; Tricomi, A.; Tuve, C.; Barbagli, G.; Ciulli, V.; Civinini, C.; D'Alessandro, R.; Focardi, E.; Lenzi, P.; Meschini, M.; Paoletti, S.; Russo, L.; Sguazzoni, G.; Strom, D.; Viliani, L.; Benussi, L.; Bianco, S.; Fabbri, F.; Piccolo, D.; Primavera, F.; Calvelli, V.; Ferro, F.; Monge, M. R.; Robutti, E.; Tosi, S.; Brianza, L.; Brivio, F.; Ciriolo, V.; Dinardo, M. E.; Fiorendi, S.; Gennai, S.; Ghezzi, A.; Govoni, P.; Malberti, M.; Malvezzi, S.; Manzoni, R. A.; Menasce, D.; Moroni, L.; Paganoni, M.; Pedrini, D.; Pigazzini, S.; Ragazzi, S.; Tabarelli de Fatis, T.; Buontempo, S.; Cavallo, N.; De Nardo, G.; Di Guida, S.; Esposito, M.; Fabozzi, F.; Fienga, F.; Iorio, A. O. M.; Lanza, G.; Lista, L.; Meola, S.; Paolucci, P.; Sciacca, C.; Thyssen, F.; Azzi, P.; Bacchetta, N.; Benato, L.; Bisello, D.; Boletti, A.; Carlin, R.; Carvalho Antunes De Oliveira, A.; Checchia, P.; Dall'Osso, M.; De Castro Manzano, P.; Dorigo, T.; Dosselli, U.; Gasparini, F.; Gasparini, U.; Gozzelino, A.; Lacaprara, S.; Margoni, M.; Meneguzzo, A. T.; Pazzini, J.; Pozzobon, N.; Ronchese, P.; Simonetto, F.; Torassa, E.; Zanetti, M.; Zotto, P.; Zumerle, G.; Braghieri, A.; Fallavollita, F.; Magnani, A.; Montagna, P.; Ratti, S. P.; Re, V.; Riccardi, C.; Salvini, P.; Vai, I.; Vitulo, P.; Alunni Solestizi, L.; Bilei, G. M.; Ciangottini, D.; Fanò, L.; Lariccia, P.; Leonardi, R.; Mantovani, G.; Menichelli, M.; Saha, A.; Santocchia, A.; Androsov, K.; Azzurri, P.; Bagliesi, G.; Bernardini, J.; Boccali, T.; Castaldi, R.; Ciocci, M. A.; Dell'Orso, R.; Donato, S.; Fedi, G.; Giassi, A.; Grippo, M. T.; Ligabue, F.; Lomtadze, T.; Martini, L.; Messineo, A.; Palla, F.; Rizzi, A.; Savoy-Navarro, A.; Spagnolo, P.; Tenchini, R.; Tonelli, G.; Venturi, A.; Verdini, P. G.; Barone, L.; Cavallari, F.; Cipriani, M.; Del Re, D.; Diemoz, M.; Gelli, S.; Longo, E.; Margaroli, F.; Marzocchi, B.; Meridiani, P.; Organtini, G.; Paramatti, R.; Preiato, F.; Rahatlou, S.; Rovelli, C.; Santanastasio, F.; Amapane, N.; Arcidiacono, R.; Argiro, S.; Arneodo, M.; Bartosik, N.; Bellan, R.; Biino, C.; Cartiglia, N.; Cenna, F.; Costa, M.; Covarelli, R.; Degano, A.; Demaria, N.; Finco, L.; Kiani, B.; Mariotti, C.; Maselli, S.; Migliore, E.; Monaco, V.; Monteil, E.; Monteno, M.; Obertino, M. M.; Pacher, L.; Pastrone, N.; Pelliccioni, M.; Pinna Angioni, G. L.; Ravera, F.; Romero, A.; Ruspa, M.; Sacchi, R.; Shchelina, K.; Sola, V.; Solano, A.; Staiano, A.; Traczyk, P.; Belforte, S.; Casarsa, M.; Cossutti, F.; Della Ricca, G.; Zanetti, A.; Kim, D. H.; Kim, G. N.; Kim, M. S.; Lee, S.; Lee, S. W.; Oh, Y. D.; Sekmen, S.; Son, D. C.; Yang, Y. C.; Lee, A.; Kim, H.; Brochero Cifuentes, J. A.; Kim, T. J.; Cho, S.; Choi, S.; Go, Y.; Gyun, D.; Ha, S.; Hong, B.; Jo, Y.; Kim, Y.; Lee, K.; Lee, K. S.; Lee, S.; Lim, J.; Park, S. K.; Roh, Y.; Almond, J.; Kim, J.; Lee, H.; Oh, S. B.; Radburn-Smith, B. C.; Seo, S. h.; Yang, U. K.; Yoo, H. D.; Yu, G. B.; Choi, M.; Kim, H.; Kim, J. H.; Lee, J. S. H.; Park, I. C.; Ryu, G.; Ryu, M. S.; Choi, Y.; Goh, J.; Hwang, C.; Lee, J.; Yu, I.; Dudenas, V.; Juodagalvis, A.; Vaitkus, J.; Ahmed, I.; Ibrahim, Z. A.; Ali, M. A. B. Md; Mohamad Idris, F.; Wan Abdullah, W. A. T.; Yusli, M. N.; Zolkapli, Z.; Castilla-Valdez, H.; De La Cruz-Burelo, E.; Heredia-De La Cruz, I.; Hernandez-Almada, A.; Lopez-Fernandez, R.; Magaña Villalba, R.; Mejia Guisao, J.; Sanchez-Hernandez, A.; Carrillo Moreno, S.; Oropeza Barrera, C.; Vazquez Valencia, F.; Carpinteyro, S.; Pedraza, I.; Salazar Ibarguen, H. A.; Uribe Estrada, C.; Morelos Pineda, A.; Krofcheck, D.; Butler, P. H.; Ahmad, A.; Ahmad, M.; Hassan, Q.; Hoorani, H. R.; Khan, W. A.; Saddique, A.; Shah, M. A.; Shoaib, M.; Waqas, M.; Bialkowska, H.; Bluj, M.; Boimska, B.; Frueboes, T.; Górski, M.; Kazana, M.; Nawrocki, K.; Romanowska-Rybinska, K.; Szleper, M.; Zalewski, P.; Bunkowski, K.; Byszuk, A.; Doroba, K.; Kalinowski, A.; Konecki, M.; Krolikowski, J.; Misiura, M.; Olszewski, M.; Walczak, M.; Bargassa, P.; Beirão Da Cruz E Silva, C.; Calpas, B.; Di Francesco, A.; Faccioli, P.; Ferreira Parracho, P. G.; Gallinaro, M.; Hollar, J.; Leonardo, N.; Lloret Iglesias, L.; Nemallapudi, M. V.; Rodrigues Antunes, J.; Seixas, J.; Toldaiev, O.; Vadruccio, D.; Varela, J.; Vischia, P.; Afanasiev, S.; Alexakhin, V.; Bunin, P.; Gavrilenko, M.; Golutvin, I.; Kamenev, A.; Karjavin, V.; Lanev, A.; Malakhov, A.; Matveev, V.; Palichik, V.; Perelygin, V.; Savina, M.; Shmatov, S.; Shulha, S.; Skatchkov, N.; Smirnov, V.; Zarubin, A.; Chtchipounov, L.; Golovtsov, V.; Ivanov, Y.; Kim, V.; Kuznetsova, E.; Murzin, V.; Oreshkin, V.; Sulimov, V.; Vorobyev, A.; Andreev, Yu.; Dermenev, A.; Gninenko, S.; Golubev, N.; Karneyeu, A.; Kirsanov, M.; Krasnikov, N.; Pashenkov, A.; Tlisov, D.; Toropin, A.; Epshteyn, V.; Gavrilov, V.; Lychkovskaya, N.; Popov, V.; Pozdnyakov, I.; Safronov, G.; Spiridonov, A.; Toms, M.; Vlasov, E.; Zhokin, A.; Bylinkin, A.; Chistov, R.; Polikarpov, S.; Zhemchugov, E.; Andreev, V.; Azarkin, M.; Dremin, I.; Kirakosyan, M.; Leonidov, A.; Terkulov, A.; Baskakov, A.; Belyaev, A.; Boos, E.; Dubinin, M.; Dudko, L.; Ershov, A.; Gribushin, A.; Klyukhin, V.; Kodolova, O.; Lokhtin, I.; Miagkov, I.; Obraztsov, S.; Petrushanko, S.; Savrin, V.; Snigirev, A.; Blinov, V.; Skovpen, Y.; Shtol, D.; Azhgirey, I.; Bayshev, I.; Bitioukov, S.; Elumakhov, D.; Kachanov, V.; Kalinin, A.; Konstantinov, D.; Krychkine, V.; Petrov, V.; Ryutin, R.; Sobol, A.; Troshin, S.; Tyurin, N.; Uzunian, A.; Volkov, A.; Adzic, P.; Cirkovic, P.; Devetak, D.; Dordevic, M.; Milosevic, J.; Rekovic, V.; Alcaraz Maestre, J.; Barrio Luna, M.; Calvo, E.; Cerrada, M.; Chamizo Llatas, M.; Colino, N.; De La Cruz, B.; Delgado Peris, A.; Escalante Del Valle, A.; Fernandez Bedoya, C.; Fernández Ramos, J. P.; Flix, J.; Fouz, M. C.; Garcia-Abia, P.; Gonzalez Lopez, O.; Goy Lopez, S.; Hernandez, J. M.; Josa, M. I.; Navarro De Martino, E.; Pérez-Calero Yzquierdo, A.; Puerta Pelayo, J.; Quintario Olmeda, A.; Redondo, I.; Romero, L.; Soares, M. S.; de Trocóniz, J. F.; Missiroli, M.; Moran, D.; Cuevas, J.; Fernandez Menendez, J.; Gonzalez Caballero, I.; González Fernández, J. R.; Palencia Cortezon, E.; Sanchez Cruz, S.; Suárez Andrés, I.; Vizan Garcia, J. M.; Cabrillo, I. J.; Calderon, A.; Curras, E.; Fernandez, M.; Garcia-Ferrero, J.; Gomez, G.; Lopez Virto, A.; Marco, J.; Martinez Rivero, C.; Matorras, F.; Piedra Gomez, J.; Rodrigo, T.; Ruiz-Jimeno, A.; Scodellaro, L.; Trevisani, N.; Vila, I.; Vilar Cortabitarte, R.; Abbaneo, D.; Auffray, E.; Auzinger, G.; Baillon, P.; Ball, A. H.; Barney, D.; Bloch, P.; Bocci, A.; Botta, C.; Camporesi, T.; Castello, R.; Cepeda, M.; Cerminara, G.; Chen, Y.; d'Enterria, D.; Dabrowski, A.; Daponte, V.; David, A.; De Gruttola, M.; De Roeck, A.; Di Marco, E.; Dobson, M.; Dorney, B.; du Pree, T.; Duggan, D.; Dünser, M.; Dupont, N.; Elliott-Peisert, A.; Everaerts, P.; Fartoukh, S.; Franzoni, G.; Fulcher, J.; Funk, W.; Gigi, D.; Gill, K.; Girone, M.; Glege, F.; Gulhan, D.; Gundacker, S.; Guthoff, M.; Harris, P.; Hegeman, J.; Innocente, V.; Janot, P.; Kieseler, J.; Kirschenmann, H.; Knünz, V.; Kornmayer, A.; Kortelainen, M. J.; Kousouris, K.; Krammer, M.; Lange, C.; Lecoq, P.; Lourenço, C.; Lucchini, M. T.; Malgeri, L.; Mannelli, M.; Martelli, A.; Meijers, F.; Merlin, J. A.; Mersi, S.; Meschi, E.; Milenovic, P.; Moortgat, F.; Morovic, S.; Mulders, M.; Neugebauer, H.; Orfanelli, S.; Orsini, L.; Pape, L.; Perez, E.; Peruzzi, M.; Petrilli, A.; Petrucciani, G.; Pfeiffer, A.; Pierini, M.; Racz, A.; Reis, T.; Rolandi, G.; Rovere, M.; Sakulin, H.; Sauvan, J. B.; Schäfer, C.; Schwick, C.; Seidel, M.; Sharma, A.; Silva, P.; Sphicas, P.; Steggemann, J.; Stoye, M.; Takahashi, Y.; Tosi, M.; Treille, D.; Triossi, A.; Tsirou, A.; Veckalns, V.; Veres, G. I.; Verweij, M.; Wardle, N.; Wöhri, H. K.; Zagozdzinska, A.; Zeuner, W. D.; Bertl, W.; Deiters, K.; Erdmann, W.; Horisberger, R.; Ingram, Q.; Kaestli, H. C.; Kotlinski, D.; Langenegger, U.; Rohe, T.; Bachmair, F.; Bäni, L.; Bianchini, L.; Casal, B.; Dissertori, G.; Dittmar, M.; Donegà, M.; Grab, C.; Heidegger, C.; Hits, D.; Hoss, J.; Kasieczka, G.; Lustermann, W.; Mangano, B.; Marionneau, M.; Martinez Ruiz del Arbol, P.; Masciovecchio, M.; Meinhard, M. T.; Meister, D.; Micheli, F.; Musella, P.; Nessi-Tedaldi, F.; Pandolfi, F.; Pata, J.; Pauss, F.; Perrin, G.; Perrozzi, L.; Quittnat, M.; Rossini, M.; Schönenberger, M.; Starodumov, A.; Tavolaro, V. R.; Theofilatos, K.; Wallny, R.; Aarrestad, T. K.; Amsler, C.; Caminada, L.; Canelli, M. F.; De Cosa, A.; Galloni, C.; Hinzmann, A.; Hreus, T.; Kilminster, B.; Ngadiuba, J.; Pinna, D.; Rauco, G.; Robmann, P.; Salerno, D.; Seitz, C.; Yang, Y.; Zucchetta, A.; Candelise, V.; Doan, T. H.; Jain, Sh.; Khurana, R.; Konyushikhin, M.; Kuo, C. M.; Lin, W.; Pozdnyakov, A.; Yu, S. S.; Kumar, Arun; Chang, P.; Chang, Y. H.; Chao, Y.; Chen, K. F.; Chen, P. H.; Fiori, F.; Hou, W.-S.; Hsiung, Y.; Liu, Y. F.; Lu, R.-S.; Miñano Moya, M.; Paganis, E.; Psallidas, A.; Tsai, J. f.; Asavapibhop, B.; Singh, G.; Srimanobhas, N.; Suwonjandee, N.; Adiguzel, A.; Cerci, S.; Damarseckin, S.; Demiroglu, Z. S.; Dozen, C.; Dumanoglu, I.; Girgis, S.; Gokbulut, G.; Guler, Y.; Hos, I.; Kangal, E. E.; Kara, O.; Kiminsu, U.; Oglakci, M.; Onengut, G.; Ozdemir, K.; Sunar Cerci, D.; Tali, B.; Topakli, H.; Turkcapar, S.; Zorbakir, I. S.; Zorbilmez, C.; Bilin, B.; Bilmis, S.; Isildak, B.; Karapinar, G.; Yalvac, M.; Zeyrek, M.; Gülmez, E.; Kaya, M.; Kaya, O.; Yetkin, E. A.; Yetkin, T.; Cakir, A.; Cankocak, K.; Sen, S.; Grynyov, B.; Levchuk, L.; Sorokin, P.; Aggleton, R.; Ball, F.; Beck, L.; Brooke, J. J.; Burns, D.; Clement, E.; Cussans, D.; Flacher, H.; Goldstein, J.; Grimes, M.; Heath, G. P.; Heath, H. F.; Jacob, J.; Kreczko, L.; Lucas, C.; Newbold, D. M.; Paramesvaran, S.; Poll, A.; Sakuma, T.; Seif El Nasr-storey, S.; Smith, D.; Smith, V. J.; Bell, K. W.; Belyaev, A.; Brew, C.; Brown, R. M.; Calligaris, L.; Cieri, D.; Cockerill, D. J. A.; Coughlan, J. A.; Harder, K.; Harper, S.; Olaiya, E.; Petyt, D.; Shepherd-Themistocleous, C. H.; Thea, A.; Tomalin, I. R.; Williams, T.; Baber, M.; Bainbridge, R.; Buchmuller, O.; Bundock, A.; Burton, D.; Casasso, S.; Citron, M.; Colling, D.; Corpe, L.; Dauncey, P.; Davies, G.; De Wit, A.; Della Negra, M.; Di Maria, R.; Dunne, P.; Elwood, A.; Futyan, D.; Haddad, Y.; Hall, G.; Iles, G.; James, T.; Lane, R.; Laner, C.; Lucas, R.; Lyons, L.; Magnan, A.-M.; Malik, S.; Mastrolorenzo, L.; Nash, J.; Nikitenko, A.; Pela, J.; Penning, B.; Pesaresi, M.; Raymond, D. M.; Richards, A.; Rose, A.; Scott, E.; Seez, C.; Summers, S.; Tapper, A.; Uchida, K.; Vazquez Acosta, M.; Virdee, T.; Wright, J.; Zenz, S. C.; Cole, J. E.; Hobson, P. R.; Khan, A.; Kyberd, P.; Reid, I. D.; Symonds, P.; Teodorescu, L.; Turner, M.; Borzou, A.; Call, K.; Dittmann, J.; Hatakeyama, K.; Liu, H.; Pastika, N.; Bartek, R.; Dominguez, A.; Buccilli, A.; Cooper, S. I.; Henderson, C.; Rumerio, P.; West, C.; Arcaro, D.; Avetisyan, A.; Bose, T.; Gastler, D.; Rankin, D.; Richardson, C.; Rohlf, J.; Sulak, L.; Zou, D.; Benelli, G.; Cutts, D.; Garabedian, A.; Hakala, J.; Heintz, U.; Hogan, J. M.; Jesus, O.; Kwok, K. H. M.; Laird, E.; Landsberg, G.; Mao, Z.; Narain, M.; Piperov, S.; Sagir, S.; Spencer, E.; Syarif, R.; Breedon, R.; Burns, D.; Calderon De La Barca Sanchez, M.; Chauhan, S.; Chertok, M.; Conway, J.; Conway, R.; Cox, P. T.; Erbacher, R.; Flores, C.; Funk, G.; Gardner, M.; Ko, W.; Lander, R.; Mclean, C.; Mulhearn, M.; Pellett, D.; Pilot, J.; Shalhout, S.; Shi, M.; Smith, J.; Squires, M.; Stolp, D.; Tos, K.; Tripathi, M.; Bachtis, M.; Bravo, C.; Cousins, R.; Dasgupta, A.; Florent, A.; Hauser, J.; Ignatenko, M.; Mccoll, N.; Saltzberg, D.; Schnaible, C.; Valuev, V.; Weber, M.; Bouvier, E.; Burt, K.; Clare, R.; Ellison, J.; Gary, J. W.; Ghiasi Shirazi, S. M. A.; Hanson, G.; Heilman, J.; Jandir, P.; Kennedy, E.; Lacroix, F.; Long, O. R.; Olmedo Negrete, M.; Paneva, M. I.; Shrinivas, A.; Si, W.; Wei, H.; Wimpenny, S.; Yates, B. R.; Branson, J. G.; Cerati, G. B.; Cittolin, S.; Derdzinski, M.; Gerosa, R.; Holzner, A.; Klein, D.; Krutelyov, V.; Letts, J.; Macneill, I.; Olivito, D.; Padhi, S.; Pieri, M.; Sani, M.; Sharma, V.; Simon, S.; Tadel, M.; Vartak, A.; Wasserbaech, S.; Welke, C.; Wood, J.; Würthwein, F.; Yagil, A.; Zevi Della Porta, G.; Amin, N.; Bhandari, R.; Bradmiller-Feld, J.; Campagnari, C.; Dishaw, A.; Dutta, V.; Franco Sevilla, M.; George, C.; Golf, F.; Gouskos, L.; Gran, J.; Heller, R.; Incandela, J.; Mullin, S. D.; Ovcharova, A.; Qu, H.; Richman, J.; Stuart, D.; Suarez, I.; Yoo, J.; Anderson, D.; Bendavid, J.; Bornheim, A.; Bunn, J.; Duarte, J.; Lawhorn, J. M.; Mott, A.; Newman, H. B.; Pena, C.; Spiropulu, M.; Vlimant, J. R.; Xie, S.; Zhu, R. Y.; Andrews, M. B.; Ferguson, T.; Paulini, M.; Russ, J.; Sun, M.; Vogel, H.; Vorobiev, I.; Weinberg, M.; Cumalat, J. P.; Ford, W. T.; Jensen, F.; Johnson, A.; Krohn, M.; Leontsinis, S.; Mulholland, T.; Stenson, K.; Wagner, S. R.; Alexander, J.; Chaves, J.; Chu, J.; Dittmer, S.; Mcdermott, K.; Mirman, N.; Nicolas Kaufman, G.; Patterson, J. R.; Rinkevicius, A.; Ryd, A.; Skinnari, L.; Soffi, L.; Tan, S. M.; Tao, Z.; Thom, J.; Tucker, J.; Wittich, P.; Zientek, M.; Winn, D.; Abdullin, S.; Albrow, M.; Apollinari, G.; Apresyan, A.; Banerjee, S.; Bauerdick, L. A. T.; Beretvas, A.; Berryhill, J.; Bhat, P. C.; Bolla, G.; Burkett, K.; Butler, J. N.; Cheung, H. W. K.; Chlebana, F.; Cihangir, S.; Cremonesi, M.; Elvira, V. D.; Fisk, I.; Freeman, J.; Gottschalk, E.; Gray, L.; Green, D.; Grünendahl, S.; Gutsche, O.; Hare, D.; Harris, R. M.; Hasegawa, S.; Hirschauer, J.; Hu, Z.; Jayatilaka, B.; Jindariani, S.; Johnson, M.; Joshi, U.; Klima, B.; Kreis, B.; Lammel, S.; Linacre, J.; Lincoln, D.; Lipton, R.; Liu, M.; Liu, T.; Lopes De Sá, R.; Lykken, J.; Maeshima, K.; Magini, N.; Marraffino, J. M.; Maruyama, S.; Mason, D.; McBride, P.; Merkel, P.; Mrenna, S.; Nahn, S.; O'Dell, V.; Pedro, K.; Prokofyev, O.; Rakness, G.; Ristori, L.; Sexton-Kennedy, E.; Soha, A.; Spalding, W. J.; Spiegel, L.; Stoynev, S.; Strait, J.; Strobbe, N.; Taylor, L.; Tkaczyk, S.; Tran, N. V.; Uplegger, L.; Vaandering, E. W.; Vernieri, C.; Verzocchi, M.; Vidal, R.; Wang, M.; Weber, H. A.; Whitbeck, A.; Wu, Y.; Acosta, D.; Avery, P.; Bortignon, P.; Bourilkov, D.; Brinkerhoff, A.; Carnes, A.; Carver, M.; Curry, D.; Das, S.; Field, R. D.; Furic, I. K.; Konigsberg, J.; Korytov, A.; Low, J. F.; Ma, P.; Matchev, K.; Mei, H.; Mitselmakher, G.; Rank, D.; Shchutska, L.; Sperka, D.; Thomas, L.; Wang, J.; Wang, S.; Yelton, J.; Linn, S.; Markowitz, P.; Martinez, G.; Rodriguez, J. L.; Ackert, A.; Adams, T.; Askew, A.; Bein, S.; Hagopian, S.; Hagopian, V.; Johnson, K. F.; Prosper, H.; Santra, A.; Yohay, R.; Baarmand, M. M.; Bhopatkar, V.; Colafranceschi, S.; Hohlmann, M.; Noonan, D.; Roy, T.; Yumiceva, F.; Adams, M. R.; Apanasevich, L.; Berry, D.; Betts, R. R.; Bucinskaite, I.; Cavanaugh, R.; Evdokimov, O.; Gauthier, L.; Gerber, C. E.; Hofman, D. J.; Jung, K.; Sandoval Gonzalez, I. D.; Varelas, N.; Wang, H.; Wu, Z.; Zakaria, M.; Zhang, J.; Bilki, B.; Clarida, W.; Dilsiz, K.; Durgut, S.; Gandrajula, R. P.; Haytmyradov, M.; Khristenko, V.; Merlo, J.-P.; Mermerkaya, H.; Mestvirishvili, A.; Moeller, A.; Nachtman, J.; Ogul, H.; Onel, Y.; Ozok, F.; Penzo, A.; Snyder, C.; Tiras, E.; Wetzel, J.; Yi, K.; Anderson, I.; Blumenfeld, B.; Cocoros, A.; Eminizer, N.; Fehling, D.; Feng, L.; Gritsan, A. V.; Maksimovic, P.; Roskes, J.; Sarica, U.; Swartz, M.; Xiao, M.; Xin, Y.; You, C.; Al-bataineh, A.; Baringer, P.; Bean, A.; Boren, S.; Bowen, J.; Castle, J.; Forthomme, L.; Kenny, R. P.; Khalil, S.; Kropivnitskaya, A.; Majumder, D.; Mcbrayer, W.; Murray, M.; Sanders, S.; Stringer, R.; Takaki, J. D. Tapia; Wang, Q.; Ivanov, A.; Kaadze, K.; Maravin, Y.; Mohammadi, A.; Saini, L. K.; Skhirtladze, N.; Toda, S.; Rebassoo, F.; Wright, D.; Anelli, C.; Baden, A.; Baron, O.; Belloni, A.; Calvert, B.; Eno, S. C.; Ferraioli, C.; Gomez, J. A.; Hadley, N. J.; Jabeen, S.; Jeng, G. Y.; Kellogg, R. G.; Kolberg, T.; Kunkle, J.; Mignerey, A. C.; Ricci-Tam, F.; Shin, Y. H.; Skuja, A.; Tonjes, M. B.; Tonwar, S. C.; Abercrombie, D.; Allen, B.; Apyan, A.; Azzolini, V.; Barbieri, R.; Baty, A.; Bi, R.; Bierwagen, K.; Brandt, S.; Busza, W.; Cali, I. A.; D'Alfonso, M.; Demiragli, Z.; Di Matteo, L.; Gomez Ceballos, G.; Goncharov, M.; Hsu, D.; Iiyama, Y.; Innocenti, G. M.; Klute, M.; Kovalskyi, D.; Krajczar, K.; Lai, Y. S.; Lee, Y.-J.; Levin, A.; Luckey, P. D.; Maier, B.; Marini, A. C.; Mcginn, C.; Mironov, C.; Narayanan, S.; Niu, X.; Paus, C.; Roland, C.; Roland, G.; Salfeld-Nebgen, J.; Stephans, G. S. F.; Tatar, K.; Varma, M.; Velicanu, D.; Veverka, J.; Wang, J.; Wang, T. W.; Wyslouch, B.; Yang, M.; Benvenuti, A. C.; Chatterjee, R. M.; Evans, A.; Hansen, P.; Kalafut, S.; Kao, S. C.; Kubota, Y.; Lesko, Z.; Mans, J.; Nourbakhsh, S.; Ruckstuhl, N.; Rusack, R.; Tambe, N.; Turkewitz, J.; Acosta, J. G.; Oliveros, S.; Avdeeva, E.; Bloom, K.; Claes, D. R.; Fangmeier, C.; Gonzalez Suarez, R.; Kamalieddin, R.; Kravchenko, I.; Malta Rodrigues, A.; Meier, F.; Monroy, J.; Siado, J. E.; Snow, G. R.; Stieger, B.; Alyari, M.; Dolen, J.; Godshalk, A.; Harrington, C.; Iashvili, I.; Kaisen, J.; Nguyen, D.; Parker, A.; Rappoccio, S.; Roozbahani, B.; Alverson, G.; Barberis, E.; Hortiangtham, A.; Massironi, A.; Morse, D. M.; Nash, D.; Orimoto, T.; Teixeira De Lima, R.; Trocino, D.; Wang, R.-J.; Wood, D.; Bhattacharya, S.; Charaf, O.; Hahn, K. A.; Kumar, A.; Mucia, N.; Odell, N.; Pollack, B.; Schmitt, M. H.; Sung, K.; Trovato, M.; Velasco, M.; Dev, N.; Hildreth, M.; Hurtado Anampa, K.; Jessop, C.; Karmgard, D. J.; Kellams, N.; Lannon, K.; Marinelli, N.; Meng, F.; Mueller, C.; Musienko, Y.; Planer, M.; Reinsvold, A.; Ruchti, R.; Rupprecht, N.; Smith, G.; Taroni, S.; Wayne, M.; Wolf, M.; Woodard, A.; Alimena, J.; Antonelli, L.; Bylsma, B.; Durkin, L. S.; Flowers, S.; Francis, B.; Hart, A.; Hill, C.; Hughes, R.; Ji, W.; Liu, B.; Luo, W.; Puigh, D.; Winer, B. L.; Wulsin, H. W.; Cooperstein, S.; Driga, O.; Elmer, P.; Hardenbrook, J.; Hebda, P.; Lange, D.; Luo, J.; Marlow, D.; Medvedeva, T.; Mei, K.; Ojalvo, I.; Olsen, J.; Palmer, C.; Piroué, P.; Stickland, D.; Svyatkovskiy, A.; Tully, C.; Malik, S.; Barker, A.; Barnes, V. E.; Folgueras, S.; Gutay, L.; Jha, M. K.; Jones, M.; Jung, A. W.; Khatiwada, A.; Miller, D. H.; Neumeister, N.; Schulte, J. F.; Shi, X.; Sun, J.; Wang, F.; Xie, W.; Parashar, N.; Stupak, J.; Adair, A.; Akgun, B.; Chen, Z.; Ecklund, K. M.; Geurts, F. J. M.; Guilbaud, M.; Li, W.; Michlin, B.; Northup, M.; Padley, B. P.; Roberts, J.; Rorie, J.; Tu, Z.; Zabel, J.; Betchart, B.; Bodek, A.; de Barbaro, P.; Demina, R.; Duh, Y. t.; Ferbel, T.; Galanti, M.; Garcia-Bellido, A.; Han, J.; Hindrichs, O.; Khukhunaishvili, A.; Lo, K. H.; Tan, P.; Verzetti, M.; Agapitos, A.; Chou, J. P.; Gershtein, Y.; Gómez Espinosa, T. A.; Halkiadakis, E.; Heindl, M.; Hughes, E.; Kaplan, S.; Elayavalli, R. Kunnawalkam; Kyriacou, S.; Lath, A.; Nash, K.; Osherson, M.; Saka, H.; Salur, S.; Schnetzer, S.; Sheffield, D.; Somalwar, S.; Stone, R.; Thomas, S.; Thomassen, P.; Walker, M.; Delannoy, A. G.; Foerster, M.; Heideman, J.; Riley, G.; Rose, K.; Spanier, S.; Thapa, K.; Bouhali, O.; Celik, A.; Dalchenko, M.; De Mattia, M.; Delgado, A.; Dildick, S.; Eusebi, R.; Gilmore, J.; Huang, T.; Juska, E.; Kamon, T.; Mueller, R.; Pakhotin, Y.; Patel, R.; Perloff, A.; Perniè, L.; Rathjens, D.; Safonov, A.; Tatarinov, A.; Ulmer, K. A.; Akchurin, N.; Cowden, C.; Damgov, J.; De Guio, F.; Dragoiu, C.; Dudero, P. R.; Faulkner, J.; Gurpinar, E.; Kunori, S.; Lamichhane, K.; Lee, S. W.; Libeiro, T.; Peltola, T.; Undleeb, S.; Volobouev, I.; Wang, Z.; Greene, S.; Gurrola, A.; Janjam, R.; Johns, W.; Maguire, C.; Melo, A.; Ni, H.; Sheldon, P.; Tuo, S.; Velkovska, J.; Xu, Q.; Arenton, M. W.; Barria, P.; Cox, B.; Goodell, J.; Hirosky, R.; Ledovskoy, A.; Li, H.; Neu, C.; Sinthuprasith, T.; Sun, X.; Wang, Y.; Wolfe, E.; Xia, F.; Clarke, C.; Harr, R.; Karchin, P. E.; Sturdy, J.; Belknap, D. A.; Buchanan, J.; Caillol, C.; Dasu, S.; Dodd, L.; Duric, S.; Gomber, B.; Grothe, M.; Herndon, M.; Hervé, A.; Klabbers, P.; Lanaro, A.; Levine, A.; Long, K.; Loveless, R.; Perry, T.; Pierro, G. A.; Polese, G.; Ruggles, T.; Savin, A.; Smith, N.; Smith, W. H.; Taylor, D.; Woods, N.

    2017-07-01

    A search is presented for extra spatial dimensions, quantum black holes, and quark contact interactions in measurements of dijet angular distributions in proton-proton collisions at √{s}=13 TeV. The data were collected with the CMS detector at the LHC and correspond to an integrated luminosity of 2.6 fb-1. The distributions are found to be in agreement with predictions from perturbative quantum chromodynamics that include electroweak corrections. Limits for different contact interaction models are obtained. In a benchmark model, valid to next-to-leading order in QCD and in which only left-handed quarks participate, quark contact interactions are excluded up to a scale of 11.5 and 14.7 TeV for destructive or constructive interference, respectively. The production of quantum black holes is excluded for masses below 7.8 or 5.3 TeV, depending on the model. The lower limits for the scales of virtual graviton exchange in the Arkani-Hamed-Dimopoulos-Dvali model of extra spatial dimensions are in the range 7.9-11.2 TeV, and are the most stringent set of limits available.

  2. Search for new physics with dijet angular distributions in proton-proton collisions at $$\\sqrt{s}$$ = 13 TeV

    DOE PAGES

    Sirunyan, Albert M.

    2017-07-05

    A search is presented for extra spatial dimensions, quantum black holes, and quark contact interactions in measurements of dijet angular distributions in proton-proton collisions at √s = 13 TeV. The data were collected with the CMS detector at the LHC and correspond to an integrated luminosity of 2.6 fb –1. The distributions are found to be in agreement with predictions from perturbative quantum chromodynamics that include electroweak corrections. Limits for different contact interaction models are obtained in a benchmark model, valid to next-to-leading order in QCD, in which only left-handed quarks participate, with quark contact interactions excluded up to amore » scale of 11.5 or 14.7 TeV for destructive or constructive interference, respectively. The production of quantum black holes is excluded for masses below 7.8 or 5.3 TeV, depending on the model. Finally, the lower limits for the scales of virtual graviton exchange in the Arkani-Hamed--Dimopoulos--Dvali model of extra spatial dimensions are in the range 7.9-11.2 TeV, and are the most stringent set of limits available.« less

  3. Search for the production of an excited bottom quark decaying to tW in proton-proton collisions at √{s}=8 TeV

    NASA Astrophysics Data System (ADS)

    Khachatryan, V.; Sirunyan, A. M.; Tumasyan, A.; Adam, W.; Asilar, E.; Bergauer, T.; Brandstetter, J.; Brondolin, E.; Dragicevic, M.; Erö, J.; Flechl, M.; Friedl, M.; Frühwirth, R.; Ghete, V. M.; Hartl, C.; Hörmann, N.; Hrubec, J.; Jeitler, M.; Knünz, V.; König, A.; Krammer, M.; Krätschmer, I.; Liko, D.; Matsushita, T.; Mikulec, I.; Rabady, D.; Rahbaran, B.; Rohringer, H.; Schieck, J.; Schöfbeck, R.; Strauss, J.; Treberer-Treberspurg, W.; Waltenberger, W.; Wulz, C.-E.; Mossolov, V.; Shumeiko, N.; Suarez Gonzalez, J.; Alderweireldt, S.; Cornelis, T.; de Wolf, E. A.; Janssen, X.; Knutsson, A.; Lauwers, J.; Luyckx, S.; Ochesanu, S.; Rougny, R.; van de Klundert, M.; van Haevermaet, H.; van Mechelen, P.; van Remortel, N.; van Spilbeeck, A.; Abu Zeid, S.; Blekman, F.; D'Hondt, J.; Daci, N.; de Bruyn, I.; Deroover, K.; Heracleous, N.; Keaveney, J.; Lowette, S.; Moreels, L.; Olbrechts, A.; Python, Q.; Strom, D.; Tavernier, S.; van Doninck, W.; van Mulders, P.; van Onsem, G. P.; van Parijs, I.; Barria, P.; Caillol, C.; Clerbaux, B.; de Lentdecker, G.; Delannoy, H.; Fasanella, G.; Favart, L.; Gay, A. P. R.; Grebenyuk, A.; Karapostoli, G.; Lenzi, T.; Léonard, A.; Maerschalk, T.; Marinov, A.; Perniè, L.; Randle-Conde, A.; Reis, T.; Seva, T.; Vander Velde, C.; Vanlaer, P.; Yonamine, R.; Zenoni, F.; Zhang, F.; Beernaert, K.; Benucci, L.; Cimmino, A.; Crucy, S.; Dobur, D.; Fagot, A.; Garcia, G.; Gul, M.; McCartin, J.; Ocampo Rios, A. A.; Poyraz, D.; Ryckbosch, D.; Salva, S.; Sigamani, M.; Strobbe, N.; Tytgat, M.; van Driessche, W.; Yazgan, E.; Zaganidis, N.; Basegmez, S.; Beluffi, C.; Bondu, O.; Brochet, S.; Bruno, G.; Castello, R.; Caudron, A.; Ceard, L.; da Silveira, G. G.; Delaere, C.; Favart, D.; Forthomme, L.; Giammanco, A.; Hollar, J.; Jafari, A.; Jez, P.; Komm, M.; Lemaitre, V.; Mertens, A.; Nuttens, C.; Perrini, L.; Pin, A.; Piotrzkowski, K.; Popov, A.; Quertenmont, L.; Selvaggi, M.; Vidal Marono, M.; Beliy, N.; Hammad, G. H.; Aldá Júnior, W. L.; Alves, G. A.; Brito, L.; Correa Martins Junior, M.; Hamer, M.; Hensel, C.; Mora Herrera, C.; Moraes, A.; Pol, M. E.; Rebello Teles, P.; Belchior Batista Das Chagas, E.; Carvalho, W.; Chinellato, J.; Custódio, A.; da Costa, E. M.; de Jesus Damiao, D.; de Oliveira Martins, C.; Fonseca de Souza, S.; Huertas Guativa, L. M.; Malbouisson, H.; Matos Figueiredo, D.; Mundim, L.; Nogima, H.; Prado da Silva, W. L.; Santoro, A.; Sznajder, A.; Tonelli Manganote, E. J.; Vilela Pereira, A.; Ahuja, S.; Bernardes, C. A.; de Souza Santos, A.; Dogra, S.; Fernandez Perez Tomei, T. R.; Gregores, E. M.; Mercadante, P. G.; Moon, C. S.; Novaes, S. F.; Padula, Sandra S.; Romero Abad, D.; Ruiz Vargas, J. C.; Aleksandrov, A.; Hadjiiska, R.; Iaydjiev, P.; Rodozov, M.; Stoykova, S.; Sultanov, G.; Vutova, M.; Dimitrov, A.; Glushkov, I.; Litov, L.; Pavlov, B.; Petkov, P.; Ahmad, M.; Bian, J. G.; Chen, G. M.; Chen, H. S.; Chen, M.; Cheng, T.; Du, R.; Jiang, C. H.; Romeo, F.; Shaheen, S. M.; Tao, J.; Wang, C.; Wang, Z.; Zhang, H.; Zhu, S.; Asawatangtrakuldee, C.; Ban, Y.; Li, Q.; Liu, S.; Mao, Y.; Qian, S. J.; Wang, D.; Xu, Z.; Zou, W.; Avila, C.; Cabrera, A.; Chaparro Sierra, L. F.; Florez, C.; Gomez, J. P.; Gomez Moreno, B.; Sanabria, J. C.; Godinovic, N.; Lelas, D.; Puljak, I.; Ribeiro Cipriano, P. M.; Antunovic, Z.; Kovac, M.; Brigljevic, V.; Kadija, K.; Luetic, J.; Micanovic, S.; Sudic, L.; Attikis, A.; Mavromanolakis, G.; Mousa, J.; Nicolaou, C.; Ptochos, F.; Razis, P. A.; Rykaczewski, H.; Bodlak, M.; Finger, M.; Finger, M.; Awad, A.; El-Khateeb, E.; Mohamed, A.; Salama, E.; Calpas, B.; Kadastik, M.; Murumaa, M.; Raidal, M.; Tiko, A.; Veelken, C.; Eerola, P.; Pekkanen, J.; Voutilainen, M.; Härkönen, J.; Karimäki, V.; Kinnunen, R.; Lampén, T.; Lassila-Perini, K.; Lehti, S.; Lindén, T.; Luukka, P.; Mäenpää, T.; Peltola, T.; Tuominen, E.; Tuominiemi, J.; Tuovinen, E.; Wendland, L.; Talvitie, J.; Tuuva, T.; Besancon, M.; Couderc, F.; Dejardin, M.; Denegri, D.; Fabbro, B.; Faure, J. L.; Favaro, C.; Ferri, F.; Ganjour, S.; Givernaud, A.; Gras, P.; Hamel de Monchenault, G.; Jarry, P.; Locci, E.; Machet, M.; Malcles, J.; Rander, J.; Rosowsky, A.; Titov, M.; Zghiche, A.; Antropov, I.; Baffioni, S.; Beaudette, F.; Busson, P.; Cadamuro, L.; Chapon, E.; Charlot, C.; Dahms, T.; Davignon, O.; Filipovic, N.; Florent, A.; Granier de Cassagnac, R.; Lisniak, S.; Mastrolorenzo, L.; Miné, P.; Naranjo, I. N.; Nguyen, M.; Ochando, C.; Ortona, G.; Paganini, P.; Regnard, S.; Salerno, R.; Sauvan, J. B.; Sirois, Y.; Strebler, T.; Yilmaz, Y.; Zabi, A.; Agram, J.-L.; Andrea, J.; Aubin, A.; Bloch, D.; Brom, J.-M.; Buttignol, M.; Chabert, E. C.; Chanon, N.; Collard, C.; Conte, E.; Coubez, X.; Fontaine, J.-C.; Gelé, D.; Goerlach, U.; Goetzmann, C.; Le Bihan, A.-C.; Merlin, J. A.; Skovpen, K.; van Hove, P.; Gadrat, S.; Beauceron, S.; Bernet, C.; Boudoul, G.; Bouvier, E.; Carrillo Montoya, C. A.; Chasserat, J.; Chierici, R.; Contardo, D.; Courbon, B.; Depasse, P.; El Mamouni, H.; Fan, J.; Fay, J.; Gascon, S.; Gouzevitch, M.; Ille, B.; Lagarde, F.; Laktineh, I. B.; Lethuillier, M.; Mirabito, L.; Pequegnot, A. L.; Perries, S.; Ruiz Alvarez, J. D.; Sabes, D.; Sgandurra, L.; Sordini, V.; Vander Donckt, M.; Verdier, P.; Viret, S.; Xiao, H.; Toriashvili, T.; Tsamalaidze, Z.; Autermann, C.; Beranek, S.; Edelhoff, M.; Feld, L.; Heister, A.; Kiesel, M. K.; Klein, K.; Lipinski, M.; Ostapchuk, A.; Preuten, M.; Raupach, F.; Schael, S.; Schulte, J. F.; Verlage, T.; Weber, H.; Wittmer, B.; Zhukov, V.; Ata, M.; Brodski, M.; Dietz-Laursonn, E.; Duchardt, D.; Endres, M.; Erdmann, M.; Erdweg, S.; Esch, T.; Fischer, R.; Güth, A.; Hebbeker, T.; Heidemann, C.; Hoepfner, K.; Klingebiel, D.; Knutzen, S.; Kreuzer, P.; Merschmeyer, M.; Meyer, A.; Millet, P.; Olschewski, M.; Padeken, K.; Papacz, P.; Pook, T.; Radziej, M.; Reithler, H.; Rieger, M.; Scheuch, F.; Sonnenschein, L.; Teyssier, D.; Thüer, S.; Cherepanov, V.; Erdogan, Y.; Flügge, G.; Geenen, H.; Geisler, M.; Hoehle, F.; Kargoll, B.; Kress, T.; Kuessel, Y.; Künsken, A.; Lingemann, J.; Nehrkorn, A.; Nowack, A.; Nugent, I. M.; Pistone, C.; Pooth, O.; Stahl, A.; Aldaya Martin, M.; Asin, I.; Bartosik, N.; Behnke, O.; Behrens, U.; Bell, A. J.; Borras, K.; Burgmeier, A.; Cakir, A.; Calligaris, L.; Campbell, A.; Choudhury, S.; Costanza, F.; Diez Pardos, C.; Dolinska, G.; Dooling, S.; Dorland, T.; Eckerlin, G.; Eckstein, D.; Eichhorn, T.; Flucke, G.; Gallo, E.; Garay Garcia, J.; Geiser, A.; Gizhko, A.; Gunnellini, P.; Hauk, J.; Hempel, M.; Jung, H.; Kalogeropoulos, A.; Karacheban, O.; Kasemann, M.; Katsas, P.; Kieseler, J.; Kleinwort, C.; Korol, I.; Lange, W.; Leonard, J.; Lipka, K.; Lobanov, A.; Lohmann, W.; Mankel, R.; Marfin, I.; Melzer-Pellmann, I.-A.; Meyer, A. B.; Mittag, G.; Mnich, J.; Mussgiller, A.; Naumann-Emme, S.; Nayak, A.; Ntomari, E.; Perrey, H.; Pitzl, D.; Placakyte, R.; Raspereza, A.; Roland, B.; Sahin, M. Ö.; Saxena, P.; Schoerner-Sadenius, T.; Schröder, M.; Seitz, C.; Spannagel, S.; Trippkewitz, K. D.; Walsh, R.; Wissing, C.; Blobel, V.; Centis Vignali, M.; Draeger, A. R.; Erfle, J.; Garutti, E.; Goebel, K.; Gonzalez, D.; Görner, M.; Haller, J.; Hoffmann, M.; Höing, R. S.; Junkes, A.; Klanner, R.; Kogler, R.; Lapsien, T.; Lenz, T.; Marchesini, I.; Marconi, D.; Meyer, M.; Nowatschin, D.; Ott, J.; Pantaleo, F.; Peiffer, T.; Perieanu, A.; Pietsch, N.; Poehlsen, J.; Rathjens, D.; Sander, C.; Schettler, H.; Schleper, P.; Schlieckau, E.; Schmidt, A.; Schwandt, J.; Seidel, M.; Sola, V.; Stadie, H.; Steinbrück, G.; Tholen, H.; Troendle, D.; Usai, E.; Vanelderen, L.; Vanhoefer, A.; Vormwald, B.; Akbiyik, M.; Barth, C.; Baus, C.; Berger, J.; Böser, C.; Butz, E.; Chwalek, T.; Colombo, F.; de Boer, W.; Descroix, A.; Dierlamm, A.; Fink, S.; Frensch, F.; Giffels, M.; Gilbert, A.; Hartmann, F.; Heindl, S. M.; Husemann, U.; Katkov, I.; Kornmayer, A.; Lobelle Pardo, P.; Maier, B.; Mildner, H.; Mozer, M. U.; Müller, T.; Müller, Th.; Plagge, M.; Quast, G.; Rabbertz, K.; Röcker, S.; Roscher, F.; Simonis, H. J.; Stober, F. M.; Ulrich, R.; Wagner-Kuhr, J.; Wayand, S.; Weber, M.; Weiler, T.; Wöhrmann, C.; Wolf, R.; Anagnostou, G.; Daskalakis, G.; Geralis, T.; Giakoumopoulou, V. A.; Kyriakis, A.; Loukas, D.; Psallidas, A.; Topsis-Giotis, I.; Agapitos, A.; Kesisoglou, S.; Panagiotou, A.; Saoulidou, N.; Tziaferi, E.; Evangelou, I.; Flouris, G.; Foudas, C.; Kokkas, P.; Loukas, N.; Manthos, N.; Papadopoulos, I.; Paradas, E.; Strologas, J.; Bencze, G.; Hajdu, C.; Hazi, A.; Hidas, P.; Horvath, D.; Sikler, F.; Veszpremi, V.; Vesztergombi, G.; Zsigmond, A. J.; Beni, N.; Czellar, S.; Karancsi, J.; Molnar, J.; Szillasi, Z.; Bartók, M.; Makovec, A.; Raics, P.; Trocsanyi, Z. L.; Ujvari, B.; Mal, P.; Mandal, K.; Sahoo, N.; Swain, S. K.; Bansal, S.; Beri, S. B.; Bhatnagar, V.; Chawla, R.; Gupta, R.; Bhawandeep, U.; Kalsi, A. K.; Kaur, A.; Kaur, M.; Kumar, R.; Mehta, A.; Mittal, M.; Singh, J. B.; Walia, G.; Kumar, Ashok; Bhardwaj, A.; Choudhary, B. C.; Garg, R. B.; Kumar, A.; Malhotra, S.; Naimuddin, M.; Nishu, N.; Ranjan, K.; Sharma, R.; Sharma, V.; Banerjee, S.; Bhattacharya, S.; Chatterjee, K.; Dey, S.; Dutta, S.; Jain, Sa.; Majumdar, N.; Modak, A.; Mondal, K.; Mukherjee, S.; Mukhopadhyay, S.; Roy, A.; Roy, D.; Roy Chowdhury, S.; Sarkar, S.; Sharan, M.; Abdulsalam, A.; Chudasama, R.; Dutta, D.; Jha, V.; Kumar, V.; Mohanty, A. K.; Pant, L. M.; Shukla, P.; Topkar, A.; Aziz, T.; Banerjee, S.; Bhowmik, S.; Chatterjee, R. M.; Dewanjee, R. K.; Dugad, S.; Ganguly, S.; Ghosh, S.; Guchait, M.; Gurtu, A.; Kole, G.; Kumar, S.; Mahakud, B.; Maity, M.; Majumder, G.; Mazumdar, K.; Mitra, S.; Mohanty, G. B.; Parida, B.; Sarkar, T.; Sudhakar, K.; Sur, N.; Sutar, B.; Wickramage, N.; Chauhan, S.; Dube, S.; Sharma, S.; Bakhshiansohi, H.; Behnamian, H.; Etesami, S. M.; Fahim, A.; Goldouzian, R.; Khakzad, M.; Mohammadi Najafabadi, M.; Naseri, M.; Paktinat Mehdiabadi, S.; Rezaei Hosseinabadi, F.; Safarzadeh, B.; Zeinali, M.; Felcini, M.; Grunewald, M.; Abbrescia, M.; Calabria, C.; Caputo, C.; Chhibra, S. S.; Colaleo, A.; Creanza, D.; Cristella, L.; de Filippis, N.; de Palma, M.; Fiore, L.; Iaselli, G.; Maggi, G.; Maggi, M.; Miniello, G.; My, S.; Nuzzo, S.; Pompili, A.; Pugliese, G.; Radogna, R.; Ranieri, A.; Selvaggi, G.; Silvestris, L.; Venditti, R.; Verwilligen, P.; Abbiendi, G.; Battilana, C.; Benvenuti, A. C.; Bonacorsi, D.; Braibant-Giacomelli, S.; Brigliadori, L.; Campanini, R.; Capiluppi, P.; Castro, A.; Cavallo, F. R.; Codispoti, G.; Cuffiani, M.; Dallavalle, G. M.; Fabbri, F.; Fanfani, A.; Fasanella, D.; Giacomelli, P.; Grandi, C.; Guiducci, L.; Marcellini, S.; Masetti, G.; Montanari, A.; Navarria, F. L.; Perrotta, A.; Rossi, A. M.; Rovelli, T.; Siroli, G. P.; Tosi, N.; Travaglini, R.; Cappello, G.; Chiorboli, M.; Costa, S.; Giordano, F.; Potenza, R.; Tricomi, A.; Tuve, C.; Barbagli, G.; Ciulli, V.; Civinini, C.; D'Alessandro, R.; Focardi, E.; Gonzi, S.; Gori, V.; Lenzi, P.; Meschini, M.; Paoletti, S.; Sguazzoni, G.; Tropiano, A.; Viliani, L.; Benussi, L.; Bianco, S.; Fabbri, F.; Piccolo, D.; Primavera, F.; Calvelli, V.; Ferro, F.; Lo Vetere, M.; Monge, M. R.; Robutti, E.; Tosi, S.; Brianza, L.; Dinardo, M. E.; Fiorendi, S.; Gennai, S.; Gerosa, R.; Ghezzi, A.; Govoni, P.; Malvezzi, S.; Manzoni, R. A.; Marzocchi, B.; Menasce, D.; Moroni, L.; Paganoni, M.; Pedrini, D.; Ragazzi, S.; Redaelli, N.; Tabarelli de Fatis, T.; Buontempo, S.; Cavallo, N.; di Guida, S.; Esposito, M.; Fabozzi, F.; Iorio, A. O. M.; Lanza, G.; Lista, L.; Meola, S.; Merola, M.; Paolucci, P.; Sciacca, C.; Thyssen, F.; Azzi, P.; Bacchetta, N.; Benato, L.; Bisello, D.; Boletti, A.; Branca, A.; Carlin, R.; Checchia, P.; Dall'Osso, M.; Dorigo, T.; Gasparini, F.; Gasparini, U.; Gozzelino, A.; Kanishchev, K.; Lacaprara, S.; Margoni, M.; Meneguzzo, A. T.; Passaseo, M.; Pazzini, J.; Pegoraro, M.; Pozzobon, N.; Ronchese, P.; Simonetto, F.; Torassa, E.; Tosi, M.; Zanetti, M.; Zotto, P.; Zucchetta, A.; Zumerle, G.; Braghieri, A.; Magnani, A.; Montagna, P.; Ratti, S. P.; Re, V.; Riccardi, C.; Salvini, P.; Vai, I.; Vitulo, P.; Alunni Solestizi, L.; Biasini, M.; Bilei, G. M.; Ciangottini, D.; Fanò, L.; Lariccia, P.; Mantovani, G.; Menichelli, M.; Saha, A.; Santocchia, A.; Spiezia, A.; Androsov, K.; Azzurri, P.; Bagliesi, G.; Bernardini, J.; Boccali, T.; Broccolo, G.; Castaldi, R.; Ciocci, M. A.; Dell'Orso, R.; Donato, S.; Fedi, G.; Foà, L.; Giassi, A.; Grippo, M. T.; Ligabue, F.; Lomtadze, T.; Martini, L.; Messineo, A.; Palla, F.; Rizzi, A.; Savoy-Navarro, A.; Serban, A. T.; Spagnolo, P.; Squillacioti, P.; Tenchini, R.; Tonelli, G.; Venturi, A.; Verdini, P. G.; Barone, L.; Cavallari, F.; D'Imperio, G.; Del Re, D.; Diemoz, M.; Gelli, S.; Jorda, C.; Longo, E.; Margaroli, F.; Meridiani, P.; Micheli, F.; Organtini, G.; Paramatti, R.; Preiato, F.; Rahatlou, S.; Rovelli, C.; Santanastasio, F.; Traczyk, P.; Amapane, N.; Arcidiacono, R.; Argiro, S.; Arneodo, M.; Bellan, R.; Biino, C.; Cartiglia, N.; Costa, M.; Covarelli, R.; Dattola, D.; Degano, A.; Demaria, N.; Finco, L.; Kiani, B.; Mariotti, C.; Maselli, S.; Migliore, E.; Monaco, V.; Monteil, E.; Musich, M.; Obertino, M. M.; Pacher, L.; Pastrone, N.; Pelliccioni, M.; Pinna Angioni, G. L.; Ravera, F.; Romero, A.; Ruspa, M.; Sacchi, R.; Solano, A.; Staiano, A.; Belforte, S.; Candelise, V.; Casarsa, M.; Cossutti, F.; Della Ricca, G.; Gobbo, B.; La Licata, C.; Marone, M.; Schizzi, A.; Umer, T.; Zanetti, A.; Kropivnitskaya, A.; Nam, S. K.; Kim, D. H.; Kim, G. N.; Kim, M. S.; Kong, D. J.; Lee, S.; Oh, Y. D.; Sakharov, A.; Son, D. C.; Brochero Cifuentes, J. A.; Kim, H.; Kim, T. J.; Ryu, M. S.; Song, S.; Choi, S.; Go, Y.; Gyun, D.; Hong, B.; Jo, M.; Kim, H.; Kim, Y.; Lee, B.; Lee, K.; Lee, K. S.; Lee, S.; Park, S. K.; Roh, Y.; Yoo, H. D.; Choi, M.; Kim, H.; Kim, J. H.; Lee, J. S. H.; Park, I. C.; Ryu, G.; Choi, Y.; Choi, Y. K.; Goh, J.; Kim, D.; Kwon, E.; Lee, J.; Yu, I.; Juodagalvis, A.; Vaitkus, J.; Ahmed, I.; Ibrahim, Z. A.; Komaragiri, J. R.; Ali, M. A. B. Md; Mohamad Idris, F.; Wan Abdullah, W. A. T.; Yusli, M. N.; Casimiro Linares, E.; Castilla-Valdez, H.; de La Cruz-Burelo, E.; Heredia-de La Cruz, I.; Hernandez-Almada, A.; Lopez-Fernandez, R.; Sanchez-Hernandez, A.; Carrillo Moreno, S.; Vazquez Valencia, F.; Pedraza, I.; Salazar Ibarguen, H. A.; Morelos Pineda, A.; Krofcheck, D.; Butler, P. H.; Reucroft, S.; Ahmad, A.; Ahmad, M.; Hassan, Q.; Hoorani, H. R.; Khan, W. A.; Khurshid, T.; Shoaib, M.; Bialkowska, H.; Bluj, M.; Boimska, B.; Frueboes, T.; Górski, M.; Kazana, M.; Nawrocki, K.; Romanowska-Rybinska, K.; Szleper, M.; Zalewski, P.; Brona, G.; Bunkowski, K.; Doroba, K.; Kalinowski, A.; Konecki, M.; Krolikowski, J.; Misiura, M.; Olszewski, M.; Walczak, M.; Bargassa, P.; Beirão da Cruz E Silva, C.; di Francesco, A.; Faccioli, P.; Ferreira Parracho, P. G.; Gallinaro, M.; Leonardo, N.; Lloret Iglesias, L.; Nguyen, F.; Rodrigues Antunes, J.; Seixas, J.; Toldaiev, O.; Vadruccio, D.; Varela, J.; Vischia, P.; Afanasiev, S.; Bunin, P.; Gavrilenko, M.; Golutvin, I.; Gorbunov, I.; Kamenev, A.; Karjavin, V.; Konoplyanikov, V.; Lanev, A.; Malakhov, A.; Matveev, V.; Moisenz, P.; Palichik, V.; Perelygin, V.; Shmatov, S.; Shulha, S.; Skatchkov, N.; Smirnov, V.; Zarubin, A.; Golovtsov, V.; Ivanov, Y.; Kim, V.; Kuznetsova, E.; Levchenko, P.; Murzin, V.; Oreshkin, V.; Smirnov, I.; Sulimov, V.; Uvarov, L.; Vavilov, S.; Vorobyev, A.; Andreev, Yu.; Dermenev, A.; Gninenko, S.; Golubev, N.; Karneyeu, A.; Kirsanov, M.; Krasnikov, N.; Pashenkov, A.; Tlisov, D.; Toropin, A.; Epshteyn, V.; Gavrilov, V.; Lychkovskaya, N.; Popov, V.; Pozdnyakov, I.; Safronov, G.; Spiridonov, A.; Vlasov, E.; Zhokin, A.; Bylinkin, A.; Andreev, V.; Azarkin, M.; Dremin, I.; Kirakosyan, M.; Leonidov, A.; Mesyats, G.; Rusakov, S. V.; Vinogradov, A.; Baskakov, A.; Belyaev, A.; Boos, E.; Dubinin, M.; Dudko, L.; Ershov, A.; Gribushin, A.; Klyukhin, V.; Kodolova, O.; Lokhtin, I.; Myagkov, I.; Obraztsov, S.; Petrushanko, S.; Savrin, V.; Snigirev, A.; Azhgirey, I.; Bayshev, I.; Bitioukov, S.; Kachanov, V.; Kalinin, A.; Konstantinov, D.; Krychkine, V.; Petrov, V.; Ryutin, R.; Sobol, A.; Tourtchanovitch, L.; Troshin, S.; Tyurin, N.; Uzunian, A.; Volkov, A.; Adzic, P.; Ekmedzic, M.; Milosevic, J.; Rekovic, V.; Alcaraz Maestre, J.; Calvo, E.; Cerrada, M.; Chamizo Llatas, M.; Colino, N.; de La Cruz, B.; Delgado Peris, A.; Domínguez Vázquez, D.; Escalante Del Valle, A.; Fernandez Bedoya, C.; Fernández Ramos, J. P.; Flix, J.; Fouz, M. C.; Garcia-Abia, P.; Gonzalez Lopez, O.; Goy Lopez, S.; Hernandez, J. M.; Josa, M. I.; Navarro de Martino, E.; Pérez-Calero Yzquierdo, A.; Puerta Pelayo, J.; Quintario Olmeda, A.; Redondo, I.; Romero, L.; Soares, M. S.; Albajar, C.; de Trocóniz, J. F.; Missiroli, M.; Moran, D.; Brun, H.; Cuevas, J.; Fernandez Menendez, J.; Folgueras, S.; Gonzalez Caballero, I.; Palencia Cortezon, E.; Vizan Garcia, J. M.; Cabrillo, I. J.; Calderon, A.; Castiñeiras de Saa, J. R.; de Castro Manzano, P.; Duarte Campderros, J.; Fernandez, M.; Garcia-Ferrero, J.; Gomez, G.; Graziano, A.; Lopez Virto, A.; Marco, J.; Marco, R.; Martinez Rivero, C.; Matorras, F.; Munoz Sanchez, F. J.; Piedra Gomez, J.; Rodrigo, T.; Rodríguez-Marrero, A. Y.; Ruiz-Jimeno, A.; Scodellaro, L.; Vila, I.; Vilar Cortabitarte, R.; Abbaneo, D.; Auffray, E.; Auzinger, G.; Bachtis, M.; Baillon, P.; Ball, A. H.; Barney, D.; Benaglia, A.; Bendavid, J.; Benhabib, L.; Benitez, J. F.; Berruti, G. M.; Bloch, P.; Bocci, A.; Bonato, A.; Botta, C.; Breuker, H.; Camporesi, T.; Cerminara, G.; Colafranceschi, S.; D'Alfonso, M.; D'Enterria, D.; Dabrowski, A.; Daponte, V.; David, A.; de Gruttola, M.; de Guio, F.; de Roeck, A.; de Visscher, S.; di Marco, E.; Dobson, M.; Dordevic, M.; Dorney, B.; Du Pree, T.; Dünser, M.; Dupont, N.; Elliott-Peisert, A.; Franzoni, G.; Funk, W.; Gigi, D.; Gill, K.; Giordano, D.; Girone, M.; Glege, F.; Guida, R.; Gundacker, S.; Guthoff, M.; Hammer, J.; Harris, P.; Hegeman, J.; Innocente, V.; Janot, P.; Kirschenmann, H.; Kortelainen, M. J.; Kousouris, K.; Krajczar, K.; Lecoq, P.; Lourenço, C.; Lucchini, M. T.; Magini, N.; Malgeri, L.; Mannelli, M.; Martelli, A.; Masetti, L.; Meijers, F.; Mersi, S.; Meschi, E.; Moortgat, F.; Morovic, S.; Mulders, M.; Nemallapudi, M. V.; Neugebauer, H.; Orfanelli, S.; Orsini, L.; Pape, L.; Perez, E.; Peruzzi, M.; Petrilli, A.; Petrucciani, G.; Pfeiffer, A.; Piparo, D.; Racz, A.; Rolandi, G.; Rovere, M.; Ruan, M.; Sakulin, H.; Schäfer, C.; Schwick, C.; Sharma, A.; Silva, P.; Simon, M.; Sphicas, P.; Spiga, D.; Steggemann, J.; Stieger, B.; Stoye, M.; Takahashi, Y.; Treille, D.; Triossi, A.; Tsirou, A.; Veres, G. I.; Wardle, N.; Wöhri, H. K.; Zagozdzinska, A.; Zeuner, W. D.; Bertl, W.; Deiters, K.; Erdmann, W.; Horisberger, R.; Ingram, Q.; Kaestli, H. C.; Kotlinski, D.; Langenegger, U.; Renker, D.; Rohe, T.; Bachmair, F.; Bäni, L.; Bianchini, L.; Buchmann, M. A.; Casal, B.; Dissertori, G.; Dittmar, M.; Donegà, M.; Eller, P.; Grab, C.; Heidegger, C.; Hits, D.; Hoss, J.; Kasieczka, G.; Lustermann, W.; Mangano, B.; Marini, A. C.; Marionneau, M.; Martinez Ruiz Del Arbol, P.; Masciovecchio, M.; Meister, D.; Musella, P.; Nessi-Tedaldi, F.; Pandolfi, F.; Pata, J.; Pauss, F.; Perrozzi, L.; Quittnat, M.; Rossini, M.; Starodumov, A.; Takahashi, M.; Tavolaro, V. R.; Theofilatos, K.; Wallny, R.; Aarrestad, T. K.; Amsler, C.; Caminada, L.; Canelli, M. F.; Chiochia, V.; de Cosa, A.; Galloni, C.; Hinzmann, A.; Hreus, T.; Kilminster, B.; Lange, C.; Ngadiuba, J.; Pinna, D.; Robmann, P.; Ronga, F. J.; Salerno, D.; Yang, Y.; Cardaci, M.; Chen, K. H.; Doan, T. H.; Jain, Sh.; Khurana, R.; Konyushikhin, M.; Kuo, C. M.; Lin, W.; Lu, Y. J.; Volpe, R.; Yu, S. S.; Kumar, Arun; Bartek, R.; Chang, P.; Chang, Y. H.; Chang, Y. W.; Chao, Y.; Chen, K. F.; Chen, P. H.; Dietz, C.; Fiori, F.; Grundler, U.; Hou, W.-S.; Hsiung, Y.; Liu, Y. F.; Lu, R.-S.; Miñano Moya, M.; Petrakou, E.; Tsai, J. F.; Tzeng, Y. M.; Asavapibhop, B.; Kovitanggoon, K.; Singh, G.; Srimanobhas, N.; Suwonjandee, N.; Adiguzel, A.; Cerci, S.; Demiroglu, Z. S.; Dozen, C.; Dumanoglu, I.; Girgis, S.; Gokbulut, G.; Guler, Y.; Gurpinar, E.; Hos, I.; Kangal, E. E.; Kayis Topaksu, A.; Onengut, G.; Ozdemir, K.; Ozturk, S.; Tali, B.; Topakli, H.; Vergili, M.; Zorbilmez, C.; Akin, I. V.; Bilin, B.; Bilmis, S.; Isildak, B.; Karapinar, G.; Surat, U. E.; Yalvac, M.; Zeyrek, M.; Albayrak, E. A.; Gülmez, E.; Kaya, M.; Kaya, O.; Yetkin, T.; Cankocak, K.; Sen, S.; Vardarlı, F. I.; Grynyov, B.; Levchuk, L.; Sorokin, P.; Aggleton, R.; Ball, F.; Beck, L.; Brooke, J. J.; Clement, E.; Cussans, D.; Flacher, H.; Goldstein, J.; Grimes, M.; Heath, G. P.; Heath, H. F.; Jacob, J.; Kreczko, L.; Lucas, C.; Meng, Z.; Newbold, D. M.; Paramesvaran, S.; Poll, A.; Sakuma, T.; Seif El Nasr-Storey, S.; Senkin, S.; Smith, D.; Smith, V. J.; Bell, K. W.; Belyaev, A.; Brew, C.; Brown, R. M.; Cockerill, D. J. A.; Coughlan, J. A.; Harder, K.; Harper, S.; Olaiya, E.; Petyt, D.; Shepherd-Themistocleous, C. H.; Thea, A.; Thomas, L.; Tomalin, I. R.; Williams, T.; Womersley, W. J.; Worm, S. D.; Baber, M.; Bainbridge, R.; Buchmuller, O.; Bundock, A.; Burton, D.; Casasso, S.; Citron, M.; Colling, D.; Corpe, L.; Cripps, N.; Dauncey, P.; Davies, G.; de Wit, A.; Della Negra, M.; Dunne, P.; Elwood, A.; Ferguson, W.; Fulcher, J.; Futyan, D.; Hall, G.; Iles, G.; Kenzie, M.; Lane, R.; Lucas, R.; Lyons, L.; Magnan, A.-M.; Malik, S.; Nash, J.; Nikitenko, A.; Pela, J.; Pesaresi, M.; Petridis, K.; Raymond, D. M.; Richards, A.; Rose, A.; Seez, C.; Tapper, A.; Uchida, K.; Vazquez Acosta, M.; Virdee, T.; Zenz, S. C.; Cole, J. E.; Hobson, P. R.; Khan, A.; Kyberd, P.; Leggat, D.; Leslie, D.; Reid, I. D.; Symonds, P.; Teodorescu, L.; Turner, M.; Borzou, A.; Call, K.; Dittmann, J.; Hatakeyama, K.; Kasmi, A.; Liu, H.; Pastika, N.; Charaf, O.; Cooper, S. I.; Henderson, C.; Rumerio, P.; Avetisyan, A.; Bose, T.; Fantasia, C.; Gastler, D.; Lawson, P.; Rankin, D.; Richardson, C.; Rohlf, J.; St. John, J.; Sulak, L.; Zou, D.; Alimena, J.; Berry, E.; Bhattacharya, S.; Cutts, D.; Dhingra, N.; Ferapontov, A.; Garabedian, A.; Heintz, U.; Laird, E.; Landsberg, G.; Mao, Z.; Narain, M.; Piperov, S.; Sagir, S.; Sinthuprasith, T.; Syarif, R.; Breedon, R.; Breto, G.; Calderon de La Barca Sanchez, M.; Chauhan, S.; Chertok, M.; Conway, J.; Conway, R.; Cox, P. T.; Erbacher, R.; Gardner, M.; Ko, W.; Lander, R.; Mulhearn, M.; Pellett, D.; Pilot, J.; Ricci-Tam, F.; Shalhout, S.; Smith, J.; Squires, M.; Stolp, D.; Tripathi, M.; Wilbur, S.; Yohay, R.; Cousins, R.; Everaerts, P.; Farrell, C.; Hauser, J.; Ignatenko, M.; Saltzberg, D.; Takasugi, E.; Valuev, V.; Weber, M.; Burt, K.; Clare, R.; Ellison, J.; Gary, J. W.; Hanson, G.; Heilman, J.; Ivova Paneva, M.; Jandir, P.; Kennedy, E.; Lacroix, F.; Long, O. R.; Luthra, A.; Malberti, M.; Olmedo Negrete, M.; Shrinivas, A.; Wei, H.; Wimpenny, S.; Branson, J. G.; Cerati, G. B.; Cittolin, S.; D'Agnolo, R. T.; Holzner, A.; Kelley, R.; Klein, D.; Letts, J.; MacNeill, I.; Olivito, D.; Padhi, S.; Pieri, M.; Sani, M.; Sharma, V.; Simon, S.; Tadel, M.; Vartak, A.; Wasserbaech, S.; Welke, C.; Würthwein, F.; Yagil, A.; Zevi Della Porta, G.; Barge, D.; Bradmiller-Feld, J.; Campagnari, C.; Dishaw, A.; Dutta, V.; Flowers, K.; Franco Sevilla, M.; Geffert, P.; George, C.; Golf, F.; Gouskos, L.; Gran, J.; Incandela, J.; Justus, C.; McColl, N.; Mullin, S. D.; Richman, J.; Stuart, D.; Suarez, I.; To, W.; West, C.; Yoo, J.; Anderson, D.; Apresyan, A.; Bornheim, A.; Bunn, J.; Chen, Y.; Duarte, J.; Mott, A.; Newman, H. B.; Pena, C.; Pierini, M.; Spiropulu, M.; Vlimant, J. R.; Xie, S.; Zhu, R. Y.; Azzolini, V.; Calamba, A.; Carlson, B.; Ferguson, T.; Paulini, M.; Russ, J.; Sun, M.; Vogel, H.; Vorobiev, I.; Cumalat, J. P.; Ford, W. T.; Gaz, A.; Jensen, F.; Johnson, A.; Krohn, M.; Mulholland, T.; Nauenberg, U.; Stenson, K.; Wagner, S. R.; Alexander, J.; Chatterjee, A.; Chaves, J.; Chu, J.; Dittmer, S.; Eggert, N.; Mirman, N.; Nicolas Kaufman, G.; Patterson, J. R.; Rinkevicius, A.; Ryd, A.; Skinnari, L.; Soffi, L.; Sun, W.; Tan, S. M.; Teo, W. D.; Thom, J.; Thompson, J.; Tucker, J.; Weng, Y.; Wittich, P.; Abdullin, S.; Albrow, M.; Anderson, J.; Apollinari, G.; Bauerdick, L. A. T.; Beretvas, A.; Berryhill, J.; Bhat, P. C.; Bolla, G.; Burkett, K.; Butler, J. N.; Cheung, H. W. K.; Chlebana, F.; Cihangir, S.; Elvira, V. D.; Fisk, I.; Freeman, J.; Gottschalk, E.; Gray, L.; Green, D.; Grünendahl, S.; Gutsche, O.; Hanlon, J.; Hare, D.; Harris, R. M.; Hirschauer, J.; Hooberman, B.; Hu, Z.; Jindariani, S.; Johnson, M.; Joshi, U.; Jung, A. W.; Klima, B.; Kreis, B.; Kwan, S.; Lammel, S.; Linacre, J.; Lincoln, D.; Lipton, R.; Liu, T.; Lopes de Sá, R.; Lykken, J.; Maeshima, K.; Marraffino, J. M.; Martinez Outschoorn, V. I.; Maruyama, S.; Mason, D.; McBride, P.; Merkel, P.; Mishra, K.; Mrenna, S.; Nahn, S.; Newman-Holmes, C.; O'Dell, V.; Pedro, K.; Prokofyev, O.; Rakness, G.; Sexton-Kennedy, E.; Soha, A.; Spalding, W. J.; Spiegel, L.; Taylor, L.; Tkaczyk, S.; Tran, N. V.; Uplegger, L.; Vaandering, E. W.; Vernieri, C.; Verzocchi, M.; Vidal, R.; Weber, H. A.; Whitbeck, A.; Yang, F.; Yin, H.; Acosta, D.; Avery, P.; Bortignon, P.; Bourilkov, D.; Carnes, A.; Carver, M.; Curry, D.; Das, S.; di Giovanni, G. P.; Field, R. D.; Fisher, M.; Furic, I. K.; Hugon, J.; Konigsberg, J.; Korytov, A.; Low, J. F.; Ma, P.; Matchev, K.; Mei, H.; Milenovic, P.; Mitselmakher, G.; Muniz, L.; Rank, D.; Rossin, R.; Shchutska, L.; Snowball, M.; Sperka, D.; Wang, J.; Wang, S.; Yelton, J.; Hewamanage, S.; Linn, S.; Markowitz, P.; Martinez, G.; Rodriguez, J. L.; Ackert, A.; Adams, J. R.; Adams, T.; Askew, A.; Bochenek, J.; Diamond, B.; Haas, J.; Hagopian, S.; Hagopian, V.; Johnson, K. F.; Khatiwada, A.; Prosper, H.; Veeraraghavan, V.; Weinberg, M.; Bhopatkar, V.; Hohlmann, M.; Kalakhety, H.; Mareskas-Palcek, D.; Noonan, D.; Roy, T.; Yumiceva, F.; Adams, M. R.; Apanasevich, L.; Berry, D.; Betts, R. R.; Bucinskaite, I.; Cavanaugh, R.; Evdokimov, O.; Gauthier, L.; Gerber, C. E.; Hofman, D. J.; Kurt, P.; O'Brien, C.; Sandoval Gonzalez, I. D.; Silkworth, C.; Turner, P.; Varelas, N.; Wu, Z.; Zakaria, M.; Bilki, B.; Clarida, W.; Dilsiz, K.; Durgut, S.; Gandrajula, R. P.; Haytmyradov, M.; Khristenko, V.; Merlo, J.-P.; Mermerkaya, H.; Mestvirishvili, A.; Moeller, A.; Nachtman, J.; Ogul, H.; Onel, Y.; Ozok, F.; Penzo, A.; Snyder, C.; Tan, P.; Tiras, E.; Wetzel, J.; Yi, K.; Anderson, I.; Barnett, B. A.; Blumenfeld, B.; Fehling, D.; Feng, L.; Gritsan, A. V.; Maksimovic, P.; Martin, C.; Osherson, M.; Swartz, M.; Xiao, M.; Xin, Y.; You, C.; Baringer, P.; Bean, A.; Benelli, G.; Bruner, C.; Gray, J.; Kenny, R. P.; Majumder, D.; Malek, M.; Murray, M.; Sanders, S.; Stringer, R.; Wang, Q.; Wood, J. S.; Chakaberia, I.; Ivanov, A.; Kaadze, K.; Khalil, S.; Makouski, M.; Maravin, Y.; Mohammadi, A.; Saini, L. K.; Skhirtladze, N.; Svintradze, I.; Toda, S.; Lange, D.; Rebassoo, F.; Wright, D.; Anelli, C.; Baden, A.; Baron, O.; Belloni, A.; Calvert, B.; Eno, S. C.; Ferraioli, C.; Gomez, J. A.; Hadley, N. J.; Jabeen, S.; Kellogg, R. G.; Kolberg, T.; Kunkle, J.; Lu, Y.; Mignerey, A. C.; Shin, Y. H.; Skuja, A.; Tonjes, M. B.; Tonwar, S. C.; Apyan, A.; Barbieri, R.; Baty, A.; Bierwagen, K.; Brandt, S.; Busza, W.; Cali, I. A.; Demiragli, Z.; Di Matteo, L.; Gomez Ceballos, G.; Goncharov, M.; Gulhan, D.; Iiyama, Y.; Innocenti, G. M.; Klute, M.; Kovalskyi, D.; Lai, Y. S.; Lee, Y.-J.; Levin, A.; Luckey, P. D.; McGinn, C.; Mironov, C.; Niu, X.; Paus, C.; Ralph, D.; Roland, C.; Roland, G.; Salfeld-Nebgen, J.; Stephans, G. S. F.; Sumorok, K.; Varma, M.; Velicanu, D.; Veverka, J.; Wang, J.; Wang, T. W.; Wyslouch, B.; Yang, M.; Zhukova, V.; Dahmes, B.; Finkel, A.; Gude, A.; Hansen, P.; Kalafut, S.; Kao, S. C.; Klapoetke, K.; Kubota, Y.; Lesko, Z.; Mans, J.; Nourbakhsh, S.; Ruckstuhl, N.; Rusack, R.; Tambe, N.; Turkewitz, J.; Acosta, J. G.; Oliveros, S.; Avdeeva, E.; Bloom, K.; Bose, S.; Claes, D. R.; Dominguez, A.; Fangmeier, C.; Gonzalez Suarez, R.; Kamalieddin, R.; Keller, J.; Knowlton, D.; Kravchenko, I.; Lazo-Flores, J.; Meier, F.; Monroy, J.; Ratnikov, F.; Siado, J. E.; Snow, G. R.; Alyari, M.; Dolen, J.; George, J.; Godshalk, A.; Iashvili, I.; Kaisen, J.; Kharchilava, A.; Kumar, A.; Rappoccio, S.; Alverson, G.; Barberis, E.; Baumgartel, D.; Chasco, M.; Hortiangtham, A.; Knapp, B.; Massironi, A.; Morse, D. M.; Nash, D.; Orimoto, T.; Teixeira de Lima, R.; Trocino, D.; Wang, R.-J.; Wood, D.; Zhang, J.; Hahn, K. A.; Kubik, A.; Mucia, N.; Odell, N.; Pollack, B.; Pozdnyakov, A.; Schmitt, M.; Stoynev, S.; Sung, K.; Trovato, M.; Velasco, M.; Brinkerhoff, A.; Dev, N.; Hildreth, M.; Jessop, C.; Karmgard, D. J.; Kellams, N.; Lannon, K.; Lynch, S.; Marinelli, N.; Meng, F.; Mueller, C.; Musienko, Y.; Pearson, T.; Planer, M.; Reinsvold, A.; Ruchti, R.; Smith, G.; Taroni, S.; Valls, N.; Wayne, M.; Wolf, M.; Woodard, A.; Antonelli, L.; Brinson, J.; Bylsma, B.; Durkin, L. S.; Flowers, S.; Hart, A.; Hill, C.; Hughes, R.; Kotov, K.; Ling, T. Y.; Liu, B.; Luo, W.; Puigh, D.; Rodenburg, M.; Winer, B. L.; Wulsin, H. W.; Driga, O.; Elmer, P.; Hardenbrook, J.; Hebda, P.; Koay, S. A.; Lujan, P.; Marlow, D.; Medvedeva, T.; Mooney, M.; Olsen, J.; Palmer, C.; Piroué, P.; Quan, X.; Saka, H.; Stickland, D.; Tully, C.; Werner, J. S.; Zuranski, A.; Malik, S.; Barnes, V. E.; Benedetti, D.; Bortoletto, D.; Gutay, L.; Jha, M. K.; Jones, M.; Jung, K.; Kress, M.; Miller, D. H.; Neumeister, N.; Radburn-Smith, B. C.; Shi, X.; Shipsey, I.; Silvers, D.; Sun, J.; Svyatkovskiy, A.; Wang, F.; Xie, W.; Xu, L.; Zablocki, J.; Parashar, N.; Stupak, J.; Adair, A.; Akgun, B.; Chen, Z.; Ecklund, K. M.; Geurts, F. J. M.; Guilbaud, M.; Li, W.; Michlin, B.; Northup, M.; Padley, B. P.; Redjimi, R.; Roberts, J.; Rorie, J.; Tu, Z.; Zabel, J.; Betchart, B.; Bodek, A.; de Barbaro, P.; Demina, R.; Eshaq, Y.; Ferbel, T.; Galanti, M.; Garcia-Bellido, A.; Goldenzweig, P.; Han, J.; Harel, A.; Hindrichs, O.; Khukhunaishvili, A.; Petrillo, G.; Verzetti, M.; Demortier, L.; Arora, S.; Barker, A.; Chou, J. P.; Contreras-Campana, C.; Contreras-Campana, E.; Duggan, D.; Ferencek, D.; Gershtein, Y.; Gray, R.; Halkiadakis, E.; Hidas, D.; Hughes, E.; Kaplan, S.; Kunnawalkam Elayavalli, R.; Lath, A.; Nash, K.; Panwalkar, S.; Park, M.; Salur, S.; Schnetzer, S.; Sheffield, D.; Somalwar, S.; Stone, R.; Thomas, S.; Thomassen, P.; Walker, M.; Foerster, M.; Riley, G.; Rose, K.; Spanier, S.; York, A.; Bouhali, O.; Castaneda Hernandez, A.; Dalchenko, M.; de Mattia, M.; Delgado, A.; Dildick, S.; Eusebi, R.; Flanagan, W.; Gilmore, J.; Kamon, T.; Krutelyov, V.; Montalvo, R.; Mueller, R.; Osipenkov, I.; Pakhotin, Y.; Patel, R.; Perloff, A.; Roe, J.; Rose, A.; Safonov, A.; Tatarinov, A.; Ulmer, K. A.; Akchurin, N.; Cowden, C.; Damgov, J.; Dragoiu, C.; Dudero, P. R.; Faulkner, J.; Kunori, S.; Lamichhane, K.; Lee, S. W.; Libeiro, T.; Undleeb, S.; Volobouev, I.; Appelt, E.; Delannoy, A. G.; Greene, S.; Gurrola, A.; Janjam, R.; Johns, W.; Maguire, C.; Mao, Y.; Melo, A.; Ni, H.; Sheldon, P.; Snook, B.; Tuo, S.; Velkovska, J.; Xu, Q.; Arenton, M. W.; Boutle, S.; Cox, B.; Francis, B.; Goodell, J.; Hirosky, R.; Ledovskoy, A.; Li, H.; Lin, C.; Neu, C.; Wolfe, E.; Wood, J.; Xia, F.; Clarke, C.; Harr, R.; Karchin, P. E.; Kottachchi Kankanamge Don, C.; Lamichhane, P.; Sturdy, J.; Belknap, D. A.; Carlsmith, D.; Cepeda, M.; Christian, A.; Dasu, S.; Dodd, L.; Duric, S.; Friis, E.; Gomber, B.; Hall-Wilton, R.; Herndon, M.; Hervé, A.; Klabbers, P.; Lanaro, A.; Levine, A.; Long, K.; Loveless, R.; Mohapatra, A.; Ojalvo, I.; Perry, T.; Pierro, G. A.; Polese, G.; Ross, I.; Ruggles, T.; Sarangi, T.; Savin, A.; Sharma, A.; Smith, N.; Smith, W. H.; Taylor, D.; Woods, N.

    2016-01-01

    A search is presented for a singly produced excited bottom quark (b*) decaying to a top quark and a W boson in the all-hadronic, lepton+jets, and dilepton final states in proton-proton collisions at √{s}=8 TeV recorded by the CMS experiment at the CERN LHC. Data corresponding to an integrated luminosity of 19.7 fb-1 are used. No significant excess of events is observed with respect to standard model expectations. We set limits at 95% confidence on the product of the b* quark production cross section and its branching fraction to tW. The cross section limits are interpreted for scenarios including left-handed, right-handed, and vector-like couplings of the b* quark and are presented in the two-dimensional coupling plane based on the production and decay coupling constants. The masses of the left-handed, right-handed, and vector-like b* quark states are excluded at 95% confidence below 1390, 1430, and 1530 GeV, respectively, for benchmark couplings. This analysis gives the most stringent limits on the mass of the b∗ quark to date. [Figure not available: see fulltext.

  4. Search for neutral MSSM Higgs bosons decaying to μ+μ- in pp collisions at √{ s} = 7 and 8 TeV

    NASA Astrophysics Data System (ADS)

    Khachatryan, V.; Sirunyan, A. M.; Tumasyan, A.; Adam, W.; Asilar, E.; Bergauer, T.; Brandstetter, J.; Brondolin, E.; Dragicevic, M.; Erö, J.; Flechl, M.; Friedl, M.; Frühwirth, R.; Ghete, V. M.; Hartl, C.; Hörmann, N.; Hrubec, J.; Jeitler, M.; Knünz, V.; König, A.; Krammer, M.; Krätschmer, I.; Liko, D.; Matsushita, T.; Mikulec, I.; Rabady, D.; Rahbaran, B.; Rohringer, H.; Schieck, J.; Schöfbeck, R.; Strauss, J.; Treberer-Treberspurg, W.; Waltenberger, W.; Wulz, C.-E.; Mossolov, V.; Shumeiko, N.; Suarez Gonzalez, J.; Alderweireldt, S.; Cornelis, T.; De Wolf, E. A.; Janssen, X.; Knutsson, A.; Lauwers, J.; Luyckx, S.; Ochesanu, S.; Rougny, R.; Van De Klundert, M.; Van Haevermaet, H.; Van Mechelen, P.; Van Remortel, N.; Van Spilbeeck, A.; Abu Zeid, S.; Blekman, F.; D'Hondt, J.; Daci, N.; De Bruyn, I.; Deroover, K.; Heracleous, N.; Keaveney, J.; Lowette, S.; Moreels, L.; Olbrechts, A.; Python, Q.; Strom, D.; Tavernier, S.; Van Doninck, W.; Van Mulders, P.; Van Onsem, G. P.; Van Parijs, I.; Barria, P.; Caillol, C.; Clerbaux, B.; De Lentdecker, G.; Delannoy, H.; Dobur, D.; Fasanella, G.; Favart, L.; Gay, A. P. R.; Grebenyuk, A.; Lenzi, T.; Léonard, A.; Maerschalk, T.; Mohammadi, A.; Perniè, L.; Randle-conde, A.; Reis, T.; Seva, T.; Thomas, L.; Vander Velde, C.; Vanlaer, P.; Wang, J.; Zenoni, F.; Zhang, F.; Beernaert, K.; Benucci, L.; Cimmino, A.; Crucy, S.; Fagot, A.; Garcia, G.; Gul, M.; Mccartin, J.; Ocampo Rios, A. A.; Poyraz, D.; Ryckbosch, D.; Salva, S.; Sigamani, M.; Strobbe, N.; Tytgat, M.; Van Driessche, W.; Yazgan, E.; Zaganidis, N.; Basegmez, S.; Beluffi, C.; Bondu, O.; Bruno, G.; Castello, R.; Caudron, A.; Ceard, L.; Da Silveira, G. G.; Delaere, C.; Favart, D.; Forthomme, L.; Giammanco, A.; Hollar, J.; Jafari, A.; Jez, P.; Komm, M.; Lemaitre, V.; Mertens, A.; Nuttens, C.; Perrini, L.; Pin, A.; Piotrzkowski, K.; Popov, A.; Quertenmont, L.; Selvaggi, M.; Vidal Marono, M.; Beliy, N.; Caebergs, T.; Hammad, G. H.; Aldá Júnior, W. L.; Alves, G. A.; Brito, L.; Correa Martins Junior, M.; Dos Reis Martins, T.; Hensel, C.; Mora Herrera, C.; Moraes, A.; Pol, M. E.; Rebello Teles, P.; Belchior Batista Das Chagas, E.; Carvalho, W.; Chinellato, J.; Custódio, A.; Da Costa, E. M.; De Jesus Damiao, D.; De Oliveira Martins, C.; Fonseca De Souza, S.; Huertas Guativa, L. M.; Malbouisson, H.; Matos Figueiredo, D.; Mundim, L.; Nogima, H.; Prado Da Silva, W. L.; Santoro, A.; Sznajder, A.; Tonelli Manganote, E. J.; Vilela Pereira, A.; Ahuja, S.; Bernardes, C. A.; De Souza Santos, A.; Dogra, S.; Fernandez Perez Tomei, T. R.; Gregores, E. M.; Mercadante, P. G.; Moon, C. S.; Novaes, S. F.; Padula, Sandra S.; Romero Abad, D.; Ruiz Vargas, J. C.; Aleksandrov, A.; Genchev, V.; Hadjiiska, R.; Iaydjiev, P.; Marinov, A.; Piperov, S.; Rodozov, M.; Stoykova, S.; Sultanov, G.; Vutova, M.; Dimitrov, A.; Glushkov, I.; Litov, L.; Pavlov, B.; Petkov, P.; Ahmad, M.; Bian, J. G.; Chen, G. M.; Chen, H. S.; Chen, M.; Cheng, T.; Du, R.; Jiang, C. H.; Plestina, R.; Romeo, F.; Shaheen, S. M.; Tao, J.; Wang, C.; Wang, Z.; Zhang, H.; Asawatangtrakuldee, C.; Ban, Y.; Li, Q.; Liu, S.; Mao, Y.; Qian, S. J.; Wang, D.; Xu, Z.; Zou, W.; Avila, C.; Cabrera, A.; Chaparro Sierra, L. F.; Florez, C.; Gomez, J. P.; Gomez Moreno, B.; Sanabria, J. C.; Godinovic, N.; Lelas, D.; Polic, D.; Puljak, I.; Antunovic, Z.; Kovac, M.; Brigljevic, V.; Kadija, K.; Luetic, J.; Sudic, L.; Attikis, A.; Mavromanolakis, G.; Mousa, J.; Nicolaou, C.; Ptochos, F.; Razis, P. A.; Rykaczewski, H.; Bodlak, M.; Finger, M.; Finger, M.; Aly, R.; Aly, S.; El-khateeb, E.; Elkafrawy, T.; Lotfy, A.; Mohamed, A.; Radi, A.; Salama, E.; Sayed, A.; Calpas, B.; Kadastik, M.; Murumaa, M.; Raidal, M.; Tiko, A.; Veelken, C.; Eerola, P.; Voutilainen, M.; Härkönen, J.; Karimäki, V.; Kinnunen, R.; Lampén, T.; Lassila-Perini, K.; Lehti, S.; Lindén, T.; Luukka, P.; Mäenpää, T.; Pekkanen, J.; Peltola, T.; Tuominen, E.; Tuominiemi, J.; Tuovinen, E.; Wendland, L.; Talvitie, J.; Tuuva, T.; Besancon, M.; Couderc, F.; Dejardin, M.; Denegri, D.; Fabbro, B.; Faure, J. L.; Favaro, C.; Ferri, F.; Ganjour, S.; Givernaud, A.; Gras, P.; Hamel de Monchenault, G.; Jarry, P.; Locci, E.; Machet, M.; Malcles, J.; Rander, J.; Rosowsky, A.; Titov, M.; Zghiche, A.; Baffioni, S.; Beaudette, F.; Busson, P.; Cadamuro, L.; Chapon, E.; Charlot, C.; Dahms, T.; Davignon, O.; Filipovic, N.; Florent, A.; Granier de Cassagnac, R.; Lisniak, S.; Mastrolorenzo, L.; Miné, P.; Naranjo, I. N.; Nguyen, M.; Ochando, C.; Ortona, G.; Paganini, P.; Regnard, S.; Salerno, R.; Sauvan, J. B.; Sirois, Y.; Strebler, T.; Yilmaz, Y.; Zabi, A.; Agram, J.-L.; Andrea, J.; Aubin, A.; Bloch, D.; Brom, J.-M.; Buttignol, M.; Chabert, E. C.; Chanon, N.; Collard, C.; Conte, E.; Fontaine, J.-C.; Gelé, D.; Goerlach, U.; Goetzmann, C.; Le Bihan, A.-C.; Merlin, J. A.; Skovpen, K.; Van Hove, P.; Gadrat, S.; Beauceron, S.; Bernet, C.; Boudoul, G.; Bouvier, E.; Brochet, S.; Carrillo Montoya, C. A.; Chasserat, J.; Chierici, R.; Contardo, D.; Courbon, B.; Depasse, P.; El Mamouni, H.; Fan, J.; Fay, J.; Gascon, S.; Gouzevitch, M.; Ille, B.; Laktineh, I. B.; Lethuillier, M.; Mirabito, L.; Pequegnot, A. L.; Perries, S.; Ruiz Alvarez, J. D.; Sabes, D.; Sgandurra, L.; Sordini, V.; Vander Donckt, M.; Verdier, P.; Viret, S.; Xiao, H.; Toriashvili, T.; Tsamalaidze, Z.; Autermann, C.; Beranek, S.; Edelhoff, M.; Feld, L.; Heister, A.; Kiesel, M. K.; Klein, K.; Lipinski, M.; Ostapchuk, A.; Preuten, M.; Raupach, F.; Sammet, J.; Schael, S.; Schulte, J. F.; Verlage, T.; Weber, H.; Wittmer, B.; Zhukov, V.; Ata, M.; Brodski, M.; Dietz-Laursonn, E.; Duchardt, D.; Endres, M.; Erdmann, M.; Erdweg, S.; Esch, T.; Fischer, R.; Güth, A.; Hebbeker, T.; Heidemann, C.; Hoepfner, K.; Klingebiel, D.; Knutzen, S.; Kreuzer, P.; Merschmeyer, M.; Meyer, A.; Millet, P.; Olschewski, M.; Padeken, K.; Papacz, P.; Pook, T.; Radziej, M.; Reithler, H.; Rieger, M.; Scheuch, F.; Sonnenschein, L.; Teyssier, D.; Thüer, S.; Cherepanov, V.; Erdogan, Y.; Flügge, G.; Geenen, H.; Geisler, M.; Haj Ahmad, W.; Hoehle, F.; Kargoll, B.; Kress, T.; Kuessel, Y.; Künsken, A.; Lingemann, J.; Nehrkorn, A.; Nowack, A.; Nugent, I. M.; Pistone, C.; Pooth, O.; Stahl, A.; Aldaya Martin, M.; Asin, I.; Bartosik, N.; Behnke, O.; Behrens, U.; Bell, A. J.; Borras, K.; Burgmeier, A.; Cakir, A.; Calligaris, L.; Campbell, A.; Choudhury, S.; Costanza, F.; Diez Pardos, C.; Dolinska, G.; Dooling, S.; Dorland, T.; Eckerlin, G.; Eckstein, D.; Eichhorn, T.; Flucke, G.; Gallo, E.; Garay Garcia, J.; Geiser, A.; Gizhko, A.; Gunnellini, P.; Hauk, J.; Hempel, M.; Jung, H.; Kalogeropoulos, A.; Karacheban, O.; Kasemann, M.; Katsas, P.; Kieseler, J.; Kleinwort, C.; Korol, I.; Lange, W.; Leonard, J.; Lipka, K.; Lobanov, A.; Lohmann, W.; Mankel, R.; Marfin, I.; Melzer-Pellmann, I.-A.; Meyer, A. B.; Mittag, G.; Mnich, J.; Mussgiller, A.; Naumann-Emme, S.; Nayak, A.; Ntomari, E.; Perrey, H.; Pitzl, D.; Placakyte, R.; Raspereza, A.; Ribeiro Cipriano, P. M.; Roland, B.; Sahin, M. Ö.; Salfeld-Nebgen, J.; Saxena, P.; Schoerner-Sadenius, T.; Schröder, M.; Seitz, C.; Spannagel, S.; Trippkewitz, K. D.; Wissing, C.; Blobel, V.; Centis Vignali, M.; Draeger, A. R.; Erfle, J.; Garutti, E.; Goebel, K.; Gonzalez, D.; Görner, M.; Haller, J.; Hoffmann, M.; Höing, R. S.; Junkes, A.; Klanner, R.; Kogler, R.; Lapsien, T.; Lenz, T.; Marchesini, I.; Marconi, D.; Nowatschin, D.; Ott, J.; Pantaleo, F.; Peiffer, T.; Perieanu, A.; Pietsch, N.; Poehlsen, J.; Rathjens, D.; Sander, C.; Schettler, H.; Schleper, P.; Schlieckau, E.; Schmidt, A.; Schwandt, J.; Seidel, M.; Sola, V.; Stadie, H.; Steinbrück, G.; Tholen, H.; Troendle, D.; Usai, E.; Vanelderen, L.; Vanhoefer, A.; Akbiyik, M.; Barth, C.; Baus, C.; Berger, J.; Böser, C.; Butz, E.; Chwalek, T.; Colombo, F.; De Boer, W.; Descroix, A.; Dierlamm, A.; Feindt, M.; Frensch, F.; Giffels, M.; Gilbert, A.; Hartmann, F.; Husemann, U.; Kassel, F.; Katkov, I.; Kornmayer, A.; Lobelle Pardo, P.; Mozer, M. U.; Müller, T.; Müller, Th.; Plagge, M.; Quast, G.; Rabbertz, K.; Röcker, S.; Roscher, F.; Simonis, H. J.; Stober, F. M.; Ulrich, R.; Wagner-Kuhr, J.; Wayand, S.; Weiler, T.; Wöhrmann, C.; Wolf, R.; Anagnostou, G.; Daskalakis, G.; Geralis, T.; Giakoumopoulou, V. A.; Kyriakis, A.; Loukas, D.; Markou, A.; Psallidas, A.; Topsis-Giotis, I.; Agapitos, A.; Kesisoglou, S.; Panagiotou, A.; Saoulidou, N.; Tziaferi, E.; Evangelou, I.; Flouris, G.; Foudas, C.; Kokkas, P.; Loukas, N.; Manthos, N.; Papadopoulos, I.; Paradas, E.; Strologas, J.; Bencze, G.; Hajdu, C.; Hazi, A.; Hidas, P.; Horvath, D.; Sikler, F.; Veszpremi, V.; Vesztergombi, G.; Zsigmond, A. J.; Beni, N.; Czellar, S.; Karancsi, J.; Molnar, J.; Szillasi, Z.; Bartók, M.; Makovec, A.; Raics, P.; Trocsanyi, Z. L.; Ujvari, B.; Mal, P.; Mandal, K.; Sahoo, N.; Swain, S. K.; Bansal, S.; Beri, S. B.; Bhatnagar, V.; Chawla, R.; Gupta, R.; Bhawandeep, U.; Kalsi, A. K.; Kaur, A.; Kaur, M.; Kumar, R.; Mehta, A.; Mittal, M.; Nishu, N.; Singh, J. B.; Walia, G.; Kumar, Ashok; Kumar, Arun; Bhardwaj, A.; Choudhary, B. C.; Garg, R. B.; Kumar, A.; Malhotra, S.; Naimuddin, M.; Ranjan, K.; Sharma, R.; Sharma, V.; Banerjee, S.; Bhattacharya, S.; Chatterjee, K.; Dey, S.; Dutta, S.; Jain, Sa.; Jain, Sh.; Khurana, R.; Majumdar, N.; Modak, A.; Mondal, K.; Mukherjee, S.; Mukhopadhyay, S.; Roy, A.; Roy, D.; Roy Chowdhury, S.; Sarkar, S.; Sharan, M.; Abdulsalam, A.; Chudasama, R.; Dutta, D.; Jha, V.; Kumar, V.; Mohanty, A. K.; Pant, L. M.; Shukla, P.; Topkar, A.; Aziz, T.; Banerjee, S.; Bhowmik, S.; Chatterjee, R. M.; Dewanjee, R. K.; Dugad, S.; Ganguly, S.; Ghosh, S.; Guchait, M.; Gurtu, A.; Kole, G.; Kumar, S.; Mahakud, B.; Maity, M.; Majumder, G.; Mazumdar, K.; Mitra, S.; Mohanty, G. B.; Parida, B.; Sarkar, T.; Sudhakar, K.; Sur, N.; Sutar, B.; Wickramage, N.; Sharma, S.; Bakhshiansohi, H.; Behnamian, H.; Etesami, S. M.; Fahim, A.; Goldouzian, R.; Khakzad, M.; Mohammadi Najafabadi, M.; Naseri, M.; Paktinat Mehdiabadi, S.; Rezaei Hosseinabadi, F.; Safarzadeh, B.; Zeinali, M.; Felcini, M.; Grunewald, M.; Abbrescia, M.; Calabria, C.; Caputo, C.; Chhibra, S. S.; Colaleo, A.; Creanza, D.; Cristella, L.; De Filippis, N.; De Palma, M.; Fiore, L.; Iaselli, G.; Maggi, G.; Maggi, M.; Miniello, G.; My, S.; Nuzzo, S.; Pompili, A.; Pugliese, G.; Radogna, R.; Ranieri, A.; Selvaggi, G.; Sharma, A.; Silvestris, L.; Venditti, R.; Verwilligen, P.; Abbiendi, G.; Battilana, C.; Benvenuti, A. C.; Bonacorsi, D.; Braibant-Giacomelli, S.; Brigliadori, L.; Campanini, R.; Capiluppi, P.; Castro, A.; Cavallo, F. R.; Codispoti, G.; Cuffiani, M.; Dallavalle, G. M.; Fabbri, F.; Fanfani, A.; Fasanella, D.; Giacomelli, P.; Grandi, C.; Guiducci, L.; Marcellini, S.; Masetti, G.; Montanari, A.; Navarria, F. L.; Perrotta, A.; Rossi, A. M.; Rovelli, T.; Siroli, G. P.; Tosi, N.; Travaglini, R.; Cappello, G.; Chiorboli, M.; Costa, S.; Giordano, F.; Potenza, R.; Tricomi, A.; Tuve, C.; Barbagli, G.; Ciulli, V.; Civinini, C.; D'Alessandro, R.; Focardi, E.; Gonzi, S.; Gori, V.; Lenzi, P.; Meschini, M.; Paoletti, S.; Sguazzoni, G.; Tropiano, A.; Viliani, L.; Benussi, L.; Bianco, S.; Fabbri, F.; Piccolo, D.; Calvelli, V.; Ferro, F.; Lo Vetere, M.; Robutti, E.; Tosi, S.; Dinardo, M. E.; Fiorendi, S.; Gennai, S.; Gerosa, R.; Ghezzi, A.; Govoni, P.; Malvezzi, S.; Manzoni, R. A.; Marzocchi, B.; Menasce, D.; Moroni, L.; Paganoni, M.; Pedrini, D.; Ragazzi, S.; Redaelli, N.; Tabarelli de Fatis, T.; Buontempo, S.; Cavallo, N.; Di Guida, S.; Esposito, M.; Fabozzi, F.; Iorio, A. O. M.; Lanza, G.; Lista, L.; Meola, S.; Merola, M.; Paolucci, P.; Sciacca, C.; Thyssen, F.; Azzi, P.; Bacchetta, N.; Bisello, D.; Branca, A.; Carlin, R.; Carvalho Antunes De Oliveira, A.; Checchia, P.; Dall'Osso, M.; Dorigo, T.; Gasparini, F.; Gasparini, U.; Gozzelino, A.; Kanishchev, K.; Lacaprara, S.; Margoni, M.; Meneguzzo, A. T.; Montecassiano, F.; Passaseo, M.; Pazzini, J.; Pozzobon, N.; Ronchese, P.; Simonetto, F.; Torassa, E.; Tosi, M.; Zanetti, M.; Zotto, P.; Zucchetta, A.; Braghieri, A.; Gabusi, M.; Magnani, A.; Ratti, S. P.; Re, V.; Riccardi, C.; Salvini, P.; Vai, I.; Vitulo, P.; Alunni Solestizi, L.; Biasini, M.; Bilei, G. M.; Ciangottini, D.; Fanò, L.; Lariccia, P.; Mantovani, G.; Menichelli, M.; Saha, A.; Santocchia, A.; Spiezia, A.; Androsov, K.; Azzurri, P.; Bagliesi, G.; Bernardini, J.; Boccali, T.; Broccolo, G.; Castaldi, R.; Ciocci, M. A.; Dell'Orso, R.; Donato, S.; Fedi, G.; Foà, L.; Giassi, A.; Grippo, M. T.; Ligabue, F.; Lomtadze, T.; Martini, L.; Messineo, A.; Palla, F.; Rizzi, A.; Savoy-Navarro, A.; Serban, A. T.; Spagnolo, P.; Squillacioti, P.; Tenchini, R.; Tonelli, G.; Venturi, A.; Verdini, P. G.; Barone, L.; Cavallari, F.; D'imperio, G.; Del Re, D.; Diemoz, M.; Gelli, S.; Jorda, C.; Longo, E.; Margaroli, F.; Meridiani, P.; Micheli, F.; Organtini, G.; Paramatti, R.; Preiato, F.; Rahatlou, S.; Rovelli, C.; Santanastasio, F.; Soffi, L.; Traczyk, P.; Amapane, N.; Arcidiacono, R.; Argiro, S.; Arneodo, M.; Bellan, R.; Biino, C.; Cartiglia, N.; Costa, M.; Covarelli, R.; Degano, A.; Demaria, N.; Finco, L.; Mariotti, C.; Maselli, S.; Migliore, E.; Monaco, V.; Monteil, E.; Musich, M.; Obertino, M. M.; Pacher, L.; Pastrone, N.; Pelliccioni, M.; Pinna Angioni, G. L.; Ravera, F.; Romero, A.; Ruspa, M.; Sacchi, R.; Solano, A.; Staiano, A.; Tamponi, U.; Trapani, P. P.; Belforte, S.; Candelise, V.; Casarsa, M.; Cossutti, F.; Della Ricca, G.; Gobbo, B.; La Licata, C.; Marone, M.; Schizzi, A.; Umer, T.; Zanetti, A.; Chang, S.; Kropivnitskaya, A.; Nam, S. K.; Kim, D. H.; Kim, G. N.; Kim, M. S.; Kong, D. J.; Lee, S.; Oh, Y. D.; Sakharov, A.; Son, D. C.; Kim, H.; Kim, T. J.; Ryu, M. S.; Song, S.; Choi, S.; Go, Y.; Gyun, D.; Hong, B.; Jo, M.; Kim, H.; Kim, Y.; Lee, B.; Lee, K.; Lee, K. S.; Lee, S.; Park, S. K.; Roh, Y.; Yoo, H. D.; Choi, M.; Kim, J. H.; Lee, J. S. H.; Park, I. C.; Ryu, G.; Choi, Y.; Choi, Y. K.; Goh, J.; Kim, D.; Kwon, E.; Lee, J.; Yu, I.; Juodagalvis, A.; Vaitkus, J.; Ibrahim, Z. A.; Komaragiri, J. R.; Md Ali, M. A. B.; Mohamad Idris, F.; Wan Abdullah, W. A. T.; Casimiro Linares, E.; Castilla-Valdez, H.; De La Cruz-Burelo, E.; Heredia-de La Cruz, I.; Hernandez-Almada, A.; Lopez-Fernandez, R.; Ramirez Sanchez, G.; Sanchez-Hernandez, A.; Carrillo Moreno, S.; Vazquez Valencia, F.; Carpinteyro, S.; Pedraza, I.; Salazar Ibarguen, H. A.; Morelos Pineda, A.; Krofcheck, D.; Butler, P. H.; Reucroft, S.; Ahmad, A.; Ahmad, M.; Hassan, Q.; Hoorani, H. R.; Khan, W. A.; Khurshid, T.; Shoaib, M.; Bialkowska, H.; Bluj, M.; Boimska, B.; Frueboes, T.; Górski, M.; Kazana, M.; Nawrocki, K.; Romanowska-Rybinska, K.; Szleper, M.; Zalewski, P.; Brona, G.; Bunkowski, K.; Doroba, K.; Kalinowski, A.; Konecki, M.; Krolikowski, J.; Misiura, M.; Olszewski, M.; Walczak, M.; Bargassa, P.; Beirão Da Cruz E Silva, C.; Di Francesco, A.; Faccioli, P.; Ferreira Parracho, P. G.; Gallinaro, M.; Lloret Iglesias, L.; Nguyen, F.; Rodrigues Antunes, J.; Seixas, J.; Toldaiev, O.; Vadruccio, D.; Varela, J.; Vischia, P.; Afanasiev, S.; Bunin, P.; Gavrilenko, M.; Golutvin, I.; Gorbunov, I.; Kamenev, A.; Karjavin, V.; Konoplyanikov, V.; Lanev, A.; Malakhov, A.; Matveev, V.; Moisenz, P.; Palichik, V.; Perelygin, V.; Shmatov, S.; Shulha, S.; Skatchkov, N.; Smirnov, V.; Zarubin, A.; Golovtsov, V.; Ivanov, Y.; Kim, V.; Kuznetsova, E.; Levchenko, P.; Murzin, V.; Oreshkin, V.; Smirnov, I.; Sulimov, V.; Uvarov, L.; Vavilov, S.; Vorobyev, A.; Andreev, Yu.; Dermenev, A.; Gninenko, S.; Golubev, N.; Karneyeu, A.; Kirsanov, M.; Krasnikov, N.; Pashenkov, A.; Tlisov, D.; Toropin, A.; Epshteyn, V.; Gavrilov, V.; Lychkovskaya, N.; Popov, V.; Pozdnyakov, I.; Safronov, G.; Spiridonov, A.; Vlasov, E.; Zhokin, A.; Bylinkin, A.; Andreev, V.; Azarkin, M.; Dremin, I.; Kirakosyan, M.; Leonidov, A.; Mesyats, G.; Rusakov, S. V.; Vinogradov, A.; Baskakov, A.; Belyaev, A.; Boos, E.; Bunichev, V.; Dubinin, M.; Dudko, L.; Gribushin, A.; Klyukhin, V.; Kodolova, O.; Lokhtin, I.; Myagkov, I.; Obraztsov, S.; Petrushanko, S.; Savrin, V.; Snigirev, A.; Azhgirey, I.; Bayshev, I.; Bitioukov, S.; Kachanov, V.; Kalinin, A.; Konstantinov, D.; Krychkine, V.; Petrov, V.; Ryutin, R.; Sobol, A.; Tourtchanovitch, L.; Troshin, S.; Tyurin, N.; Uzunian, A.; Volkov, A.; Adzic, P.; Ekmedzic, M.; Milosevic, J.; Rekovic, V.; Alcaraz Maestre, J.; Calvo, E.; Cerrada, M.; Chamizo Llatas, M.; Colino, N.; De La Cruz, B.; Delgado Peris, A.; Domínguez Vázquez, D.; Escalante Del Valle, A.; Fernandez Bedoya, C.; Fernández Ramos, J. P.; Flix, J.; Fouz, M. C.; Garcia-Abia, P.; Gonzalez Lopez, O.; Goy Lopez, S.; Hernandez, J. M.; Josa, M. I.; Navarro De Martino, E.; Pérez-Calero Yzquierdo, A.; Puerta Pelayo, J.; Quintario Olmeda, A.; Redondo, I.; Romero, L.; Soares, M. S.; Albajar, C.; de Trocóniz, J. F.; Missiroli, M.; Moran, D.; Brun, H.; Cuevas, J.; Fernandez Menendez, J.; Folgueras, S.; Gonzalez Caballero, I.; Palencia Cortezon, E.; Vizan Garcia, J. M.; Brochero Cifuentes, J. A.; Cabrillo, I. J.; Calderon, A.; Castiñeiras De Saa, J. R.; Duarte Campderros, J.; Fernandez, M.; Gomez, G.; Graziano, A.; Lopez Virto, A.; Marco, J.; Marco, R.; Martinez Rivero, C.; Matorras, F.; Munoz Sanchez, F. J.; Piedra Gomez, J.; Rodrigo, T.; Rodríguez-Marrero, A. Y.; Ruiz-Jimeno, A.; Scodellaro, L.; Vila, I.; Vilar Cortabitarte, R.; Abbaneo, D.; Auffray, E.; Auzinger, G.; Bachtis, M.; Baillon, P.; Ball, A. H.; Barney, D.; Benaglia, A.; Bendavid, J.; Benhabib, L.; Benitez, J. F.; Berruti, G. M.; Bianchi, G.; Bloch, P.; Bocci, A.; Bonato, A.; Botta, C.; Breuker, H.; Camporesi, T.; Cerminara, G.; Colafranceschi, S.; D'Alfonso, M.; d'Enterria, D.; Dabrowski, A.; Daponte, V.; David, A.; De Gruttola, M.; De Guio, F.; De Roeck, A.; De Visscher, S.; Di Marco, E.; Dobson, M.; Dordevic, M.; du Pree, T.; Dupont, N.; Elliott-Peisert, A.; Eugster, J.; Franzoni, G.; Funk, W.; Gigi, D.; Gill, K.; Giordano, D.; Girone, M.; Glege, F.; Guida, R.; Gundacker, S.; Guthoff, M.; Hammer, J.; Hansen, M.; Harris, P.; Hegeman, J.; Innocente, V.; Janot, P.; Kirschenmann, H.; Kortelainen, M. J.; Kousouris, K.; Krajczar, K.; Lecoq, P.; Lourenço, C.; Lucchini, M. T.; Magini, N.; Malgeri, L.; Mannelli, M.; Marrouche, J.; Martelli, A.; Masetti, L.; Meijers, F.; Mersi, S.; Meschi, E.; Moortgat, F.; Morovic, S.; Mulders, M.; Nemallapudi, M. V.; Neugebauer, H.; Orfanelli, S.; Orsini, L.; Pape, L.; Perez, E.; Petrilli, A.; Petrucciani, G.; Pfeiffer, A.; Piparo, D.; Racz, A.; Rolandi, G.; Rovere, M.; Ruan, M.; Sakulin, H.; Schäfer, C.; Schwick, C.; Sharma, A.; Silva, P.; Simon, M.; Sphicas, P.; Spiga, D.; Steggemann, J.; Stieger, B.; Stoye, M.; Takahashi, Y.; Treille, D.; Tsirou, A.; Veres, G. I.; Wardle, N.; Wöhri, H. K.; Zagozdzinska, A.; Zeuner, W. D.; Bertl, W.; Deiters, K.; Erdmann, W.; Horisberger, R.; Ingram, Q.; Kaestli, H. C.; Kotlinski, D.; Langenegger, U.; Rohe, T.; Bachmair, F.; Bäni, L.; Bianchini, L.; Buchmann, M. A.; Casal, B.; Dissertori, G.; Dittmar, M.; Donegà, M.; Dünser, M.; Eller, P.; Grab, C.; Heidegger, C.; Hits, D.; Hoss, J.; Kasieczka, G.; Lustermann, W.; Mangano, B.; Marini, A. C.; Marionneau, M.; Martinez Ruiz del Arbol, P.; Masciovecchio, M.; Meister, D.; Mohr, N.; Musella, P.; Nessi-Tedaldi, F.; Pandolfi, F.; Pata, J.; Pauss, F.; Perrozzi, L.; Peruzzi, M.; Quittnat, M.; Rossini, M.; Starodumov, A.; Takahashi, M.; Tavolaro, V. R.; Theofilatos, K.; Wallny, R.; Weber, H. A.; Aarrestad, T. K.; Amsler, C.; Canelli, M. F.; Chiochia, V.; De Cosa, A.; Galloni, C.; Hinzmann, A.; Hreus, T.; Kilminster, B.; Lange, C.; Ngadiuba, J.; Pinna, D.; Robmann, P.; Ronga, F. J.; Salerno, D.; Taroni, S.; Yang, Y.; Cardaci, M.; Chen, K. H.; Doan, T. H.; Ferro, C.; Konyushikhin, M.; Kuo, C. M.; Lin, W.; Lu, Y. J.; Volpe, R.; Yu, S. S.; Bartek, R.; Chang, P.; Chang, Y. H.; Chang, Y. W.; Chao, Y.; Chen, K. F.; Chen, P. H.; Dietz, C.; Fiori, F.; Grundler, U.; Hou, W.-S.; Hsiung, Y.; Liu, Y. F.; Lu, R.-S.; Miñano Moya, M.; Petrakou, E.; Tsai, J. F.; Tzeng, Y. M.; Asavapibhop, B.; Kovitanggoon, K.; Singh, G.; Srimanobhas, N.; Suwonjandee, N.; Adiguzel, A.; Bakirci, M. N.; Dozen, C.; Dumanoglu, I.; Eskut, E.; Girgis, S.; Gokbulut, G.; Guler, Y.; Gurpinar, E.; Hos, I.; Kangal, E. E.; Onengut, G.; Ozdemir, K.; Polatoz, A.; Sunar Cerci, D.; Vergili, M.; Zorbilmez, C.; Akin, I. V.; Bilin, B.; Bilmis, S.; Isildak, B.; Karapinar, G.; Surat, U. E.; Yalvac, M.; Zeyrek, M.; Albayrak, E. A.; Gülmez, E.; Kaya, M.; Kaya, O.; Yetkin, T.; Cankocak, K.; Günaydin, Y. O.; Vardarlı, F. I.; Grynyov, B.; Levchuk, L.; Sorokin, P.; Aggleton, R.; Ball, F.; Beck, L.; Brooke, J. J.; Clement, E.; Cussans, D.; Flacher, H.; Goldstein, J.; Grimes, M.; Heath, G. P.; Heath, H. F.; Jacob, J.; Kreczko, L.; Lucas, C.; Meng, Z.; Newbold, D. M.; Paramesvaran, S.; Poll, A.; Sakuma, T.; Seif El Nasr-storey, S.; Senkin, S.; Smith, D.; Smith, V. J.; Bell, K. W.; Belyaev, A.; Brew, C.; Brown, R. M.; Cockerill, D. J. A.; Coughlan, J. A.; Harder, K.; Harper, S.; Olaiya, E.; Petyt, D.; Shepherd-Themistocleous, C. H.; Thea, A.; Tomalin, I. R.; Williams, T.; Womersley, W. J.; Worm, S. D.; Baber, M.; Bainbridge, R.; Buchmuller, O.; Bundock, A.; Burton, D.; Casasso, S.; Citron, M.; Colling, D.; Corpe, L.; Cripps, N.; Dauncey, P.; Davies, G.; De Wit, A.; Della Negra, M.; Dunne, P.; Elwood, A.; Ferguson, W.; Fulcher, J.; Futyan, D.; Hall, G.; Iles, G.; Karapostoli, G.; Kenzie, M.; Lane, R.; Lucas, R.; Lyons, L.; Magnan, A.-M.; Malik, S.; Nash, J.; Nikitenko, A.; Pela, J.; Pesaresi, M.; Petridis, K.; Raymond, D. M.; Richards, A.; Rose, A.; Seez, C.; Sharp, P.; Tapper, A.; Uchida, K.; Vazquez Acosta, M.; Virdee, T.; Zenz, S. C.; Cole, J. E.; Hobson, P. R.; Khan, A.; Kyberd, P.; Leggat, D.; Leslie, D.; Reid, I. D.; Symonds, P.; Teodorescu, L.; Turner, M.; Borzou, A.; Dittmann, J.; Hatakeyama, K.; Kasmi, A.; Liu, H.; Pastika, N.; Scarborough, T.; Charaf, O.; Cooper, S. I.; Henderson, C.; Rumerio, P.; Avetisyan, A.; Bose, T.; Fantasia, C.; Gastler, D.; Lawson, P.; Rankin, D.; Richardson, C.; Rohlf, J.; St. John, J.; Sulak, L.; Zou, D.; Alimena, J.; Berry, E.; Bhattacharya, S.; Cutts, D.; Demiragli, Z.; Dhingra, N.; Ferapontov, A.; Garabedian, A.; Heintz, U.; Laird, E.; Landsberg, G.; Mao, Z.; Narain, M.; Sagir, S.; Sinthuprasith, T.; Breedon, R.; Breto, G.; Calderon De La Barca Sanchez, M.; Chauhan, S.; Chertok, M.; Conway, J.; Conway, R.; Cox, P. T.; Erbacher, R.; Gardner, M.; Ko, W.; Lander, R.; Mulhearn, M.; Pellett, D.; Pilot, J.; Ricci-Tam, F.; Shalhout, S.; Smith, J.; Squires, M.; Stolp, D.; Tripathi, M.; Wilbur, S.; Yohay, R.; Cousins, R.; Everaerts, P.; Farrell, C.; Hauser, J.; Ignatenko, M.; Rakness, G.; Saltzberg, D.; Takasugi, E.; Valuev, V.; Weber, M.; Burt, K.; Clare, R.; Ellison, J.; Gary, J. W.; Hanson, G.; Heilman, J.; Ivova Paneva, M.; Jandir, P.; Kennedy, E.; Lacroix, F.; Long, O. R.; Luthra, A.; Malberti, M.; Olmedo Negrete, M.; Shrinivas, A.; Sumowidagdo, S.; Wei, H.; Wimpenny, S.; Branson, J. G.; Cerati, G. B.; Cittolin, S.; D'Agnolo, R. T.; Holzner, A.; Kelley, R.; Klein, D.; Letts, J.; Macneill, I.; Olivito, D.; Padhi, S.; Pieri, M.; Sani, M.; Sharma, V.; Simon, S.; Tadel, M.; Tu, Y.; Vartak, A.; Wasserbaech, S.; Welke, C.; Würthwein, F.; Yagil, A.; Zevi Della Porta, G.; Barge, D.; Bradmiller-Feld, J.; Campagnari, C.; Dishaw, A.; Dutta, V.; Flowers, K.; Franco Sevilla, M.; Geffert, P.; George, C.; Golf, F.; Gouskos, L.; Gran, J.; Incandela, J.; Justus, C.; Mccoll, N.; Mullin, S. D.; Richman, J.; Stuart, D.; To, W.; West, C.; Yoo, J.; Anderson, D.; Apresyan, A.; Bornheim, A.; Bunn, J.; Chen, Y.; Duarte, J.; Mott, A.; Newman, H. B.; Pena, C.; Pierini, M.; Spiropulu, M.; Vlimant, J. R.; Xie, S.; Zhu, R. Y.; Azzolini, V.; Calamba, A.; Carlson, B.; Ferguson, T.; Iiyama, Y.; Paulini, M.; Russ, J.; Sun, M.; Vogel, H.; Vorobiev, I.; Cumalat, J. P.; Ford, W. T.; Gaz, A.; Jensen, F.; Johnson, A.; Krohn, M.; Mulholland, T.; Nauenberg, U.; Smith, J. G.; Stenson, K.; Wagner, S. R.; Alexander, J.; Chatterjee, A.; Chaves, J.; Chu, J.; Dittmer, S.; Eggert, N.; Mirman, N.; Nicolas Kaufman, G.; Patterson, J. R.; Rinkevicius, A.; Ryd, A.; Skinnari, L.; Sun, W.; Tan, S. M.; Teo, W. D.; Thom, J.; Thompson, J.; Tucker, J.; Weng, Y.; Wittich, P.; Abdullin, S.; Albrow, M.; Anderson, J.; Apollinari, G.; Bauerdick, L. A. T.; Beretvas, A.; Berryhill, J.; Bhat, P. C.; Bolla, G.; Burkett, K.; Butler, J. N.; Cheung, H. W. K.; Chlebana, F.; Cihangir, S.; Elvira, V. D.; Fisk, I.; Freeman, J.; Gottschalk, E.; Gray, L.; Green, D.; Grünendahl, S.; Gutsche, O.; Hanlon, J.; Hare, D.; Harris, R. M.; Hirschauer, J.; Hooberman, B.; Hu, Z.; Jindariani, S.; Johnson, M.; Joshi, U.; Jung, A. W.; Klima, B.; Kreis, B.; Kwan, S.; Lammel, S.; Linacre, J.; Lincoln, D.; Lipton, R.; Liu, T.; Lopes De Sá, R.; Lykken, J.; Maeshima, K.; Marraffino, J. M.; Martinez Outschoorn, V. I.; Maruyama, S.; Mason, D.; McBride, P.; Merkel, P.; Mishra, K.; Mrenna, S.; Nahn, S.; Newman-Holmes, C.; O'Dell, V.; Prokofyev, O.; Sexton-Kennedy, E.; Soha, A.; Spalding, W. J.; Spiegel, L.; Taylor, L.; Tkaczyk, S.; Tran, N. V.; Uplegger, L.; Vaandering, E. W.; Vernieri, C.; Verzocchi, M.; Vidal, R.; Whitbeck, A.; Yang, F.; Yin, H.; Acosta, D.; Avery, P.; Bortignon, P.; Bourilkov, D.; Carnes, A.; Carver, M.; Curry, D.; Das, S.; Di Giovanni, G. P.; Field, R. D.; Fisher, M.; Furic, I. K.; Hugon, J.; Konigsberg, J.; Korytov, A.; Kypreos, T.; Low, J. F.; Ma, P.; Matchev, K.; Mei, H.; Milenovic, P.; Mitselmakher, G.; Muniz, L.; Rank, D.; Shchutska, L.; Snowball, M.; Sperka, D.; Wang, S.; Yelton, J.; Hewamanage, S.; Linn, S.; Markowitz, P.; Martinez, G.; Rodriguez, J. L.; Ackert, A.; Adams, J. R.; Adams, T.; Askew, A.; Bochenek, J.; Diamond, B.; Haas, J.; Hagopian, S.; Hagopian, V.; Johnson, K. F.; Khatiwada, A.; Prosper, H.; Veeraraghavan, V.; Weinberg, M.; Bhopatkar, V.; Hohlmann, M.; Kalakhety, H.; Mareskas-palcek, D.; Roy, T.; Yumiceva, F.; Adams, M. R.; Apanasevich, L.; Berry, D.; Betts, R. R.; Bucinskaite, I.; Cavanaugh, R.; Evdokimov, O.; Gauthier, L.; Gerber, C. E.; Hofman, D. J.; Kurt, P.; O'Brien, C.; Sandoval Gonzalez, I. D.; Silkworth, C.; Turner, P.; Varelas, N.; Wu, Z.; Zakaria, M.; Bilki, B.; Clarida, W.; Dilsiz, K.; Durgut, S.; Gandrajula, R. P.; Haytmyradov, M.; Khristenko, V.; Merlo, J.-P.; Mermerkaya, H.; Mestvirishvili, A.; Moeller, A.; Nachtman, J.; Ogul, H.; Onel, Y.; Ozok, F.; Penzo, A.; Sen, S.; Snyder, C.; Tan, P.; Tiras, E.; Wetzel, J.; Yi, K.; Anderson, I.; Barnett, B. A.; Blumenfeld, B.; Fehling, D.; Feng, L.; Gritsan, A. V.; Maksimovic, P.; Martin, C.; Nash, K.; Osherson, M.; Swartz, M.; Xiao, M.; Xin, Y.; Baringer, P.; Bean, A.; Benelli, G.; Bruner, C.; Gray, J.; Kenny, R. P., III; Majumder, D.; Malek, M.; Murray, M.; Noonan, D.; Sanders, S.; Stringer, R.; Wang, Q.; Wood, J. S.; Chakaberia, I.; Ivanov, A.; Kaadze, K.; Khalil, S.; Makouski, M.; Maravin, Y.; Saini, L. K.; Skhirtladze, N.; Svintradze, I.; Toda, S.; Lange, D.; Rebassoo, F.; Wright, D.; Anelli, C.; Baden, A.; Baron, O.; Belloni, A.; Calvert, B.; Eno, S. C.; Ferraioli, C.; Gomez, J. A.; Hadley, N. J.; Jabeen, S.; Kellogg, R. G.; Kolberg, T.; Kunkle, J.; Lu, Y.; Mignerey, A. C.; Pedro, K.; Shin, Y. H.; Skuja, A.; Tonjes, M. B.; Tonwar, S. C.; Apyan, A.; Barbieri, R.; Baty, A.; Bierwagen, K.; Brandt, S.; Busza, W.; Cali, I. A.; Di Matteo, L.; Gomez Ceballos, G.; Goncharov, M.; Gulhan, D.; Innocenti, G. M.; Klute, M.; Kovalskyi, D.; Lai, Y. S.; Lee, Y.-J.; Levin, A.; Luckey, P. D.; Mcginn, C.; Niu, X.; Paus, C.; Ralph, D.; Roland, C.; Roland, G.; Stephans, G. S. F.; Sumorok, K.; Varma, M.; Velicanu, D.; Veverka, J.; Wang, J.; Wang, T. W.; Wyslouch, B.; Yang, M.; Zhukova, V.; Dahmes, B.; Finkel, A.; Gude, A.; Hansen, P.; Kalafut, S.; Kao, S. C.; Klapoetke, K.; Kubota, Y.; Lesko, Z.; Mans, J.; Nourbakhsh, S.; Ruckstuhl, N.; Rusack, R.; Tambe, N.; Turkewitz, J.; Acosta, J. G.; Oliveros, S.; Avdeeva, E.; Bloom, K.; Bose, S.; Claes, D. R.; Dominguez, A.; Fangmeier, C.; Gonzalez Suarez, R.; Kamalieddin, R.; Keller, J.; Knowlton, D.; Kravchenko, I.; Lazo-Flores, J.; Meier, F.; Monroy, J.; Ratnikov, F.; Siado, J. E.; Snow, G. R.; Alyari, M.; Dolen, J.; George, J.; Godshalk, A.; Iashvili, I.; Kaisen, J.; Kharchilava, A.; Kumar, A.; Rappoccio, S.; Alverson, G.; Barberis, E.; Baumgartel, D.; Chasco, M.; Hortiangtham, A.; Massironi, A.; Morse, D. M.; Nash, D.; Orimoto, T.; Teixeira De Lima, R.; Trocino, D.; Wang, R.-J.; Wood, D.; Zhang, J.; Hahn, K. A.; Kubik, A.; Mucia, N.; Odell, N.; Pollack, B.; Pozdnyakov, A.; Schmitt, M.; Stoynev, S.; Sung, K.; Trovato, M.; Velasco, M.; Won, S.; Brinkerhoff, A.; Dev, N.; Hildreth, M.; Jessop, C.; Karmgard, D. J.; Kellams, N.; Lannon, K.; Lynch, S.; Marinelli, N.; Meng, F.; Mueller, C.; Musienko, Y.; Pearson, T.; Planer, M.; Ruchti, R.; Smith, G.; Valls, N.; Wayne, M.; Wolf, M.; Woodard, A.; Antonelli, L.; Brinson, J.; Bylsma, B.; Durkin, L. S.; Flowers, S.; Hart, A.; Hill, C.; Hughes, R.; Kotov, K.; Ling, T. Y.; Liu, B.; Luo, W.; Puigh, D.; Rodenburg, M.; Winer, B. L.; Wulsin, H. W.; Driga, O.; Elmer, P.; Hardenbrook, J.; Hebda, P.; Koay, S. A.; Lujan, P.; Marlow, D.; Medvedeva, T.; Mooney, M.; Olsen, J.; Palmer, C.; Piroué, P.; Quan, X.; Saka, H.; Stickland, D.; Tully, C.; Werner, J. S.; Zuranski, A.; Malik, S.; Barnes, V. E.; Benedetti, D.; Bortoletto, D.; Gutay, L.; Jha, M. K.; Jones, M.; Jung, K.; Kress, M.; Leonardo, N.; Miller, D. H.; Neumeister, N.; Primavera, F.; Radburn-Smith, B. C.; Shi, X.; Shipsey, I.; Silvers, D.; Sun, J.; Svyatkovskiy, A.; Wang, F.; Xie, W.; Xu, L.; Zablocki, J.; Parashar, N.; Stupak, J.; Adair, A.; Akgun, B.; Chen, Z.; Ecklund, K. M.; Geurts, F. J. M.; Guilbaud, M.; Li, W.; Michlin, B.; Northup, M.; Padley, B. P.; Redjimi, R.; Roberts, J.; Rorie, J.; Tu, Z.; Zabel, J.; Betchart, B.; Bodek, A.; de Barbaro, P.; Demina, R.; Eshaq, Y.; Ferbel, T.; Galanti, M.; Garcia-Bellido, A.; Goldenzweig, P.; Han, J.; Harel, A.; Hindrichs, O.; Khukhunaishvili, A.; Petrillo, G.; Verzetti, M.; Vishnevskiy, D.; Demortier, L.; Arora, S.; Barker, A.; Chou, J. P.; Contreras-Campana, C.; Contreras-Campana, E.; Duggan, D.; Ferencek, D.; Gershtein, Y.; Gray, R.; Halkiadakis, E.; Hidas, D.; Hughes, E.; Kaplan, S.; Kunnawalkam Elayavalli, R.; Lath, A.; Panwalkar, S.; Park, M.; Salur, S.; Schnetzer, S.; Sheffield, D.; Somalwar, S.; Stone, R.; Thomas, S.; Thomassen, P.; Walker, M.; Foerster, M.; Riley, G.; Rose, K.; Spanier, S.; York, A.; Bouhali, O.; Castaneda Hernandez, A.; Dalchenko, M.; De Mattia, M.; Delgado, A.; Dildick, S.; Eusebi, R.; Flanagan, W.; Gilmore, J.; Kamon, T.; Krutelyov, V.; Montalvo, R.; Mueller, R.; Osipenkov, I.; Pakhotin, Y.; Patel, R.; Perloff, A.; Roe, J.; Rose, A.; Safonov, A.; Suarez, I.; Tatarinov, A.; Ulmer, K. A.; Akchurin, N.; Cowden, C.; Damgov, J.; Dragoiu, C.; Dudero, P. R.; Faulkner, J.; Kunori, S.; Lamichhane, K.; Lee, S. W.; Libeiro, T.; Undleeb, S.; Volobouev, I.; Appelt, E.; Delannoy, A. G.; Greene, S.; Gurrola, A.; Janjam, R.; Johns, W.; Maguire, C.; Mao, Y.; Melo, A.; Sheldon, P.; Snook, B.; Tuo, S.; Velkovska, J.; Xu, Q.; Arenton, M. W.; Boutle, S.; Cox, B.; Francis, B.; Goodell, J.; Hirosky, R.; Ledovskoy, A.; Li, H.; Lin, C.; Neu, C.; Wolfe, E.; Wood, J.; Xia, F.; Clarke, C.; Harr, R.; Karchin, P. E.; Kottachchi Kankanamge Don, C.; Lamichhane, P.; Sturdy, J.; Belknap, D. A.; Carlsmith, D.; Cepeda, M.; Christian, A.; Dasu, S.; Dodd, L.; Duric, S.; Friis, E.; Gomber, B.; Hall-Wilton, R.; Herndon, M.; Hervé, A.; Klabbers, P.; Lanaro, A.; Levine, A.; Long, K.; Loveless, R.; Mohapatra, A.; Ojalvo, I.; Perry, T.; Pierro, G. A.; Polese, G.; Ross, I.; Ruggles, T.; Sarangi, T.; Savin, A.; Smith, N.; Smith, W. H.; Taylor, D.; Woods, N.

    2016-01-01

    A search for neutral Higgs bosons predicted in the minimal supersymmetric standard model (MSSM) for μ+μ- decay channels is presented. The analysis uses data collected by the CMS experiment at the LHC in proton-proton collisions at centre-of-mass energies of 7 and 8 TeV, corresponding to integrated luminosities of 5.1 and 19.3 fb-1, respectively. The search is sensitive to Higgs bosons produced either through the gluon fusion process or in association with a b b ‾ quark pair. No statistically significant excess is observed in the μ+μ- mass spectrum. Results are interpreted in the framework of several benchmark scenarios, and the data are used to set an upper limit on the MSSM parameter tan ⁡ β as a function of the mass of the pseudoscalar A boson in the range from 115 to 300 GeV. Model independent upper limits are given for the product of the cross section and branching fraction for gluon fusion and b quark associated production at √{ s} = 8 TeV. They are the most stringent limits obtained to date in this channel.

  5. Resonances in the reaction ortho- and para- D2 + H at temperatures below 10 K

    NASA Astrophysics Data System (ADS)

    Simbotin, I.; Côté, R.

    2016-05-01

    In a previous study we reported cross sections for the reaction H2 + D in the temperature regime 10-6 < T < 10 K, and found pronounced shape resonances, especially in the p and d partial waves. We found that the resonant structures were sensitive to the initial rovibrational state of H2; in particular, we showed that the effect of the nuclear-spin symmetry was very important, since ortho- and para- H2 gave significantly different results. We now investigate the reaction D2 + H for vibrationally excited ortho- and para- D2, and compare and contrast these results with those for H2 + D. We remark that this benchmark system is a prototypical example of reactions with a strong barrier, which have very small cross sections in the cold and ultracold regimes. However, shape resonances can enhance the reaction cross sections by orders of magnitude for temperatures around and below T = 1 K. Moreover, resonant features would provide stringent tests for quantum chemistry calculations of potential energy surfaces. Partial support from the US Army Research Office (Grant No. W911NF-13-1-0213).

  6. Molprobity's ultimate rotamer-library distributions for model validation.

    PubMed

    Hintze, Bradley J; Lewis, Steven M; Richardson, Jane S; Richardson, David C

    2016-09-01

    Here we describe the updated MolProbity rotamer-library distributions derived from an order-of-magnitude larger and more stringently quality-filtered dataset of about 8000 (vs. 500) protein chains, and we explain the resulting changes and improvements to model validation as seen by users. To include only side-chains with satisfactory justification for their given conformation, we added residue-specific filters for electron-density value and model-to-density fit. The combined new protocol retains a million residues of data, while cleaning up false-positive noise in the multi- χ datapoint distributions. It enables unambiguous characterization of conformational clusters nearly 1000-fold less frequent than the most common ones. We describe examples of local interactions that favor these rare conformations, including the role of authentic covalent bond-angle deviations in enabling presumably strained side-chain conformations. Further, along with favored and outlier, an allowed category (0.3-2.0% occurrence in reference data) has been added, analogous to Ramachandran validation categories. The new rotamer distributions are used for current rotamer validation in MolProbity and PHENIX, and for rotamer choice in PHENIX model-building and refinement. The multi-dimensional χ distributions and Top8000 reference dataset are freely available on GitHub. These rotamers are termed "ultimate" because data sampling and quality are now fully adequate for this task, and also because we believe the future of conformational validation should integrate side-chain with backbone criteria. Proteins 2016; 84:1177-1189. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  7. MolProbity’s Ultimate Rotamer-Library Distributions for Model Validation

    PubMed Central

    Hintze, Bradley J.; Lewis, Steven M.; Richardson, Jane S.; Richardson, David C.

    2016-01-01

    Here we describe the updated MolProbity rotamer-library distributions derived from an order-of-magnitude larger and more stringently quality-filtered dataset of about 8000 (vs. 500) protein chains, and we explain the resulting changes and improvements to model validation as seen by users. To include only sidechains with satisfactory justification for their given conformation, we added residue-specific filters for electron-density value and model-to-density fit. The combined new protocol retains a million residues of data, while cleaning up false-positive noise in the multi-χ datapoint distributions. It enables unambiguous characterization of conformational clusters nearly 1000-fold less frequent than the most common ones. We describe examples of local interactions that favor these rare conformations, including the role of authentic covalent bond-angle deviations in enabling presumably strained sidechain conformations. Further, along with favored and outlier, an allowed category (0.3% to 2.0% occurrence in reference data) has been added, analogous to Ramachandran validation categories. The new rotamer distributions are used for current rotamer validation in Mol-Probity and PHENIX, and for rotamer choice in PHENIX model-building and refinement. The multi-dimensional χ distributions and Top8000 reference dataset are freely available on GitHub. These rotamers are termed “ultimate” because data sampling and quality are now fully adequate for this task, and also because we believe the future of conformational validation should integrate sidechain with backbone criteria. PMID:27018641

  8. Human Vision-Motivated Algorithm Allows Consistent Retinal Vessel Classification Based on Local Color Contrast for Advancing General Diagnostic Exams.

    PubMed

    Ivanov, Iliya V; Leitritz, Martin A; Norrenberg, Lars A; Völker, Michael; Dynowski, Marek; Ueffing, Marius; Dietter, Johannes

    2016-02-01

    Abnormalities of blood vessel anatomy, morphology, and ratio can serve as important diagnostic markers for retinal diseases such as AMD or diabetic retinopathy. Large cohort studies demand automated and quantitative image analysis of vascular abnormalities. Therefore, we developed an analytical software tool to enable automated standardized classification of blood vessels supporting clinical reading. A dataset of 61 images was collected from a total of 33 women and 8 men with a median age of 38 years. The pupils were not dilated, and images were taken after dark adaption. In contrast to current methods in which classification is based on vessel profile intensity averages, and similar to human vision, local color contrast was chosen as a discriminator to allow artery vein discrimination and arterial-venous ratio (AVR) calculation without vessel tracking. With 83% ± 1 standard error of the mean for our dataset, we achieved best classification for weighted lightness information from a combination of the red, green, and blue channels. Tested on an independent dataset, our method reached 89% correct classification, which, when benchmarked against conventional ophthalmologic classification, shows significantly improved classification scores. Our study demonstrates that vessel classification based on local color contrast can cope with inter- or intraimage lightness variability and allows consistent AVR calculation. We offer an open-source implementation of this method upon request, which can be integrated into existing tool sets and applied to general diagnostic exams.

  9. WRF-TMH: predicting transmembrane helix by fusing composition index and physicochemical properties of amino acids.

    PubMed

    Hayat, Maqsood; Khan, Asifullah

    2013-05-01

    Membrane protein is the prime constituent of a cell, which performs a role of mediator between intra and extracellular processes. The prediction of transmembrane (TM) helix and its topology provides essential information regarding the function and structure of membrane proteins. However, prediction of TM helix and its topology is a challenging issue in bioinformatics and computational biology due to experimental complexities and lack of its established structures. Therefore, the location and orientation of TM helix segments are predicted from topogenic sequences. In this regard, we propose WRF-TMH model for effectively predicting TM helix segments. In this model, information is extracted from membrane protein sequences using compositional index and physicochemical properties. The redundant and irrelevant features are eliminated through singular value decomposition. The selected features provided by these feature extraction strategies are then fused to develop a hybrid model. Weighted random forest is adopted as a classification approach. We have used two benchmark datasets including low and high-resolution datasets. tenfold cross validation is employed to assess the performance of WRF-TMH model at different levels including per protein, per segment, and per residue. The success rates of WRF-TMH model are quite promising and are the best reported so far on the same datasets. It is observed that WRF-TMH model might play a substantial role, and will provide essential information for further structural and functional studies on membrane proteins. The accompanied web predictor is accessible at http://111.68.99.218/WRF-TMH/ .

  10. PSOFuzzySVM-TMH: identification of transmembrane helix segments using ensemble feature space by incorporated fuzzy support vector machine.

    PubMed

    Hayat, Maqsood; Tahir, Muhammad

    2015-08-01

    Membrane protein is a central component of the cell that manages intra and extracellular processes. Membrane proteins execute a diversity of functions that are vital for the survival of organisms. The topology of transmembrane proteins describes the number of transmembrane (TM) helix segments and its orientation. However, owing to the lack of its recognized structures, the identification of TM helix and its topology through experimental methods is laborious with low throughput. In order to identify TM helix segments reliably, accurately, and effectively from topogenic sequences, we propose the PSOFuzzySVM-TMH model. In this model, evolutionary based information position specific scoring matrix and discrete based information 6-letter exchange group are used to formulate transmembrane protein sequences. The noisy and extraneous attributes are eradicated using an optimization selection technique, particle swarm optimization, from both feature spaces. Finally, the selected feature spaces are combined in order to form ensemble feature space. Fuzzy-support vector Machine is utilized as a classification algorithm. Two benchmark datasets, including low and high resolution datasets, are used. At various levels, the performance of the PSOFuzzySVM-TMH model is assessed through 10-fold cross validation test. The empirical results reveal that the proposed framework PSOFuzzySVM-TMH outperforms in terms of classification performance in the examined datasets. It is ascertained that the proposed model might be a useful and high throughput tool for academia and research community for further structure and functional studies on transmembrane proteins.

  11. Local receptive field constrained stacked sparse autoencoder for classification of hyperspectral images.

    PubMed

    Wan, Xiaoqing; Zhao, Chunhui

    2017-06-01

    As a competitive machine learning algorithm, the stacked sparse autoencoder (SSA) has achieved outstanding popularity in exploiting high-level features for classification of hyperspectral images (HSIs). In general, in the SSA architecture, the nodes between adjacent layers are fully connected and need to be iteratively fine-tuned during the pretraining stage; however, the nodes of previous layers further away may be less likely to have a dense correlation to the given node of subsequent layers. Therefore, to reduce the classification error and increase the learning rate, this paper proposes the general framework of locally connected SSA; that is, the biologically inspired local receptive field (LRF) constrained SSA architecture is employed to simultaneously characterize the local correlations of spectral features and extract high-level feature representations of hyperspectral data. In addition, the appropriate receptive field constraint is concurrently updated by measuring the spatial distances from the neighbor nodes to the corresponding node. Finally, the efficient random forest classifier is cascaded to the last hidden layer of the SSA architecture as a benchmark classifier. Experimental results on two real HSI datasets demonstrate that the proposed hierarchical LRF constrained stacked sparse autoencoder and random forest (SSARF) provides encouraging results with respect to other contrastive methods, for instance, the improvements of overall accuracy in a range of 0.72%-10.87% for the Indian Pines dataset and 0.74%-7.90% for the Kennedy Space Center dataset; moreover, it generates lower running time compared with the result provided by similar SSARF based methodology.

  12. Network information improves cancer outcome prediction.

    PubMed

    Roy, Janine; Winter, Christof; Isik, Zerrin; Schroeder, Michael

    2014-07-01

    Disease progression in cancer can vary substantially between patients. Yet, patients often receive the same treatment. Recently, there has been much work on predicting disease progression and patient outcome variables from gene expression in order to personalize treatment options. Despite first diagnostic kits in the market, there are open problems such as the choice of random gene signatures or noisy expression data. One approach to deal with these two problems employs protein-protein interaction networks and ranks genes using the random surfer model of Google's PageRank algorithm. In this work, we created a benchmark dataset collection comprising 25 cancer outcome prediction datasets from literature and systematically evaluated the use of networks and a PageRank derivative, NetRank, for signature identification. We show that the NetRank performs significantly better than classical methods such as fold change or t-test. Despite an order of magnitude difference in network size, a regulatory and protein-protein interaction network perform equally well. Experimental evaluation on cancer outcome prediction in all of the 25 underlying datasets suggests that the network-based methodology identifies highly overlapping signatures over all cancer types, in contrast to classical methods that fail to identify highly common gene sets across the same cancer types. Integration of network information into gene expression analysis allows the identification of more reliable and accurate biomarkers and provides a deeper understanding of processes occurring in cancer development and progression. © The Author 2012. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  13. GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains

    PubMed Central

    Lu, Zhiyong

    2015-01-01

    The automatic recognition of gene names and their associated database identifiers from biomedical text has been widely studied in recent years, as these tasks play an important role in many downstream text-mining applications. Despite significant previous research, only a small number of tools are publicly available and these tools are typically restricted to detecting only mention level gene names or only document level gene identifiers. In this work, we report GNormPlus: an end-to-end and open source system that handles both gene mention and identifier detection. We created a new corpus of 694 PubMed articles to support our development of GNormPlus, containing manual annotations for not only gene names and their identifiers, but also closely related concepts useful for gene name disambiguation, such as gene families and protein domains. GNormPlus integrates several advanced text-mining techniques, including SimConcept for resolving composite gene names. As a result, GNormPlus compares favorably to other state-of-the-art methods when evaluated on two widely used public benchmarking datasets, achieving 86.7% F1-score on the BioCreative II Gene Normalization task dataset and 50.1% F1-score on the BioCreative III Gene Normalization task dataset. The GNormPlus source code and its annotated corpus are freely available, and the results of applying GNormPlus to the entire PubMed are freely accessible through our web-based tool PubTator. PMID:26380306

  14. A community resource benchmarking predictions of peptide binding to MHC-I molecules.

    PubMed

    Peters, Bjoern; Bui, Huynh-Hoa; Frankild, Sune; Nielson, Morten; Lundegaard, Claus; Kostem, Emrah; Basch, Derek; Lamberth, Kasper; Harndahl, Mikkel; Fleri, Ward; Wilson, Stephen S; Sidney, John; Lund, Ole; Buus, Soren; Sette, Alessandro

    2006-06-09

    Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T lymphocytes is an essential part of immune surveillance. Each MHC allele has a characteristic peptide binding preference, which can be captured in prediction algorithms, allowing for the rapid scan of entire pathogen proteomes for peptide likely to bind MHC. Here we make public a large set of 48,828 quantitative peptide-binding affinity measurements relating to 48 different mouse, human, macaque, and chimpanzee MHC class I alleles. We use this data to establish a set of benchmark predictions with one neural network method and two matrix-based prediction methods extensively utilized in our groups. In general, the neural network outperforms the matrix-based predictions mainly due to its ability to generalize even on a small amount of data. We also retrieved predictions from tools publicly available on the internet. While differences in the data used to generate these predictions hamper direct comparisons, we do conclude that tools based on combinatorial peptide libraries perform remarkably well. The transparent prediction evaluation on this dataset provides tool developers with a benchmark for comparison of newly developed prediction methods. In addition, to generate and evaluate our own prediction methods, we have established an easily extensible web-based prediction framework that allows automated side-by-side comparisons of prediction methods implemented by experts. This is an advance over the current practice of tool developers having to generate reference predictions themselves, which can lead to underestimating the performance of prediction methods they are not as familiar with as their own. The overall goal of this effort is to provide a transparent prediction evaluation allowing bioinformaticians to identify promising features of prediction methods and providing guidance to immunologists regarding the reliability of prediction tools.

  15. InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk.

    PubMed

    Cheng, Liang; Jiang, Yue; Ju, Hong; Sun, Jie; Peng, Jiajie; Zhou, Meng; Hu, Yang

    2018-01-19

    Since the establishment of the first biomedical ontology Gene Ontology (GO), the number of biomedical ontology has increased dramatically. Nowadays over 300 ontologies have been built including extensively used Disease Ontology (DO) and Human Phenotype Ontology (HPO). Because of the advantage of identifying novel relationships between terms, calculating similarity between ontology terms is one of the major tasks in this research area. Though similarities between terms within each ontology have been studied with in silico methods, term similarities across different ontologies were not investigated as deeply. The latest method took advantage of gene functional interaction network (GFIN) to explore such inter-ontology similarities of terms. However, it only used gene interactions and failed to make full use of the connectivity among gene nodes of the network. In addition, all existent methods are particularly designed for GO and their performances on the extended ontology community remain unknown. We proposed a method InfAcrOnt to infer similarities between terms across ontologies utilizing the entire GFIN. InfAcrOnt builds a term-gene-gene network which comprised ontology annotations and GFIN, and acquires similarities between terms across ontologies through modeling the information flow within the network by random walk. In our benchmark experiments on sub-ontologies of GO, InfAcrOnt achieves a high average area under the receiver operating characteristic curve (AUC) (0.9322 and 0.9309) and low standard deviations (1.8746e-6 and 3.0977e-6) in both human and yeast benchmark datasets exhibiting superior performance. Meanwhile, comparisons of InfAcrOnt results and prior knowledge on pair-wise DO-HPO terms and pair-wise DO-GO terms show high correlations. The experiment results show that InfAcrOnt significantly improves the performance of inferring similarities between terms across ontologies in benchmark set.

  16. D Modeling of Industrial Heritage Building Using COTSs System: Test, Limits and Performances

    NASA Astrophysics Data System (ADS)

    Piras, M.; Di Pietra, V.; Visintini, D.

    2017-08-01

    The role of UAV systems in applied geomatics is continuously increasing in several applications as inspection, surveying and geospatial data. This evolution is mainly due to two factors: new technologies and new algorithms for data processing. About technologies, from some years ago there is a very wide use of commercial UAV even COTSs (Commercial On-The-Shelf) systems. Moreover, these UAVs allow to easily acquire oblique images, giving the possibility to overcome the limitations of the nadir approach related to the field of view and occlusions. In order to test potential and issue of COTSs systems, the Italian Society of Photogrammetry and Topography (SIFET) has organised the SBM2017, which is a benchmark where all people can participate in a shared experience. This benchmark, called "Photogrammetry with oblique images from UAV: potentialities and challenges", permits to collect considerations from the users, highlight the potential of these systems, define the critical aspects and the technological challenges and compare distinct approaches and software. The case study is the "Fornace Penna" in Scicli (Ragusa, Italy), an inaccessible monument of industrial architecture from the early 1900s. The datasets (images and video) have been acquired from three different UAVs system: Parrot Bebop 2, DJI Phantom 4 and Flytop Flynovex. The aim of this benchmark is to generate the 3D model of the "Fornace Penna", making an analysis considering different software, imaging geometry and processing strategies. This paper describes the surveying strategies, the methodologies and five different photogrammetric obtained results (sensor calibration, external orientation, dense point cloud and two orthophotos), using separately - the single images and the frames extracted from the video - acquired with the DJI system.

  17. Multi-task learning with group information for human action recognition

    NASA Astrophysics Data System (ADS)

    Qian, Li; Wu, Song; Pu, Nan; Xu, Shulin; Xiao, Guoqiang

    2018-04-01

    Human action recognition is an important and challenging task in computer vision research, due to the variations in human motion performance, interpersonal differences and recording settings. In this paper, we propose a novel multi-task learning framework with group information (MTL-GI) for accurate and efficient human action recognition. Specifically, we firstly obtain group information through calculating the mutual information according to the latent relationship between Gaussian components and action categories, and clustering similar action categories into the same group by affinity propagation clustering. Additionally, in order to explore the relationships of related tasks, we incorporate group information into multi-task learning. Experimental results evaluated on two popular benchmarks (UCF50 and HMDB51 datasets) demonstrate the superiority of our proposed MTL-GI framework.

  18. A New Algorithm for Identifying Cis-Regulatory Modules Based on Hidden Markov Model

    PubMed Central

    2017-01-01

    The discovery of cis-regulatory modules (CRMs) is the key to understanding mechanisms of transcription regulation. Since CRMs have specific regulatory structures that are the basis for the regulation of gene expression, how to model the regulatory structure of CRMs has a considerable impact on the performance of CRM identification. The paper proposes a CRM discovery algorithm called ComSPS. ComSPS builds a regulatory structure model of CRMs based on HMM by exploring the rules of CRM transcriptional grammar that governs the internal motif site arrangement of CRMs. We test ComSPS on three benchmark datasets and compare it with five existing methods. Experimental results show that ComSPS performs better than them. PMID:28497059

  19. An iterative bidirectional heuristic placement algorithm for solving the two-dimensional knapsack packing problem

    NASA Astrophysics Data System (ADS)

    Shiangjen, Kanokwatt; Chaijaruwanich, Jeerayut; Srisujjalertwaja, Wijak; Unachak, Prakarn; Somhom, Samerkae

    2018-02-01

    This article presents an efficient heuristic placement algorithm, namely, a bidirectional heuristic placement, for solving the two-dimensional rectangular knapsack packing problem. The heuristic demonstrates ways to maximize space utilization by fitting the appropriate rectangle from both sides of the wall of the current residual space layer by layer. The iterative local search along with a shift strategy is developed and applied to the heuristic to balance the exploitation and exploration tasks in the solution space without the tuning of any parameters. The experimental results on many scales of packing problems show that this approach can produce high-quality solutions for most of the benchmark datasets, especially for large-scale problems, within a reasonable duration of computational time.

  20. Volatility forecasting for low-volatility portfolio selection in the US and the Korean equity markets

    NASA Astrophysics Data System (ADS)

    Kim, Saejoon

    2018-01-01

    We consider the problem of low-volatility portfolio selection which has been the subject of extensive research in the field of portfolio selection. To improve the currently existing techniques that rely purely on past information to select low-volatility portfolios, this paper investigates the use of time series regression techniques that make forecasts of future volatility to select the portfolios. In particular, for the first time, the utility of support vector regression and its enhancements as portfolio selection techniques is provided. It is shown that our regression-based portfolio selection provides attractive outperformances compared to the benchmark index and the portfolio defined by a well-known strategy on the data-sets of the S&P 500 and the KOSPI 200.

Top