Sample records for randomly selected subset

  1. Unbiased feature selection in learning random forests for high-dimensional data.

    PubMed

    Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi

    2015-01-01

    Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.

  2. Transferability of optimally-selected climate models in the quantification of climate change impacts on hydrology

    NASA Astrophysics Data System (ADS)

    Chen, Jie; Brissette, François P.; Lucas-Picher, Philippe

    2016-11-01

    Given the ever increasing number of climate change simulations being carried out, it has become impractical to use all of them to cover the uncertainty of climate change impacts. Various methods have been proposed to optimally select subsets of a large ensemble of climate simulations for impact studies. However, the behaviour of optimally-selected subsets of climate simulations for climate change impacts is unknown, since the transfer process from climate projections to the impact study world is usually highly non-linear. Consequently, this study investigates the transferability of optimally-selected subsets of climate simulations in the case of hydrological impacts. Two different methods were used for the optimal selection of subsets of climate scenarios, and both were found to be capable of adequately representing the spread of selected climate model variables contained in the original large ensemble. However, in both cases, the optimal subsets had limited transferability to hydrological impacts. To capture a similar variability in the impact model world, many more simulations have to be used than those that are needed to simply cover variability from the climate model variables' perspective. Overall, both optimal subset selection methods were better than random selection when small subsets were selected from a large ensemble for impact studies. However, as the number of selected simulations increased, random selection often performed better than the two optimal methods. To ensure adequate uncertainty coverage, the results of this study imply that selecting as many climate change simulations as possible is the best avenue. Where this was not possible, the two optimal methods were found to perform adequately.

  3. Computerized stratified random site-selection approaches for design of a ground-water-quality sampling network

    USGS Publications Warehouse

    Scott, J.C.

    1990-01-01

    Computer software was written to randomly select sites for a ground-water-quality sampling network. The software uses digital cartographic techniques and subroutines from a proprietary geographic information system. The report presents the approaches, computer software, and sample applications. It is often desirable to collect ground-water-quality samples from various areas in a study region that have different values of a spatial characteristic, such as land-use or hydrogeologic setting. A stratified network can be used for testing hypotheses about relations between spatial characteristics and water quality, or for calculating statistical descriptions of water-quality data that account for variations that correspond to the spatial characteristic. In the software described, a study region is subdivided into areal subsets that have a common spatial characteristic to stratify the population into several categories from which sampling sites are selected. Different numbers of sites may be selected from each category of areal subsets. A population of potential sampling sites may be defined by either specifying a fixed population of existing sites, or by preparing an equally spaced population of potential sites. In either case, each site is identified with a single category, depending on the value of the spatial characteristic of the areal subset in which the site is located. Sites are selected from one category at a time. One of two approaches may be used to select sites. Sites may be selected randomly, or the areal subsets in the category can be grouped into cells and sites selected randomly from each cell.

  4. Effects of Sample Selection on Estimates of Economic Impacts of Outdoor Recreation

    Treesearch

    Donald B.K. English

    1997-01-01

    Estimates of the economic impacts of recreation often come from spending data provided by a self-selected subset of a random sample of site visitors. The subset is frequently less than half the onsite sample. Biased vectors of per trip spending and impact estimates can result if self-selection is related to spending pattctns, and proper corrective procedures arc not...

  5. Data-driven confounder selection via Markov and Bayesian networks.

    PubMed

    Häggström, Jenny

    2018-06-01

    To unbiasedly estimate a causal effect on an outcome unconfoundedness is often assumed. If there is sufficient knowledge on the underlying causal structure then existing confounder selection criteria can be used to select subsets of the observed pretreatment covariates, X, sufficient for unconfoundedness, if such subsets exist. Here, estimation of these target subsets is considered when the underlying causal structure is unknown. The proposed method is to model the causal structure by a probabilistic graphical model, for example, a Markov or Bayesian network, estimate this graph from observed data and select the target subsets given the estimated graph. The approach is evaluated by simulation both in a high-dimensional setting where unconfoundedness holds given X and in a setting where unconfoundedness only holds given subsets of X. Several common target subsets are investigated and the selected subsets are compared with respect to accuracy in estimating the average causal effect. The proposed method is implemented with existing software that can easily handle high-dimensional data, in terms of large samples and large number of covariates. The results from the simulation study show that, if unconfoundedness holds given X, this approach is very successful in selecting the target subsets, outperforming alternative approaches based on random forests and LASSO, and that the subset estimating the target subset containing all causes of outcome yields smallest MSE in the average causal effect estimation. © 2017, The International Biometric Society.

  6. Comparative study of feature selection with ensemble learning using SOM variants

    NASA Astrophysics Data System (ADS)

    Filali, Ameni; Jlassi, Chiraz; Arous, Najet

    2017-03-01

    Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.

  7. Defining an essence of structure determining residue contacts in proteins.

    PubMed

    Sathyapriya, R; Duarte, Jose M; Stehr, Henning; Filippis, Ioannis; Lappe, Michael

    2009-12-01

    The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this "structural essence" has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts-such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed "cone-peeling" that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 A Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This "structural essence" opens new avenues in the fields of structure prediction, empirical potentials and docking.

  8. Defining an Essence of Structure Determining Residue Contacts in Proteins

    PubMed Central

    Sathyapriya, R.; Duarte, Jose M.; Stehr, Henning; Filippis, Ioannis; Lappe, Michael

    2009-01-01

    The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this “structural essence” has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts—such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed “cone-peeling” that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 Å Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This “structural essence” opens new avenues in the fields of structure prediction, empirical potentials and docking. PMID:19997489

  9. Sample size determination for bibliographic retrieval studies

    PubMed Central

    Yao, Xiaomei; Wilczynski, Nancy L; Walter, Stephen D; Haynes, R Brian

    2008-01-01

    Background Research for developing search strategies to retrieve high-quality clinical journal articles from MEDLINE is expensive and time-consuming. The objective of this study was to determine the minimal number of high-quality articles in a journal subset that would need to be hand-searched to update or create new MEDLINE search strategies for treatment, diagnosis, and prognosis studies. Methods The desired width of the 95% confidence intervals (W) for the lowest sensitivity among existing search strategies was used to calculate the number of high-quality articles needed to reliably update search strategies. New search strategies were derived in journal subsets formed by 2 approaches: random sampling of journals and top journals (having the most high-quality articles). The new strategies were tested in both the original large journal database and in a low-yielding journal (having few high-quality articles) subset. Results For treatment studies, if W was 10% or less for the lowest sensitivity among our existing search strategies, a subset of 15 randomly selected journals or 2 top journals were adequate for updating search strategies, based on each approach having at least 99 high-quality articles. The new strategies derived in 15 randomly selected journals or 2 top journals performed well in the original large journal database. Nevertheless, the new search strategies developed using the random sampling approach performed better than those developed using the top journal approach in a low-yielding journal subset. For studies of diagnosis and prognosis, no journal subset had enough high-quality articles to achieve the expected W (10%). Conclusion The approach of randomly sampling a small subset of journals that includes sufficient high-quality articles is an efficient way to update or create search strategies for high-quality articles on therapy in MEDLINE. The concentrations of diagnosis and prognosis articles are too low for this approach. PMID:18823538

  10. Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE.

    PubMed

    Chen, Qi; Meng, Zhaopeng; Liu, Xinyi; Jin, Qianguo; Su, Ran

    2018-06-15

    Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficiency increase. A ranking of features, as well as candidate subsets with the corresponding accuracy, is produced through RFE. The subset with highest accuracy (HA) or a preset number of features (PreNum) are often used as the final subset. However, this may lead to a large number of features being selected, or if there is no prior knowledge about this preset number, it is often ambiguous and subjective regarding final subset selection. A proper decision variant is in high demand to automatically determine the optimal subset. In this study, we conduct pioneering work to explore the decision variant after obtaining a list of candidate subsets from RFE. We provide a detailed analysis and comparison of several decision variants to automatically select the optimal feature subset. Random forest (RF)-recursive feature elimination (RF-RFE) algorithm and a voting strategy are introduced. We validated the variants on two totally different molecular biology datasets, one for a toxicogenomic study and the other one for protein sequence analysis. The study provides an automated way to determine the optimal feature subset when using RF-RFE.

  11. An Active RBSE Framework to Generate Optimal Stimulus Sequences in a BCI for Spelling

    NASA Astrophysics Data System (ADS)

    Moghadamfalahi, Mohammad; Akcakaya, Murat; Nezamfar, Hooman; Sourati, Jamshid; Erdogmus, Deniz

    2017-10-01

    A class of brain computer interfaces (BCIs) employs noninvasive recordings of electroencephalography (EEG) signals to enable users with severe speech and motor impairments to interact with their environment and social network. For example, EEG based BCIs for typing popularly utilize event related potentials (ERPs) for inference. Presentation paradigm design in current ERP-based letter by letter typing BCIs typically query the user with an arbitrary subset characters. However, the typing accuracy and also typing speed can potentially be enhanced with more informed subset selection and flash assignment. In this manuscript, we introduce the active recursive Bayesian state estimation (active-RBSE) framework for inference and sequence optimization. Prior to presentation in each iteration, rather than showing a subset of randomly selected characters, the developed framework optimally selects a subset based on a query function. Selected queries are made adaptively specialized for users during each intent detection. Through a simulation-based study, we assess the effect of active-RBSE on the performance of a language-model assisted typing BCI in terms of typing speed and accuracy. To provide a baseline for comparison, we also utilize standard presentation paradigms namely, row and column matrix presentation paradigm and also random rapid serial visual presentation paradigms. The results show that utilization of active-RBSE can enhance the online performance of the system, both in terms of typing accuracy and speed.

  12. Issues Relating to Selective Reporting When Including Non-Randomized Studies in Systematic Reviews on the Effects of Healthcare Interventions

    ERIC Educational Resources Information Center

    Norris, Susan L.; Moher, David; Reeves, Barnaby C.; Shea, Beverley; Loke, Yoon; Garner, Sarah; Anderson, Laurie; Tugwell, Peter; Wells, George

    2013-01-01

    Background: Selective outcome and analysis reporting (SOR and SAR) occur when only a subset of outcomes measured and analyzed in a study is fully reported, and are an important source of potential bias. Key methodological issues: We describe what is known about the prevalence and effects of SOR and SAR in both randomized controlled trials (RCTs)…

  13. Reducing seed dependent variability of non-uniformly sampled multidimensional NMR data

    NASA Astrophysics Data System (ADS)

    Mobli, Mehdi

    2015-07-01

    The application of NMR spectroscopy to study the structure, dynamics and function of macromolecules requires the acquisition of several multidimensional spectra. The one-dimensional NMR time-response from the spectrometer is extended to additional dimensions by introducing incremented delays in the experiment that cause oscillation of the signal along "indirect" dimensions. For a given dimension the delay is incremented at twice the rate of the maximum frequency (Nyquist rate). To achieve high-resolution requires acquisition of long data records sampled at the Nyquist rate. This is typically a prohibitive step due to time constraints, resulting in sub-optimal data records to the detriment of subsequent analyses. The multidimensional NMR spectrum itself is typically sparse, and it has been shown that in such cases it is possible to use non-Fourier methods to reconstruct a high-resolution multidimensional spectrum from a random subset of non-uniformly sampled (NUS) data. For a given acquisition time, NUS has the potential to improve the sensitivity and resolution of a multidimensional spectrum, compared to traditional uniform sampling. The improvements in sensitivity and/or resolution achieved by NUS are heavily dependent on the distribution of points in the random subset acquired. Typically, random points are selected from a probability density function (PDF) weighted according to the NMR signal envelope. In extreme cases as little as 1% of the data is subsampled. The heavy under-sampling can result in poor reproducibility, i.e. when two experiments are carried out where the same number of random samples is selected from the same PDF but using different random seeds. Here, a jittered sampling approach is introduced that is shown to improve random seed dependent reproducibility of multidimensional spectra generated from NUS data, compared to commonly applied NUS methods. It is shown that this is achieved due to the low variability of the inherent sensitivity of the random subset chosen from a given PDF. Finally, it is demonstrated that metrics used to find optimal NUS distributions are heavily dependent on the inherent sensitivity of the random subset, and such optimisation is therefore less critical when using the proposed sampling scheme.

  14. A Metacommunity Framework for Enhancing the Effectiveness of Biological Monitoring Strategies

    PubMed Central

    Roque, Fabio O.; Cottenie, Karl

    2012-01-01

    Because of inadequate knowledge and funding, the use of biodiversity indicators is often suggested as a way to support management decisions. Consequently, many studies have analyzed the performance of certain groups as indicator taxa. However, in addition to knowing whether certain groups can adequately represent the biodiversity as a whole, we must also know whether they show similar responses to the main structuring processes affecting biodiversity. Here we present an application of the metacommunity framework for evaluating the effectiveness of biodiversity indicators. Although the metacommunity framework has contributed to a better understanding of biodiversity patterns, there is still limited discussion about its implications for conservation and biomonitoring. We evaluated the effectiveness of indicator taxa in representing spatial variation in macroinvertebrate community composition in Atlantic Forest streams, and the processes that drive this variation. We focused on analyzing whether some groups conform to environmental processes and other groups are more influenced by spatial processes, and on how this can help in deciding which indicator group or groups should be used. We showed that a relatively small subset of taxa from the metacommunity would represent 80% of the variation in community composition shown by the entire metacommunity. Moreover, this subset does not have to be composed of predetermined taxonomic groups, but rather can be defined based on random subsets. We also found that some random subsets composed of a small number of genera performed better in responding to major environmental gradients. There were also random subsets that seemed to be affected by spatial processes, which could indicate important historical processes. We were able to integrate in the same theoretical and practical framework, the selection of biodiversity surrogates, indicators of environmental conditions, and more importantly, an explicit integration of environmental and spatial processes into the selection approach. PMID:22937068

  15. VARIABLE SELECTION FOR QUALITATIVE INTERACTIONS IN PERSONALIZED MEDICINE WHILE CONTROLLING THE FAMILY-WISE ERROR RATE

    PubMed Central

    Gunter, Lacey; Zhu, Ji; Murphy, Susan

    2012-01-01

    For many years, subset analysis has been a popular topic for the biostatistics and clinical trials literature. In more recent years, the discussion has focused on finding subsets of genomes which play a role in the effect of treatment, often referred to as stratified or personalized medicine. Though highly sought after, methods for detecting subsets with altering treatment effects are limited and lacking in power. In this article we discuss variable selection for qualitative interactions with the aim to discover these critical patient subsets. We propose a new technique designed specifically to find these interaction variables among a large set of variables while still controlling for the number of false discoveries. We compare this new method against standard qualitative interaction tests using simulations and give an example of its use on data from a randomized controlled trial for the treatment of depression. PMID:22023676

  16. Column Subset Selection, Matrix Factorization, and Eigenvalue Optimization

    DTIC Science & Technology

    2008-07-01

    Pietsch and Grothendieck, which are regarded as basic instruments in modern functional analysis [Pis86]. • The methods for computing these... Pietsch factorization and the maxcut semi- definite program [GW95]. 1.2. Overview. We focus on the algorithmic version of the Kashin–Tzafriri theorem...will see that the desired subset is exposed by factoring the random submatrix. This factorization, which was invented by Pietsch , is regarded as a basic

  17. Classification of Medical Datasets Using SVMs with Hybrid Evolutionary Algorithms Based on Endocrine-Based Particle Swarm Optimization and Artificial Bee Colony Algorithms.

    PubMed

    Lin, Kuan-Cheng; Hsieh, Yi-Hsiu

    2015-10-01

    The classification and analysis of data is an important issue in today's research. Selecting a suitable set of features makes it possible to classify an enormous quantity of data quickly and efficiently. Feature selection is generally viewed as a problem of feature subset selection, such as combination optimization problems. Evolutionary algorithms using random search methods have proven highly effective in obtaining solutions to problems of optimization in a diversity of applications. In this study, we developed a hybrid evolutionary algorithm based on endocrine-based particle swarm optimization (EPSO) and artificial bee colony (ABC) algorithms in conjunction with a support vector machine (SVM) for the selection of optimal feature subsets for the classification of datasets. The results of experiments using specific UCI medical datasets demonstrate that the accuracy of the proposed hybrid evolutionary algorithm is superior to that of basic PSO, EPSO and ABC algorithms, with regard to classification accuracy using subsets with a reduced number of features.

  18. A statistical approach to selecting and confirming validation targets in -omics experiments

    PubMed Central

    2012-01-01

    Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145

  19. Spectral Band Selection for Urban Material Classification Using Hyperspectral Libraries

    NASA Astrophysics Data System (ADS)

    Le Bris, A.; Chehata, N.; Briottet, X.; Paparoditis, N.

    2016-06-01

    In urban areas, information concerning very high resolution land cover and especially material maps are necessary for several city modelling or monitoring applications. That is to say, knowledge concerning the roofing materials or the different kinds of ground areas is required. Airborne remote sensing techniques appear to be convenient for providing such information at a large scale. However, results obtained using most traditional processing methods based on usual red-green-blue-near infrared multispectral images remain limited for such applications. A possible way to improve classification results is to enhance the imagery spectral resolution using superspectral or hyperspectral sensors. In this study, it is intended to design a superspectral sensor dedicated to urban materials classification and this work particularly focused on the selection of the optimal spectral band subsets for such sensor. First, reflectance spectral signatures of urban materials were collected from 7 spectral libraires. Then, spectral optimization was performed using this data set. The band selection workflow included two steps, optimising first the number of spectral bands using an incremental method and then examining several possible optimised band subsets using a stochastic algorithm. The same wrapper relevance criterion relying on a confidence measure of Random Forests classifier was used at both steps. To cope with the limited number of available spectra for several classes, additional synthetic spectra were generated from the collection of reference spectra: intra-class variability was simulated by multiplying reference spectra by a random coefficient. At the end, selected band subsets were evaluated considering the classification quality reached using a rbf svm classifier. It was confirmed that a limited band subset was sufficient to classify common urban materials. The important contribution of bands from the Short Wave Infra-Red (SWIR) spectral domain (1000-2400 nm) to material classification was also shown.

  20. Optimized probability sampling of study sites to improve generalizability in a multisite intervention trial.

    PubMed

    Kraschnewski, Jennifer L; Keyserling, Thomas C; Bangdiwala, Shrikant I; Gizlice, Ziya; Garcia, Beverly A; Johnston, Larry F; Gustafson, Alison; Petrovic, Lindsay; Glasgow, Russell E; Samuel-Hodge, Carmen D

    2010-01-01

    Studies of type 2 translation, the adaption of evidence-based interventions to real-world settings, should include representative study sites and staff to improve external validity. Sites for such studies are, however, often selected by convenience sampling, which limits generalizability. We used an optimized probability sampling protocol to select an unbiased, representative sample of study sites to prepare for a randomized trial of a weight loss intervention. We invited North Carolina health departments within 200 miles of the research center to participate (N = 81). Of the 43 health departments that were eligible, 30 were interested in participating. To select a representative and feasible sample of 6 health departments that met inclusion criteria, we generated all combinations of 6 from the 30 health departments that were eligible and interested. From the subset of combinations that met inclusion criteria, we selected 1 at random. Of 593,775 possible combinations of 6 counties, 15,177 (3%) met inclusion criteria. Sites in the selected subset were similar to all eligible sites in terms of health department characteristics and county demographics. Optimized probability sampling improved generalizability by ensuring an unbiased and representative sample of study sites.

  1. A new mosaic method for three-dimensional surface

    NASA Astrophysics Data System (ADS)

    Yuan, Yun; Zhu, Zhaokun; Ding, Yongjun

    2011-08-01

    Three-dimensional (3-D) data mosaic is a indispensable link in surface measurement and digital terrain map generation. With respect to the mosaic problem of the local unorganized cloud points with rude registration and mass mismatched points, a new mosaic method for 3-D surface based on RANSAC is proposed. Every circular of this method is processed sequentially by random sample with additional shape constraint, data normalization of cloud points, absolute orientation, data denormalization of cloud points, inlier number statistic, etc. After N random sample trials the largest consensus set is selected, and at last the model is re-estimated using all the points in the selected subset. The minimal subset is composed of three non-colinear points which form a triangle. The shape of triangle is considered in random sample selection in order to make the sample selection reasonable. A new coordinate system transformation algorithm presented in this paper is used to avoid the singularity. The whole rotation transformation between the two coordinate systems can be solved by twice rotations expressed by Euler angle vector, each rotation has explicit physical means. Both simulation and real data are used to prove the correctness and validity of this mosaic method. This method has better noise immunity due to its robust estimation property, and has high accuracy as the shape constraint is added to random sample and the data normalization added to the absolute orientation. This method is applicable for high precision measurement of three-dimensional surface and also for the 3-D terrain mosaic.

  2. An overview of the Columbia Habitat Monitoring Program's (CHaMP) spatial-temporal design framework

    EPA Science Inventory

    We briefly review the concept of a master sample applied to stream networks in which a randomized set of stream sites is selected across a broad region to serve as a list of sites from which a subset of sites is selected to achieve multiple objectives of specific designs. The Col...

  3. Decoys Selection in Benchmarking Datasets: Overview and Perspectives

    PubMed Central

    Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu

    2018-01-01

    Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509

  4. A small number of candidate gene SNPs reveal continental ancestry in African Americans

    PubMed Central

    KODAMAN, NURI; ALDRICH, MELINDA C.; SMITH, JEFFREY R.; SIGNORELLO, LISA B.; BRADLEY, KEVIN; BREYER, JOAN; COHEN, SARAH S.; LONG, JIRONG; CAI, QIUYIN; GILES, JUSTIN; BUSH, WILLIAM S.; BLOT, WILLIAM J.; MATTHEWS, CHARLES E.; WILLIAMS, SCOTT M.

    2013-01-01

    SUMMARY Using genetic data from an obesity candidate gene study of self-reported African Americans and European Americans, we investigated the number of Ancestry Informative Markers (AIMs) and candidate gene SNPs necessary to infer continental ancestry. Proportions of African and European ancestry were assessed with STRUCTURE (K=2), using 276 AIMs. These reference values were compared to estimates derived using 120, 60, 30, and 15 SNP subsets randomly chosen from the 276 AIMs and from 1144 SNPs in 44 candidate genes. All subsets generated estimates of ancestry consistent with the reference estimates, with mean correlations greater than 0.99 for all subsets of AIMs, and mean correlations of 0.99±0.003; 0.98± 0.01; 0.93±0.03; and 0.81± 0.11 for subsets of 120, 60, 30, and 15 candidate gene SNPs, respectively. Among African Americans, the median absolute difference from reference African ancestry values ranged from 0.01 to 0.03 for the four AIMs subsets and from 0.03 to 0.09 for the four candidate gene SNP subsets. Furthermore, YRI/CEU Fst values provided a metric to predict the performance of candidate gene SNPs. Our results demonstrate that a small number of SNPs randomly selected from candidate genes can be used to estimate admixture proportions in African Americans reliably. PMID:23278390

  5. Feature selection for the classification of traced neurons.

    PubMed

    López-Cabrera, José D; Lorenzo-Ginori, Juan V

    2018-06-01

    The great availability of computational tools to calculate the properties of traced neurons leads to the existence of many descriptors which allow the automated classification of neurons from these reconstructions. This situation determines the necessity to eliminate irrelevant features as well as making a selection of the most appropriate among them, in order to improve the quality of the classification obtained. The dataset used contains a total of 318 traced neurons, classified by human experts in 192 GABAergic interneurons and 126 pyramidal cells. The features were extracted by means of the L-measure software, which is one of the most used computational tools in neuroinformatics to quantify traced neurons. We review some current feature selection techniques as filter, wrapper, embedded and ensemble methods. The stability of the feature selection methods was measured. For the ensemble methods, several aggregation methods based on different metrics were applied to combine the subsets obtained during the feature selection process. The subsets obtained applying feature selection methods were evaluated using supervised classifiers, among which Random Forest, C4.5, SVM, Naïve Bayes, Knn, Decision Table and the Logistic classifier were used as classification algorithms. Feature selection methods of types filter, embedded, wrappers and ensembles were compared and the subsets returned were tested in classification tasks for different classification algorithms. L-measure features EucDistanceSD, PathDistanceSD, Branch_pathlengthAve, Branch_pathlengthSD and EucDistanceAve were present in more than 60% of the selected subsets which provides evidence about their importance in the classification of this neurons. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Identifying Depressed Older Adults in Primary Care: A Secondary Analysis of a Multisite Randomized Controlled Trial

    PubMed Central

    Voils, Corrine I.; Olsen, Maren K.; Williams, John W.; for the IMPACT Study Investigators

    2008-01-01

    Objective: To determine whether a subset of depressive symptoms could be identified to facilitate diagnosis of depression in older adults in primary care. Method: Secondary analysis was conducted on 898 participants aged 60 years or older with major depressive disorder and/or dysthymic disorder (according to DSM-IV criteria) who participated in the Improving Mood–Promoting Access to Collaborative Treatment (IMPACT) study, a multisite, randomized trial of collaborative care for depression (recruitment from July 1999 to August 2001). Linear regression was used to identify a core subset of depressive symptoms associated with decreased social, physical, and mental functioning. The sensitivity and specificity, adjusting for selection bias, were evaluated for these symptoms. The sensitivity and specificity of a second subset of 4 depressive symptoms previously validated in a midlife sample was also evaluated. Results: Psychomotor changes, fatigue, and suicidal ideation were associated with decreased functioning and served as the core set of symptoms. Adjusting for selection bias, the sensitivity of these 3 symptoms was 0.012 and specificity 0.994. The sensitivity of the 4 symptoms previously validated in a midlife sample was 0.019 and specificity was 0.997. Conclusion: We identified 3 depression symptoms that were highly specific for major depressive disorder in older adults. However, these symptoms and a previously identified subset were too insensitive for accurate diagnosis. Therefore, we recommend a full assessment of DSM-IV depression criteria for accurate diagnosis. PMID:18311416

  7. PyCCF: Python Cross Correlation Function for reverberation mapping studies

    NASA Astrophysics Data System (ADS)

    Sun, Mouyuan; Grier, C. J.; Peterson, B. M.

    2018-05-01

    PyCCF emulates a Fortran program written by B. Peterson for use with reverberation mapping. The code cross correlates two light curves that are unevenly sampled using linear interpolation and measures the peak and centroid of the cross-correlation function. In addition, it is possible to run Monto Carlo iterations using flux randomization and random subset selection (RSS) to produce cross-correlation centroid distributions to estimate the uncertainties in the cross correlation results.

  8. Efficient one-cycle affinity selection of binding proteins or peptides specific for a small-molecule using a T7 phage display pool.

    PubMed

    Takakusagi, Yoichi; Kuramochi, Kouji; Takagi, Manami; Kusayanagi, Tomoe; Manita, Daisuke; Ozawa, Hiroko; Iwakiri, Kanako; Takakusagi, Kaori; Miyano, Yuka; Nakazaki, Atsuo; Kobayashi, Susumu; Sugawara, Fumio; Sakaguchi, Kengo

    2008-11-15

    Here, we report an efficient one-cycle affinity selection using a natural-protein or random-peptide T7 phage pool for identification of binding proteins or peptides specific for small-molecules. The screening procedure involved a cuvette type 27-MHz quartz-crystal microbalance (QCM) apparatus with introduction of self-assembled monolayer (SAM) for a specific small-molecule immobilization on the gold electrode surface of a sensor chip. Using this apparatus, we attempted an affinity selection of proteins or peptides against synthetic ligand for FK506-binding protein (SLF) or irinotecan (Iri, CPT-11). An affinity selection using SLF-SAM and a natural-protein T7 phage pool successfully detected FK506-binding protein 12 (FKBP12)-displaying T7 phage after an interaction time of only 10 min. Extensive exploration of time-consuming wash and/or elution conditions together with several rounds of selection was not required. Furthermore, in the selection using a 15-mer random-peptide T7 phage pool and subsequent analysis utilizing receptor ligand contact (RELIC) software, a subset of SLF-selected peptides clearly pinpointed several amino-acid residues within the binding site of FKBP12. Likewise, a subset of Iri-selected peptides pinpointed part of the positive amino-acid region of residues from the Iri-binding site of the well-known direct targets, acetylcholinesterase (AChE) and carboxylesterase (CE). Our findings demonstrate the effectiveness of this method and general applicability for a wide range of small-molecules.

  9. The Fisher-Markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data.

    PubMed

    Cheng, Qiang; Zhou, Hongbo; Cheng, Jie

    2011-06-01

    Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selector--which we call the Fisher-Markov selector--to identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid--dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in fact can be obtained with an explicit expression.

  10. Different hunting strategies select for different weights in red deer.

    PubMed

    Martínez, María; Rodríguez-Vigal, Carlos; Jones, Owen R; Coulson, Tim; San Miguel, Alfonso

    2005-09-22

    Much insight can be derived from records of shot animals. Most researchers using such data assume that their data represents a random sample of a particular demographic class. However, hunters typically select a non-random subset of the population and hunting is, therefore, not a random process. Here, with red deer (Cervus elaphus) hunting data from a ranch in Toledo, Spain, we demonstrate that data collection methods have a significant influence upon the apparent relationship between age and weight. We argue that a failure to correct for such methodological bias may have significant consequences for the interpretation of analyses involving weight or correlated traits such as breeding success, and urge researchers to explore methods to identify and correct for such bias in their data.

  11. Classification of urine sediment based on convolution neural network

    NASA Astrophysics Data System (ADS)

    Pan, Jingjing; Jiang, Cunbo; Zhu, Tiantian

    2018-04-01

    By designing a new convolution neural network framework, this paper breaks the constraints of the original convolution neural network framework requiring large training samples and samples of the same size. Move and cropping the input images, generate the same size of the sub-graph. And then, the generated sub-graph uses the method of dropout, increasing the diversity of samples and preventing the fitting generation. Randomly select some proper subset in the sub-graphic set and ensure that the number of elements in the proper subset is same and the proper subset is not the same. The proper subsets are used as input layers for the convolution neural network. Through the convolution layer, the pooling, the full connection layer and output layer, we can obtained the classification loss rate of test set and training set. In the red blood cells, white blood cells, calcium oxalate crystallization classification experiment, the classification accuracy rate of 97% or more.

  12. Eradication of melanomas by targeted elimination of a minor subset of tumor cells

    PubMed Central

    Schmidt, Patrick; Kopecky, Caroline; Hombach, Andreas; Zigrino, Paola; Mauch, Cornelia; Abken, Hinrich

    2011-01-01

    Proceeding on the assumption that all cancer cells have equal malignant capacities, current regimens in cancer therapy attempt to eradicate all malignant cells of a tumor lesion. Using in vivo targeting of tumor cell subsets, we demonstrate that selective elimination of a definite, minor tumor cell subpopulation is particularly effective in eradicating established melanoma lesions irrespective of the bulk of cancer cells. Tumor cell subsets were specifically eliminated in a tumor lesion by adoptive transfer of engineered cytotoxic T cells redirected in an antigen-restricted manner via a chimeric antigen receptor. Targeted elimination of less than 2% of the tumor cells that coexpress high molecular weight melanoma-associated antigen (HMW-MAA) (melanoma-associated chondroitin sulfate proteoglycan, MCSP) and CD20 lastingly eradicated melanoma lesions, whereas targeting of any random 10% tumor cell subset was not effective. Our data challenge the biological therapy and current drug development paradigms in the treatment of cancer. PMID:21282657

  13. An evaluation of the genetic-matched pair study design using genome-wide SNP data from the European population.

    PubMed

    Lu, Timothy Tehua; Lao, Oscar; Nothnagel, Michael; Junge, Olaf; Freitag-Wolf, Sandra; Caliebe, Amke; Balascakova, Miroslava; Bertranpetit, Jaume; Bindoff, Laurence Albert; Comas, David; Holmlund, Gunilla; Kouvatsi, Anastasia; Macek, Milan; Mollet, Isabelle; Nielsen, Finn; Parson, Walther; Palo, Jukka; Ploski, Rafal; Sajantila, Antti; Tagliabracci, Adriano; Gether, Ulrik; Werge, Thomas; Rivadeneira, Fernando; Hofman, Albert; Uitterlinden, André Gerardus; Gieger, Christian; Wichmann, Heinz-Erich; Ruether, Andreas; Schreiber, Stefan; Becker, Christian; Nürnberg, Peter; Nelson, Matthew Roberts; Kayser, Manfred; Krawczak, Michael

    2009-07-01

    Genetic matching potentially provides a means to alleviate the effects of incomplete Mendelian randomization in population-based gene-disease association studies. We therefore evaluated the genetic-matched pair study design on the basis of genome-wide SNP data (309,790 markers; Affymetrix GeneChip Human Mapping 500K Array) from 2457 individuals, sampled at 23 different recruitment sites across Europe. Using pair-wise identity-by-state (IBS) as a matching criterion, we tried to derive a subset of markers that would allow identification of the best overall matching (BOM) partner for a given individual, based on the IBS status for the subset alone. However, our results suggest that, by following this approach, the prediction accuracy is only notably improved by the first 20 markers selected, and increases proportionally to the marker number thereafter. Furthermore, in a considerable proportion of cases (76.0%), the BOM of a given individual, based on the complete marker set, came from a different recruitment site than the individual itself. A second marker set, specifically selected for ancestry sensitivity using singular value decomposition, performed even more poorly and was no more capable of predicting the BOM than randomly chosen subsets. This leads us to conclude that, at least in Europe, the utility of the genetic-matched pair study design depends critically on the availability of comprehensive genotype information for both cases and controls.

  14. Efficacy and Safety of Atomoxetine in Childhood Attention-Deficit/Hyperactivity Disorder with Comorbid Oppositional Defiant Disorder

    ERIC Educational Resources Information Center

    Kaplan, S.; Heiligenstein, J.; West, S.; Busner, J.; Harder, D.; Dittmann, R.; Casat, C.; Wernicke, J. F.

    2004-01-01

    Objective: To compare the safety and efficacy of atomoxetine, a selective inhibitor of the norepinephrine transporter, versus placebo in Attention-Deficit/Hyperactivity Disorder (ADHD) patients with comorbid Oppositional Defiant Disorder (ODD). Methods: A subset analysis of 98 children from two identical, multi-site, double-blind, randomized,…

  15. Different hunting strategies select for different weights in red deer

    PubMed Central

    Martínez, María; Rodríguez-Vigal, Carlos; Jones, Owen R; Coulson, Tim; Miguel, Alfonso San

    2005-01-01

    Much insight can be derived from records of shot animals. Most researchers using such data assume that their data represents a random sample of a particular demographic class. However, hunters typically select a non-random subset of the population and hunting is, therefore, not a random process. Here, with red deer (Cervus elaphus) hunting data from a ranch in Toledo, Spain, we demonstrate that data collection methods have a significant influence upon the apparent relationship between age and weight. We argue that a failure to correct for such methodological bias may have significant consequences for the interpretation of analyses involving weight or correlated traits such as breeding success, and urge researchers to explore methods to identify and correct for such bias in their data. PMID:17148205

  16. Model selection for logistic regression models

    NASA Astrophysics Data System (ADS)

    Duller, Christine

    2012-09-01

    Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.

  17. Minimizing the average distance to a closest leaf in a phylogenetic tree.

    PubMed

    Matsen, Frederick A; Gallagher, Aaron; McCoy, Connor O

    2013-11-01

    When performing an analysis on a collection of molecular sequences, it can be convenient to reduce the number of sequences under consideration while maintaining some characteristic of a larger collection of sequences. For example, one may wish to select a subset of high-quality sequences that represent the diversity of a larger collection of sequences. One may also wish to specialize a large database of characterized "reference sequences" to a smaller subset that is as close as possible on average to a collection of "query sequences" of interest. Such a representative subset can be useful whenever one wishes to find a set of reference sequences that is appropriate to use for comparative analysis of environmentally derived sequences, such as for selecting "reference tree" sequences for phylogenetic placement of metagenomic reads. In this article, we formalize these problems in terms of the minimization of the Average Distance to the Closest Leaf (ADCL) and investigate algorithms to perform the relevant minimization. We show that the greedy algorithm is not effective, show that a variant of the Partitioning Around Medoids (PAM) heuristic gets stuck in local minima, and develop an exact dynamic programming approach. Using this exact program we note that the performance of PAM appears to be good for simulated trees, and is faster than the exact algorithm for small trees. On the other hand, the exact program gives solutions for all numbers of leaves less than or equal to the given desired number of leaves, whereas PAM only gives a solution for the prespecified number of leaves. Via application to real data, we show that the ADCL criterion chooses chimeric sequences less often than random subsets, whereas the maximization of phylogenetic diversity chooses them more often than random. These algorithms have been implemented in publicly available software.

  18. Overlapping meta-analyses on the same topic: survey of published studies.

    PubMed

    Siontis, Konstantinos C; Hernandez-Boussard, Tina; Ioannidis, John P A

    2013-07-19

    To assess how common it is to have multiple overlapping meta-analyses of randomized trials published on the same topic. Survey of published meta-analyses. PubMed. Meta-analyses published in 2010 were identified, and 5% of them were randomly selected. We further selected those that included randomized trials and examined effectiveness of any medical intervention. For eligible meta-analyses, we searched for other meta-analyses on the same topic (covering the same comparisons, indications/settings, and outcomes or overlapping subsets of them) published until February 2013. Of 73 eligible meta-analyses published in 2010, 49 (67%) had at least one other overlapping meta-analysis (median two meta-analyses per topic, interquartile range 1-4, maximum 13). In 17 topics at least one author was involved in at least two of the overlapping meta-analyses. No characteristics of the index meta-analyses were associated with the potential for overlapping meta-analyses. Among pairs of overlapping meta-analyses in 20 randomly selected topics, 13 of the more recent meta-analyses did not include any additional outcomes. In three of the four topics with eight or more published meta-analyses, many meta-analyses examined only a subset of the eligible interventions or indications/settings covered by the index meta-analysis. Conversely, for statins in the prevention of atrial fibrillation after cardiac surgery, 11 meta-analyses were published with similar eligibility criteria for interventions and setting: there was still variability on which studies were included, but the results were always similar or even identical across meta-analyses. While some independent replication of meta-analyses by different teams is possibly useful, the overall picture suggests that there is a waste of efforts with many topics covered by multiple overlapping meta-analyses.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bromberger, Seth A.; Klymko, Christine F.; Henderson, Keith A.

    Betweenness centrality is a graph statistic used to nd vertices that are participants in a large number of shortest paths in a graph. This centrality measure is commonly used in path and network interdiction problems and its complete form requires the calculation of all-pairs shortest paths for each vertex. This leads to a time complexity of O(jV jjEj), which is impractical for large graphs. Estimation of betweenness centrality has focused on performing shortest-path calculations on a subset of randomly- selected vertices. This reduces the complexity of the centrality estimation to O(jSjjEj); jSj < jV j, which can be scaled appropriatelymore » based on the computing resources available. An estimation strategy that uses random selection of vertices for seed selection is fast and simple to implement, but may not provide optimal estimation of betweenness centrality when the number of samples is constrained. Our experimentation has identi ed a number of alternate seed-selection strategies that provide lower error than random selection in common scale-free graphs. These strategies are discussed and experimental results are presented.« less

  20. Longitudinal analyses of correlated response efficiencies of fillet traits in Nile tilapia.

    PubMed

    Turra, E M; Fernandes, A F A; de Alvarenga, E R; Teixeira, E A; Alves, G F O; Manduca, L G; Murphy, T W; Silva, M A

    2018-03-01

    Recent studies with Nile tilapia have shown divergent results regarding the possibility of selecting on morphometric measurements to promote indirect genetic gains in fillet yield (FY). The use of indirect selection for fillet traits is important as these traits are only measurable after harvesting. Random regression models are a powerful tool in association studies to identify the best time point to measure and select animals. Random regression models can also be applied in a multiple trait approach to analyze indirect response to selection, which would avoid the need to sacrifice candidate fish. Therefore, the aim of this study was to investigate the genetic relationships between several body measurements, weight and fillet traits throughout the growth period and to evaluate the possibility of indirect selection for fillet traits in Nile tilapia. Data were collected from 2042 fish and was divided into two subsets. The first subset was used to estimate genetic parameters, including the permanent environmental effect for BW and body measurements (8758 records for each body measurement, as each fish was individually weighed and measured a maximum of six times). The second subset (2042 records for each trait) was used to estimate genetic correlations and heritabilities, which enabled the calculation of correlated response efficiencies between body measurements and the fillet traits. Heritability estimates across ages ranged from 0.05 to 0.5 for height, 0.02 to 0.48 for corrected length (CL), 0.05 to 0.68 for width, 0.08 to 0.57 for fillet weight (FW) and 0.12 to 0.42 for FY. All genetic correlation estimates between body measurements and FW were positive and strong (0.64 to 0.98). The estimates of genetic correlation between body measurements and FY were positive (except for CL at some ages), but weak to moderate (-0.08 to 0.68). These estimates resulted in strong and favorable correlated response efficiencies for FW and positive, but moderate for FY. These results indicate the possibility of achieving indirect genetic gains for FW and by selecting for morphometric traits, but low efficiency for FY when compared with direct selection.

  1. Identification of selection signatures in cattle breeds selected for dairy production.

    PubMed

    Stella, Alessandra; Ajmone-Marsan, Paolo; Lazzari, Barbara; Boettcher, Paul

    2010-08-01

    The genomics revolution has spurred the undertaking of HapMap studies of numerous species, allowing for population genomics to increase the understanding of how selection has created genetic differences between subspecies populations. The objectives of this study were to (1) develop an approach to detect signatures of selection in subsets of phenotypically similar breeds of livestock by comparing single nucleotide polymorphism (SNP) diversity between the subset and a larger population, (2) verify this method in breeds selected for simply inherited traits, and (3) apply this method to the dairy breeds in the International Bovine HapMap (IBHM) study. The data consisted of genotypes for 32,689 SNPs of 497 animals from 19 breeds. For a given subset of breeds, the test statistic was the parametric composite log likelihood (CLL) of the differences in allelic frequencies between the subset and the IBHM for a sliding window of SNPs. The null distribution was obtained by calculating CLL for 50,000 random subsets (per chromosome) of individuals. The validity of this approach was confirmed by obtaining extremely large CLLs at the sites of causative variation for polled (BTA1) and black-coat-color (BTA18) phenotypes. Across the 30 bovine chromosomes, 699 putative selection signatures were detected. The largest CLL was on BTA6 and corresponded to KIT, which is responsible for the piebald phenotype present in four of the five dairy breeds. Potassium channel-related genes were at the site of the largest CLL on three chromosomes (BTA14, -16, and -25) whereas integrins (BTA18 and -19) and serine/arginine rich splicing factors (BTA20 and -23) each had the largest CLL on two chromosomes. On the basis of the results of this study, the application of population genomics to farm animals seems quite promising. Comparisons between breed groups have the potential to identify genomic regions influencing complex traits with no need for complex equipment and the collection of extensive phenotypic records and can contribute to the identification of candidate genes and to the understanding of the biological mechanisms controlling complex traits.

  2. Do bioclimate variables improve performance of climate envelope models?

    USGS Publications Warehouse

    Watling, James I.; Romañach, Stephanie S.; Bucklin, David N.; Speroterra, Carolina; Brandt, Laura A.; Pearlstine, Leonard G.; Mazzotti, Frank J.

    2012-01-01

    Climate envelope models are widely used to forecast potential effects of climate change on species distributions. A key issue in climate envelope modeling is the selection of predictor variables that most directly influence species. To determine whether model performance and spatial predictions were related to the selection of predictor variables, we compared models using bioclimate variables with models constructed from monthly climate data for twelve terrestrial vertebrate species in the southeastern USA using two different algorithms (random forests or generalized linear models), and two model selection techniques (using uncorrelated predictors or a subset of user-defined biologically relevant predictor variables). There were no differences in performance between models created with bioclimate or monthly variables, but one metric of model performance was significantly greater using the random forest algorithm compared with generalized linear models. Spatial predictions between maps using bioclimate and monthly variables were very consistent using the random forest algorithm with uncorrelated predictors, whereas we observed greater variability in predictions using generalized linear models.

  3. Input Decimated Ensembles

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Oza, Nikunj C.; Clancy, Daniel (Technical Monitor)

    2001-01-01

    Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many pattern recognition problems. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers. Therefore, reducing those correlations while keeping the classifiers' performance levels high is an important area of research. In this article, we explore input decimation (ID), a method which selects feature subsets for their ability to discriminate among the classes and uses them to decouple the base classifiers. We provide a summary of the theoretical benefits of correlation reduction, along with results of our method on two underwater sonar data sets, three benchmarks from the Probenl/UCI repositories, and two synthetic data sets. The results indicate that input decimated ensembles (IDEs) outperform ensembles whose base classifiers use all the input features; randomly selected subsets of features; and features created using principal components analysis, on a wide range of domains.

  4. Chemical library subset selection algorithms: a unified derivation using spatial statistics.

    PubMed

    Hamprecht, Fred A; Thiel, Walter; van Gunsteren, Wilfred F

    2002-01-01

    If similar compounds have similar activity, rational subset selection becomes superior to random selection in screening for pharmacological lead discovery programs. Traditional approaches to this experimental design problem fall into two classes: (i) a linear or quadratic response function is assumed (ii) some space filling criterion is optimized. The assumptions underlying the first approach are clear but not always defendable; the second approach yields more intuitive designs but lacks a clear theoretical foundation. We model activity in a bioassay as realization of a stochastic process and use the best linear unbiased estimator to construct spatial sampling designs that optimize the integrated mean square prediction error, the maximum mean square prediction error, or the entropy. We argue that our approach constitutes a unifying framework encompassing most proposed techniques as limiting cases and sheds light on their underlying assumptions. In particular, vector quantization is obtained, in dimensions up to eight, in the limiting case of very smooth response surfaces for the integrated mean square error criterion. Closest packing is obtained for very rough surfaces under the integrated mean square error and entropy criteria. We suggest to use either the integrated mean square prediction error or the entropy as optimization criteria rather than approximations thereof and propose a scheme for direct iterative minimization of the integrated mean square prediction error. Finally, we discuss how the quality of chemical descriptors manifests itself and clarify the assumptions underlying the selection of diverse or representative subsets.

  5. Biochemical Sensors Using Carbon Nanotube Arrays

    NASA Technical Reports Server (NTRS)

    Meyyappan, Meyya (Inventor); Cassell, Alan M. (Inventor); Li, Jun (Inventor)

    2011-01-01

    Method and system for detecting presence of biomolecules in a selected subset, or in each of several selected subsets, in a fluid. Each of an array of two or more carbon nanotubes ("CNTs") is connected at a first CNT end to one or more electronics devices, each of which senses a selected electrochemical signal that is generated when a target biomolecule in the selected subset becomes attached to a functionalized second end of the CNT, which is covalently bonded with a probe molecule. This approach indicates when target biomolecules in the selected subset are present and indicates presence or absence of target biomolecules in two or more selected subsets. Alternatively, presence of absence of an analyte can be detected.

  6. Changes in DNA Methylation from Age 18 to Pregnancy in Type 1, 2, and 17 T Helper and Regulatory T-Cells Pathway Genes

    PubMed Central

    Iqbal, Sabrina; Lockett, Gabrielle A.; Arshad, S. Hasan; Zhang, Hongmei; Kaushal, Akhilesh; Tetali, Sabarinath R.; Mukherjee, Nandini

    2018-01-01

    To succeed, pregnancies need to initiate immune biases towards T helper 2 (Th2) responses, yet little is known about what establishes this bias. Using the Illumina 450 K platform, we explored changes in DNA methylation (DNAm) of Th1, Th2, Th17, and regulatory T cell pathway genes before and during pregnancy. Female participants were recruited at birth (1989), and followed through age 18 years and their pregnancy (2011–2015). Peripheral blood DNAm was measured in 245 girls at 18 years; from among these girls, the DNAm of 54 women was repeatedly measured in the first (weeks 8–21, n = 39) and second (weeks 22–38, n = 35) halves of pregnancy, respectively. M-values (logit-transformed β-values of DNAm) were analyzed: First, with repeated measurement models, cytosine–phosphate–guanine sites (CpGs) of pathway genes in pregnancy and at age 18 (nonpregnant) were compared for changes (p ≤ 0.05). Second, we tested how many of the 348 pathway-related CpGs changed compared to 10 randomly selected subsets of all other CpGs and compared to 10 randomly selected subsets of other CD4+-related CpGs (348 in each subset). Contrasted to the nonpregnant state, 27.7% of Th1-related CpGs changed in the first and 36.1% in the second half of pregnancy. Among the Th2 pathway CpGs, proportions of changes were 35.1% (first) and 33.8% (second half). The methylation changes suggest involvement of both Th1 and Th2 pathway CpGs in the immune bias during pregnancy. Changes in regulatory T cell and Th17 pathways need further exploration. PMID:29415463

  7. Ensemble Feature Learning of Genomic Data Using Support Vector Machine

    PubMed Central

    Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.

    2016-01-01

    The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923

  8. Writing on wet paper

    NASA Astrophysics Data System (ADS)

    Fridrich, Jessica; Goljan, Miroslav; Lisonek, Petr; Soukal, David

    2005-03-01

    In this paper, we show that the communication channel known as writing in memory with defective cells is a relevant information-theoretical model for a specific case of passive warden steganography when the sender embeds a secret message into a subset C of the cover object X without sharing the selection channel C with the recipient. The set C could be arbitrary, determined by the sender from the cover object using a deterministic, pseudo-random, or a truly random process. We call this steganography "writing on wet paper" and realize it using low-density random linear codes with the encoding step based on the LT process. The importance of writing on wet paper for covert communication is discussed within the context of adaptive steganography and perturbed quantization steganography. Heuristic arguments supported by tests using blind steganalysis indicate that the wet paper steganography provides improved steganographic security for embedding in JPEG images and is less vulnerable to attacks when compared to existing methods with shared selection channels.

  9. Rethinking the assessment of risk of bias due to selective reporting: a cross-sectional study.

    PubMed

    Page, Matthew J; Higgins, Julian P T

    2016-07-08

    Selective reporting is included as a core domain of Cochrane's tool for assessing risk of bias in randomised trials. There has been no evaluation of review authors' use of this domain. We aimed to evaluate assessments of selective reporting in a cross-section of Cochrane reviews and to outline areas for improvement. We obtained data on selective reporting judgements for 8434 studies included in 586 Cochrane reviews published from issue 1-8, 2015. One author classified the reasons for judgements of high risk of selective reporting bias. We randomly selected 100 reviews with at least one trial rated at high risk of outcome non-reporting bias (non-/partial reporting of an outcome on the basis of its results). One author recorded whether the authors of these reviews incorporated the selective reporting assessment when interpreting results. Of the 8434 studies, 1055 (13 %) were rated at high risk of bias on the selective reporting domain. The most common reason was concern about outcome non-reporting bias. Few studies were rated at high risk because of concerns about bias in selection of the reported result (e.g. reporting of only a subset of measurements, analysis methods or subsets of the data that were pre-specified). Review authors often specified in the risk of bias tables the study outcomes that were not reported (84 % of studies) but less frequently specified the outcomes that were partially reported (61 % of studies). At least one study was rated at high risk of outcome non-reporting bias in 31 % of reviews. In the random sample of these reviews, only 30 % incorporated this information when interpreting results, by acknowledging that the synthesis of an outcome was missing data that were not/partially reported. Our audit of user practice in Cochrane reviews suggests that the assessment of selective reporting in the current risk of bias tool does not work well. It is not always clear which outcomes were selectively reported or what the corresponding risk of bias is in the synthesis with missing outcome data. New tools that will make it easier for reviewers to convey this information are being developed.

  10. Computer access security code system

    NASA Technical Reports Server (NTRS)

    Collins, Earl R., Jr. (Inventor)

    1990-01-01

    A security code system for controlling access to computer and computer-controlled entry situations comprises a plurality of subsets of alpha-numeric characters disposed in random order in matrices of at least two dimensions forming theoretical rectangles, cubes, etc., such that when access is desired, at least one pair of previously unused character subsets not found in the same row or column of the matrix is chosen at random and transmitted by the computer. The proper response to gain access is transmittal of subsets which complete the rectangle, and/or a parallelepiped whose opposite corners were defined by first groups of code. Once used, subsets are not used again to absolutely defeat unauthorized access by eavesdropping, and the like.

  11. Effects of prey abundance, distribution, visual contrast and morphology on selection by a pelagic piscivore

    USGS Publications Warehouse

    Hansen, Adam G.; Beauchamp, David A.

    2014-01-01

    Most predators eat only a subset of possible prey. However, studies evaluating diet selection rarely measure prey availability in a manner that accounts for temporal–spatial overlap with predators, the sensory mechanisms employed to detect prey, and constraints on prey capture.We evaluated the diet selection of cutthroat trout (Oncorhynchus clarkii) feeding on a diverse planktivore assemblage in Lake Washington to test the hypothesis that the diet selection of piscivores would reflect random (opportunistic) as opposed to non-random (targeted) feeding, after accounting for predator–prey overlap, visual detection and capture constraints.Diets of cutthroat trout were sampled in autumn 2005, when the abundance of transparent, age-0 longfin smelt (Spirinchus thaleichthys) was low, and 2006, when the abundance of smelt was nearly seven times higher. Diet selection was evaluated separately using depth-integrated and depth-specific (accounted for predator–prey overlap) prey abundance. The abundance of different prey was then adjusted for differences in detectability and vulnerability to predation to see whether these factors could explain diet selection.In 2005, cutthroat trout fed non-randomly by selecting against the smaller, transparent age-0 longfin smelt, but for the larger age-1 longfin smelt. After adjusting prey abundance for visual detection and capture, cutthroat trout fed randomly. In 2006, depth-integrated and depth-specific abundance explained the diets of cutthroat trout well, indicating random feeding. Feeding became non-random after adjusting for visual detection and capture. Cutthroat trout selected strongly for age-0 longfin smelt, but against similar sized threespine stickleback (Gasterosteus aculeatus) and larger age-1 longfin smelt in 2006. Overlap with juvenile sockeye salmon (O. nerka) was minimal in both years, and sockeye salmon were rare in the diets of cutthroat trout.The direction of the shift between random and non-random selection depended on the presence of a weak versus a strong year class of age-0 longfin smelt. These fish were easy to catch, but hard to see. When their density was low, poor detection could explain their rarity in the diet. When their density was high, poor detection was compensated by higher encounter rates with cutthroat trout, sufficient to elicit a targeted feeding response. The nature of the feeding selectivity of a predator can be highly dependent on fluctuations in the abundance and suitability of key prey.

  12. Randomized interpolative decomposition of separated representations

    NASA Astrophysics Data System (ADS)

    Biagioni, David J.; Beylkin, Daniel; Beylkin, Gregory

    2015-01-01

    We introduce an algorithm to compute tensor interpolative decomposition (dubbed CTD-ID) for the reduction of the separation rank of Canonical Tensor Decompositions (CTDs). Tensor ID selects, for a user-defined accuracy ɛ, a near optimal subset of terms of a CTD to represent the remaining terms via a linear combination of the selected terms. CTD-ID can be used as an alternative to or in combination with the Alternating Least Squares (ALS) algorithm. We present examples of its use within a convergent iteration to compute inverse operators in high dimensions. We also briefly discuss the spectral norm as a computational alternative to the Frobenius norm in estimating approximation errors of tensor ID. We reduce the problem of finding tensor IDs to that of constructing interpolative decompositions of certain matrices. These matrices are generated via randomized projection of the terms of the given tensor. We provide cost estimates and several examples of the new approach to the reduction of separation rank.

  13. How to Select a Good Training-data Subset for Transcription: Submodular Active Selection for Sequences

    DTIC Science & Technology

    2009-01-01

    selection and uncertainty sampling signif- icantly. Index Terms: Transcription, labeling, submodularity, submod- ular selection, active learning , sequence...name of batch active learning , where a subset of data that is most informative and represen- tative of the whole is selected for labeling. Often...representative subset. Note that our Fisher ker- nel is over an unsupervised generative model, which enables us to bootstrap our active learning approach

  14. Adenovirus-specific T-cell Subsets in Human Peripheral Blood and After IFN-γ Immunomagnetic Selection.

    PubMed

    Qian, Chongsheng; Wang, Yingying; Cai, Huili; Laroye, Caroline; De Carvalho Bittencourt, Marcelo; Clement, Laurence; Stoltz, Jean-François; Decot, Véronique; Reppel, Loïc; Bensoussan, Danièle

    2016-01-01

    Adoptive antiviral cellular immunotherapy by infusion of virus-specific T cells (VSTs) is becoming an alternative treatment for viral infection after hematopoietic stem cell transplantation. The T memory stem cell (TSCM) subset was recently described as exhibiting self-renewal and multipotency properties which are required for sustained efficacy in vivo. We wondered if such a crucial subset for immunotherapy was present in VSTs. We identified, by flow cytometry, TSCM in adenovirus (ADV)-specific interferon (IFN)-γ+ T cells before and after IFN-γ-based immunomagnetic selection, and analyzed the distribution of the main T-cell subsets in VSTs: naive T cells (TN), TSCM, T central memory cells (TCM), T effector memory cell (TEM), and effector T cells (TEFF). In this study all of the different T-cell subsets were observed in the blood sample from healthy donor ADV-VSTs, both before and after IFN-γ-based immunomagnetic selection. As the IFN-γ-based immunomagnetic selection system sorts mainly the most differentiated T-cell subsets, we observed that TEM was always the major T-cell subset of ADV-specific T cells after immunomagnetic isolation and especially after expansion in vitro. Comparing T-cell subpopulation profiles before and after in vitro expansion, we observed that in vitro cell culture with interleukin-2 resulted in a significant expansion of TN-like, TCM, TEM, and TEFF subsets in CD4IFN-γ T cells and of TCM and TEM subsets only in CD8IFN-γ T cells. We demonstrated the presence of all T-cell subsets in IFN-γ VSTs including the TSCM subpopulation, although this was weakly selected by the IFN-γ-based immunomagnetic selection system.

  15. Atlas ranking and selection for automatic segmentation of the esophagus from CT scans

    NASA Astrophysics Data System (ADS)

    Yang, Jinzhong; Haas, Benjamin; Fang, Raymond; Beadle, Beth M.; Garden, Adam S.; Liao, Zhongxing; Zhang, Lifei; Balter, Peter; Court, Laurence

    2017-12-01

    In radiation treatment planning, the esophagus is an important organ-at-risk that should be spared in patients with head and neck cancer or thoracic cancer who undergo intensity-modulated radiation therapy. However, automatic segmentation of the esophagus from CT scans is extremely challenging because of the structure’s inconsistent intensity, low contrast against the surrounding tissues, complex and variable shape and location, and random air bubbles. The goal of this study is to develop an online atlas selection approach to choose a subset of optimal atlases for multi-atlas segmentation to the delineate esophagus automatically. We performed atlas selection in two phases. In the first phase, we used the correlation coefficient of the image content in a cubic region between each atlas and the new image to evaluate their similarity and to rank the atlases in an atlas pool. A subset of atlases based on this ranking was selected, and deformable image registration was performed to generate deformed contours and deformed images in the new image space. In the second phase of atlas selection, we used Kullback-Leibler divergence to measure the similarity of local-intensity histograms between the new image and each of the deformed images, and the measurements were used to rank the previously selected atlases. Deformed contours were overlapped sequentially, from the most to the least similar, and the overlap ratio was examined. We further identified a subset of optimal atlases by analyzing the variation of the overlap ratio versus the number of atlases. The deformed contours from these optimal atlases were fused together using a modified simultaneous truth and performance level estimation algorithm to produce the final segmentation. The approach was validated with promising results using both internal data sets (21 head and neck cancer patients and 15 thoracic cancer patients) and external data sets (30 thoracic patients).

  16. Variable Neighborhood Search Heuristics for Selecting a Subset of Variables in Principal Component Analysis

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Singh, Renu; Steinley, Douglas

    2009-01-01

    The selection of a subset of variables from a pool of candidates is an important problem in several areas of multivariate statistics. Within the context of principal component analysis (PCA), a number of authors have argued that subset selection is crucial for identifying those variables that are required for correct interpretation of the…

  17. The NSO FTS database program and archive (FTSDBM)

    NASA Technical Reports Server (NTRS)

    Lytle, D. M.

    1992-01-01

    Data from the NSO Fourier transform spectrometer is being re-archived from half inch tape onto write-once compact disk. In the process, information about each spectrum and a low resolution copy of each spectrum is being saved into an on-line database. FTSDBM is a simple database management program in the NSO external package for IRAF. A command language allows the FTSDBM user to add entries to the database, delete entries, select subsets from the database based on keyword values including ranges of values, create new database files based on these subsets, make keyword lists, examine low resolution spectra graphically, and make disk number/file number lists. Once the archive is complete, FTSDBM will allow the database to be efficiently searched for data of interest to the user and the compact disk format will allow random access to that data.

  18. Selecting climate change scenarios for regional hydrologic impact studies based on climate extremes indices

    NASA Astrophysics Data System (ADS)

    Seo, Seung Beom; Kim, Young-Oh; Kim, Youngil; Eum, Hyung-Il

    2018-04-01

    When selecting a subset of climate change scenarios (GCM models), the priority is to ensure that the subset reflects the comprehensive range of possible model results for all variables concerned. Though many studies have attempted to improve the scenario selection, there is a lack of studies that discuss methods to ensure that the results from a subset of climate models contain the same range of uncertainty in hydrologic variables as when all models are considered. We applied the Katsavounidis-Kuo-Zhang (KKZ) algorithm to select a subset of climate change scenarios and demonstrated its ability to reduce the number of GCM models in an ensemble, while the ranges of multiple climate extremes indices were preserved. First, we analyzed the role of 27 ETCCDI climate extremes indices for scenario selection and selected the representative climate extreme indices. Before the selection of a subset, we excluded a few deficient GCM models that could not represent the observed climate regime. Subsequently, we discovered that a subset of GCM models selected by the KKZ algorithm with the representative climate extreme indices could not capture the full potential range of changes in hydrologic extremes (e.g., 3-day peak flow and 7-day low flow) in some regional case studies. However, the application of the KKZ algorithm with a different set of climate indices, which are correlated to the hydrologic extremes, enabled the overcoming of this limitation. Key climate indices, dependent on the hydrologic extremes to be projected, must therefore be determined prior to the selection of a subset of GCM models.

  19. Using Parental Profiles to Predict Membership in a Subset of College Students Experiencing Excessive Alcohol Consequences: Findings From a Longitudinal Study

    PubMed Central

    Varvil-Weld, Lindsey; Mallett, Kimberly A.; Turrisi, Rob; Abar, Caitlin C.

    2012-01-01

    Objective: Previous research identified a high-risk subset of college students experiencing a disproportionate number of alcohol-related consequences at the end of their first year. With the goal of identifying pre-college predictors of membership in this high-risk subset, the present study used a prospective design to identify latent profiles of student-reported maternal and paternal parenting styles and alcohol-specific behaviors and to determine whether these profiles were associated with membership in the high-risk consequences subset. Method: A sample of randomly selected 370 incoming first-year students at a large public university reported on their mothers’ and fathers’ communication quality, monitoring, approval of alcohol use, and modeling of drinking behaviors and on consequences experienced across the first year of college. Results: Students in the high-risk subset comprised 15.5% of the sample but accounted for almost half (46.6%) of the total consequences reported by the entire sample. Latent profile analyses identified four parental profiles: positive pro-alcohol, positive anti-alcohol, negative mother, and negative father. Logistic regression analyses revealed that students in the negative-father profile were at greatest odds of being in the high-risk consequences subset at a follow-up assessment 1 year later, even after drinking at baseline was controlled for. Students in the positive pro-alcohol profile also were at increased odds of being in the high-risk subset, although this association was attenuated after baseline drinking was controlled for. Conclusions: These findings have important implications for the improvement of existing parent- and individual-based college student drinking interventions designed to reduce alcohol-related consequences. PMID:22456248

  20. Using parental profiles to predict membership in a subset of college students experiencing excessive alcohol consequences: findings from a longitudinal study.

    PubMed

    Varvil-Weld, Lindsey; Mallett, Kimberly A; Turrisi, Rob; Abar, Caitlin C

    2012-05-01

    Previous research identified a high-risk subset of college students experiencing a disproportionate number of alcohol-related consequences at the end of their first year. With the goal of identifying pre-college predictors of membership in this high-risk subset, the present study used a prospective design to identify latent profiles of student-reported maternal and paternal parenting styles and alcohol-specific behaviors and to determine whether these profiles were associated with membership in the high-risk consequences subset. A sample of randomly selected 370 incoming first-year students at a large public university reported on their mothers' and fathers' communication quality, monitoring, approval of alcohol use, and modeling of drinking behaviors and on consequences experienced across the first year of college. Students in the high-risk subset comprised 15.5% of the sample but accounted for almost half (46.6%) of the total consequences reported by the entire sample. Latent profile analyses identified four parental profiles: positive pro-alcohol, positive anti-alcohol, negative mother, and negative father. Logistic regression analyses revealed that students in the negative-father profile were at greatest odds of being in the high-risk consequences subset at a follow-up assessment 1 year later, even after drinking at baseline was controlled for. Students in the positive pro-alcohol profile also were at increased odds of being in the high-risk subset, although this association was attenuated after baseline drinking was controlled for. These findings have important implications for the improvement of existing parent- and individual-based college student drinking interventions designed to reduce alcohol-related consequences.

  1. Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling

    PubMed Central

    Zhou, Fuqun; Zhang, Aining

    2016-01-01

    Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2–3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests’ features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data. PMID:27792152

  2. Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling.

    PubMed

    Zhou, Fuqun; Zhang, Aining

    2016-10-25

    Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2-3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests' features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data.

  3. Systematic wavelength selection for improved multivariate spectral analysis

    DOEpatents

    Thomas, Edward V.; Robinson, Mark R.; Haaland, David M.

    1995-01-01

    Methods and apparatus for determining in a biological material one or more unknown values of at least one known characteristic (e.g. the concentration of an analyte such as glucose in blood or the concentration of one or more blood gas parameters) with a model based on a set of samples with known values of the known characteristics and a multivariate algorithm using several wavelength subsets. The method includes selecting multiple wavelength subsets, from the electromagnetic spectral region appropriate for determining the known characteristic, for use by an algorithm wherein the selection of wavelength subsets improves the model's fitness of the determination for the unknown values of the known characteristic. The selection process utilizes multivariate search methods that select both predictive and synergistic wavelengths within the range of wavelengths utilized. The fitness of the wavelength subsets is determined by the fitness function F=.function.(cost, performance). The method includes the steps of: (1) using one or more applications of a genetic algorithm to produce one or more count spectra, with multiple count spectra then combined to produce a combined count spectrum; (2) smoothing the count spectrum; (3) selecting a threshold count from a count spectrum to select these wavelength subsets which optimize the fitness function; and (4) eliminating a portion of the selected wavelength subsets. The determination of the unknown values can be made: (1) noninvasively and in vivo; (2) invasively and in vivo; or (3) in vitro.

  4. Improving Classification Performance through an Advanced Ensemble Based Heterogeneous Extreme Learning Machines.

    PubMed

    Abuassba, Adnan O M; Zhang, Dezheng; Luo, Xiong; Shaheryar, Ahmad; Ali, Hazrat

    2017-01-01

    Extreme Learning Machine (ELM) is a fast-learning algorithm for a single-hidden layer feedforward neural network (SLFN). It often has good generalization performance. However, there are chances that it might overfit the training data due to having more hidden nodes than needed. To address the generalization performance, we use a heterogeneous ensemble approach. We propose an Advanced ELM Ensemble (AELME) for classification, which includes Regularized-ELM, L 2 -norm-optimized ELM (ELML2), and Kernel-ELM. The ensemble is constructed by training a randomly chosen ELM classifier on a subset of training data selected through random resampling. The proposed AELM-Ensemble is evolved by employing an objective function of increasing diversity and accuracy among the final ensemble. Finally, the class label of unseen data is predicted using majority vote approach. Splitting the training data into subsets and incorporation of heterogeneous ELM classifiers result in higher prediction accuracy, better generalization, and a lower number of base classifiers, as compared to other models (Adaboost, Bagging, Dynamic ELM ensemble, data splitting ELM ensemble, and ELM ensemble). The validity of AELME is confirmed through classification on several real-world benchmark datasets.

  5. Improving Classification Performance through an Advanced Ensemble Based Heterogeneous Extreme Learning Machines

    PubMed Central

    Abuassba, Adnan O. M.; Ali, Hazrat

    2017-01-01

    Extreme Learning Machine (ELM) is a fast-learning algorithm for a single-hidden layer feedforward neural network (SLFN). It often has good generalization performance. However, there are chances that it might overfit the training data due to having more hidden nodes than needed. To address the generalization performance, we use a heterogeneous ensemble approach. We propose an Advanced ELM Ensemble (AELME) for classification, which includes Regularized-ELM, L2-norm-optimized ELM (ELML2), and Kernel-ELM. The ensemble is constructed by training a randomly chosen ELM classifier on a subset of training data selected through random resampling. The proposed AELM-Ensemble is evolved by employing an objective function of increasing diversity and accuracy among the final ensemble. Finally, the class label of unseen data is predicted using majority vote approach. Splitting the training data into subsets and incorporation of heterogeneous ELM classifiers result in higher prediction accuracy, better generalization, and a lower number of base classifiers, as compared to other models (Adaboost, Bagging, Dynamic ELM ensemble, data splitting ELM ensemble, and ELM ensemble). The validity of AELME is confirmed through classification on several real-world benchmark datasets. PMID:28546808

  6. Selection of core animals in the Algorithm for Proven and Young using a simulation model.

    PubMed

    Bradford, H L; Pocrnić, I; Fragomeni, B O; Lourenco, D A L; Misztal, I

    2017-12-01

    The Algorithm for Proven and Young (APY) enables the implementation of single-step genomic BLUP (ssGBLUP) in large, genotyped populations by separating genotyped animals into core and non-core subsets and creating a computationally efficient inverse for the genomic relationship matrix (G). As APY became the choice for large-scale genomic evaluations in BLUP-based methods, a common question is how to choose the animals in the core subset. We compared several core definitions to answer this question. Simulations comprised a moderately heritable trait for 95,010 animals and 50,000 genotypes for animals across five generations. Genotypes consisted of 25,500 SNP distributed across 15 chromosomes. Genotyping errors and missing pedigree were also mimicked. Core animals were defined based on individual generations, equal representation across generations, and at random. For a sufficiently large core size, core definitions had the same accuracies and biases, even if the core animals had imperfect genotypes. When genotyped animals had unknown parents, accuracy and bias were significantly better (p ≤ .05) for random and across generation core definitions. © 2017 The Authors. Journal of Animal Breeding and Genetics Published by Blackwell Verlag GmbH.

  7. Feature Selection for Ridge Regression with Provable Guarantees.

    PubMed

    Paul, Saurabh; Drineas, Petros

    2016-04-01

    We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.

  8. Plate-based diversity subset screening generation 2: an improved paradigm for high-throughput screening of large compound files.

    PubMed

    Bell, Andrew S; Bradley, Joseph; Everett, Jeremy R; Loesel, Jens; McLoughlin, David; Mills, James; Peakman, Marie-Claire; Sharp, Robert E; Williams, Christine; Zhu, Hongyao

    2016-11-01

    High-throughput screening (HTS) is an effective method for lead and probe discovery that is widely used in industry and academia to identify novel chemical matter and to initiate the drug discovery process. However, HTS can be time consuming and costly and the use of subsets as an efficient alternative to screening entire compound collections has been investigated. Subsets may be selected on the basis of chemical diversity, molecular properties, biological activity diversity or biological target focus. Previously, we described a novel form of subset screening: plate-based diversity subset (PBDS) screening, in which the screening subset is constructed by plate selection (rather than individual compound cherry-picking), using algorithms that select for compound quality and chemical diversity on a plate basis. In this paper, we describe a second-generation approach to the construction of an updated subset: PBDS2, using both plate and individual compound selection, that has an improved coverage of the chemical space of the screening file, whilst only selecting the same number of plates for screening. We describe the validation of PBDS2 and its successful use in hit and lead discovery. PBDS2 screening became the default mode of singleton (one compound per well) HTS for lead discovery in Pfizer.

  9. Clustering of financial time series with application to index and enhanced index tracking portfolio

    NASA Astrophysics Data System (ADS)

    Dose, Christian; Cincotti, Silvano

    2005-09-01

    A stochastic-optimization technique based on time series cluster analysis is described for index tracking and enhanced index tracking problems. Our methodology solves the problem in two steps, i.e., by first selecting a subset of stocks and then setting the weight of each stock as a result of an optimization process (asset allocation). Present formulation takes into account constraints on the number of stocks and on the fraction of capital invested in each of them, whilst not including transaction costs. Computational results based on clustering selection are compared to those of random techniques and show the importance of clustering in noise reduction and robust forecasting applications, in particular for enhanced index tracking.

  10. Prediction of lysine ubiquitylation with ensemble classifier and feature selection.

    PubMed

    Zhao, Xiaowei; Li, Xiangtao; Ma, Zhiqiang; Yin, Minghao

    2011-01-01

    Ubiquitylation is an important process of post-translational modification. Correct identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems. This paper develops a novel computational method to effectively identify the lysine ubiquitylation sites based on the ensemble approach. In the proposed method, 468 ubiquitylation sites from 323 proteins retrieved from the Swiss-Prot database were encoded into feature vectors by using four kinds of protein sequences information. An effective feature selection method was then applied to extract informative feature subsets. After different feature subsets were obtained by setting different starting points in the search procedure, they were used to train multiple random forests classifiers and then aggregated into a consensus classifier by majority voting. Evaluated by jackknife tests and independent tests respectively, the accuracy of the proposed predictor reached 76.82% for the training dataset and 79.16% for the test dataset, indicating that this predictor is a useful tool to predict lysine ubiquitylation sites. Furthermore, site-specific feature analysis was performed and it was shown that ubiquitylation is intimately correlated with the features of its surrounding sites in addition to features derived from the lysine site itself. The feature selection method is available upon request.

  11. Comparison of Genetic Algorithm, Particle Swarm Optimization and Biogeography-based Optimization for Feature Selection to Classify Clusters of Microcalcifications

    NASA Astrophysics Data System (ADS)

    Khehra, Baljit Singh; Pharwaha, Amar Partap Singh

    2017-04-01

    Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.

  12. Using ArcMap, Google Earth, and Global Positioning Systems to select and locate random households in rural Haiti.

    PubMed

    Wampler, Peter J; Rediske, Richard R; Molla, Azizur R

    2013-01-18

    A remote sensing technique was developed which combines a Geographic Information System (GIS); Google Earth, and Microsoft Excel to identify home locations for a random sample of households in rural Haiti. The method was used to select homes for ethnographic and water quality research in a region of rural Haiti located within 9 km of a local hospital and source of health education in Deschapelles, Haiti. The technique does not require access to governmental records or ground based surveys to collect household location data and can be performed in a rapid, cost-effective manner. The random selection of households and the location of these households during field surveys were accomplished using GIS, Google Earth, Microsoft Excel, and handheld Garmin GPSmap 76CSx GPS units. Homes were identified and mapped in Google Earth, exported to ArcMap 10.0, and a random list of homes was generated using Microsoft Excel which was then loaded onto handheld GPS units for field location. The development and use of a remote sensing method was essential to the selection and location of random households. A total of 537 homes initially were mapped and a randomized subset of 96 was identified as potential survey locations. Over 96% of the homes mapped using Google Earth imagery were correctly identified as occupied dwellings. Only 3.6% of the occupants of mapped homes visited declined to be interviewed. 16.4% of the homes visited were not occupied at the time of the visit due to work away from the home or market days. A total of 55 households were located using this method during the 10 days of fieldwork in May and June of 2012. The method used to generate and field locate random homes for surveys and water sampling was an effective means of selecting random households in a rural environment lacking geolocation infrastructure. The success rate for locating households using a handheld GPS was excellent and only rarely was local knowledge required to identify and locate households. This method provides an important technique that can be applied to other developing countries where a randomized study design is needed but infrastructure is lacking to implement more traditional participant selection methods.

  13. Dealing with correlated choices: how a spin-glass model can help political parties select their policies.

    PubMed

    Moore, M A; Katzgraber, Helmut G

    2014-10-01

    Starting from preferences on N proposed policies obtained via questionnaires from a sample of the electorate, an Ising spin-glass model in a field can be constructed from which a political party could find the subset of the proposed policies which would maximize its appeal, form a coherent choice in the eyes of the electorate, and have maximum overlap with the party's existing policies. We illustrate the application of the procedure by simulations of a spin glass in a random field on scale-free networks.

  14. Skin lesion computational diagnosis of dermoscopic images: Ensemble models based on input feature manipulation.

    PubMed

    Oliveira, Roberta B; Pereira, Aledir S; Tavares, João Manuel R S

    2017-10-01

    The number of deaths worldwide due to melanoma has risen in recent times, in part because melanoma is the most aggressive type of skin cancer. Computational systems have been developed to assist dermatologists in early diagnosis of skin cancer, or even to monitor skin lesions. However, there still remains a challenge to improve classifiers for the diagnosis of such skin lesions. The main objective of this article is to evaluate different ensemble classification models based on input feature manipulation to diagnose skin lesions. Input feature manipulation processes are based on feature subset selections from shape properties, colour variation and texture analysis to generate diversity for the ensemble models. Three subset selection models are presented here: (1) a subset selection model based on specific feature groups, (2) a correlation-based subset selection model, and (3) a subset selection model based on feature selection algorithms. Each ensemble classification model is generated using an optimum-path forest classifier and integrated with a majority voting strategy. The proposed models were applied on a set of 1104 dermoscopic images using a cross-validation procedure. The best results were obtained by the first ensemble classification model that generates a feature subset ensemble based on specific feature groups. The skin lesion diagnosis computational system achieved 94.3% accuracy, 91.8% sensitivity and 96.7% specificity. The input feature manipulation process based on specific feature subsets generated the greatest diversity for the ensemble classification model with very promising results. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods.

    PubMed

    Polat, Huseyin; Danaei Mehr, Homay; Cetin, Aydin

    2017-04-01

    As Chronic Kidney Disease progresses slowly, early detection and effective treatment are the only cure to reduce the mortality rate. Machine learning techniques are gaining significance in medical diagnosis because of their classification ability with high accuracy rates. The accuracy of classification algorithms depend on the use of correct feature selection algorithms to reduce the dimension of datasets. In this study, Support Vector Machine classification algorithm was used to diagnose Chronic Kidney Disease. To diagnose the Chronic Kidney Disease, two essential types of feature selection methods namely, wrapper and filter approaches were chosen to reduce the dimension of Chronic Kidney Disease dataset. In wrapper approach, classifier subset evaluator with greedy stepwise search engine and wrapper subset evaluator with the Best First search engine were used. In filter approach, correlation feature selection subset evaluator with greedy stepwise search engine and filtered subset evaluator with the Best First search engine were used. The results showed that the Support Vector Machine classifier by using filtered subset evaluator with the Best First search engine feature selection method has higher accuracy rate (98.5%) in the diagnosis of Chronic Kidney Disease compared to other selected methods.

  16. Predicting ovarian malignancy: application of artificial neural networks to transvaginal and color Doppler flow US.

    PubMed

    Biagiotti, R; Desii, C; Vanzi, E; Gacci, G

    1999-02-01

    To compare the performance of artificial neural networks (ANNs) with that of multiple logistic regression (MLR) models for predicting ovarian malignancy in patients with adnexal masses by using transvaginal B-mode and color Doppler flow ultrasonography (US). A total of 226 adnexal masses were examined before surgery: Fifty-one were malignant and 175 were benign. The data were divided into training and testing subsets by using a "leave n out method." The training subsets were used to compute the optimum MLR equations and to train the ANNs. The cross-validation subsets were used to estimate the performance of each of the two models in predicting ovarian malignancy. At testing, three-layer back-propagation networks, based on the same input variables selected by using MLR (i.e., women's ages, papillary projections, random echogenicity, peak systolic velocity, and resistance index), had a significantly higher sensitivity than did MLR (96% vs 84%; McNemar test, p = .04). The Brier scores for ANNs were significantly lower than those calculated for MLR (Student t test for paired samples, P = .004). ANNs might have potential for categorizing adnexal masses as either malignant or benign on the basis of multiple variables related to demographic and US features.

  17. The SIMRAND methodology: Theory and application for the simulation of research and development projects

    NASA Technical Reports Server (NTRS)

    Miles, R. F., Jr.

    1986-01-01

    A research and development (R&D) project often involves a number of decisions that must be made concerning which subset of systems or tasks are to be undertaken to achieve the goal of the R&D project. To help in this decision making, SIMRAND (SIMulation of Research ANd Development Projects) is a methodology for the selection of the optimal subset of systems or tasks to be undertaken on an R&D project. Using alternative networks, the SIMRAND methodology models the alternative subsets of systems or tasks under consideration. Each path through an alternative network represents one way of satisfying the project goals. Equations are developed that relate the system or task variables to the measure of reference. Uncertainty is incorporated by treating the variables of the equations probabilistically as random variables, with cumulative distribution functions assessed by technical experts. Analytical techniques of probability theory are used to reduce the complexity of the alternative networks. Cardinal utility functions over the measure of preference are assessed for the decision makers. A run of the SIMRAND Computer I Program combines, in a Monte Carlo simulation model, the network structure, the equations, the cumulative distribution functions, and the utility functions.

  18. An evaluation of exact methods for the multiple subset maximum cardinality selection problem.

    PubMed

    Brusco, Michael J; Köhn, Hans-Friedrich; Steinley, Douglas

    2016-05-01

    The maximum cardinality subset selection problem requires finding the largest possible subset from a set of objects, such that one or more conditions are satisfied. An important extension of this problem is to extract multiple subsets, where the addition of one more object to a larger subset would always be preferred to increases in the size of one or more smaller subsets. We refer to this as the multiple subset maximum cardinality selection problem (MSMCSP). A recently published branch-and-bound algorithm solves the MSMCSP as a partitioning problem. Unfortunately, the computational requirement associated with the algorithm is often enormous, thus rendering the method infeasible from a practical standpoint. In this paper, we present an alternative approach that successively solves a series of binary integer linear programs to obtain a globally optimal solution to the MSMCSP. Computational comparisons of the methods using published similarity data for 45 food items reveal that the proposed sequential method is computationally far more efficient than the branch-and-bound approach. © 2016 The British Psychological Society.

  19. The Causal Effect of Tracing by Peer Health Workers on Return to Clinic Among Patients Who Were Lost to Follow-up From Antiretroviral Therapy in Eastern Africa: A "Natural Experiment" Arising From Surveillance of Lost Patients.

    PubMed

    Bershetyn, Anna; Odeny, Thomas A; Lyamuya, Rita; Nakiwogga-Muwanga, Alice; Diero, Lameck; Bwana, Mwebesa; Braitstein, Paula; Somi, Geoffrey; Kambugu, Andrew; Bukusi, Elizabeth; Hartogensis, Wendy; Glidden, David V; Wools-Kaloustian, Kara; Yiannoutsos, Constantin; Martin, Jeffrey; Geng, Elvin H

    2017-06-01

    The effect of tracing human immunodeficiency virus (HIV)-infected patients who are lost to follow-up (LTFU) on reengagement has not been rigorously assessed. We carried out an ex post analysis of a surveillance study in which LTFU patients were randomly selected for tracing to identify the effect of tracing on reengagement. We evaluated HIV-infected adults on antiretroviral therapy who were LTFU (>90 days late for last visit) at 14 clinics in Uganda, Kenya, and Tanzania. A random sample of LTFU patients was selected for tracing by peer health workers. We assessed the effect of selection for tracing using Kaplan-Meier estimates of reengagement among all patients as well as the subset of LTFU patients who were alive, contacted in person by the tracer, and out of care. Of 5781 eligible patients, 991 (17%) were randomly selected for tracing. One year after selection for tracing, 13.3% (95% confidence interval [CI], 11.1%-15.3%) of those selected for tracing returned compared with 10.0% (95% CI, 9.1%-10.8%) of those not randomly selected, an adjusted risk difference of 3.0% (95% CI, .7%-5.3%). Among patients found to be alive, personally contacted, and out of care, tracing increased the absolute probability of return at 1 year by 22% (95% CI, 7.1%-36.2%). The effect of tracing on rate of return to clinic decayed with a half-life of 7.0 days after tracing (95% CI, 2.6 %-12.9%). Tracing interventions increase reengagement, but developing methods for targeting LTFU patients most likely to benefit can make this practice more efficient. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.

  20. System-level multi-target drug discovery from natural products with applications to cardiovascular diseases.

    PubMed

    Zheng, Chunli; Wang, Jinan; Liu, Jianling; Pei, Mengjie; Huang, Chao; Wang, Yonghua

    2014-08-01

    The term systems pharmacology describes a field of study that uses computational and experimental approaches to broaden the view of drug actions rooted in molecular interactions and advance the process of drug discovery. The aim of this work is to stick out the role that the systems pharmacology plays across the multi-target drug discovery from natural products for cardiovascular diseases (CVDs). Firstly, based on network pharmacology methods, we reconstructed the drug-target and target-target networks to determine the putative protein target set of multi-target drugs for CVDs treatment. Secondly, we reintegrated a compound dataset of natural products and then obtained a multi-target compounds subset by virtual-screening process. Thirdly, a drug-likeness evaluation was applied to find the ADME-favorable compounds in this subset. Finally, we conducted in vitro experiments to evaluate the reliability of the selected chemicals and targets. We found that four of the five randomly selected natural molecules can effectively act on the target set for CVDs, indicating the reasonability of our systems-based method. This strategy may serve as a new model for multi-target drug discovery of complex diseases.

  1. Stochastic subset selection for learning with kernel machines.

    PubMed

    Rhinelander, Jason; Liu, Xiaoping P

    2012-06-01

    Kernel machines have gained much popularity in applications of machine learning. Support vector machines (SVMs) are a subset of kernel machines and generalize well for classification, regression, and anomaly detection tasks. The training procedure for traditional SVMs involves solving a quadratic programming (QP) problem. The QP problem scales super linearly in computational effort with the number of training samples and is often used for the offline batch processing of data. Kernel machines operate by retaining a subset of observed data during training. The data vectors contained within this subset are referred to as support vectors (SVs). The work presented in this paper introduces a subset selection method for the use of kernel machines in online, changing environments. Our algorithm works by using a stochastic indexing technique when selecting a subset of SVs when computing the kernel expansion. The work described here is novel because it separates the selection of kernel basis functions from the training algorithm used. The subset selection algorithm presented here can be used in conjunction with any online training technique. It is important for online kernel machines to be computationally efficient due to the real-time requirements of online environments. Our algorithm is an important contribution because it scales linearly with the number of training samples and is compatible with current training techniques. Our algorithm outperforms standard techniques in terms of computational efficiency and provides increased recognition accuracy in our experiments. We provide results from experiments using both simulated and real-world data sets to verify our algorithm.

  2. Hybrid spread spectrum radio system

    DOEpatents

    Smith, Stephen F.; Dress, William B.

    2010-02-02

    Systems and methods are described for hybrid spread spectrum radio systems. A method includes modulating a signal by utilizing a subset of bits from a pseudo-random code generator to control an amplification circuit that provides a gain to the signal. Another method includes: modulating a signal by utilizing a subset of bits from a pseudo-random code generator to control a fast hopping frequency synthesizer; and fast frequency hopping the signal with the fast hopping frequency synthesizer, wherein multiple frequency hops occur within a single data-bit time.

  3. Spectra of random operators with absolutely continuous integrated density of states

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rio, Rafael del, E-mail: delrio@iimas.unam.mx, E-mail: delriomagia@gmail.com

    2014-04-15

    The structure of the spectrum of random operators is studied. It is shown that if the density of states measure of some subsets of the spectrum is zero, then these subsets are empty. In particular follows that absolute continuity of the integrated density of states implies singular spectra of ergodic operators is either empty or of positive measure. Our results apply to Anderson and alloy type models, perturbed Landau Hamiltonians, almost periodic potentials, and models which are not ergodic.

  4. Communication methods, systems, apparatus, and devices involving RF tag registration

    DOEpatents

    Burghard, Brion J [W. Richland, WA; Skorpik, James R [Kennewick, WA

    2008-04-22

    One technique of the present invention includes a number of Radio Frequency (RF) tags that each have a different identifier. Information is broadcast to the tags from an RF tag interrogator. This information corresponds to a maximum quantity of tag response time slots that are available. This maximum quantity may be less than the total number of tags. The tags each select one of the time slots as a function of the information and a random number provided by each respective tag. The different identifiers are transmitted to the interrogator from at least a subset of the RF tags.

  5. Random-subset fitting of digital holograms for fast three-dimensional particle tracking [invited].

    PubMed

    Dimiduk, Thomas G; Perry, Rebecca W; Fung, Jerome; Manoharan, Vinothan N

    2014-09-20

    Fitting scattering solutions to time series of digital holograms is a precise way to measure three-dimensional dynamics of microscale objects such as colloidal particles. However, this inverse-problem approach is computationally expensive. We show that the computational time can be reduced by an order of magnitude or more by fitting to a random subset of the pixels in a hologram. We demonstrate our algorithm on experimentally measured holograms of micrometer-scale colloidal particles, and we show that 20-fold increases in speed, relative to fitting full frames, can be attained while introducing errors in the particle positions of 10 nm or less. The method is straightforward to implement and works for any scattering model. It also enables a parallelization strategy wherein random-subset fitting is used to quickly determine initial guesses that are subsequently used to fit full frames in parallel. This approach may prove particularly useful for studying rare events, such as nucleation, that can only be captured with high frame rates over long times.

  6. Detection of Nitrogen Content in Rubber Leaves Using Near-Infrared (NIR) Spectroscopy with Correlation-Based Successive Projections Algorithm (SPA).

    PubMed

    Tang, Rongnian; Chen, Xupeng; Li, Chuang

    2018-05-01

    Near-infrared spectroscopy is an efficient, low-cost technology that has potential as an accurate method in detecting the nitrogen content of natural rubber leaves. Successive projections algorithm (SPA) is a widely used variable selection method for multivariate calibration, which uses projection operations to select a variable subset with minimum multi-collinearity. However, due to the fluctuation of correlation between variables, high collinearity may still exist in non-adjacent variables of subset obtained by basic SPA. Based on analysis to the correlation matrix of the spectra data, this paper proposed a correlation-based SPA (CB-SPA) to apply the successive projections algorithm in regions with consistent correlation. The result shows that CB-SPA can select variable subsets with more valuable variables and less multi-collinearity. Meanwhile, models established by the CB-SPA subset outperform basic SPA subsets in predicting nitrogen content in terms of both cross-validation and external prediction. Moreover, CB-SPA is assured to be more efficient, for the time cost in its selection procedure is one-twelfth that of the basic SPA.

  7. Effect of Cytomegalovirus Co-Infection on Normalization of Selected T-Cell Subsets in Children with Perinatally Acquired HIV Infection Treated with Combination Antiretroviral Therapy

    PubMed Central

    Kapetanovic, Suad; Aaron, Lisa; Montepiedra, Grace; Anthony, Patricia; Thuvamontolrat, Kasalyn; Pahwa, Savita; Burchett, Sandra; Weinberg, Adriana; Kovacs, Andrea

    2015-01-01

    Background We examined the effect of cytomegalovirus (CMV) co-infection and viremia on reconstitution of selected CD4+ and CD8+ T-cell subsets in perinatally HIV-infected (PHIV+) children ≥ 1-year old who participated in a partially randomized, open-label, 96-week combination antiretroviral therapy (cART)-algorithm study. Methods Participants were categorized as CMV-naïve, CMV-positive (CMV+) viremic, and CMV+ aviremic, based on blood, urine, or throat culture, CMV IgG and DNA polymerase chain reaction measured at baseline. At weeks 0, 12, 20 and 40, T-cell subsets including naïve (CD62L+CD45RA+; CD95-CD28+), activated (CD38+HLA-DR+) and terminally differentiated (CD62L-CD45RA+; CD95+CD28-) CD4+ and CD8+ T-cells were measured by flow cytometry. Results Of the 107 participants included in the analysis, 14% were CMV+ viremic; 49% CMV+ aviremic; 37% CMV-naïve. In longitudinal adjusted models, compared with CMV+ status, baseline CMV-naïve status was significantly associated with faster recovery of CD8+CD62L+CD45RA+% and CD8+CD95-CD28+% and faster decrease of CD8+CD95+CD28-%, independent of HIV VL response to treatment, cART regimen and baseline CD4%. Surprisingly, CMV status did not have a significant impact on longitudinal trends in CD8+CD38+HLA-DR+%. CMV status did not have a significant impact on any CD4+ T-cell subsets. Conclusions In this cohort of PHIV+ children, the normalization of naïve and terminally differentiated CD8+ T-cell subsets in response to cART was detrimentally affected by the presence of CMV co-infection. These findings may have implications for adjunctive treatment strategies targeting CMV co-infection in PHIV+ children, especially those that are now adults or reaching young adulthood and may have accelerated immunologic aging, increased opportunistic infections and aging diseases of the immune system. PMID:25794163

  8. Random sampling of elementary flux modes in large-scale metabolic networks.

    PubMed

    Machado, Daniel; Soons, Zita; Patil, Kiran Raosaheb; Ferreira, Eugénio C; Rocha, Isabel

    2012-09-15

    The description of a metabolic network in terms of elementary (flux) modes (EMs) provides an important framework for metabolic pathway analysis. However, their application to large networks has been hampered by the combinatorial explosion in the number of modes. In this work, we develop a method for generating random samples of EMs without computing the whole set. Our algorithm is an adaptation of the canonical basis approach, where we add an additional filtering step which, at each iteration, selects a random subset of the new combinations of modes. In order to obtain an unbiased sample, all candidates are assigned the same probability of getting selected. This approach avoids the exponential growth of the number of modes during computation, thus generating a random sample of the complete set of EMs within reasonable time. We generated samples of different sizes for a metabolic network of Escherichia coli, and observed that they preserve several properties of the full EM set. It is also shown that EM sampling can be used for rational strain design. A well distributed sample, that is representative of the complete set of EMs, should be suitable to most EM-based methods for analysis and optimization of metabolic networks. Source code for a cross-platform implementation in Python is freely available at http://code.google.com/p/emsampler. dmachado@deb.uminho.pt Supplementary data are available at Bioinformatics online.

  9. Predictive value of initial FDG-PET features for treatment response and survival in esophageal cancer patients treated with chemo-radiation therapy using a random forest classifier.

    PubMed

    Desbordes, Paul; Ruan, Su; Modzelewski, Romain; Pineau, Pascal; Vauclin, Sébastien; Gouel, Pierrick; Michel, Pierre; Di Fiore, Frédéric; Vera, Pierre; Gardin, Isabelle

    2017-01-01

    In oncology, texture features extracted from positron emission tomography with 18-fluorodeoxyglucose images (FDG-PET) are of increasing interest for predictive and prognostic studies, leading to several tens of features per tumor. To select the best features, the use of a random forest (RF) classifier was investigated. Sixty-five patients with an esophageal cancer treated with a combined chemo-radiation therapy were retrospectively included. All patients underwent a pretreatment whole-body FDG-PET. The patients were followed for 3 years after the end of the treatment. The response assessment was performed 1 month after the end of the therapy. Patients were classified as complete responders and non-complete responders. Sixty-one features were extracted from medical records and PET images. First, Spearman's analysis was performed to eliminate correlated features. Then, the best predictive and prognostic subsets of features were selected using a RF algorithm. These results were compared to those obtained by a Mann-Whitney U test (predictive study) and a univariate Kaplan-Meier analysis (prognostic study). Among the 61 initial features, 28 were not correlated. From these 28 features, the best subset of complementary features found using the RF classifier to predict response was composed of 2 features: metabolic tumor volume (MTV) and homogeneity from the co-occurrence matrix. The corresponding predictive value (AUC = 0.836 ± 0.105, Se = 82 ± 9%, Sp = 91 ± 12%) was higher than the best predictive results found using the Mann-Whitney test: busyness from the gray level difference matrix (P < 0.0001, AUC = 0.810, Se = 66%, Sp = 88%). The best prognostic subset found using RF was composed of 3 features: MTV and 2 clinical features (WHO status and nutritional risk index) (AUC = 0.822 ± 0.059, Se = 79 ± 9%, Sp = 95 ± 6%), while no feature was significantly prognostic according to the Kaplan-Meier analysis. The RF classifier can improve predictive and prognostic values compared to the Mann-Whitney U test and the univariate Kaplan-Meier survival analysis when applied to several tens of features in a limited patient database.

  10. Analysis of clinical data to determine the minimum number of sensors required for adequate skin temperature monitoring of superficial hyperthermia treatments.

    PubMed

    Bakker, Akke; Holman, Rebecca; Rodrigues, Dario B; Dobšíček Trefná, Hana; Stauffer, Paul R; van Tienhoven, Geertjan; Rasch, Coen R N; Crezee, Hans

    2018-04-27

    Tumor response and treatment toxicity are related to minimum and maximum tissue temperatures during hyperthermia, respectively. Using a large set of clinical data, we analyzed the number of sensors required to adequately monitor skin temperature during superficial hyperthermia treatment of breast cancer patients. Hyperthermia treatments monitored with >60 stationary temperature sensors were selected from a database of patients with recurrent breast cancer treated with re-irradiation (23 × 2 Gy) and hyperthermia using single 434 MHz applicators (effective field size 351-396 cm 2 ). Reduced temperature monitoring schemes involved randomly selected subsets of stationary skin sensors, and another subset simulating continuous thermal mapping of the skin. Temperature differences (ΔT) between subsets and complete sets of sensors were evaluated in terms of overall minimum (T min ) and maximum (T max ) temperature, as well as T90 and T10. Eighty patients were included yielding a total of 400 hyperthermia sessions. Median ΔT was <0.01 °C for T90, its 95% confidence interval (95%CI) decreased to ≤0.5 °C when >50 sensors were used. Subsets of <10 sensors result in underestimation of T max up to -2.1 °C (ΔT 95%CI), which decreased to -0.5 °C when >50 sensors were used. Thermal profiles (8-21 probes) yielded a median ΔT < 0.01 °C for T90 and T max , with a 95%CI of -0.2 °C and 0.4 °C, respectively. The detection rate of T max  ≥43 °C is ≥85% while using >50 stationary sensors or thermal profiles. Adequate coverage of the skin temperature distribution during superficial hyperthermia treatment requires the use of >50 stationary sensors per 400 cm 2 applicator. Thermal mapping is a valid alternative.

  11. Selection of a Representative Subset of Global Climate Models that Captures the Profile of Regional Changes for Integrated Climate Impacts Assessment

    NASA Technical Reports Server (NTRS)

    Ruane, Alex C.; Mcdermid, Sonali P.

    2017-01-01

    We present the Representative Temperature and Precipitation (T&P) GCM Subsetting Approach developed within the Agricultural Model Intercomparison and Improvement Project (AgMIP) to select a practical subset of global climate models (GCMs) for regional integrated assessment of climate impacts when resource limitations do not permit the full ensemble of GCMs to be evaluated given the need to also focus on impacts sector and economics models. Subsetting inherently leads to a loss of information but can free up resources to explore important uncertainties in the integrated assessment that would otherwise be prohibitive. The Representative T&P GCM Subsetting Approach identifies five individual GCMs that capture a profile of the full ensemble of temperature and precipitation change within the growing season while maintaining information about the probability that basic classes of climate changes (relatively cool/wet, cool/dry, middle, hot/wet, and hot/dry) are projected in the full GCM ensemble. We demonstrate the selection methodology for maize impacts in Ames, Iowa, and discuss limitations and situations when additional information may be required to select representative GCMs. We then classify 29 GCMs over all land areas to identify regions and seasons with characteristic diagonal skewness related to surface moisture as well as extreme skewness connected to snow-albedo feedbacks and GCM uncertainty. Finally, we employ this basic approach to recognize that GCM projections demonstrate coherence across space, time, and greenhouse gas concentration pathway. The Representative T&P GCM Subsetting Approach provides a quantitative basis for the determination of useful GCM subsets, provides a practical and coherent approach where previous assessments selected solely on availability of scenarios, and may be extended for application to a range of scales and sectoral impacts.

  12. Theoretical analysis on the measurement errors of local 2D DIC: Part I temporal and spatial uncertainty quantification of displacement measurements

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Yueqi; Lava, Pascal; Reu, Phillip

    This study presents a theoretical uncertainty quantification of displacement measurements by subset-based 2D-digital image correlation. A generalized solution to estimate the random error of displacement measurement is presented. The obtained solution suggests that the random error of displacement measurements is determined by the image noise, the summation of the intensity gradient in a subset, the subpixel part of displacement, and the interpolation scheme. The proposed method is validated with virtual digital image correlation tests.

  13. Theoretical analysis on the measurement errors of local 2D DIC: Part I temporal and spatial uncertainty quantification of displacement measurements

    DOE PAGES

    Wang, Yueqi; Lava, Pascal; Reu, Phillip; ...

    2015-12-23

    This study presents a theoretical uncertainty quantification of displacement measurements by subset-based 2D-digital image correlation. A generalized solution to estimate the random error of displacement measurement is presented. The obtained solution suggests that the random error of displacement measurements is determined by the image noise, the summation of the intensity gradient in a subset, the subpixel part of displacement, and the interpolation scheme. The proposed method is validated with virtual digital image correlation tests.

  14. Learning a constrained conditional random field for enhanced segmentation of fallen trees in ALS point clouds

    NASA Astrophysics Data System (ADS)

    Polewski, Przemyslaw; Yao, Wei; Heurich, Marco; Krzystek, Peter; Stilla, Uwe

    2018-06-01

    In this study, we present a method for improving the quality of automatic single fallen tree stem segmentation in ALS data by applying a specialized constrained conditional random field (CRF). The entire processing pipeline is composed of two steps. First, short stem segments of equal length are detected and a subset of them is selected for further processing, while in the second step the chosen segments are merged to form entire trees. The first step is accomplished using the specialized CRF defined on the space of segment labelings, capable of finding segment candidates which are easier to merge subsequently. To achieve this, the CRF considers not only the features of every candidate individually, but incorporates pairwise spatial interactions between adjacent segments into the model. In particular, pairwise interactions include a collinearity/angular deviation probability which is learned from training data as well as the ratio of spatial overlap, whereas unary potentials encode a learned probabilistic model of the laser point distribution around each segment. Each of these components enters the CRF energy with its own balance factor. To process previously unseen data, we first calculate the subset of segments for merging on a grid of balance factors by minimizing the CRF energy. Then, we perform the merging and rank the balance configurations according to the quality of their resulting merged trees, obtained from a learned tree appearance model. The final result is derived from the top-ranked configuration. We tested our approach on 5 plots from the Bavarian Forest National Park using reference data acquired in a field inventory. Compared to our previous segment selection method without pairwise interactions, an increase in detection correctness and completeness of up to 7 and 9 percentage points, respectively, was observed.

  15. Evaluation and application of multiple scoring functions for a virtual screening experiment

    NASA Astrophysics Data System (ADS)

    Xing, Li; Hodgkin, Edward; Liu, Qian; Sedlock, David

    2004-05-01

    In order to identify novel chemical classes of factor Xa inhibitors, five scoring functions (FlexX, DOCK, GOLD, ChemScore and PMF) were engaged to evaluate the multiple docking poses generated by FlexX. The compound collection was composed of confirmed potent factor Xa inhibitors and a subset of the LeadQuest® screening compound library. Except for PMF the other four scoring functions succeeded in reproducing the crystal complex (PDB code: 1FAX). During virtual screening the highest hit rate (80%) was demonstrated by FlexX at an energy cutoff of -40 kJ/mol, which is about 40-fold over random screening (2.06%). Limited results suggest that presenting more poses of a single molecule to the scoring functions could deteriorate their enrichment factors. A series of promising scaffolds with favorable binding scores was retrieved from LeadQuest. Consensus scoring by pair-wise intersection failed to enrich the hit rate yielded by single scorings (i.e. FlexX). We note that reported successes of consensus scoring in hit rate enrichment could be artificial because their comparisons were based on a selected subset of single scoring and a markedly reduced subset of double or triple scoring. The findings presented in this report are based upon a single biological system and support further studies.

  16. Reliable Refuge: Two Sky Island Scorpion Species Select Larger, Thermally Stable Retreat Sites.

    PubMed

    Becker, Jamie E; Brown, Christopher A

    2016-01-01

    Sky island scorpions shelter under rocks and other surface debris, but, as with other scorpions, it is unclear whether these species select retreat sites randomly. Furthermore, little is known about the thermal preferences of scorpions, and no research has been done to identify whether reproductive condition might influence retreat site selection. The objectives were to (1) identify physical or thermal characteristics for retreat sites occupied by two sky island scorpions (Vaejovis cashi Graham 2007 and V. electrum Hughes 2011) and those not occupied; (2) determine whether retreat site selection differs between the two study species; and (3) identify whether thermal selection differs between species and between gravid and non-gravid females of the same species. Within each scorpion's habitat, maximum dimensions of rocks along a transect line were measured and compared to occupied rocks to determine whether retreat site selection occurred randomly. Temperature loggers were placed under a subset of occupied and unoccupied rocks for 48 hours to compare the thermal characteristics of these rocks. Thermal gradient trials were conducted before parturition and after dispersal of young in order to identify whether gravidity influences thermal preference. Vaejovis cashi and V. electrum both selected larger retreat sites that had more stable thermal profiles. Neither species appeared to have thermal preferences influenced by reproductive condition. However, while thermal selection did not differ among non-gravid individuals, gravid V. electrum selected warmer temperatures than its gravid congener. Sky island scorpions appear to select large retreat sites to maintain thermal stability, although biotic factors (e.g., competition) could also be involved in this choice. Future studies should focus on identifying the various biotic or abiotic factors that could influence retreat site selection in scorpions, as well as determining whether reproductive condition affects thermal selection in other arachnids.

  17. Using learning automata to determine proper subset size in high-dimensional spaces

    NASA Astrophysics Data System (ADS)

    Seyyedi, Seyyed Hossein; Minaei-Bidgoli, Behrouz

    2017-03-01

    In this paper, we offer a new method called FSLA (Finding the best candidate Subset using Learning Automata), which combines the filter and wrapper approaches for feature selection in high-dimensional spaces. Considering the difficulties of dimension reduction in high-dimensional spaces, FSLA's multi-objective functionality is to determine, in an efficient manner, a feature subset that leads to an appropriate tradeoff between the learning algorithm's accuracy and efficiency. First, using an existing weighting function, the feature list is sorted and selected subsets of the list of different sizes are considered. Then, a learning automaton verifies the performance of each subset when it is used as the input space of the learning algorithm and estimates its fitness upon the algorithm's accuracy and the subset size, which determines the algorithm's efficiency. Finally, FSLA introduces the fittest subset as the best choice. We tested FSLA in the framework of text classification. The results confirm its promising performance of attaining the identified goal.

  18. Two-stage atlas subset selection in multi-atlas based image segmentation.

    PubMed

    Zhao, Tingting; Ruan, Dan

    2015-06-01

    Fast growing access to large databases and cloud stored data presents a unique opportunity for multi-atlas based image segmentation and also presents challenges in heterogeneous atlas quality and computation burden. This work aims to develop a novel two-stage method tailored to the special needs in the face of large atlas collection with varied quality, so that high-accuracy segmentation can be achieved with low computational cost. An atlas subset selection scheme is proposed to substitute a significant portion of the computationally expensive full-fledged registration in the conventional scheme with a low-cost alternative. More specifically, the authors introduce a two-stage atlas subset selection method. In the first stage, an augmented subset is obtained based on a low-cost registration configuration and a preliminary relevance metric; in the second stage, the subset is further narrowed down to a fusion set of desired size, based on full-fledged registration and a refined relevance metric. An inference model is developed to characterize the relationship between the preliminary and refined relevance metrics, and a proper augmented subset size is derived to ensure that the desired atlases survive the preliminary selection with high probability. The performance of the proposed scheme has been assessed with cross validation based on two clinical datasets consisting of manually segmented prostate and brain magnetic resonance images, respectively. The proposed scheme demonstrates comparable end-to-end segmentation performance as the conventional single-stage selection method, but with significant computation reduction. Compared with the alternative computation reduction method, their scheme improves the mean and medium Dice similarity coefficient value from (0.74, 0.78) to (0.83, 0.85) and from (0.82, 0.84) to (0.95, 0.95) for prostate and corpus callosum segmentation, respectively, with statistical significance. The authors have developed a novel two-stage atlas subset selection scheme for multi-atlas based segmentation. It achieves good segmentation accuracy with significantly reduced computation cost, making it a suitable configuration in the presence of extensive heterogeneous atlases.

  19. Hierarchical Kohonenen net for anomaly detection in network security.

    PubMed

    Sarasamma, Suseela T; Zhu, Qiuming A; Huff, Julie

    2005-04-01

    A novel multilevel hierarchical Kohonen Net (K-Map) for an intrusion detection system is presented. Each level of the hierarchical map is modeled as a simple winner-take-all K-Map. One significant advantage of this multilevel hierarchical K-Map is its computational efficiency. Unlike other statistical anomaly detection methods such as nearest neighbor approach, K-means clustering or probabilistic analysis that employ distance computation in the feature space to identify the outliers, our approach does not involve costly point-to-point computation in organizing the data into clusters. Another advantage is the reduced network size. We use the classification capability of the K-Map on selected dimensions of data set in detecting anomalies. Randomly selected subsets that contain both attacks and normal records from the KDD Cup 1999 benchmark data are used to train the hierarchical net. We use a confidence measure to label the clusters. Then we use the test set from the same KDD Cup 1999 benchmark to test the hierarchical net. We show that a hierarchical K-Map in which each layer operates on a small subset of the feature space is superior to a single-layer K-Map operating on the whole feature space in detecting a variety of attacks in terms of detection rate as well as false positive rate.

  20. Hybrid feature selection for supporting lightweight intrusion detection systems

    NASA Astrophysics Data System (ADS)

    Song, Jianglong; Zhao, Wentao; Liu, Qiang; Wang, Xin

    2017-08-01

    Redundant and irrelevant features not only cause high resource consumption but also degrade the performance of Intrusion Detection Systems (IDS), especially when coping with big data. These features slow down the process of training and testing in network traffic classification. Therefore, a hybrid feature selection approach in combination with wrapper and filter selection is designed in this paper to build a lightweight intrusion detection system. Two main phases are involved in this method. The first phase conducts a preliminary search for an optimal subset of features, in which the chi-square feature selection is utilized. The selected set of features from the previous phase is further refined in the second phase in a wrapper manner, in which the Random Forest(RF) is used to guide the selection process and retain an optimized set of features. After that, we build an RF-based detection model and make a fair comparison with other approaches. The experimental results on NSL-KDD datasets show that our approach results are in higher detection accuracy as well as faster training and testing processes.

  1. Adaptive feature selection using v-shaped binary particle swarm optimization.

    PubMed

    Teng, Xuyang; Dong, Hongbin; Zhou, Xiurong

    2017-01-01

    Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers.

  2. Adaptive feature selection using v-shaped binary particle swarm optimization

    PubMed Central

    Dong, Hongbin; Zhou, Xiurong

    2017-01-01

    Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers. PMID:28358850

  3. Evidence for naive T-cell repopulation despite thymus irradiation after autologous transplantation in adults with multiple myeloma: role of ex vivo CD34+ selection and age.

    PubMed

    Malphettes, Marion; Carcelain, Guislaine; Saint-Mezard, Pierre; Leblond, Véronique; Altes, Hester Korthals; Marolleau, Jean-Pierre; Debré, Patrice; Brouet, Jean-Claude; Fermand, Jean-Paul; Autran, Brigitte

    2003-03-01

    Immunodeficiency following autologous CD34+-purified peripheral blood stem cell (PBSC) transplantation could be related to T-cell depletion of the graft or impaired T-cell reconstitution due to thymus irradiation. Aiming to assess the role of irradiated thymus in T-cell repopulation, we studied 32 adults with multiple myeloma, randomly assigned to receive high-dose therapy including total body irradiation (TBI) followed by autologous transplantation with either unselected or CD34+-selected PBSCs. The median number of reinfused CD3+ cells was lower in the selected group (0.03 versus 14 x 10(6)/kg; P =.002). Lymphocyte subset counts were evaluated from month 3 to 24 after grafting. Naive CD4+ T cells were characterized both by phenotype and by quantification of T-cell receptor rearrangement excision circles (TRECs). The reconstitution of CD3+ and CD4+ T cells was significantly delayed in the CD34+-selected group, but eventually led to counts similar to those found in the unselected group after month 12. Mechanism of reconstitution differed, however, between both groups. Indeed, a marked increase in the naive CD62L+CD45RA+CD4+ subset was observed in the selected group, but not in the unselected group in which half of the CD45RA+CD4+ T cells appear to be CD62L-. Age was identified as an independent adverse factor for CD4+ and CD62L+CD45RA+CD4+ T-cell reconstitution. Our results provide evidence that infusing PBSCs depleted of T cells after TBI in adults delays T-cell reconstitution but accelerates thymic regeneration.

  4. Fizzy: feature subset selection for metagenomics.

    PubMed

    Ditzler, Gregory; Morrison, J Calvin; Lan, Yemin; Rosen, Gail L

    2015-11-04

    Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome. We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.

  5. Fizzy. Feature subset selection for metagenomics

    DOE PAGES

    Ditzler, Gregory; Morrison, J. Calvin; Lan, Yemin; ...

    2015-11-04

    Background: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α– & β–diversity. Feature subset selection – a sub-field of machine learning – can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate betweenmore » age groups in the human gut microbiome. Results: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. Conclusions: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.« less

  6. Anatomical constraints on attention: Hemifield independence is a signature of multifocal spatial selection

    PubMed Central

    Alvarez, George A; Gill, Jonathan; Cavanagh, Patrick

    2012-01-01

    Previous studies have shown independent attentional selection of targets in the left and right visual hemifields during attentional tracking (Alvarez & Cavanagh, 2005) but not during a visual search (Luck, Hillyard, Mangun, & Gazzaniga, 1989). Here we tested whether multifocal spatial attention is the critical process that operates independently in the two hemifields. It is explicitly required in tracking (attend to a subset of object locations, suppress the others) but not in the standard visual search task (where all items are potential targets). We used a modified visual search task in which observers searched for a target within a subset of display items, where the subset was selected based on location (Experiments 1 and 3A) or based on a salient feature difference (Experiments 2 and 3B). The results show hemifield independence in this subset visual search task with location-based selection but not with feature-based selection; this effect cannot be explained by general difficulty (Experiment 4). Combined, these findings suggest that hemifield independence is a signature of multifocal spatial attention and highlight the need for cognitive and neural theories of attention to account for anatomical constraints on selection mechanisms. PMID:22637710

  7. Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods.

    PubMed

    Martínez, María Jimena; Ponzoni, Ignacio; Díaz, Mónica F; Vazquez, Gustavo E; Soto, Axel J

    2015-01-01

    The design of QSAR/QSPR models is a challenging problem, where the selection of the most relevant descriptors constitutes a key step of the process. Several feature selection methods that address this step are concentrated on statistical associations among descriptors and target properties, whereas the chemical knowledge is left out of the analysis. For this reason, the interpretability and generality of the QSAR/QSPR models obtained by these feature selection methods are drastically affected. Therefore, an approach for integrating domain expert's knowledge in the selection process is needed for increase the confidence in the final set of descriptors. In this paper a software tool, which we named Visual and Interactive DEscriptor ANalysis (VIDEAN), that combines statistical methods with interactive visualizations for choosing a set of descriptors for predicting a target property is proposed. Domain expertise can be added to the feature selection process by means of an interactive visual exploration of data, and aided by statistical tools and metrics based on information theory. Coordinated visual representations are presented for capturing different relationships and interactions among descriptors, target properties and candidate subsets of descriptors. The competencies of the proposed software were assessed through different scenarios. These scenarios reveal how an expert can use this tool to choose one subset of descriptors from a group of candidate subsets or how to modify existing descriptor subsets and even incorporate new descriptors according to his or her own knowledge of the target property. The reported experiences showed the suitability of our software for selecting sets of descriptors with low cardinality, high interpretability, low redundancy and high statistical performance in a visual exploratory way. Therefore, it is possible to conclude that the resulting tool allows the integration of a chemist's expertise in the descriptor selection process with a low cognitive effort in contrast with the alternative of using an ad-hoc manual analysis of the selected descriptors. Graphical abstractVIDEAN allows the visual analysis of candidate subsets of descriptors for QSAR/QSPR. In the two panels on the top, users can interactively explore numerical correlations as well as co-occurrences in the candidate subsets through two interactive graphs.

  8. Controllability of social networks and the strategic use of random information.

    PubMed

    Cremonini, Marco; Casamassima, Francesca

    2017-01-01

    This work is aimed at studying realistic social control strategies for social networks based on the introduction of random information into the state of selected driver agents. Deliberately exposing selected agents to random information is a technique already experimented in recommender systems or search engines, and represents one of the few options for influencing the behavior of a social context that could be accepted as ethical, could be fully disclosed to members, and does not involve the use of force or of deception. Our research is based on a model of knowledge diffusion applied to a time-varying adaptive network and considers two well-known strategies for influencing social contexts: One is the selection of few influencers for manipulating their actions in order to drive the whole network to a certain behavior; the other, instead, drives the network behavior acting on the state of a large subset of ordinary, scarcely influencing users. The two approaches have been studied in terms of network and diffusion effects. The network effect is analyzed through the changes induced on network average degree and clustering coefficient, while the diffusion effect is based on two ad hoc metrics which are defined to measure the degree of knowledge diffusion and skill level, as well as the polarization of agent interests. The results, obtained through simulations on synthetic networks, show a rich dynamics and strong effects on the communication structure and on the distribution of knowledge and skills. These findings support our hypothesis that the strategic use of random information could represent a realistic approach to social network controllability, and that with both strategies, in principle, the control effect could be remarkable.

  9. Wavelet-based energy features for glaucomatous image classification.

    PubMed

    Dua, Sumeet; Acharya, U Rajendra; Chowriappa, Pradeep; Sree, S Vinitha

    2012-01-01

    Texture features within images are actively pursued for accurate and efficient glaucoma classification. Energy distribution over wavelet subbands is applied to find these important texture features. In this paper, we investigate the discriminatory potential of wavelet features obtained from the daubechies (db3), symlets (sym3), and biorthogonal (bio3.3, bio3.5, and bio3.7) wavelet filters. We propose a novel technique to extract energy signatures obtained using 2-D discrete wavelet transform, and subject these signatures to different feature ranking and feature selection strategies. We have gauged the effectiveness of the resultant ranked and selected subsets of features using a support vector machine, sequential minimal optimization, random forest, and naïve Bayes classification strategies. We observed an accuracy of around 93% using tenfold cross validations to demonstrate the effectiveness of these methods.

  10. Retinal ganglion cells with distinct directional preferences differ in molecular identity, structure, and central projections.

    PubMed

    Kay, Jeremy N; De la Huerta, Irina; Kim, In-Jung; Zhang, Yifeng; Yamagata, Masahito; Chu, Monica W; Meister, Markus; Sanes, Joshua R

    2011-05-25

    The retina contains ganglion cells (RGCs) that respond selectively to objects moving in particular directions. Individual members of a group of ON-OFF direction-selective RGCs (ooDSGCs) detect stimuli moving in one of four directions: ventral, dorsal, nasal, or temporal. Despite this physiological diversity, little is known about subtype-specific differences in structure, molecular identity, and projections. To seek such differences, we characterized mouse transgenic lines that selectively mark ooDSGCs preferring ventral or nasal motion as well as a line that marks both ventral- and dorsal-preferring subsets. We then used the lines to identify cell surface molecules, including Cadherin 6, CollagenXXVα1, and Matrix metalloprotease 17, that are selectively expressed by distinct subsets of ooDSGCs. We also identify a neuropeptide, CART (cocaine- and amphetamine-regulated transcript), that distinguishes all ooDSGCs from other RGCs. Together, this panel of endogenous and transgenic markers distinguishes the four ooDSGC subsets. Patterns of molecular diversification occur before eye opening and are therefore experience independent. They may help to explain how the four subsets obtain distinct inputs. We also demonstrate differences among subsets in their dendritic patterns within the retina and their axonal projections to the brain. Differences in projections indicate that information about motion in different directions is sent to different destinations.

  11. Species undersampling in tropical bat surveys: effects on emerging biodiversity patterns.

    PubMed

    Meyer, Christoph F J; Aguiar, Ludmilla M S; Aguirre, Luis F; Baumgarten, Julio; Clarke, Frank M; Cosson, Jean-François; Estrada Villegas, Sergio; Fahr, Jakob; Faria, Deborah; Furey, Neil; Henry, Mickaël; Jenkins, Richard K B; Kunz, Thomas H; Cristina MacSwiney González, M; Moya, Isabel; Pons, Jean-Marc; Racey, Paul A; Rex, Katja; Sampaio, Erica M; Stoner, Kathryn E; Voigt, Christian C; von Staden, Dietrich; Weise, Christa D; Kalko, Elisabeth K V

    2015-01-01

    Undersampling is commonplace in biodiversity surveys of species-rich tropical assemblages in which rare taxa abound, with possible repercussions for our ability to implement surveys and monitoring programmes in a cost-effective way. We investigated the consequences of information loss due to species undersampling (missing subsets of species from the full species pool) in tropical bat surveys for the emerging patterns of species richness (SR) and compositional variation across sites. For 27 bat assemblage data sets from across the tropics, we used correlations between original data sets and subsets with different numbers of species deleted either at random, or according to their rarity in the assemblage, to assess to what extent patterns in SR and composition in data subsets are congruent with those in the initial data set. We then examined to what degree high sample representativeness (r ≥ 0·8) was influenced by biogeographic region, sampling method, sampling effort or structural assemblage characteristics. For SR, correlations between random subsets and original data sets were strong (r ≥ 0·8) with moderate (ca. 20%) species loss. Bias associated with information loss was greater for species composition; on average ca. 90% of species in random subsets had to be retained to adequately capture among-site variation. For nonrandom subsets, removing only the rarest species (on average c. 10% of the full data set) yielded strong correlations (r > 0·95) for both SR and composition. Eliminating greater proportions of rare species resulted in weaker correlations and large variation in the magnitude of observed correlations among data sets. Species subsets that comprised ca. 85% of the original set can be considered reliable surrogates, capable of adequately revealing patterns of SR and temporal or spatial turnover in many tropical bat assemblages. Our analyses thus demonstrate the potential as well as limitations for reducing survey effort and streamlining sampling protocols, and consequently for increasing the cost-effectiveness in tropical bat surveys or monitoring programmes. The dependence of the performance of species subsets on structural assemblage characteristics (total assemblage abundance, proportion of rare species), however, underscores the importance of adaptive monitoring schemes and of establishing surrogate performance on a site by site basis based on pilot surveys. © 2014 The Authors. Journal of Animal Ecology © 2014 British Ecological Society.

  12. A non-linear data mining parameter selection algorithm for continuous variables

    PubMed Central

    Razavi, Marianne; Brady, Sean

    2017-01-01

    In this article, we propose a new data mining algorithm, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, a preferred selection method should have the potential of adding a supplementary level of regression analysis that would capture complex relationships in the data via mathematical transformation of the predictors and exploration of synergistic effects of combined variables. The method that we present here has the potential to produce an optimal subset of variables, rendering the overall process of model selection more efficient. This algorithm introduces interpretable parameters by transforming the original inputs and also a faithful fit to the data. The core objective of this paper is to introduce a new estimation technique for the classical least square regression framework. This new automatic variable transformation and model selection method could offer an optimal and stable model that minimizes the mean square error and variability, while combining all possible subset selection methodology with the inclusion variable transformations and interactions. Moreover, this method controls multicollinearity, leading to an optimal set of explanatory variables. PMID:29131829

  13. Two-stage atlas subset selection in multi-atlas based image segmentation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhao, Tingting, E-mail: tingtingzhao@mednet.ucla.edu; Ruan, Dan, E-mail: druan@mednet.ucla.edu

    2015-06-15

    Purpose: Fast growing access to large databases and cloud stored data presents a unique opportunity for multi-atlas based image segmentation and also presents challenges in heterogeneous atlas quality and computation burden. This work aims to develop a novel two-stage method tailored to the special needs in the face of large atlas collection with varied quality, so that high-accuracy segmentation can be achieved with low computational cost. Methods: An atlas subset selection scheme is proposed to substitute a significant portion of the computationally expensive full-fledged registration in the conventional scheme with a low-cost alternative. More specifically, the authors introduce a two-stagemore » atlas subset selection method. In the first stage, an augmented subset is obtained based on a low-cost registration configuration and a preliminary relevance metric; in the second stage, the subset is further narrowed down to a fusion set of desired size, based on full-fledged registration and a refined relevance metric. An inference model is developed to characterize the relationship between the preliminary and refined relevance metrics, and a proper augmented subset size is derived to ensure that the desired atlases survive the preliminary selection with high probability. Results: The performance of the proposed scheme has been assessed with cross validation based on two clinical datasets consisting of manually segmented prostate and brain magnetic resonance images, respectively. The proposed scheme demonstrates comparable end-to-end segmentation performance as the conventional single-stage selection method, but with significant computation reduction. Compared with the alternative computation reduction method, their scheme improves the mean and medium Dice similarity coefficient value from (0.74, 0.78) to (0.83, 0.85) and from (0.82, 0.84) to (0.95, 0.95) for prostate and corpus callosum segmentation, respectively, with statistical significance. Conclusions: The authors have developed a novel two-stage atlas subset selection scheme for multi-atlas based segmentation. It achieves good segmentation accuracy with significantly reduced computation cost, making it a suitable configuration in the presence of extensive heterogeneous atlases.« less

  14. Minimally buffered data transfers between nodes in a data communications network

    DOEpatents

    Miller, Douglas R.

    2015-06-23

    Methods, apparatus, and products for minimally buffered data transfers between nodes in a data communications network are disclosed that include: receiving, by a messaging module on an origin node, a storage identifier, a origin data type, and a target data type, the storage identifier specifying application storage containing data, the origin data type describing a data subset contained in the origin application storage, the target data type describing an arrangement of the data subset in application storage on a target node; creating, by the messaging module, origin metadata describing the origin data type; selecting, by the messaging module from the origin application storage in dependence upon the origin metadata and the storage identifier, the data subset; and transmitting, by the messaging module to the target node, the selected data subset for storing in the target application storage in dependence upon the target data type without temporarily buffering the data subset.

  15. Development and Standardization of a Test for Pragmatic Language Skills in Egyptian Arabic: The Egyptian Arabic Pragmatic Language Test (EAPLT).

    PubMed

    Khodeir, Mona S; Hegazi, Mona A; Saleh, Marwa M

    2018-03-19

    The aim of this study was to standardize an Egyptian Arabic Pragmatic Language Test (EAPLT) using linguistically and socially suitable questions and pictures in order to be able to address specific deficits in this language domain. Questions and pictures were designed for the EAPLT to assess 3 pragmatic language subsets: pragmatic skills, functions, and factors. Ten expert phoniatricians were asked to review the EAPLT and complete a questionnaire to assess the validity of the test items. The EAPLT was applied in 120 typically developing Arabic-speaking Egyptian children (64 females and 56 males) randomly selected by inclusion and exclusion criteria in the age range between 2 years, 1 month, 1 day and 9 years, 12 months, 31 days. Children's scores were used to calculate the means and standard deviations and the 5th and 95th percentiles to determine the age of the pragmatic skills acquisition. All experts have mostly agreed that the EAPLT gives a general idea about children's pragmatic language development. Test-retest reliability analysis proved the high reliability and internal consistency of the EAPLT subsets. A statistically significant correlation was found between the test subsets and age. The EAPLT is a valid and reliable Egyptian Arabic test that can be applied in order to detect a pragmatic language delay. © 2018 S. Karger AG, Basel.

  16. A Cancer Gene Selection Algorithm Based on the K-S Test and CFS.

    PubMed

    Su, Qiang; Wang, Yina; Jiang, Xiaobing; Chen, Fuxue; Lu, Wen-Cong

    2017-01-01

    To address the challenging problem of selecting distinguished genes from cancer gene expression datasets, this paper presents a gene subset selection algorithm based on the Kolmogorov-Smirnov (K-S) test and correlation-based feature selection (CFS) principles. The algorithm selects distinguished genes first using the K-S test, and then, it uses CFS to select genes from those selected by the K-S test. We adopted support vector machines (SVM) as the classification tool and used the criteria of accuracy to evaluate the performance of the classifiers on the selected gene subsets. This approach compared the proposed gene subset selection algorithm with the K-S test, CFS, minimum-redundancy maximum-relevancy (mRMR), and ReliefF algorithms. The average experimental results of the aforementioned gene selection algorithms for 5 gene expression datasets demonstrate that, based on accuracy, the performance of the new K-S and CFS-based algorithm is better than those of the K-S test, CFS, mRMR, and ReliefF algorithms. The experimental results show that the K-S test-CFS gene selection algorithm is a very effective and promising approach compared to the K-S test, CFS, mRMR, and ReliefF algorithms.

  17. Developing a radiomics framework for classifying non-small cell lung carcinoma subtypes

    NASA Astrophysics Data System (ADS)

    Yu, Dongdong; Zang, Yali; Dong, Di; Zhou, Mu; Gevaert, Olivier; Fang, Mengjie; Shi, Jingyun; Tian, Jie

    2017-03-01

    Patient-targeted treatment of non-small cell lung carcinoma (NSCLC) has been well documented according to the histologic subtypes over the past decade. In parallel, recent development of quantitative image biomarkers has recently been highlighted as important diagnostic tools to facilitate histological subtype classification. In this study, we present a radiomics analysis that classifies the adenocarcinoma (ADC) and squamous cell carcinoma (SqCC). We extract 52-dimensional, CT-based features (7 statistical features and 45 image texture features) to represent each nodule. We evaluate our approach on a clinical dataset including 324 ADCs and 110 SqCCs patients with CT image scans. Classification of these features is performed with four different machine-learning classifiers including Support Vector Machines with Radial Basis Function kernel (RBF-SVM), Random forest (RF), K-nearest neighbor (KNN), and RUSBoost algorithms. To improve the classifiers' performance, optimal feature subset is selected from the original feature set by using an iterative forward inclusion and backward eliminating algorithm. Extensive experimental results demonstrate that radiomics features achieve encouraging classification results on both complete feature set (AUC=0.89) and optimal feature subset (AUC=0.91).

  18. Enhancement web proxy cache performance using Wrapper Feature Selection methods with NB and J48

    NASA Astrophysics Data System (ADS)

    Mahmoud Al-Qudah, Dua'a.; Funke Olanrewaju, Rashidah; Wong Azman, Amelia

    2017-11-01

    Web proxy cache technique reduces response time by storing a copy of pages between client and server sides. If requested pages are cached in the proxy, there is no need to access the server. Due to the limited size and excessive cost of cache compared to the other storages, cache replacement algorithm is used to determine evict page when the cache is full. On the other hand, the conventional algorithms for replacement such as Least Recently Use (LRU), First in First Out (FIFO), Least Frequently Use (LFU), Randomized Policy etc. may discard important pages just before use. Furthermore, using conventional algorithm cannot be well optimized since it requires some decision to intelligently evict a page before replacement. Hence, most researchers propose an integration among intelligent classifiers and replacement algorithm to improves replacement algorithms performance. This research proposes using automated wrapper feature selection methods to choose the best subset of features that are relevant and influence classifiers prediction accuracy. The result present that using wrapper feature selection methods namely: Best First (BFS), Incremental Wrapper subset selection(IWSS)embedded NB and particle swarm optimization(PSO)reduce number of features and have a good impact on reducing computation time. Using PSO enhance NB classifier accuracy by 1.1%, 0.43% and 0.22% over using NB with all features, using BFS and using IWSS embedded NB respectively. PSO rises J48 accuracy by 0.03%, 1.91 and 0.04% over using J48 classifier with all features, using IWSS-embedded NB and using BFS respectively. While using IWSS embedded NB fastest NB and J48 classifiers much more than BFS and PSO. However, it reduces computation time of NB by 0.1383 and reduce computation time of J48 by 2.998.

  19. The Cross-Entropy Based Multi-Filter Ensemble Method for Gene Selection.

    PubMed

    Sun, Yingqiang; Lu, Chengbo; Li, Xiaobo

    2018-05-17

    The gene expression profile has the characteristics of a high dimension, low sample, and continuous type, and it is a great challenge to use gene expression profile data for the classification of tumor samples. This paper proposes a cross-entropy based multi-filter ensemble (CEMFE) method for microarray data classification. Firstly, multiple filters are used to select the microarray data in order to obtain a plurality of the pre-selected feature subsets with a different classification ability. The top N genes with the highest rank of each subset are integrated so as to form a new data set. Secondly, the cross-entropy algorithm is used to remove the redundant data in the data set. Finally, the wrapper method, which is based on forward feature selection, is used to select the best feature subset. The experimental results show that the proposed method is more efficient than other gene selection methods and that it can achieve a higher classification accuracy under fewer characteristic genes.

  20. Randomized Phase II, Double-Blind, Placebo-Controlled Study of Exemestane With or Without Entinostat in Postmenopausal Women With Locally Recurrent or Metastatic Estrogen Receptor-Positive Breast Cancer Progressing on Treatment With a Nonsteroidal Aromatase Inhibitor

    PubMed Central

    Yardley, Denise A.; Ismail-Khan, Roohi R.; Melichar, Bohuslav; Lichinitser, Mikhail; Munster, Pamela N.; Klein, Pamela M.; Cruickshank, Scott; Miller, Kathy D.; Lee, Min J.; Trepel, Jane B

    2013-01-01

    Purpose Entinostat is an oral isoform selective histone deacetylase inhibitor that targets resistance to hormonal therapies in estrogen receptor–positive (ER+) breast cancer. This randomized, placebo-controlled, phase II study evaluated entinostat combined with the aromatase inhibitor exemestane versus exemestane alone. Patients and Methods Postmenopausal women with ER+ advanced breast cancer progressing on a nonsteroidal aromatase inhibitor were randomly assigned to exemestane 25 mg daily plus entinostat 5 mg once per week (EE) or exemestane plus placebo (EP). The primary end point was progression-free survival (PFS). Blood was collected in a subset of patients for evaluation of protein lysine acetylation as a biomarker of entinostat activity. Results One hundred thirty patients were randomly assigned (EE group, n = 64; EP group, n = 66). Based on intent-to-treat analysis, treatment with EE improved median PFS to 4.3 months versus 2.3 months with EP (hazard ratio [HR], 0.73; 95% CI, 0.50 to 1.07; one-sided P = .055; two-sided P = .11 [predefined significance level of .10, one-sided]). Median overall survival was an exploratory end point and improved to 28.1 months with EE versus 19.8 months with EP (HR, 0.59; 95% CI, 0.36 to 0.97; P = .036). Fatigue and neutropenia were the most frequent grade 3/4 toxicities. Treatment discontinuation because of adverse events was higher in the EE group versus the EP group (11% v 2%). Protein lysine hyperacetylation in the EE biomarker subset was associated with prolonged PFS. Conclusion Entinostat added to exemestane is generally well tolerated and demonstrated activity in patients with ER+ advanced breast cancer in this signal-finding phase II study. Acetylation changes may provide an opportunity to maximize clinical benefit with entinostat. Plans for a confirmatory study are underway. PMID:23650416

  1. On the reliable and flexible solution of practical subset regression problems

    NASA Technical Reports Server (NTRS)

    Verhaegen, M. H.

    1987-01-01

    A new algorithm for solving subset regression problems is described. The algorithm performs a QR decomposition with a new column-pivoting strategy, which permits subset selection directly from the originally defined regression parameters. This, in combination with a number of extensions of the new technique, makes the method a very flexible tool for analyzing subset regression problems in which the parameters have a physical meaning.

  2. Patterns of medicinal plant use: an examination of the Ecuadorian Shuar medicinal flora using contingency table and binomial analyses.

    PubMed

    Bennett, Bradley C; Husby, Chad E

    2008-03-28

    Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.

  3. Analysis of Information Content in High-Spectral Resolution Sounders using Subset Selection Analysis

    NASA Technical Reports Server (NTRS)

    Velez-Reyes, Miguel; Joiner, Joanna

    1998-01-01

    In this paper, we summarize the results of the sensitivity analysis and data reduction carried out to determine the information content of AIRS and IASI channels. The analysis and data reduction was based on the use of subset selection techniques developed in the linear algebra and statistical community to study linear dependencies in high dimensional data sets. We applied the subset selection method to study dependency among channels by studying the dependency among their weighting functions. Also, we applied the technique to study the information provided by the different levels in which the atmosphere is discretized for retrievals and analysis. Results from the method correlate well with intuition in many respects and point out to possible modifications for band selection in sensor design and number and location of levels in the analysis process.

  4. Random Partition Distribution Indexed by Pairwise Information

    PubMed Central

    Dahl, David B.; Day, Ryan; Tsai, Jerry W.

    2017-01-01

    We propose a random partition distribution indexed by pairwise similarity information such that partitions compatible with the similarities are given more probability. The use of pairwise similarities, in the form of distances, is common in some clustering algorithms (e.g., hierarchical clustering), but we show how to use this type of information to define a prior partition distribution for flexible Bayesian modeling. A defining feature of the distribution is that it allocates probability among partitions within a given number of subsets, but it does not shift probability among sets of partitions with different numbers of subsets. Our distribution places more probability on partitions that group similar items yet keeps the total probability of partitions with a given number of subsets constant. The distribution of the number of subsets (and its moments) is available in closed-form and is not a function of the similarities. Our formulation has an explicit probability mass function (with a tractable normalizing constant) so the full suite of MCMC methods may be used for posterior inference. We compare our distribution with several existing partition distributions, showing that our formulation has attractive properties. We provide three demonstrations to highlight the features and relative performance of our distribution. PMID:29276318

  5. Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study.

    PubMed

    Olivera, André Rodrigues; Roesler, Valter; Iochpe, Cirano; Schmidt, Maria Inês; Vigo, Álvaro; Barreto, Sandhi Maria; Duncan, Bruce Bartholow

    2017-01-01

    Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. The best models were created using artificial neural networks and logistic regression. -These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.

  6. Spatial Downscaling of Alien Species Presences using Machine Learning

    NASA Astrophysics Data System (ADS)

    Daliakopoulos, Ioannis N.; Katsanevakis, Stelios; Moustakas, Aristides

    2017-07-01

    Large scale, high-resolution data on alien species distributions are essential for spatially explicit assessments of their environmental and socio-economic impacts, and management interventions for mitigation. However, these data are often unavailable. This paper presents a method that relies on Random Forest (RF) models to distribute alien species presence counts at a finer resolution grid, thus achieving spatial downscaling. A sufficiently large number of RF models are trained using random subsets of the dataset as predictors, in a bootstrapping approach to account for the uncertainty introduced by the subset selection. The method is tested with an approximately 8×8 km2 grid containing floral alien species presence and several indices of climatic, habitat, land use covariates for the Mediterranean island of Crete, Greece. Alien species presence is aggregated at 16×16 km2 and used as a predictor of presence at the original resolution, thus simulating spatial downscaling. Potential explanatory variables included habitat types, land cover richness, endemic species richness, soil type, temperature, precipitation, and freshwater availability. Uncertainty assessment of the spatial downscaling of alien species’ occurrences was also performed and true/false presences and absences were quantified. The approach is promising for downscaling alien species datasets of larger spatial scale but coarse resolution, where the underlying environmental information is available at a finer resolution than the alien species data. Furthermore, the RF architecture allows for tuning towards operationally optimal sensitivity and specificity, thus providing a decision support tool for designing a resource efficient alien species census.

  7. Assessing the Influence of Precipitation Variability on the Vegetation Dynamics of the Mediterranean Rangelands using NDVI and Machine Learning

    NASA Astrophysics Data System (ADS)

    Daliakopoulos, Ioannis; Tsanis, Ioannis

    2017-04-01

    Mitigating the vulnerability of Mediterranean rangelands against degradation is limited by our ability to understand and accurately characterize those impacts in space and time. The Normalized Difference Vegetation Index (NDVI) is a radiometric measure of the photosynthetically active radiation absorbed by green vegetation canopy chlorophyll and is therefore a good surrogate measure of vegetation dynamics. On the other hand, meteorological indices such as the drought assessing Standardised Precipitation Index (SPI) are can be easily estimated from historical and projected datasets at the global scale. This work investigates the potential of driving Random Forest (RF) models with meteorological indices to approximate NDVI-based vegetation dynamics. A sufficiently large number of RF models are trained using random subsets of the dataset as predictors, in a bootstrapping approach to account for the uncertainty introduced by the subset selection. The updated E-OBS-v13.1 dataset of the ENSEMBLES EU FP6 program provides observed monthly meteorological input to estimate SPI over the Mediterranean rangelands. RF models are trained to depict vegetation dynamics using the latest version (3g.v1) of the third generation GIMMS NDVI generated from NOAA's Advanced Very High Resolution Radiometer (AVHRR) sensors. Analysis is conducted for the period 1981-2015 at a gridded spatial resolution of 25 km. Preliminary results demonstrate the potential of machine learning algorithms to effectively mimic the underlying physical relationship of drought and Earth Observation vegetation indices to provide estimates based on precipitation variability.

  8. Algorithm For Solution Of Subset-Regression Problems

    NASA Technical Reports Server (NTRS)

    Verhaegen, Michel

    1991-01-01

    Reliable and flexible algorithm for solution of subset-regression problem performs QR decomposition with new column-pivoting strategy, enables selection of subset directly from originally defined regression parameters. This feature, in combination with number of extensions, makes algorithm very flexible for use in analysis of subset-regression problems in which parameters have physical meanings. Also extended to enable joint processing of columns contaminated by noise with those free of noise, without using scaling techniques.

  9. Lectin Ulex europaeus agglutinin I specifically labels a subset of primary afferent fibers which project selectively to the superficial dorsal horn of the spinal cord.

    PubMed

    Mori, K

    1986-02-19

    To examine differential carbohydrate expression among different subsets of primary afferent fibers, several fluorescein-isothiocyanate conjugated lectins were used in a histochemical study of the dorsal root ganglion (DRG) and spinal cord of the rabbit. The lectin Ulex europaeus agglutinin I specifically labeled a subset of DRG cells and primary afferent fibers which projected to the superficial laminae of the dorsal horn. These results suggest that specific carbohydrates containing L-fucosyl residue is expressed selectively in small diameter primary afferent fibers which subserve nociception or thermoception.

  10. Image Correlation Pattern Optimization for Micro-Scale In-Situ Strain Measurements

    NASA Technical Reports Server (NTRS)

    Bomarito, G. F.; Hochhalter, J. D.; Cannon, A. H.

    2016-01-01

    The accuracy and precision of digital image correlation (DIC) is a function of three primary ingredients: image acquisition, image analysis, and the subject of the image. Development of the first two (i.e. image acquisition techniques and image correlation algorithms) has led to widespread use of DIC; however, fewer developments have been focused on the third ingredient. Typically, subjects of DIC images are mechanical specimens with either a natural surface pattern or a pattern applied to the surface. Research in the area of DIC patterns has primarily been aimed at identifying which surface patterns are best suited for DIC, by comparing patterns to each other. Because the easiest and most widespread methods of applying patterns have a high degree of randomness associated with them (e.g., airbrush, spray paint, particle decoration, etc.), less effort has been spent on exact construction of ideal patterns. With the development of patterning techniques such as microstamping and lithography, patterns can be applied to a specimen pixel by pixel from a patterned image. In these cases, especially because the patterns are reused many times, an optimal pattern is sought such that error introduced into DIC from the pattern is minimized. DIC consists of tracking the motion of an array of nodes from a reference image to a deformed image. Every pixel in the images has an associated intensity (grayscale) value, with discretization depending on the bit depth of the image. Because individual pixel matching by intensity value yields a non-unique scale-dependent problem, subsets around each node are used for identification. A correlation criteria is used to find the best match of a particular subset of a reference image within a deformed image. The reader is referred to references for enumerations of typical correlation criteria. As illustrated by Schreier and Sutton and Lu and Cary systematic errors can be introduced by representing the underlying deformation with under-matched shape functions. An important implication, as discussed by Sutton et al., is that in the presence of highly localized deformations (e.g., crack fronts), error can be reduced by minimizing the subset size. In other words, smaller subsets allow the more accurate resolution of localized deformations. Contrarily, the choice of optimal subset size has been widely studied and a general consensus is that larger subsets with more information content are less prone to random error. Thus, an optimal subset size balances the systematic error from under matched deformations with random error from measurement noise. The alternative approach pursued in the current work is to choose a small subset size and optimize the information content within (i.e., optimizing an applied DIC pattern), rather than finding an optimal subset size. In the literature, many pattern quality metrics have been proposed, e.g., sum of square intensity gradient (SSSIG), mean subset fluctuation, gray level co-occurrence, autocorrelation-based metrics, and speckle-based metrics. The majority of these metrics were developed to quantify the quality of common pseudo-random patterns after they have been applied, and were not created with the intent of pattern generation. As such, it is found that none of the metrics examined in this study are fit to be the objective function of a pattern generation optimization. In some cases, such as with speckle-based metrics, application to pixel by pixel patterns is ill-conditioned and requires somewhat arbitrary extensions. In other cases, such as with the SSSIG, it is shown that trivial solutions exist for the optimum of the metric which are ill-suited for DIC (such as a checkerboard pattern). In the current work, a multi-metric optimization method is proposed whereby quality is viewed as a combination of individual quality metrics. Specifically, SSSIG and two auto-correlation metrics are used which have generally competitive objectives. Thus, each metric could be viewed as a constraint imposed upon the others, thereby precluding the achievement of their trivial solutions. In this way, optimization produces a pattern which balances the benefits of multiple quality metrics. The resulting pattern, along with randomly generated patterns, is subjected to numerical deformations and analyzed with DIC software. The optimal pattern is shown to outperform randomly generated patterns.

  11. A reference dataset for deformable image registration spatial accuracy evaluation using the COPDgene study archive

    NASA Astrophysics Data System (ADS)

    Castillo, Richard; Castillo, Edward; Fuentes, David; Ahmad, Moiz; Wood, Abbie M.; Ludwig, Michelle S.; Guerrero, Thomas

    2013-05-01

    Landmark point-pairs provide a strategy to assess deformable image registration (DIR) accuracy in terms of the spatial registration of the underlying anatomy depicted in medical images. In this study, we propose to augment a publicly available database (www.dir-lab.com) of medical images with large sets of manually identified anatomic feature pairs between breath-hold computed tomography (BH-CT) images for DIR spatial accuracy evaluation. Ten BH-CT image pairs were randomly selected from the COPDgene study cases. Each patient had received CT imaging of the entire thorax in the supine position at one-fourth dose normal expiration and maximum effort full dose inspiration. Using dedicated in-house software, an imaging expert manually identified large sets of anatomic feature pairs between images. Estimates of inter- and intra-observer spatial variation in feature localization were determined by repeat measurements of multiple observers over subsets of randomly selected features. 7298 anatomic landmark features were manually paired between the 10 sets of images. Quantity of feature pairs per case ranged from 447 to 1172. Average 3D Euclidean landmark displacements varied substantially among cases, ranging from 12.29 (SD: 6.39) to 30.90 (SD: 14.05) mm. Repeat registration of uniformly sampled subsets of 150 landmarks for each case yielded estimates of observer localization error, which ranged in average from 0.58 (SD: 0.87) to 1.06 (SD: 2.38) mm for each case. The additions to the online web database (www.dir-lab.com) described in this work will broaden the applicability of the reference data, providing a freely available common dataset for targeted critical evaluation of DIR spatial accuracy performance in multiple clinical settings. Estimates of observer variance in feature localization suggest consistent spatial accuracy for all observers across both four-dimensional CT and COPDgene patient cohorts.

  12. Approximate error conjugation gradient minimization methods

    DOEpatents

    Kallman, Jeffrey S

    2013-05-21

    In one embodiment, a method includes selecting a subset of rays from a set of all rays to use in an error calculation for a constrained conjugate gradient minimization problem, calculating an approximate error using the subset of rays, and calculating a minimum in a conjugate gradient direction based on the approximate error. In another embodiment, a system includes a processor for executing logic, logic for selecting a subset of rays from a set of all rays to use in an error calculation for a constrained conjugate gradient minimization problem, logic for calculating an approximate error using the subset of rays, and logic for calculating a minimum in a conjugate gradient direction based on the approximate error. In other embodiments, computer program products, methods, and systems are described capable of using approximate error in constrained conjugate gradient minimization problems.

  13. Estimation of the lower and upper bounds on the probability of failure using subset simulation and random set theory

    NASA Astrophysics Data System (ADS)

    Alvarez, Diego A.; Uribe, Felipe; Hurtado, Jorge E.

    2018-02-01

    Random set theory is a general framework which comprises uncertainty in the form of probability boxes, possibility distributions, cumulative distribution functions, Dempster-Shafer structures or intervals; in addition, the dependence between the input variables can be expressed using copulas. In this paper, the lower and upper bounds on the probability of failure are calculated by means of random set theory. In order to accelerate the calculation, a well-known and efficient probability-based reliability method known as subset simulation is employed. This method is especially useful for finding small failure probabilities in both low- and high-dimensional spaces, disjoint failure domains and nonlinear limit state functions. The proposed methodology represents a drastic reduction of the computational labor implied by plain Monte Carlo simulation for problems defined with a mixture of representations for the input variables, while delivering similar results. Numerical examples illustrate the efficiency of the proposed approach.

  14. Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection.

    PubMed

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2017-01-01

    Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports.

  15. Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection

    PubMed Central

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2017-01-01

    Objectives Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Methods Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Results Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. Conclusion The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports. PMID:28166263

  16. Choice: 36 band feature selection software with applications to multispectral pattern recognition

    NASA Technical Reports Server (NTRS)

    Jones, W. C.

    1973-01-01

    Feature selection software was developed at the Earth Resources Laboratory that is capable of inputting up to 36 channels and selecting channel subsets according to several criteria based on divergence. One of the criterion used is compatible with the table look-up classifier requirements. The software indicates which channel subset best separates (based on average divergence) each class from all other classes. The software employs an exhaustive search technique, and computer time is not prohibitive. A typical task to select the best 4 of 22 channels for 12 classes takes 9 minutes on a Univac 1108 computer.

  17. Hints of correlation between broad-line and radio variations for 3C 120

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, H. T.; Bai, J. M.; Li, S. K.

    2014-01-01

    In this paper, we investigate the correlation between broad-line and radio variations for the broad-line radio galaxy 3C 120. By the z-transformed discrete correlation function method and the model-independent flux randomization/random subset selection (FR/RSS) Monte Carlo method, we find that broad Hβ line variations lead the 15 GHz variations. The FR/RSS method shows that the Hβ line variations lead the radio variations by a factor of τ{sub ob} = 0.34 ± 0.01 yr. This time lag can be used to locate the position of the emitting region of radio outbursts in the jet, on the order of ∼5 lt-yr frommore » the central engine. This distance is much larger than the size of the broad-line region. The large separation of the radio outburst emitting region from the broad-line region will observably influence the gamma-ray emission in 3C 120.« less

  18. High-Dose Chemotherapy With Autologous Hematopoietic Stem-Cell Transplantation in Metastatic Breast Cancer: Overview of Six Randomized Trials

    PubMed Central

    Berry, Donald A.; Ueno, Naoto T.; Johnson, Marcella M.; Lei, Xiudong; Caputo, Jean; Smith, Dori A.; Yancey, Linda J.; Crump, Michael; Stadtmauer, Edward A.; Biron, Pierre; Crown, John P.; Schmid, Peter; Lotz, Jean-Pierre; Rosti, Giovanni; Bregni, Marco; Demirer, Taner

    2011-01-01

    Purpose High doses of effective chemotherapy are compelling if they can be delivered safely. Substantial interest in supporting high-dose chemotherapy with bone marrow or autologous hematopoietic stem-cell transplantation in the 1980s and 1990s led to the initiation of randomized trials to evaluate its effect in the treatment of metastatic breast cancer. Methods We identified six randomized trials in metastatic breast cancer that evaluated high doses of chemotherapy with transplant support versus a control regimen without stem-cell support. We assembled a single database containing individual patient information from these trials. The primary analysis of overall survival was a log-rank test comparing high dose versus control. We also used Cox proportional hazards regression, adjusting for known covariates. We addressed potential treatment differences within subsets of patients. Results The effect of high-dose chemotherapy on overall survival was not statistically different (median, 2.16 v 2.02 years; P = .08). A statistically significant advantage in progression-free survival (median, 0.91 v 0.69 years) did not translate into survival benefit. Subset analyses found little evidence that there are groups of patients who might benefit from high-dose chemotherapy with hematopoietic support. Conclusion Overall survival of patients with metastatic breast cancer in the six randomized trials was not significantly improved by high-dose chemotherapy; any benefit from high doses was small. No identifiable subset of patients seems to benefit from high-dose chemotherapy. PMID:21768454

  19. A data driven partial ambiguity resolution: Two step success rate criterion, and its simulation demonstration

    NASA Astrophysics Data System (ADS)

    Hou, Yanqing; Verhagen, Sandra; Wu, Jie

    2016-12-01

    Ambiguity Resolution (AR) is a key technique in GNSS precise positioning. In case of weak models (i.e., low precision of data), however, the success rate of AR may be low, which may consequently introduce large errors to the baseline solution in cases of wrong fixing. Partial Ambiguity Resolution (PAR) is therefore proposed such that the baseline precision can be improved by fixing only a subset of ambiguities with high success rate. This contribution proposes a new PAR strategy, allowing to select the subset such that the expected precision gain is maximized among a set of pre-selected subsets, while at the same time the failure rate is controlled. These pre-selected subsets are supposed to obtain the highest success rate among those with the same subset size. The strategy is called Two-step Success Rate Criterion (TSRC) as it will first try to fix a relatively large subset with the fixed failure rate ratio test (FFRT) to decide on acceptance or rejection. In case of rejection, a smaller subset will be fixed and validated by the ratio test so as to fulfill the overall failure rate criterion. It is shown how the method can be practically used, without introducing a large additional computation effort. And more importantly, how it can improve (or at least not deteriorate) the availability in terms of baseline precision comparing to classical Success Rate Criterion (SRC) PAR strategy, based on a simulation validation. In the simulation validation, significant improvements are obtained for single-GNSS on short baselines with dual-frequency observations. For dual-constellation GNSS, the improvement for single-frequency observations on short baselines is very significant, on average 68%. For the medium- to long baselines, with dual-constellation GNSS the average improvement is around 20-30%.

  20. Investigation of the FK5 system in the equatorial zone. Application to the instrumental system of the Second Quito astrolabe catalogue.

    NASA Astrophysics Data System (ADS)

    Kolesnik, Y. B.

    1995-12-01

    15 catalogues produced in the 1980s and 12 catalogues made from 1960 to 1978 have been used to assess the consistency of the FK5 system with observations in the declination zone from -30deg to 30deg. Classical δ-dependent and α-dependent systematic differences (Cat-FK5) have been formed for individual instrumental systems of the catalogues. The weighted mean instrumental systems for two subsets of catalogues centred at the epochs 1970 and 1987 have been constructed. External systematic and random accuracy of the catalogues under analysis and errors of the mean instrumental systems for both selections of catalogues have been estimated and presented in tables. The individual systematic differences of the catalogues and the mean instrumental systems are shown in figures. Numerical values of the total systematic deviations for both mean instrumental systems are given in tables. The results of intercomparison are discussed to assess the actual systematic deviations of the FK5 at the respective epochs and its actual random accuracy. It has been found that the mutual consistency of individual instrumental systems of catalogues of 1980s with respect to zonal systematic differences in both right ascension and declination is significantly better when comparing with the earlier catalogues. Consistency of both catalogue subsets is comparable with respect to α-dependent systematic differences. It is shown that the claimed random errors of the FK5 positions and proper motions are rather realistic, while deviations of the FK5 right ascension and declination system in the equatorial zone for both mean epochs exceed expected ones from the formal considerations. Quick degradation of the FK5 system with time is detected in right ascension. The results in declination are recognized to be less reliable, due to larger inconsistency of the individual instrumental systems. The system of the Second Quito Astrolabe Catalogue (QAC 2) has been investigated by comparison with two subsets of catalogues. It shows rather good consistency with both mean instrumental systems. Some conspicuous local deviations are outlined and discussed. We conclude that the QAC 2 might successfully be used in the compilation of the future second general catalogue of astrolabes as a link between northern and southern astrolabe catalogues.

  1. A Parameter Subset Selection Algorithm for Mixed-Effects Models

    DOE PAGES

    Schmidt, Kathleen L.; Smith, Ralph C.

    2016-01-01

    Mixed-effects models are commonly used to statistically model phenomena that include attributes associated with a population or general underlying mechanism as well as effects specific to individuals or components of the general mechanism. This can include individual effects associated with data from multiple experiments. However, the parameterizations used to incorporate the population and individual effects are often unidentifiable in the sense that parameters are not uniquely specified by the data. As a result, the current literature focuses on model selection, by which insensitive parameters are fixed or removed from the model. Model selection methods that employ information criteria are applicablemore » to both linear and nonlinear mixed-effects models, but such techniques are limited in that they are computationally prohibitive for large problems due to the number of possible models that must be tested. To limit the scope of possible models for model selection via information criteria, we introduce a parameter subset selection (PSS) algorithm for mixed-effects models, which orders the parameters by their significance. In conclusion, we provide examples to verify the effectiveness of the PSS algorithm and to test the performance of mixed-effects model selection that makes use of parameter subset selection.« less

  2. Feature selection for wearable smartphone-based human activity recognition with able bodied, elderly, and stroke patients.

    PubMed

    Capela, Nicole A; Lemaire, Edward D; Baddour, Natalie

    2015-01-01

    Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.

  3. Feature Selection for Wearable Smartphone-Based Human Activity Recognition with Able bodied, Elderly, and Stroke Patients

    PubMed Central

    2015-01-01

    Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations. PMID:25885272

  4. Use of Hundreds of Electrocardiograhpic Biomarkers for Prediction of Mortality in Post-Menopausal Women: The Women’s Health Initiative

    PubMed Central

    Gorodeski, Eiran Z.; Ishwaran, Hemant; Kogalur, Udaya B.; Blackstone, Eugene H.; Hsich, Eileen; Zhang, Zhu-ming; Vitolins, Mara Z.; Manson, JoAnn E.; Curb, J. David; Martin, Lisa W.; Prineas, Ronald J.; Lauer, Michael S.

    2013-01-01

    Background Simultaneous contribution of hundreds of electrocardiographic biomarkers to prediction of long-term mortality in post-menopausal women with clinically normal resting electrocardiograms (ECGs) is unknown. Methods and Results We analyzed ECGs and all-cause mortality in 33,144 women enrolled in Women’s Health Initiative trials, who were without baseline cardiovascular disease or cancer, and had normal ECGs by Minnesota and Novacode criteria. Four hundred and seventy seven ECG biomarkers, encompassing global and individual ECG findings, were measured using computer algorithms. During a median follow-up of 8.1 years (range for survivors 0.5–11.2 years), 1,229 women died. For analyses cohort was randomly split into derivation (n=22,096, deaths=819) and validation (n=11,048, deaths=410) subsets. ECG biomarkers, demographic, and clinical characteristics were simultaneously analyzed using both traditional Cox regression and Random Survival Forest (RSF), a novel algorithmic machine-learning approach. Regression modeling failed to converge. RSF variable selection yielded 20 variables that were independently predictive of long-term mortality, 14 of which were ECG biomarkers related to autonomic tone, atrial conduction, and ventricular depolarization and repolarization. Conclusions We identified 14 ECG biomarkers from amongst hundreds that were associated with long-term prognosis using a novel random forest variable selection methodology. These were related to autonomic tone, atrial conduction, ventricular depolarization, and ventricular repolarization. Quantitative ECG biomarkers have prognostic importance, and may be markers of subclinical disease in apparently healthy post-menopausal women. PMID:21862719

  5. Selecting predictors for discriminant analysis of species performance: an example from an amphibious softwater plant.

    PubMed

    Vanderhaeghe, F; Smolders, A J P; Roelofs, J G M; Hoffmann, M

    2012-03-01

    Selecting an appropriate variable subset in linear multivariate methods is an important methodological issue for ecologists. Interest often exists in obtaining general predictive capacity or in finding causal inferences from predictor variables. Because of a lack of solid knowledge on a studied phenomenon, scientists explore predictor variables in order to find the most meaningful (i.e. discriminating) ones. As an example, we modelled the response of the amphibious softwater plant Eleocharis multicaulis using canonical discriminant function analysis. We asked how variables can be selected through comparison of several methods: univariate Pearson chi-square screening, principal components analysis (PCA) and step-wise analysis, as well as combinations of some methods. We expected PCA to perform best. The selected methods were evaluated through fit and stability of the resulting discriminant functions and through correlations between these functions and the predictor variables. The chi-square subset, at P < 0.05, followed by a step-wise sub-selection, gave the best results. In contrast to expectations, PCA performed poorly, as so did step-wise analysis. The different chi-square subset methods all yielded ecologically meaningful variables, while probable noise variables were also selected by PCA and step-wise analysis. We advise against the simple use of PCA or step-wise discriminant analysis to obtain an ecologically meaningful variable subset; the former because it does not take into account the response variable, the latter because noise variables are likely to be selected. We suggest that univariate screening techniques are a worthwhile alternative for variable selection in ecology. © 2011 German Botanical Society and The Royal Botanical Society of the Netherlands.

  6. Nonparametric Bayesian Dictionary Learning for Analysis of Noisy and Incomplete Images

    PubMed Central

    Zhou, Mingyuan; Chen, Haojun; Paisley, John; Ren, Lu; Li, Lingbo; Xing, Zhengming; Dunson, David; Sapiro, Guillermo; Carin, Lawrence

    2013-01-01

    Nonparametric Bayesian methods are considered for recovery of imagery based upon compressive, incomplete, and/or noisy measurements. A truncated beta-Bernoulli process is employed to infer an appropriate dictionary for the data under test and also for image recovery. In the context of compressive sensing, significant improvements in image recovery are manifested using learned dictionaries, relative to using standard orthonormal image expansions. The compressive-measurement projections are also optimized for the learned dictionary. Additionally, we consider simpler (incomplete) measurements, defined by measuring a subset of image pixels, uniformly selected at random. Spatial interrelationships within imagery are exploited through use of the Dirichlet and probit stick-breaking processes. Several example results are presented, with comparisons to other methods in the literature. PMID:21693421

  7. The quantitative structure-insecticidal activity relationships from plant derived compounds against chikungunya and zika Aedes aegypti (Diptera:Culicidae) vector.

    PubMed

    Saavedra, Laura M; Romanelli, Gustavo P; Rozo, Ciro E; Duchowicz, Pablo R

    2018-01-01

    The insecticidal activity of a series of 62 plant derived molecules against the chikungunya, dengue and zika vector, the Aedes aegypti (Diptera:Culicidae) mosquito, is subjected to a Quantitative Structure-Activity Relationships (QSAR) analysis. The Replacement Method (RM) variable subset selection technique based on Multivariable Linear Regression (MLR) proves to be successful for exploring 4885 molecular descriptors calculated with Dragon 6. The predictive capability of the obtained models is confirmed through an external test set of compounds, Leave-One-Out (LOO) cross-validation and Y-Randomization. The present study constitutes a first necessary computational step for designing less toxic insecticides. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Texture analysis based on the Hermite transform for image classification and segmentation

    NASA Astrophysics Data System (ADS)

    Estudillo-Romero, Alfonso; Escalante-Ramirez, Boris; Savage-Carmona, Jesus

    2012-06-01

    Texture analysis has become an important task in image processing because it is used as a preprocessing stage in different research areas including medical image analysis, industrial inspection, segmentation of remote sensed imaginary, multimedia indexing and retrieval. In order to extract visual texture features a texture image analysis technique is presented based on the Hermite transform. Psychovisual evidence suggests that the Gaussian derivatives fit the receptive field profiles of mammalian visual systems. The Hermite transform describes locally basic texture features in terms of Gaussian derivatives. Multiresolution combined with several analysis orders provides detection of patterns that characterizes every texture class. The analysis of the local maximum energy direction and steering of the transformation coefficients increase the method robustness against the texture orientation. This method presents an advantage over classical filter bank design because in the latter a fixed number of orientations for the analysis has to be selected. During the training stage, a subset of the Hermite analysis filters is chosen in order to improve the inter-class separability, reduce dimensionality of the feature vectors and computational cost during the classification stage. We exhaustively evaluated the correct classification rate of real randomly selected training and testing texture subsets using several kinds of common used texture features. A comparison between different distance measurements is also presented. Results of the unsupervised real texture segmentation using this approach and comparison with previous approaches showed the benefits of our proposal.

  9. Predicting degree of benefit from adjuvant trastuzumab in NSABP trial B-31.

    PubMed

    Pogue-Geile, Katherine L; Kim, Chungyeul; Jeong, Jong-Hyeon; Tanaka, Noriko; Bandos, Hanna; Gavin, Patrick G; Fumagalli, Debora; Goldstein, Lynn C; Sneige, Nour; Burandt, Eike; Taniyama, Yusuke; Bohn, Olga L; Lee, Ahwon; Kim, Seung-Il; Reilly, Megan L; Remillard, Matthew Y; Blackmon, Nicole L; Kim, Seong-Rim; Horne, Zachary D; Rastogi, Priya; Fehrenbacher, Louis; Romond, Edward H; Swain, Sandra M; Mamounas, Eleftherios P; Wickerham, D Lawrence; Geyer, Charles E; Costantino, Joseph P; Wolmark, Norman; Paik, Soonmyung

    2013-12-04

    National Surgical Adjuvant Breast and Bowel Project (NSABP) trial B-31 suggested the efficacy of adjuvant trastuzumab, even in HER2-negative breast cancer. This finding prompted us to develop a predictive model for degree of benefit from trastuzumab using archived tumor blocks from B-31. Case subjects with tumor blocks were randomly divided into discovery (n = 588) and confirmation cohorts (n = 991). A predictive model was built from the discovery cohort through gene expression profiling of 462 genes with nCounter assay. A predefined cut point for the predictive model was tested in the confirmation cohort. Gene-by-treatment interaction was tested with Cox models, and correlations between variables were assessed with Spearman correlation. Principal component analysis was performed on the final set of selected genes. All statistical tests were two-sided. Eight predictive genes associated with HER2 (ERBB2, c17orf37, GRB7) or ER (ESR1, NAT1, GATA3, CA12, IGF1R) were selected for model building. Three-dimensional subset treatment effect pattern plot using two principal components of these genes was used to identify a subset with no benefit from trastuzumab, characterized by intermediate-level ERBB2 and high-level ESR1 mRNA expression. In the confirmation set, the predefined cut points for this model classified patients into three subsets with differential benefit from trastuzumab with hazard ratios of 1.58 (95% confidence interval [CI] = 0.67 to 3.69; P = .29; n = 100), 0.60 (95% CI = 0.41 to 0.89; P = .01; n = 449), and 0.28 (95% CI = 0.20 to 0.41; P < .001; n = 442; P(interaction) between the model and trastuzumab < .001). We developed a gene expression-based predictive model for degree of benefit from trastuzumab and demonstrated that HER2-negative tumors belong to the moderate benefit group, thus providing justification for testing trastuzumab in HER2-negative patients (NSABP B-47).

  10. Predicting Degree of Benefit From Adjuvant Trastuzumab in NSABP Trial B-31

    PubMed Central

    Pogue-Geile, Katherine L.; Kim, Chungyeul; Jeong, Jong-Hyeon; Tanaka, Noriko; Bandos, Hanna; Gavin, Patrick G.; Fumagalli, Debora; Goldstein, Lynn C.; Sneige, Nour; Burandt, Eike; Taniyama, Yusuke; Bohn, Olga L.; Lee, Ahwon; Kim, Seung-Il; Reilly, Megan L.; Remillard, Matthew Y.; Blackmon, Nicole L.; Kim, Seong-Rim; Horne, Zachary D.; Rastogi, Priya; Fehrenbacher, Louis; Romond, Edward H.; Swain, Sandra M.; Mamounas, Eleftherios P.; Wickerham, D. Lawrence; Geyer, Charles E.; Costantino, Joseph P.; Wolmark, Norman

    2013-01-01

    Background National Surgical Adjuvant Breast and Bowel Project (NSABP) trial B-31 suggested the efficacy of adjuvant trastuzumab, even in HER2-negative breast cancer. This finding prompted us to develop a predictive model for degree of benefit from trastuzumab using archived tumor blocks from B-31. Methods Case subjects with tumor blocks were randomly divided into discovery (n = 588) and confirmation cohorts (n = 991). A predictive model was built from the discovery cohort through gene expression profiling of 462 genes with nCounter assay. A predefined cut point for the predictive model was tested in the confirmation cohort. Gene-by-treatment interaction was tested with Cox models, and correlations between variables were assessed with Spearman correlation. Principal component analysis was performed on the final set of selected genes. All statistical tests were two-sided. Results Eight predictive genes associated with HER2 (ERBB2, c17orf37, GRB7) or ER (ESR1, NAT1, GATA3, CA12, IGF1R) were selected for model building. Three-dimensional subset treatment effect pattern plot using two principal components of these genes was used to identify a subset with no benefit from trastuzumab, characterized by intermediate-level ERBB2 and high-level ESR1 mRNA expression. In the confirmation set, the predefined cut points for this model classified patients into three subsets with differential benefit from trastuzumab with hazard ratios of 1.58 (95% confidence interval [CI] = 0.67 to 3.69; P = .29; n = 100), 0.60 (95% CI = 0.41 to 0.89; P = .01; n = 449), and 0.28 (95% CI = 0.20 to 0.41; P < .001; n = 442; P interaction between the model and trastuzumab < .001). Conclusions We developed a gene expression–based predictive model for degree of benefit from trastuzumab and demonstrated that HER2-negative tumors belong to the moderate benefit group, thus providing justification for testing trastuzumab in HER2-negative patients (NSABP B-47). PMID:24262440

  11. A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease

    NASA Astrophysics Data System (ADS)

    Maryam, Setiawan, Noor Akhmad; Wahyunggoro, Oyas

    2017-08-01

    The diagnosis of erythemato-squamous disease is a complex problem and difficult to detect in dermatology. Besides that, it is a major cause of skin cancer. Data mining implementation in the medical field helps expert to diagnose precisely, accurately, and inexpensively. In this research, we use data mining technique to developed a diagnosis model based on multiclass SVM with a novel hybrid feature selection method to diagnose erythemato-squamous disease. Our hybrid feature selection method, named ChiGA (Chi Square and Genetic Algorithm), uses the advantages from filter and wrapper methods to select the optimal feature subset from original feature. Chi square used as filter method to remove redundant features and GA as wrapper method to select the ideal feature subset with SVM used as classifier. Experiment performed with 10 fold cross validation on erythemato-squamous diseases dataset taken from University of California Irvine (UCI) machine learning database. The experimental result shows that the proposed model based multiclass SVM with Chi Square and GA can give an optimum feature subset. There are 18 optimum features with 99.18% accuracy.

  12. Expansion of CD14+CD16+ monocytes producing TNF-α in complication-free diabetes type 1 juvenile onset patients.

    PubMed

    Myśliwska, Jolanta; Smardzewski, Marcin; Marek-Trzonkowska, Natalia; Myśliwiec, Małgorzata; Raczyńska, Krystyna

    2012-10-01

    We concentrated on the complication-free phase of juvenile onset type 1 diabetes mellitus (T1DM) searching for associations between concentration of inflammatory factors TNF-α, CRP and VEGF and two monocyte subsets the CD14(++)CD16(-) and CD14(+)CD16(+). We analysed a randomly selected group of 150 patients without complications (disease duration 2.74 ± 2.51 years) at the start of the project and 5 years later. They were compared with 24 patients with retinopathy (6.53 ± 3.39 years of disease) and 30 healthy volunteers. Our results indicate that in the complication-free period the concentration of TNF-α significantly increased and continued to increase after retinopathy was established. After 5 years the percentage and absolute number of CD14(+)CD16(+) monocytes doubled in complication-free patients. Our study indicates that the size of CD14(+)CD16(+) monocyte subset may be used alternatively to CRP values as an indicator of inflammation grade. Our results imply the necessity of trials using anti-TNF-α therapy in the complication-free phase of the disease. Copyright © 2012 Elsevier Ltd. All rights reserved.

  13. An Iris Segmentation Algorithm based on Edge Orientation for Off-angle Iris Recognition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karakaya, Mahmut; Barstow, Del R; Santos-Villalobos, Hector J

    Iris recognition is known as one of the most accurate and reliable biometrics. However, the accuracy of iris recognition systems depends on the quality of data capture and is negatively affected by several factors such as angle, occlusion, and dilation. In this paper, we present a segmentation algorithm for off-angle iris images that uses edge detection, edge elimination, edge classification, and ellipse fitting techniques. In our approach, we first detect all candidate edges in the iris image by using the canny edge detector; this collection contains edges from the iris and pupil boundaries as well as eyelash, eyelids, iris texturemore » etc. Edge orientation is used to eliminate the edges that cannot be part of the iris or pupil. Then, we classify the remaining edge points into two sets as pupil edges and iris edges. Finally, we randomly generate subsets of iris and pupil edge points, fit ellipses for each subset, select ellipses with similar parameters, and average to form the resultant ellipses. Based on the results from real experiments, the proposed method shows effectiveness in segmentation for off-angle iris images.« less

  14. Bridging the gap between formal and experience-based knowledge for context-aware laparoscopy.

    PubMed

    Katić, Darko; Schuck, Jürgen; Wekerle, Anna-Laura; Kenngott, Hannes; Müller-Stich, Beat Peter; Dillmann, Rüdiger; Speidel, Stefanie

    2016-06-01

    Computer assistance is increasingly common in surgery. However, the amount of information is bound to overload processing abilities of surgeons. We propose methods to recognize the current phase of a surgery for context-aware information filtering. The purpose is to select the most suitable subset of information for surgical situations which require special assistance. We combine formal knowledge, represented by an ontology, and experience-based knowledge, represented by training samples, to recognize phases. For this purpose, we have developed two different methods. Firstly, we use formal knowledge about possible phase transitions to create a composition of random forests. Secondly, we propose a method based on cultural optimization to infer formal rules from experience to recognize phases. The proposed methods are compared with a purely formal knowledge-based approach using rules and a purely experience-based one using regular random forests. The comparative evaluation on laparoscopic pancreas resections and adrenalectomies employs a consistent set of quality criteria on clean and noisy input. The rule-based approaches proved best with noisefree data. The random forest-based ones were more robust in the presence of noise. Formal and experience-based knowledge can be successfully combined for robust phase recognition.

  15. Enhancing the Discrimination Ability of a Gas Sensor Array Based on a Novel Feature Selection and Fusion Framework.

    PubMed

    Deng, Changjian; Lv, Kun; Shi, Debo; Yang, Bo; Yu, Song; He, Zhiyi; Yan, Jia

    2018-06-12

    In this paper, a novel feature selection and fusion framework is proposed to enhance the discrimination ability of gas sensor arrays for odor identification. Firstly, we put forward an efficient feature selection method based on the separability and the dissimilarity to determine the feature selection order for each type of feature when increasing the dimension of selected feature subsets. Secondly, the K-nearest neighbor (KNN) classifier is applied to determine the dimensions of the optimal feature subsets for different types of features. Finally, in the process of establishing features fusion, we come up with a classification dominance feature fusion strategy which conducts an effective basic feature. Experimental results on two datasets show that the recognition rates of Database I and Database II achieve 97.5% and 80.11%, respectively, when k = 1 for KNN classifier and the distance metric is correlation distance (COR), which demonstrates the superiority of the proposed feature selection and fusion framework in representing signal features. The novel feature selection method proposed in this paper can effectively select feature subsets that are conducive to the classification, while the feature fusion framework can fuse various features which describe the different characteristics of sensor signals, for enhancing the discrimination ability of gas sensors and, to a certain extent, suppressing drift effect.

  16. An active learning representative subset selection method using net analyte signal.

    PubMed

    He, Zhonghai; Ma, Zhenhe; Luan, Jingmin; Cai, Xi

    2018-05-05

    To guarantee accurate predictions, representative samples are needed when building a calibration model for spectroscopic measurements. However, in general, it is not known whether a sample is representative prior to measuring its concentration, which is both time-consuming and expensive. In this paper, a method to determine whether a sample should be selected into a calibration set is presented. The selection is based on the difference of Euclidean norm of net analyte signal (NAS) vector between the candidate and existing samples. First, the concentrations and spectra of a group of samples are used to compute the projection matrix, NAS vector, and scalar values. Next, the NAS vectors of candidate samples are computed by multiplying projection matrix with spectra of samples. Scalar value of NAS is obtained by norm computation. The distance between the candidate set and the selected set is computed, and samples with the largest distance are added to selected set sequentially. Last, the concentration of the analyte is measured such that the sample can be used as a calibration sample. Using a validation test, it is shown that the presented method is more efficient than random selection. As a result, the amount of time and money spent on reference measurements is greatly reduced. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. An active learning representative subset selection method using net analyte signal

    NASA Astrophysics Data System (ADS)

    He, Zhonghai; Ma, Zhenhe; Luan, Jingmin; Cai, Xi

    2018-05-01

    To guarantee accurate predictions, representative samples are needed when building a calibration model for spectroscopic measurements. However, in general, it is not known whether a sample is representative prior to measuring its concentration, which is both time-consuming and expensive. In this paper, a method to determine whether a sample should be selected into a calibration set is presented. The selection is based on the difference of Euclidean norm of net analyte signal (NAS) vector between the candidate and existing samples. First, the concentrations and spectra of a group of samples are used to compute the projection matrix, NAS vector, and scalar values. Next, the NAS vectors of candidate samples are computed by multiplying projection matrix with spectra of samples. Scalar value of NAS is obtained by norm computation. The distance between the candidate set and the selected set is computed, and samples with the largest distance are added to selected set sequentially. Last, the concentration of the analyte is measured such that the sample can be used as a calibration sample. Using a validation test, it is shown that the presented method is more efficient than random selection. As a result, the amount of time and money spent on reference measurements is greatly reduced.

  18. Replica amplification of nucleic acid arrays

    DOEpatents

    Church, George M.

    2002-01-01

    A method of producing a plurality of a nucleic acid array, comprising, in order, the steps of amplifying in situ nucleic acid molecules of a first randomly-patterned, immobilized nucleic acid array comprising a heterogeneous pool of nucleic acid molecules affixed to a support, transferring at least a subset of the nucleic acid molecules produced by such amplifying to a second support, and affixing the subset so transferred to the second support to form a second randomly-patterned, immobilized nucleic acid array, wherein the nucleic acid molecules of the second array occupy positions that correspond to those of the nucleic acid molecules from which they were amplified on the first array, so that the first array serves as a template to produce a plurality, is disclosed.

  19. High Dimensional Classification Using Features Annealed Independence Rules.

    PubMed

    Fan, Jianqing; Fan, Yingying

    2008-01-01

    Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.

  20. The Self-Adapting Focused Review System. Probability sampling of medical records to monitor utilization and quality of care.

    PubMed

    Ash, A; Schwartz, M; Payne, S M; Restuccia, J D

    1990-11-01

    Medical record review is increasing in importance as the need to identify and monitor utilization and quality of care problems grow. To conserve resources, reviews are usually performed on a subset of cases. If judgment is used to identify subgroups for review, this raises the following questions: How should subgroups be determined, particularly since the locus of problems can change over time? What standard of comparison should be used in interpreting rates of problems found in subgroups? How can population problem rates be estimated from observed subgroup rates? How can the bias be avoided that arises because reviewers know that selected cases are suspected of having problems? How can changes in problem rates over time be interpreted when evaluating intervention programs? Simple random sampling, an alternative to subgroup review, overcomes the problems implied by these questions but is inefficient. The Self-Adapting Focused Review System (SAFRS), introduced and described here, provides an adaptive approach to record selection that is based upon model-weighted probability sampling. It retains the desirable inferential properties of random sampling while allowing reviews to be concentrated on cases currently thought most likely to be problematic. Model development and evaluation are illustrated using hospital data to predict inappropriate admissions.

  1. A weight-gain-for-gestational-age z score chart for the assessment of maternal weight gain in pregnancy.

    PubMed

    Hutcheon, Jennifer A; Platt, Robert W; Abrams, Barbara; Himes, Katherine P; Simhan, Hyagriv N; Bodnar, Lisa M

    2013-05-01

    To establish the unbiased relation between maternal weight gain in pregnancy and perinatal health, a classification for maternal weight gain is needed that is uncorrelated with gestational age. The goal of this study was to create a weight-gain-for-gestational-age percentile and z score chart to describe the mean, SD, and selected percentiles of maternal weight gain throughout pregnancy in a contemporary cohort of US women. The study population was drawn from normal-weight women with uncomplicated, singleton pregnancies who delivered at the Magee-Womens Hospital in Pittsburgh, PA, 1998-2008. Analyses were based on a randomly selected subset of 648 women for whom serial prenatal weight measurements were available through medical chart record abstraction (6727 weight measurements). The pattern of maternal weight gain throughout gestation was estimated by using a random-effects regression model. The estimates were used to create a chart with the smoothed means, percentiles, and SDs of gestational weight gain for each week of pregnancy. This chart allows researchers to express total weight gain as an age-standardized z score, which can be used in epidemiologic analyses to study the association between pregnancy weight gain and adverse or physiologic pregnancy outcomes independent of gestational age.

  2. Criteria to Extract High-Quality Protein Data Bank Subsets for Structure Users.

    PubMed

    Carugo, Oliviero; Djinović-Carugo, Kristina

    2016-01-01

    It is often necessary to build subsets of the Protein Data Bank to extract structural trends and average values. For this purpose it is mandatory that the subsets are non-redundant and of high quality. The first problem can be solved relatively easily at the sequence level or at the structural level. The second, on the contrary, needs special attention. It is not sufficient, in fact, to consider the crystallographic resolution and other feature must be taken into account: the absence of strings of residues from the electron density maps and from the files deposited in the Protein Data Bank; the B-factor values; the appropriate validation of the structural models; the quality of the electron density maps, which is not uniform; and the temperature of the diffraction experiments. More stringent criteria produce smaller subsets, which can be enlarged with more tolerant selection criteria. The incessant growth of the Protein Data Bank and especially of the number of high-resolution structures is allowing the use of more stringent selection criteria, with a consequent improvement of the quality of the subsets of the Protein Data Bank.

  3. The use of random forests in modelling short-term air pollution effects based on traffic and meteorological conditions: A case study in Wrocław.

    PubMed

    Kamińska, Joanna A

    2018-07-01

    Random forests, an advanced data mining method, are used here to model the regression relationships between concentrations of the pollutants NO 2 , NO x and PM 2.5 , and nine variables describing meteorological conditions, temporal conditions and traffic flow. The study was based on hourly values of wind speed, wind direction, temperature, air pressure and relative humidity, temporal variables, and finally traffic flow, in the two years 2015 and 2016. An air quality measurement station was selected on a main road, located a short distance (40 m) from a large intersection equipped with a traffic flow measurement system. Nine different time subsets were defined, based among other things on the climatic conditions in Wrocław. An analysis was made of the fit of models created for those subsets, and of the importance of the predictors. Both the fit and the importance of particular predictors were found to be dependent on season. The best fit was obtained for models created for the six-month warm season (April-September) and for the summer season (June-August). The most important explanatory variable in the models of concentrations of nitrogen oxides was traffic flow, while in the case of PM 2.5 the most important were meteorological conditions, in particular temperature, wind speed and wind direction. Temporal variables (except for month in the case of PM 2.5 ) were found to have no significant effect on the concentrations of the studied pollutants. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. Metastability of Reversible Random Walks in Potential Fields

    NASA Astrophysics Data System (ADS)

    Landim, C.; Misturini, R.; Tsunoda, K.

    2015-09-01

    Let be an open and bounded subset of , and let be a twice continuously differentiable function. Denote by the discretization of , , and denote by the continuous-time, nearest-neighbor, random walk on which jumps from to at rate . We examine in this article the metastable behavior of among the wells of the potential F.

  5. Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.

    PubMed

    Martinez, Emmanuel; Alvarez, Mario Moises; Trevino, Victor

    2010-08-01

    Biomarker discovery is a typical application from functional genomics. Due to the large number of genes studied simultaneously in microarray data, feature selection is a key step. Swarm intelligence has emerged as a solution for the feature selection problem. However, swarm intelligence settings for feature selection fail to select small features subsets. We have proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. In this study, we tested our algorithm in 11 microarray datasets for brain, leukemia, lung, prostate, and others. We show that the proposed swarm intelligence algorithm successfully increase the classification accuracy and decrease the number of selected features compared to other swarm intelligence methods. Copyright © 2010 Elsevier Ltd. All rights reserved.

  6. Comparison of Decisions Quality of Heuristic Methods with Limited Depth-First Search Techniques in the Graph Shortest Path Problem

    NASA Astrophysics Data System (ADS)

    Vatutin, Eduard

    2017-12-01

    The article deals with the problem of analysis of effectiveness of the heuristic methods with limited depth-first search techniques of decision obtaining in the test problem of getting the shortest path in graph. The article briefly describes the group of methods based on the limit of branches number of the combinatorial search tree and limit of analyzed subtree depth used to solve the problem. The methodology of comparing experimental data for the estimation of the quality of solutions based on the performing of computational experiments with samples of graphs with pseudo-random structure and selected vertices and arcs number using the BOINC platform is considered. It also shows description of obtained experimental results which allow to identify the areas of the preferable usage of selected subset of heuristic methods depending on the size of the problem and power of constraints. It is shown that the considered pair of methods is ineffective in the selected problem and significantly inferior to the quality of solutions that are provided by ant colony optimization method and its modification with combinatorial returns.

  7. An improved wrapper-based feature selection method for machinery fault diagnosis

    PubMed Central

    2017-01-01

    A major issue of machinery fault diagnosis using vibration signals is that it is over-reliant on personnel knowledge and experience in interpreting the signal. Thus, machine learning has been adapted for machinery fault diagnosis. The quantity and quality of the input features, however, influence the fault classification performance. Feature selection plays a vital role in selecting the most representative feature subset for the machine learning algorithm. In contrast, the trade-off relationship between capability when selecting the best feature subset and computational effort is inevitable in the wrapper-based feature selection (WFS) method. This paper proposes an improved WFS technique before integration with a support vector machine (SVM) model classifier as a complete fault diagnosis system for a rolling element bearing case study. The bearing vibration dataset made available by the Case Western Reserve University Bearing Data Centre was executed using the proposed WFS and its performance has been analysed and discussed. The results reveal that the proposed WFS secures the best feature subset with a lower computational effort by eliminating the redundancy of re-evaluation. The proposed WFS has therefore been found to be capable and efficient to carry out feature selection tasks. PMID:29261689

  8. Which products are available for subsetting?

    Atmospheric Science Data Center

    2014-12-08

    ... users to create smaller files (subsets) of the original data by selecting desired parameters, parameter criterion, or latitude and ... fluxes, where the net flux is constrained to the global heat storage in netCDF format. Single Scanner Footprint TOA/Surface Fluxes ...

  9. New TES Search and Subset Application

    Atmospheric Science Data Center

    2017-08-23

    ... Wednesday, September 19, 2012 The Atmospheric Science Data Center (ASDC) at NASA Langley Research Center in collaboration ... pleased to announce the release of the TES Search and Subset Web Application for select TES Level 2 products. Features of the Search and ...

  10. A Simple Joint Estimation Method of Residual Frequency Offset and Sampling Frequency Offset for DVB Systems

    NASA Astrophysics Data System (ADS)

    Kwon, Ki-Won; Cho, Yongsoo

    This letter presents a simple joint estimation method for residual frequency offset (RFO) and sampling frequency offset (STO) in OFDM-based digital video broadcasting (DVB) systems. The proposed method selects a continual pilot (CP) subset from an unsymmetrically and non-uniformly distributed CP set to obtain an unbiased estimator. Simulation results show that the proposed method using a properly selected CP subset is unbiased and performs robustly.

  11. SU-E-J-128: Two-Stage Atlas Selection in Multi-Atlas-Based Image Segmentation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhao, T; Ruan, D

    2015-06-15

    Purpose: In the new era of big data, multi-atlas-based image segmentation is challenged by heterogeneous atlas quality and high computation burden from extensive atlas collection, demanding efficient identification of the most relevant atlases. This study aims to develop a two-stage atlas selection scheme to achieve computational economy with performance guarantee. Methods: We develop a low-cost fusion set selection scheme by introducing a preliminary selection to trim full atlas collection into an augmented subset, alleviating the need for extensive full-fledged registrations. More specifically, fusion set selection is performed in two successive steps: preliminary selection and refinement. An augmented subset is firstmore » roughly selected from the whole atlas collection with a simple registration scheme and the corresponding preliminary relevance metric; the augmented subset is further refined into the desired fusion set size, using full-fledged registration and the associated relevance metric. The main novelty of this work is the introduction of an inference model to relate the preliminary and refined relevance metrics, based on which the augmented subset size is rigorously derived to ensure the desired atlases survive the preliminary selection with high probability. Results: The performance and complexity of the proposed two-stage atlas selection method were assessed using a collection of 30 prostate MR images. It achieved comparable segmentation accuracy as the conventional one-stage method with full-fledged registration, but significantly reduced computation time to 1/3 (from 30.82 to 11.04 min per segmentation). Compared with alternative one-stage cost-saving approach, the proposed scheme yielded superior performance with mean and medium DSC of (0.83, 0.85) compared to (0.74, 0.78). Conclusion: This work has developed a model-guided two-stage atlas selection scheme to achieve significant cost reduction while guaranteeing high segmentation accuracy. The benefit in both complexity and performance is expected to be most pronounced with large-scale heterogeneous data.« less

  12. Study Design and Rationale for a Randomized, Placebo-Controlled, Double-Blind Study to Assess the Efficacy and Safety of Selumetinib in Combination With Docetaxel as Second-Line Treatment in Patients With KRAS-Mutant Advanced Non-Small Cell Lung Cancer (SELECT-1).

    PubMed

    Jänne, Pasi A; Mann, Helen; Ghiorghiu, Dana

    2016-03-01

    Oncogenic KRAS mutations represent the largest genomically defined subset of lung cancer, and are associated with activation of the RAS/RAF/MEK/ERK pathway. There are currently no therapies specifically approved for patients with KRAS-mutant (KRASm) non-small-cell lung cancer (NSCLC), and these patients derive less clinical benefit from chemotherapy than the overall NSCLC population. In a recent phase II study, selumetinib (AZD6244, ARRY-142886), an oral, potent and selective, allosteric MEK1/2 inhibitor with a short half-life, combined with docetaxel, improved clinical outcome as second-line treatment for patients with KRASm NSCLC. This combination will be further evaluated in the phase III SELECT-1 study. SELECT-1 (NCT01933932) is a randomized, double-blind, placebo-controlled phase III study assessing the efficacy and safety of selumetinib plus docetaxel in patients with KRASm locally advanced or metastatic NSCLC, eligible for second-line treatment. The primary endpoint is progression-free survival (PFS); secondary endpoints include overall survival, objective response rate, duration of response, and safety and tolerability. Approximately 634 patients will be randomized 1:1 to receive selumetinib (75 mg twice daily on a continuous oral administration schedule) in combination with docetaxel (75 mg/m(2), intravenously on day 1 of every 21-day cycle) or placebo in combination with docetaxel (same schedule), until objective disease progression. Patients may continue to receive treatment after objective disease progression if deemed appropriate by the investigator. If the primary endpoint of PFS is met, selumetinib plus docetaxel would be the first targeted treatment for patients with KRASm advanced NSCLC who are eligible for second-line treatment. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets.

    PubMed

    Jeong, Seongmun; Kim, Jae-Yoon; Jeong, Soon-Chun; Kang, Sung-Taeg; Moon, Jung-Kyung; Kim, Namshin

    2017-01-01

    Selecting core subsets from plant genotype datasets is important for enhancing cost-effectiveness and to shorten the time required for analyses of genome-wide association studies (GWAS), and genomics-assisted breeding of crop species, etc. Recently, a large number of genetic markers (>100,000 single nucleotide polymorphisms) have been identified from high-density single nucleotide polymorphism (SNP) arrays and next-generation sequencing (NGS) data. However, there is no software available for picking out the efficient and consistent core subset from such a huge dataset. It is necessary to develop software that can extract genetically important samples in a population with coherence. We here present a new program, GenoCore, which can find quickly and efficiently the core subset representing the entire population. We introduce simple measures of coverage and diversity scores, which reflect genotype errors and genetic variations, and can help to select a sample rapidly and accurately for crop genotype dataset. Comparison of our method to other core collection software using example datasets are performed to validate the performance according to genetic distance, diversity, coverage, required system resources, and the number of selected samples. GenoCore selects the smallest, most consistent, and most representative core collection from all samples, using less memory with more efficient scores, and shows greater genetic coverage compared to the other software tested. GenoCore was written in R language, and can be accessed online with an example dataset and test results at https://github.com/lovemun/Genocore.

  14. Performance Analysis of Relay Subset Selection for Amplify-and-Forward Cognitive Relay Networks

    PubMed Central

    Qureshi, Ijaz Mansoor; Malik, Aqdas Naveed; Zubair, Muhammad

    2014-01-01

    Cooperative communication is regarded as a key technology in wireless networks, including cognitive radio networks (CRNs), which increases the diversity order of the signal to combat the unfavorable effects of the fading channels, by allowing distributed terminals to collaborate through sophisticated signal processing. Underlay CRNs have strict interference constraints towards the secondary users (SUs) active in the frequency band of the primary users (PUs), which limits their transmit power and their coverage area. Relay selection offers a potential solution to the challenges faced by underlay networks, by selecting either single best relay or a subset of potential relay set under different design requirements and assumptions. The best relay selection schemes proposed in the literature for amplify-and-forward (AF) based underlay cognitive relay networks have been very well studied in terms of outage probability (OP) and bit error rate (BER), which is deficient in multiple relay selection schemes. The novelty of this work is to study the outage behavior of multiple relay selection in the underlay CRN and derive the closed-form expressions for the OP and BER through cumulative distribution function (CDF) of the SNR received at the destination. The effectiveness of relay subset selection is shown through simulation results. PMID:24737980

  15. In Silico Syndrome Prediction for Coronary Artery Disease in Traditional Chinese Medicine

    PubMed Central

    Lu, Peng; Chen, Jianxin; Zhao, Huihui; Gao, Yibo; Luo, Liangtao; Zuo, Xiaohan; Shi, Qi; Yang, Yiping; Yi, Jianqiang; Wang, Wei

    2012-01-01

    Coronary artery disease (CAD) is the leading causes of deaths in the world. The differentiation of syndrome (ZHENG) is the criterion of diagnosis and therapeutic in TCM. Therefore, syndrome prediction in silico can be improving the performance of treatment. In this paper, we present a Bayesian network framework to construct a high-confidence syndrome predictor based on the optimum subset, that is, collected by Support Vector Machine (SVM) feature selection. Syndrome of CAD can be divided into asthenia and sthenia syndromes. According to the hierarchical characteristics of syndrome, we firstly label every case three types of syndrome (asthenia, sthenia, or both) to solve several syndromes with some patients. On basis of the three syndromes' classes, we design SVM feature selection to achieve the optimum symptom subset and compare this subset with Markov blanket feature select using ROC. Using this subset, the six predictors of CAD's syndrome are constructed by the Bayesian network technique. We also design Naïve Bayes, C4.5 Logistic, Radial basis function (RBF) network compared with Bayesian network. In a conclusion, the Bayesian network method based on the optimum symptoms shows a practical method to predict six syndromes of CAD in TCM. PMID:22567030

  16. Robustly Aligning a Shape Model and Its Application to Car Alignment of Unknown Pose.

    PubMed

    Li, Yan; Gu, Leon; Kanade, Takeo

    2011-09-01

    Precisely localizing in an image a set of feature points that form a shape of an object, such as car or face, is called alignment. Previous shape alignment methods attempted to fit a whole shape model to the observed data, based on the assumption of Gaussian observation noise and the associated regularization process. However, such an approach, though able to deal with Gaussian noise in feature detection, turns out not to be robust or precise because it is vulnerable to gross feature detection errors or outliers resulting from partial occlusions or spurious features from the background or neighboring objects. We address this problem by adopting a randomized hypothesis-and-test approach. First, a Bayesian inference algorithm is developed to generate a shape-and-pose hypothesis of the object from a partial shape or a subset of feature points. For alignment, a large number of hypotheses are generated by randomly sampling subsets of feature points, and then evaluated to find the one that minimizes the shape prediction error. This method of randomized subset-based matching can effectively handle outliers and recover the correct object shape. We apply this approach on a challenging data set of over 5,000 different-posed car images, spanning a wide variety of car types, lighting, background scenes, and partial occlusions. Experimental results demonstrate favorable improvements over previous methods on both accuracy and robustness.

  17. Canary TMA — EDRN Public Portal

    Cancer.gov

    This protocol describes a multi-center, retrospective, case-cohort tissue microarray (TMA) study to evaluate tissue biomarkers for their ability to predict recurrent prostate cancer at the time of radical prostatectomy (RP). Candidate biomarkers will be assessed by performing tissue localization studies on TMAs containing recurrent prostate cancer and non-recurrent prostate cancer. De-identified data will be transferred to a central repository for statistical analysis. Participating institutions will use a variation of case-cohort sampling to randomly select a subset of patients from a retrospectively constructed RP cohort and/or perform selected assays on the cohort. The study endpoint is time to recurrence; of primary interest is five year recurrence free survival. Recurrent prostate cancer is defined by 1) a single serum prostate-specific antigen (PSA) level greater than 0.2 ng/mL after RP and/or 2) receipt of salvage or secondary therapy after RP and/or 3) clinical or radiological evidence of metastatic disease. Non-recurrent prostate cancer is defined as disease with no evidence of recurrence.

  18. Choosing non-redundant representative subsets of protein sequence data sets using submodular optimization.

    PubMed

    Libbrecht, Maxwell W; Bilmes, Jeffrey A; Noble, William Stafford

    2018-04-01

    Selecting a non-redundant representative subset of sequences is a common step in many bioinformatics workflows, such as the creation of non-redundant training sets for sequence and structural models or selection of "operational taxonomic units" from metagenomics data. Previous methods for this task, such as CD-HIT, PISCES, and UCLUST, apply a heuristic threshold-based algorithm that has no theoretical guarantees. We propose a new approach based on submodular optimization. Submodular optimization, a discrete analogue to continuous convex optimization, has been used with great success for other representative set selection problems. We demonstrate that the submodular optimization approach results in representative protein sequence subsets with greater structural diversity than sets chosen by existing methods, using as a gold standard the SCOPe library of protein domain structures. In this setting, submodular optimization consistently yields protein sequence subsets that include more SCOPe domain families than sets of the same size selected by competing approaches. We also show how the optimization framework allows us to design a mixture objective function that performs well for both large and small representative sets. The framework we describe is the best possible in polynomial time (under some assumptions), and it is flexible and intuitive because it applies a suite of generic methods to optimize one of a variety of objective functions. © 2018 Wiley Periodicals, Inc.

  19. Variable selection with stepwise and best subset approaches

    PubMed Central

    2016-01-01

    While purposeful selection is performed partly by software and partly by hand, the stepwise and best subset approaches are automatically performed by software. Two R functions stepAIC() and bestglm() are well designed for stepwise and best subset regression, respectively. The stepAIC() function begins with a full or null model, and methods for stepwise regression can be specified in the direction argument with character values “forward”, “backward” and “both”. The bestglm() function begins with a data frame containing explanatory variables and response variables. The response variable should be in the last column. Varieties of goodness-of-fit criteria can be specified in the IC argument. The Bayesian information criterion (BIC) usually results in more parsimonious model than the Akaike information criterion. PMID:27162786

  20. Identifying highly connected counties compensates for resource limitations when evaluating national spread of an invasive pathogen.

    PubMed

    Sutrave, Sweta; Scoglio, Caterina; Isard, Scott A; Hutchinson, J M Shawn; Garrett, Karen A

    2012-01-01

    Surveying invasive species can be highly resource intensive, yet near-real-time evaluations of invasion progress are important resources for management planning. In the case of the soybean rust invasion of the United States, a linked monitoring, prediction, and communication network saved U.S. soybean growers approximately $200 M/yr. Modeling of future movement of the pathogen (Phakopsora pachyrhizi) was based on data about current disease locations from an extensive network of sentinel plots. We developed a dynamic network model for U.S. soybean rust epidemics, with counties as nodes and link weights a function of host hectarage and wind speed and direction. We used the network model to compare four strategies for selecting an optimal subset of sentinel plots, listed here in order of increasing performance: random selection, zonal selection (based on more heavily weighting regions nearer the south, where the pathogen overwinters), frequency-based selection (based on how frequently the county had been infected in the past), and frequency-based selection weighted by the node strength of the sentinel plot in the network model. When dynamic network properties such as node strength are characterized for invasive species, this information can be used to reduce the resources necessary to survey and predict invasion progress.

  1. An enhancement of binary particle swarm optimization for gene selection in classifying cancer classes

    PubMed Central

    2013-01-01

    Background Gene expression data could likely be a momentous help in the progress of proficient cancer diagnoses and classification platforms. Lately, many researchers analyze gene expression data using diverse computational intelligence methods, for selecting a small subset of informative genes from the data for cancer classification. Many computational methods face difficulties in selecting small subsets due to the small number of samples compared to the huge number of genes (high-dimension), irrelevant genes, and noisy genes. Methods We propose an enhanced binary particle swarm optimization to perform the selection of small subsets of informative genes which is significant for cancer classification. Particle speed, rule, and modified sigmoid function are introduced in this proposed method to increase the probability of the bits in a particle’s position to be zero. The method was empirically applied to a suite of ten well-known benchmark gene expression data sets. Results The performance of the proposed method proved to be superior to other previous related works, including the conventional version of binary particle swarm optimization (BPSO) in terms of classification accuracy and the number of selected genes. The proposed method also requires lower computational time compared to BPSO. PMID:23617960

  2. Aggregating job exit statuses of a plurality of compute nodes executing a parallel application

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

    Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregatingmore » each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.« less

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ditzler, Gregory; Morrison, J. Calvin; Lan, Yemin

    Background: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α– & β–diversity. Feature subset selection – a sub-field of machine learning – can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate betweenmore » age groups in the human gut microbiome. Results: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. Conclusions: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.« less

  4. Subset selective search on the basis of color and preview.

    PubMed

    Donk, Mieke

    2017-01-01

    In the preview paradigm observers are presented with one set of elements (the irrelevant set) followed by the addition of a second set among which the target is presented (the relevant set). Search efficiency in such a preview condition has been demonstrated to be higher than that in a full-baseline condition in which both sets are simultaneously presented, suggesting that a preview of the irrelevant set reduces its influence on the search process. However, numbers of irrelevant and relevant elements are typically not independently manipulated. Moreover, subset selective search also occurs when both sets are presented simultaneously but differ in color. The aim of the present study was to investigate how numbers of irrelevant and relevant elements contribute to preview search in the absence and presence of a color difference between subsets. In two experiments it was demonstrated that a preview reduced the influence of the number of irrelevant elements in the absence but not in the presence of a color difference between subsets. In the presence of a color difference, a preview lowered the effect of the number of relevant elements but only when the target was defined by a unique feature within the relevant set (Experiment 1); when the target was defined by a conjunction of features (Experiment 2), search efficiency as a function of the number of relevant elements was not modulated by a preview. Together the results are in line with the idea that subset selective search is based on different simultaneously operating mechanisms.

  5. Convergence in High Probability of the Quantum Diffusion in a Random Band Matrix Model

    NASA Astrophysics Data System (ADS)

    Margarint, Vlad

    2018-06-01

    We consider Hermitian random band matrices H in d ≥slant 1 dimensions. The matrix elements H_{xy}, indexed by x, y \\in Λ \\subset Z^d, are independent, uniformly distributed random variable if |x-y| is less than the band width W, and zero otherwise. We update the previous results of the converge of quantum diffusion in a random band matrix model from convergence of the expectation to convergence in high probability. The result is uniformly in the size |Λ| of the matrix.

  6. Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification

    PubMed Central

    2012-01-01

    Background Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. Results This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. Conclusions It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network. PMID:22830977

  7. Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction

    PubMed Central

    Arruti, Andoni; Cearreta, Idoia; Álvarez, Aitor; Lazkano, Elena; Sierra, Basilio

    2014-01-01

    Study of emotions in human–computer interaction is a growing research area. This paper shows an attempt to select the most significant features for emotion recognition in spoken Basque and Spanish Languages using different methods for feature selection. RekEmozio database was used as the experimental data set. Several Machine Learning paradigms were used for the emotion classification task. Experiments were executed in three phases, using different sets of features as classification variables in each phase. Moreover, feature subset selection was applied at each phase in order to seek for the most relevant feature subset. The three phases approach was selected to check the validity of the proposed approach. Achieved results show that an instance-based learning algorithm using feature subset selection techniques based on evolutionary algorithms is the best Machine Learning paradigm in automatic emotion recognition, with all different feature sets, obtaining a mean of 80,05% emotion recognition rate in Basque and a 74,82% in Spanish. In order to check the goodness of the proposed process, a greedy searching approach (FSS-Forward) has been applied and a comparison between them is provided. Based on achieved results, a set of most relevant non-speaker dependent features is proposed for both languages and new perspectives are suggested. PMID:25279686

  8. Spatial-Temporal Data Collection with Compressive Sensing in Mobile Sensor Networks

    PubMed Central

    Li, Jiayin; Guo, Wenzhong; Chen, Zhonghui; Xiong, Neal

    2017-01-01

    Compressive sensing (CS) provides an energy-efficient paradigm for data gathering in wireless sensor networks (WSNs). However, the existing work on spatial-temporal data gathering using compressive sensing only considers either multi-hop relaying based or multiple random walks based approaches. In this paper, we exploit the mobility pattern for spatial-temporal data collection and propose a novel mobile data gathering scheme by employing the Metropolis-Hastings algorithm with delayed acceptance, an improved random walk algorithm for a mobile collector to collect data from a sensing field. The proposed scheme exploits Kronecker compressive sensing (KCS) for spatial-temporal correlation of sensory data by allowing the mobile collector to gather temporal compressive measurements from a small subset of randomly selected nodes along a random routing path. More importantly, from the theoretical perspective we prove that the equivalent sensing matrix constructed from the proposed scheme for spatial-temporal compressible signal can satisfy the property of KCS models. The simulation results demonstrate that the proposed scheme can not only significantly reduce communication cost but also improve recovery accuracy for mobile data gathering compared to the other existing schemes. In particular, we also show that the proposed scheme is robust in unreliable wireless environment under various packet losses. All this indicates that the proposed scheme can be an efficient alternative for data gathering application in WSNs. PMID:29117152

  9. Learning accurate and interpretable models based on regularized random forests regression

    PubMed Central

    2014-01-01

    Background Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. Methods In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. Results We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. Conclusion It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. PMID:25350120

  10. Spatial-Temporal Data Collection with Compressive Sensing in Mobile Sensor Networks.

    PubMed

    Zheng, Haifeng; Li, Jiayin; Feng, Xinxin; Guo, Wenzhong; Chen, Zhonghui; Xiong, Neal

    2017-11-08

    Compressive sensing (CS) provides an energy-efficient paradigm for data gathering in wireless sensor networks (WSNs). However, the existing work on spatial-temporal data gathering using compressive sensing only considers either multi-hop relaying based or multiple random walks based approaches. In this paper, we exploit the mobility pattern for spatial-temporal data collection and propose a novel mobile data gathering scheme by employing the Metropolis-Hastings algorithm with delayed acceptance, an improved random walk algorithm for a mobile collector to collect data from a sensing field. The proposed scheme exploits Kronecker compressive sensing (KCS) for spatial-temporal correlation of sensory data by allowing the mobile collector to gather temporal compressive measurements from a small subset of randomly selected nodes along a random routing path. More importantly, from the theoretical perspective we prove that the equivalent sensing matrix constructed from the proposed scheme for spatial-temporal compressible signal can satisfy the property of KCS models. The simulation results demonstrate that the proposed scheme can not only significantly reduce communication cost but also improve recovery accuracy for mobile data gathering compared to the other existing schemes. In particular, we also show that the proposed scheme is robust in unreliable wireless environment under various packet losses. All this indicates that the proposed scheme can be an efficient alternative for data gathering application in WSNs .

  11. Evolution of a Modified Binomial Random Graph by Agglomeration

    NASA Astrophysics Data System (ADS)

    Kang, Mihyun; Pachon, Angelica; Rodríguez, Pablo M.

    2018-02-01

    In the classical Erdős-Rényi random graph G( n, p) there are n vertices and each of the possible edges is independently present with probability p. The random graph G( n, p) is homogeneous in the sense that all vertices have the same characteristics. On the other hand, numerous real-world networks are inhomogeneous in this respect. Such an inhomogeneity of vertices may influence the connection probability between pairs of vertices. The purpose of this paper is to propose a new inhomogeneous random graph model which is obtained in a constructive way from the Erdős-Rényi random graph G( n, p). Given a configuration of n vertices arranged in N subsets of vertices (we call each subset a super-vertex), we define a random graph with N super-vertices by letting two super-vertices be connected if and only if there is at least one edge between them in G( n, p). Our main result concerns the threshold for connectedness. We also analyze the phase transition for the emergence of the giant component and the degree distribution. Even though our model begins with G( n, p), it assumes the existence of some community structure encoded in the configuration. Furthermore, under certain conditions it exhibits a power law degree distribution. Both properties are important for real-world applications.

  12. Identification of features in indexed data and equipment therefore

    DOEpatents

    Jarman, Kristin H [Richland, WA; Daly, Don Simone [Richland, WA; Anderson, Kevin K [Richland, WA; Wahl, Karen L [Richland, WA

    2002-04-02

    Embodiments of the present invention provide methods of identifying a feature in an indexed dataset. Such embodiments encompass selecting an initial subset of indices, the initial subset of indices being encompassed by an initial window-of-interest and comprising at least one beginning index and at least one ending index; computing an intensity weighted measure of dispersion for the subset of indices using a subset of responses corresponding to the subset of indices; and comparing the intensity weighted measure of dispersion to a dispersion critical value determined from an expected value of the intensity weighted measure of dispersion under a null hypothesis of no transient feature present. Embodiments of the present invention also encompass equipment configured to perform the methods of the present invention.

  13. Efficient feature subset selection with probabilistic distance criteria. [pattern recognition

    NASA Technical Reports Server (NTRS)

    Chittineni, C. B.

    1979-01-01

    Recursive expressions are derived for efficiently computing the commonly used probabilistic distance measures as a change in the criteria both when a feature is added to and when a feature is deleted from the current feature subset. A combinatorial algorithm for generating all possible r feature combinations from a given set of s features in (s/r) steps with a change of a single feature at each step is presented. These expressions can also be used for both forward and backward sequential feature selection.

  14. The moderating effects of school climate on bullying prevention efforts.

    PubMed

    Low, Sabina; Van Ryzin, Mark

    2014-09-01

    Bullying prevention efforts have yielded mixed effects over the last 20 years. Program effectiveness is driven by a number of factors (e.g., program elements and implementation), but there remains a dearth of understanding regarding the role of school climate on the impact of bullying prevention programs. This gap is surprising, given research suggesting that bullying problems and climate are strongly related. The current study examines the moderating role of school climate on the impacts of a stand-alone bullying prevention curriculum. In addition, the current study examined 2 different dimensions of school climate across both student and staff perceptions. Data for this study were derived from a Steps to Respect (STR) randomized efficacy trial that was conducted in 33 elementary schools over a 1-year period. Schools were randomly assigned to intervention or wait-listed control condition. Outcome measures (pre-to-post) were obtained from (a) all school staff, (b) a randomly selected subset of 3rd-5th grade teachers in each school, and (c) all students in classrooms of selected teachers. Multilevel analyses revealed that psychosocial climate was strongly related to reductions in bullying-related attitudes and behaviors. Intervention status yielded only 1 significant main effect, although, STR schools with positive psychosocial climate at baseline had less victimization at posttest. Policies/administrative commitment to bullying were related to reduced perpetration among all schools. Findings suggest positive psychosocial climate (from both staff and student perspective) plays a foundational role in bullying prevention, and can optimize effects of stand-alone programs. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  15. Threshold quantum state sharing based on entanglement swapping

    NASA Astrophysics Data System (ADS)

    Qin, Huawang; Tso, Raylin

    2018-06-01

    A threshold quantum state sharing scheme is proposed. The dealer uses the quantum-controlled-not operations to expand the d-dimensional quantum state and then uses the entanglement swapping to distribute the state to a random subset of participants. The participants use the single-particle measurements and unitary operations to recover the initial quantum state. In our scheme, the dealer can share different quantum states among different subsets of participants simultaneously. So the scheme will be very flexible in practice.

  16. Variable screening via quantile partial correlation

    PubMed Central

    Ma, Shujie; Tsai, Chih-Ling

    2016-01-01

    In quantile linear regression with ultra-high dimensional data, we propose an algorithm for screening all candidate variables and subsequently selecting relevant predictors. Specifically, we first employ quantile partial correlation for screening, and then we apply the extended Bayesian information criterion (EBIC) for best subset selection. Our proposed method can successfully select predictors when the variables are highly correlated, and it can also identify variables that make a contribution to the conditional quantiles but are marginally uncorrelated or weakly correlated with the response. Theoretical results show that the proposed algorithm can yield the sure screening set. By controlling the false selection rate, model selection consistency can be achieved theoretically. In practice, we proposed using EBIC for best subset selection so that the resulting model is screening consistent. Simulation studies demonstrate that the proposed algorithm performs well, and an empirical example is presented. PMID:28943683

  17. [Effect of Sijunzi Decoction and enteral nutrition on T-cell subsets and nutritional status in patients with gastric cancer after operation: a randomized controlled trial].

    PubMed

    Cai, Jun; Wang, Hua; Zhou, Sheng; Wu, Bin; Song, Hua-Rong; Xuan, Zheng-Rong

    2008-01-01

    To observe the effect of perioperative application of Sijunzi Decoction and enteral nutrition on T-cell subsets and nutritional status in patients with gastric cancer after operation. In this prospective, single-blinded, controlled clinical trial, fifty-nine patients with gastric cancer were randomly divided into three groups: control group (n=20) and two study groups (group A, n=21; group B, n=18). Sjunzi Decoction (100 ml) was administered via nasogastric tube to the patients in the study group B from the second postoperation day to the 9th postoperation day. Patients in the two study groups were given an isocaloric and isonitrogonous enteral diet, which was started on the second day after operation, and continued for eight days. Patients in the control group were given an isocaloric and isonitrogonous parenteral diet for 9 days. All variables of nutritional status such as serum albumin (ALB), prealbumin (PA), transferrin (TRF) and T-cell subsets were measured one day before operation, and one day and 10 days after operation. All the nutritional variables and the levels of CD3(+), CD4(+), CD4(+)/CD8(+) were decreased significantly after operation. Ten days after operation, T-cell subsets and nutritional variables in the two study groups were increased as compare with the control group. The levels of ALB, TRF and T-cell subsets in the study group B were increased significantly as compared with the study group A (P<0.05). Enteral nutrition assisted with Sijunzi Decoction can positively improve and optimize cellular immune function and nutritional status in the patients with gastric cancer after operation.

  18. Estimating skin blood saturation by selecting a subset of hyperspectral imaging data

    NASA Astrophysics Data System (ADS)

    Ewerlöf, Maria; Salerud, E. Göran; Strömberg, Tomas; Larsson, Marcus

    2015-03-01

    Skin blood haemoglobin saturation (?b) can be estimated with hyperspectral imaging using the wavelength (λ) range of 450-700 nm where haemoglobin absorption displays distinct spectral characteristics. Depending on the image size and photon transport algorithm, computations may be demanding. Therefore, this work aims to evaluate subsets with a reduced number of wavelengths for ?b estimation. White Monte Carlo simulations are performed using a two-layered tissue model with discrete values for epidermal thickness (?epi) and the reduced scattering coefficient (μ's ), mimicking an imaging setup. A detected intensity look-up table is calculated for a range of model parameter values relevant to human skin, adding absorption effects in the post-processing. Skin model parameters, including absorbers, are; μ's (λ), ?epi, haemoglobin saturation (?b), tissue fraction blood (?b) and tissue fraction melanin (?mel). The skin model paired with the look-up table allow spectra to be calculated swiftly. Three inverse models with varying number of free parameters are evaluated: A(?b, ?b), B(?b, ?b, ?mel) and C(all parameters free). Fourteen wavelength candidates are selected by analysing the maximal spectral sensitivity to ?b and minimizing the sensitivity to ?b. All possible combinations of these candidates with three, four and 14 wavelengths, as well as the full spectral range, are evaluated for estimating ?b for 1000 randomly generated evaluation spectra. The results show that the simplified models A and B estimated ?b accurately using four wavelengths (mean error 2.2% for model B). If the number of wavelengths increased, the model complexity needed to be increased to avoid poor estimations.

  19. HGF/MET-directed therapeutics in gastroesophageal cancer: a review of clinical and biomarker development.

    PubMed

    Hack, Stephen P; Bruey, Jean-Marie; Koeppen, Hartmut

    2014-05-30

    Aberrant activation of the HGF/MET signaling axis has been strongly implicated in the malignant transformation and progression of gastroesophageal cancer (GEC). MET receptor overexpression in tumor samples from GEC patients has been consistently correlated with an aggressive metastatic phenotype and poor prognosis. In preclinical GEC models, abrogation of HGF/MET signaling has been shown to induce tumor regression as well as inhibition of metastatic dissemination. Promising clinical results in patient subsets in which MET is overexpressed have spurned several randomized studies of HGF/MET-directed agents, including two pivotal global Phase III trials. Available data highlight the need for predictive biomarkers in order to select patients most likely to benefit from HGF/MET inhibition. In this review, we discuss the current knowledge of mechanisms of MET activation in GEC, the current status of the clinical evaluation of MET-targeted therapies in GEC, characteristics of ongoing randomized GEC trials and the associated efforts to identify and validate biomarkers. We also discuss the considerations and challenges for HGF/MET inhibitor drug development in the GEC setting.

  20. HGF/MET-directed therapeutics in gastroesophageal cancer: a review of clinical and biomarker development

    PubMed Central

    Hack, Stephen P.; Bruey, Jean-Marie; Koeppen, Hartmut

    2014-01-01

    Aberrant activation of the HGF/MET signaling axis has been strongly implicated in the malignant transformation and progression of gastroesophageal cancer (GEC). MET receptor overexpression in tumor samples from GEC patients has been consistently correlated with an aggressive metastatic phenotype and poor prognosis. In preclinical GEC models, abrogation of HGF/MET signaling has been shown to induce tumor regression as well as inhibition of metastatic dissemination. Promising clinical results in patient subsets in which MET is overexpressed have spurned several randomized studies of HGF/MET-directed agents, including two pivotal global Phase III trials. Available data highlight the need for predictive biomarkers in order to select patients most likely to benefit from HGF/MET inhibition. In this review, we discuss the current knowledge of mechanisms of MET activation in GEC, the current status of the clinical evaluation of MET-targeted therapies in GEC, characteristics of ongoing randomized GEC trials and the associated efforts to identify and validate biomarkers. We also discuss the considerations and challenges for HGF/MET inhibitor drug development in the GEC setting. PMID:24930887

  1. Identification of petroleum hydrocarbons using a reduced number of PAHs selected by Procrustes rotation.

    PubMed

    Fernández-Varela, R; Andrade, J M; Muniategui, S; Prada, D; Ramírez-Villalobos, F

    2010-04-01

    Identifying petroleum-related products released into the environment is a complex and difficult task. To achieve this, polycyclic aromatic hydrocarbons (PAHs) are of outstanding importance nowadays. Despite traditional quantitative fingerprinting uses straightforward univariate statistical analyses to differentiate among oils and to assess their sources, a multivariate strategy based on Procrustes rotation (PR) was applied in this paper. The aim of PR is to select a reduced subset of PAHs still capable of performing a satisfactory identification of petroleum-related hydrocarbons. PR selected two subsets of three (C(2)-naphthalene, C(2)-dibenzothiophene and C(2)-phenanthrene) and five (C(1)-decahidronaphthalene, naphthalene, C(2)-phenanthrene, C(3)-phenanthrene and C(2)-fluoranthene) PAHs for each of the two datasets studied here. The classification abilities of each subset of PAHs were tested using principal components analysis, hierarchical cluster analysis and Kohonen neural networks and it was demonstrated that they unraveled the same patterns as the overall set of PAHs. (c) 2009 Elsevier Ltd. All rights reserved.

  2. Accuracy of direct genomic values in Holstein bulls and cows using subsets of SNP markers

    PubMed Central

    2010-01-01

    Background At the current price, the use of high-density single nucleotide polymorphisms (SNP) genotyping assays in genomic selection of dairy cattle is limited to applications involving elite sires and dams. The objective of this study was to evaluate the use of low-density assays to predict direct genomic value (DGV) on five milk production traits, an overall conformation trait, a survival index, and two profit index traits (APR, ASI). Methods Dense SNP genotypes were available for 42,576 SNP for 2,114 Holstein bulls and 510 cows. A subset of 1,847 bulls born between 1955 and 2004 was used as a training set to fit models with various sets of pre-selected SNP. A group of 297 bulls born between 2001 and 2004 and all cows born between 1992 and 2004 were used to evaluate the accuracy of DGV prediction. Ridge regression (RR) and partial least squares regression (PLSR) were used to derive prediction equations and to rank SNP based on the absolute value of the regression coefficients. Four alternative strategies were applied to select subset of SNP, namely: subsets of the highest ranked SNP for each individual trait, or a single subset of evenly spaced SNP, where SNP were selected based on their rank for ASI, APR or minor allele frequency within intervals of approximately equal length. Results RR and PLSR performed very similarly to predict DGV, with PLSR performing better for low-density assays and RR for higher-density SNP sets. When using all SNP, DGV predictions for production traits, which have a higher heritability, were more accurate (0.52-0.64) than for survival (0.19-0.20), which has a low heritability. The gain in accuracy using subsets that included the highest ranked SNP for each trait was marginal (5-6%) over a common set of evenly spaced SNP when at least 3,000 SNP were used. Subsets containing 3,000 SNP provided more than 90% of the accuracy that could be achieved with a high-density assay for cows, and 80% of the high-density assay for young bulls. Conclusions Accurate genomic evaluation of the broader bull and cow population can be achieved with a single genotyping assays containing ~ 3,000 to 5,000 evenly spaced SNP. PMID:20950478

  3. Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers.

    PubMed

    Weigel, K A; de los Campos, G; González-Recio, O; Naya, H; Wu, X L; Long, N; Rosa, G J M; Gianola, D

    2009-10-01

    The objective of the present study was to assess the predictive ability of subsets of single nucleotide polymorphism (SNP) markers for development of low-cost, low-density genotyping assays in dairy cattle. Dense SNP genotypes of 4,703 Holstein bulls were provided by the USDA Agricultural Research Service. A subset of 3,305 bulls born from 1952 to 1998 was used to fit various models (training set), and a subset of 1,398 bulls born from 1999 to 2002 was used to evaluate their predictive ability (testing set). After editing, data included genotypes for 32,518 SNP and August 2003 and April 2008 predicted transmitting abilities (PTA) for lifetime net merit (LNM$), the latter resulting from progeny testing. The Bayesian least absolute shrinkage and selection operator method was used to regress August 2003 PTA on marker covariates in the training set to arrive at estimates of marker effects and direct genomic PTA. The coefficient of determination (R(2)) from regressing the April 2008 progeny test PTA of bulls in the testing set on their August 2003 direct genomic PTA was 0.375. Subsets of 300, 500, 750, 1,000, 1,250, 1,500, and 2,000 SNP were created by choosing equally spaced and highly ranked SNP, with the latter based on the absolute value of their estimated effects obtained from the training set. The SNP effects were re-estimated from the training set for each subset of SNP, and the 2008 progeny test PTA of bulls in the testing set were regressed on corresponding direct genomic PTA. The R(2) values for subsets of 300, 500, 750, 1,000, 1,250, 1,500, and 2,000 SNP with largest effects (evenly spaced SNP) were 0.184 (0.064), 0.236 (0.111), 0.269 (0.190), 0.289 (0.179), 0.307 (0.228), 0.313 (0.268), and 0.322 (0.291), respectively. These results indicate that a low-density assay comprising selected SNP could be a cost-effective alternative for selection decisions and that significant gains in predictive ability may be achieved by increasing the number of SNP allocated to such an assay from 300 or fewer to 1,000 or more.

  4. Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure-Property Relationships.

    PubMed

    Janet, Jon Paul; Kulik, Heather J

    2017-11-22

    Machine learning (ML) of quantum mechanical properties shows promise for accelerating chemical discovery. For transition metal chemistry where accurate calculations are computationally costly and available training data sets are small, the molecular representation becomes a critical ingredient in ML model predictive accuracy. We introduce a series of revised autocorrelation functions (RACs) that encode relationships of the heuristic atomic properties (e.g., size, connectivity, and electronegativity) on a molecular graph. We alter the starting point, scope, and nature of the quantities evaluated in standard ACs to make these RACs amenable to inorganic chemistry. On an organic molecule set, we first demonstrate superior standard AC performance to other presently available topological descriptors for ML model training, with mean unsigned errors (MUEs) for atomization energies on set-aside test molecules as low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs on set-aside test molecules in spin-state splitting in comparison to 15-20× higher errors for feature sets that encode whole-molecule structural information. Systematic feature selection methods including univariate filtering, recursive feature elimination, and direct optimization (e.g., random forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5× smaller than the full RAC set produce sub- to 1 kcal/mol spin-splitting MUEs, with good transferability to metal-ligand bond length prediction (0.004-5 Å MUE) and redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature selection results across property sets reveals the relative importance of local, electronic descriptors (e.g., electronegativity, atomic number) in spin-splitting and distal, steric effects in redox potential and bond lengths.

  5. Probabilistic streamflow forecasting for hydroelectricity production: A comparison of two non-parametric system identification algorithms

    NASA Astrophysics Data System (ADS)

    Pande, Saket; Sharma, Ashish

    2014-05-01

    This study is motivated by the need to robustly specify, identify, and forecast runoff generation processes for hydroelectricity production. It atleast requires the identification of significant predictors of runoff generation and the influence of each such significant predictor on runoff response. To this end, we compare two non-parametric algorithms of predictor subset selection. One is based on information theory that assesses predictor significance (and hence selection) based on Partial Information (PI) rationale of Sharma and Mehrotra (2014). The other algorithm is based on a frequentist approach that uses bounds on probability of error concept of Pande (2005), assesses all possible predictor subsets on-the-go and converges to a predictor subset in an computationally efficient manner. Both the algorithms approximate the underlying system by locally constant functions and select predictor subsets corresponding to these functions. The performance of the two algorithms is compared on a set of synthetic case studies as well as a real world case study of inflow forecasting. References: Sharma, A., and R. Mehrotra (2014), An information theoretic alternative to model a natural system using observational information alone, Water Resources Research, 49, doi:10.1002/2013WR013845. Pande, S. (2005), Generalized local learning in water resource management, PhD dissertation, Utah State University, UT-USA, 148p.

  6. Reengineered glucose oxidase for amperometric glucose determination in diabetes analytics.

    PubMed

    Arango Gutierrez, Erik; Mundhada, Hemanshu; Meier, Thomas; Duefel, Hartmut; Bocola, Marco; Schwaneberg, Ulrich

    2013-12-15

    Glucose oxidase is an oxidoreductase exhibiting a high β-D-glucose specificity and high stability which renders glucose oxidase well-suited for applications in diabetes care. Nevertheless, GOx activity is highly oxygen dependent which can lead to inaccuracies in amperometric β-D-glucose determinations. Therefore a directed evolution campaign with two rounds of random mutagenesis (SeSaM followed by epPCR), site saturation mutagenesis studies on individual positions, and one simultaneous site saturation library (OmniChange; 4 positions) was performed. A diabetes care well suited mediator (quinone diimine) was selected and the GOx variant (T30V I94V) served as starting point. For directed GOx evolution a microtiter plate detection system based on the quinone diimine mediator was developed and the well-known ABTS-assay was applied in microtiter plate format to validate oxygen independency of improved GOx variants. Two iterative rounds of random diversity generation and screening yielded to two subsets of amino acid positions which mainly improved activity (A173, A332) and oxygen independency (F414, V560). Simultaneous site saturation of all four positions with a reduced subset of amino acids using the OmniChange method yielded finally variant V7 with a 37-fold decreased oxygen dependency (mediator activity: 7.4 U/mg WT, 47.5 U/mg V7; oxygen activity: 172.3 U/mg WT, 30.1 U/mg V7). V7 is still highly β-D-glucose specific, highly active with the quinone diimine mediator and thermal resistance is retained (prerequisite for GOx coating of diabetes test stripes). The latter properties and V7's oxygen insensitivity make V7 a very promising candidate to replace standard GOx in diabetes care applications. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. Random sample consensus combined with partial least squares regression (RANSAC-PLS) for microbial metabolomics data mining and phenotype improvement.

    PubMed

    Teoh, Shao Thing; Kitamura, Miki; Nakayama, Yasumune; Putri, Sastia; Mukai, Yukio; Fukusaki, Eiichiro

    2016-08-01

    In recent years, the advent of high-throughput omics technology has made possible a new class of strain engineering approaches, based on identification of possible gene targets for phenotype improvement from omic-level comparison of different strains or growth conditions. Metabolomics, with its focus on the omic level closest to the phenotype, lends itself naturally to this semi-rational methodology. When a quantitative phenotype such as growth rate under stress is considered, regression modeling using multivariate techniques such as partial least squares (PLS) is often used to identify metabolites correlated with the target phenotype. However, linear modeling techniques such as PLS require a consistent metabolite-phenotype trend across the samples, which may not be the case when outliers or multiple conflicting trends are present in the data. To address this, we proposed a data-mining strategy that utilizes random sample consensus (RANSAC) to select subsets of samples with consistent trends for construction of better regression models. By applying a combination of RANSAC and PLS (RANSAC-PLS) to a dataset from a previous study (gas chromatography/mass spectrometry metabolomics data and 1-butanol tolerance of 19 yeast mutant strains), new metabolites were indicated to be correlated with tolerance within certain subsets of the samples. The relevance of these metabolites to 1-butanol tolerance were then validated from single-deletion strains of corresponding metabolic genes. The results showed that RANSAC-PLS is a promising strategy to identify unique metabolites that provide additional hints for phenotype improvement, which could not be detected by traditional PLS modeling using the entire dataset. Copyright © 2016 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  8. Object-based random forest classification of Landsat ETM+ and WorldView-2 satellite imagery for mapping lowland native grassland communities in Tasmania, Australia

    NASA Astrophysics Data System (ADS)

    Melville, Bethany; Lucieer, Arko; Aryal, Jagannath

    2018-04-01

    This paper presents a random forest classification approach for identifying and mapping three types of lowland native grassland communities found in the Tasmanian Midlands region. Due to the high conservation priority assigned to these communities, there has been an increasing need to identify appropriate datasets that can be used to derive accurate and frequently updateable maps of community extent. Therefore, this paper proposes a method employing repeat classification and statistical significance testing as a means of identifying the most appropriate dataset for mapping these communities. Two datasets were acquired and analysed; a Landsat ETM+ scene, and a WorldView-2 scene, both from 2010. Training and validation data were randomly subset using a k-fold (k = 50) approach from a pre-existing field dataset. Poa labillardierei, Themeda triandra and lowland native grassland complex communities were identified in addition to dry woodland and agriculture. For each subset of randomly allocated points, a random forest model was trained based on each dataset, and then used to classify the corresponding imagery. Validation was performed using the reciprocal points from the independent subset that had not been used to train the model. Final training and classification accuracies were reported as per class means for each satellite dataset. Analysis of Variance (ANOVA) was undertaken to determine whether classification accuracy differed between the two datasets, as well as between classifications. Results showed mean class accuracies between 54% and 87%. Class accuracy only differed significantly between datasets for the dry woodland and Themeda grassland classes, with the WorldView-2 dataset showing higher mean classification accuracies. The results of this study indicate that remote sensing is a viable method for the identification of lowland native grassland communities in the Tasmanian Midlands, and that repeat classification and statistical significant testing can be used to identify optimal datasets for vegetation community mapping.

  9. Ordering Elements and Subsets: Examples for Student Understanding

    ERIC Educational Resources Information Center

    Mellinger, Keith E.

    2004-01-01

    Teaching the art of counting can be quite difficult. Many undergraduate students have difficulty separating the ideas of permutation, combination, repetition, etc. This article develops some examples to help explain some of the underlying theory while looking carefully at the selection of various subsets of objects from a larger collection. The…

  10. Diabetes Care Management Teams Did Not Reduce Utilization When Compared With Traditional Care: A Randomized Cluster Trial.

    PubMed

    Kearns, Patrick

    2017-10-01

    PURPOSE: Health services research evaluates redesign models for primary care. Care management is one alternative. Evaluation includes resource utilization as a criterion. Compare the impact of care-manager teams on resource utilization. The comparison includes entire panes of patients and the subset of patients with diabetes. DESIGN: Randomized, prospective, cohort study comparing change in utilization rates between groups, pre- and post-intervention. METHODOLOGY: Ten primary care physician panels in a safety-net setting. Ten physicians were randomized to either a care-management approach (Group 1) or a traditional approach (Group 2). Care managers focused on diabetes and the cardiovascular cluster of diseases. Analysis compared rates of hospitalization, 30-day readmission, emergency room visits, and urgent care visits. Analysis compared baseline rates to annual rates after a yearlong run-in for entire panels and the subset of patients with diabetes. RESULTS: Resource utilization showed no statistically significant change between baseline and Year 3 (P=.79). Emergency room visits and hospital readmission increased for both groups (P=.90), while hospital admissions and urgent care visits decreased (P=.73). Similarly, utilization was not significantly different for patients with diabetes (P=.69). CONCLUSIONS: A care-management team approach failed to improve resource utilization rates by entire panels and the subset of diabetic patients compared to traditional care. This reinforces the need for further evidentiary support for the care-management model's hypothesis in the safety net.

  11. Dimethyl fumarate–induced lymphopenia in MS due to differential T-cell subset apoptosis

    PubMed Central

    Ghadiri, Mahtab; Rezk, Ayman; Li, Rui; Evans, Ashley; Luessi, Felix; Zipp, Frauke; Giacomini, Paul S.; Antel, Jack

    2017-01-01

    Objective: To examine the mechanism underlying the preferential CD8+ vs CD4+ T-cell lymphopenia induced by dimethyl fumarate (DMF) treatment of MS. Methods: Total lymphocyte counts and comprehensive T-cell subset analyses were performed in high-quality samples obtained from patients with MS prior to and serially following DMF treatment initiation. Random coefficient mixed-effects analysis was used to model the trajectory of T-cell subset losses in vivo. Survival and apoptosis of distinct T-cell subsets were assessed following in vitro exposure to DMF. Results: Best-fit modeling indicated that the DMF-induced preferential reductions in CD8+ vs CD4+ T-cell counts nonetheless followed similar depletion kinetics, suggesting a similar rather than distinct mechanism involved in losses of both the CD8+ and CD4+ T cells. In vitro, DMF exposure resulted in dose-dependent reductions in T-cell survival, which were found to reflect apoptotic cell death. This DMF-induced apoptosis was greater for CD8+ vs CD4+, as well as for memory vs naive, and conventional vs regulatory T-cell subsets, a pattern which mirrored preferential T-cell subset losses that we observed during in vivo treatment of patients. Conclusions: Differential apoptosis mediated by DMF may underlie the preferential lymphopenia of distinct T-cell subsets, including CD8+ and memory T-cell subsets, seen in treated patients with MS. This differential susceptibility of distinct T-cell subsets to DMF-induced apoptosis may contribute to both the safety and efficacy profiles of DMF in patients with MS. PMID:28377940

  12. Classification of coronary artery tissues using optical coherence tomography imaging in Kawasaki disease

    NASA Astrophysics Data System (ADS)

    Abdolmanafi, Atefeh; Prasad, Arpan Suravi; Duong, Luc; Dahdah, Nagib

    2016-03-01

    Intravascular imaging modalities, such as Optical Coherence Tomography (OCT) allow nowadays improving diagnosis, treatment, follow-up, and even prevention of coronary artery disease in the adult. OCT has been recently used in children following Kawasaki disease (KD), the most prevalent acquired coronary artery disease during childhood with devastating complications. The assessment of coronary artery layers with OCT and early detection of coronary sequelae secondary to KD is a promising tool for preventing myocardial infarction in this population. More importantly, OCT is promising for tissue quantification of the inner vessel wall, including neo intima luminal myofibroblast proliferation, calcification, and fibrous scar deposits. The goal of this study is to classify the coronary artery layers of OCT imaging obtained from a series of KD patients. Our approach is focused on developing a robust Random Forest classifier built on the idea of randomly selecting a subset of features at each node and based on second- and higher-order statistical texture analysis which estimates the gray-level spatial distribution of images by specifying the local features of each pixel and extracting the statistics from their distribution. The average classification accuracy for intima and media are 76.36% and 73.72% respectively. Random forest classifier with texture analysis promises for classification of coronary artery tissue.

  13. Monocyte Subset Dynamics in Human Atherosclerosis Can Be Profiled with Magnetic Nano-Sensors

    PubMed Central

    Wildgruber, Moritz; Lee, Hakho; Chudnovskiy, Aleksey; Yoon, Tae-Jong; Etzrodt, Martin; Pittet, Mikael J.; Nahrendorf, Matthias; Croce, Kevin; Libby, Peter; Weissleder, Ralph; Swirski, Filip K.

    2009-01-01

    Monocytes are circulating macrophage and dendritic cell precursors that populate healthy and diseased tissue. In humans, monocytes consist of at least two subsets whose proportions in the blood fluctuate in response to coronary artery disease, sepsis, and viral infection. Animal studies have shown that specific shifts in the monocyte subset repertoire either exacerbate or attenuate disease, suggesting a role for monocyte subsets as biomarkers and therapeutic targets. Assays are therefore needed that can selectively and rapidly enumerate monocytes and their subsets. This study shows that two major human monocyte subsets express similar levels of the receptor for macrophage colony stimulating factor (MCSFR) but differ in their phagocytic capacity. We exploit these properties and custom-engineer magnetic nanoparticles for ex vivo sensing of monocytes and their subsets. We present a two-dimensional enumerative mathematical model that simultaneously reports number and proportion of monocyte subsets in a small volume of human blood. Using a recently described diagnostic magnetic resonance (DMR) chip with 1 µl sample size and high throughput capabilities, we then show that application of the model accurately quantifies subset fluctuations that occur in patients with atherosclerosis. PMID:19461894

  14. Prevalence of skeletal and eye malformations in frogs from north-central United States: estimations based on collections from randomly selected sites.

    PubMed

    Schoff, Patrick K; Johnson, Catherine M; Schotthoefer, Anna M; Murphy, Joseph E; Lieske, Camilla; Cole, Rebecca A; Johnson, Lucinda B; Beasley, Val R

    2003-07-01

    Skeletal malformation rates for several frog species were determined in a set of randomly selected wetlands in the north-central USA over three consecutive years. In 1998, 62 sites yielded 389 metamorphic frogs, nine (2.3%) of which had skeletal or eye malformations. A subset of the original sites was surveyed in the following 2 yr. In 1999, 1,085 metamorphic frogs were collected from 36 sites and 17 (1.6%) had skeletal or eye malformations, while in 2000, examination of 1,131 metamorphs yielded 16 (1.4%) with skeletal or eye malformations. Hindlimb malformations predominated in all three years, but other abnormalities, involving forelimb, eye, and pelvis were also found. Northern leopard frogs (Rana pipiens) constituted the majority of collected metamorphs as well as most of the malformed specimens. However, malformations were also noted in mink frogs (R. septentrionalis), wood frogs (R. sylvatica), and gray tree frogs (Hyla spp.). The malformed specimens were found in clustered sites in all three years but the cluster locations were not the same in any year. The malformation rates reported here are higher than the 0.3% rate determined for metamorphic frogs collected from similar sites in Minnesota in the 1960s, and thus, appear to represent an elevation of an earlier baseline malformation rate.

  15. Prevalence of skeletal and eye malformations in frogs from north-central United States: estimations based on collections from randomly selected sites

    USGS Publications Warehouse

    Schoff, P.K.; Johnson, C.M.; Schotthoefer, A.M.; Murphy, J.E.; Lieske, C.; Cole, Rebecca A.; Johnson, L.B.; Beasley, V.R.

    2003-01-01

    Skeletal malformation rates for several frog species were determined in a set of randomly selected wetlands in the north-central USA over three consecutive years. In 1998, 62 sites yielded 389 metamorphic frogs, nine (2.3%) of which had skeletal or eye malformations. A subset of the original sites was surveyed in the following 2 yr. In 1999, 1,085 metamorphic frogs were collected from 36 sites and 17 (1.6%) had skeletal or eye malformations, while in 2000, examination of 1,131 metamorphs yielded 16 (1.4%) with skeletal or eye malformations. Hindlimb malformations predominated in all three years, but other abnormalities, involving forelimb, eye, and pelvis were also found. Northern leopard frogs (Rana pipiens) constituted the majority of collected metamorphs as well as most of the malformed specimens. However, malformations were also noted in mink frogs (R. septentrionalis), wood frogs (R. sylvatica), and gray tree frogs (Hyla spp.). The malformed specimens were found in clustered sites in all three years but the cluster locations were not the same in any year. The malformation rates reported here are higher than the 0.3% rate determined for metamorphic frogs collected from similar sites in Minnesota in the 1960s, and thus, appear to represent an elevation of an earlier baseline malformation rate.

  16. Chronic lymphocytic leukemia antibodies with a common stereotypic rearrangement recognize nonmuscle myosin heavy chain IIA

    PubMed Central

    Catera, Rosa; Hatzi, Katerina; Yan, Xiao-Jie; Zhang, Lu; Wang, Xiao Bo; Fales, Henry M.; Allen, Steven L.; Kolitz, Jonathan E.; Rai, Kanti R.; Chiorazzi, Nicholas

    2008-01-01

    Leukemic B lymphocytes of a large group of unrelated chronic lymphocytic leukemia (CLL) patients express an unmutated heavy chain immunoglobulin variable (V) region encoded by IGHV1-69, IGHD3-16, and IGHJ3 with nearly identical heavy and light chain complementarity-determining region 3 sequences. The likelihood that these patients developed CLL clones with identical antibody V regions randomly is highly improbable and suggests selection by a common antigen. Monoclonal antibodies (mAbs) from this stereotypic subset strongly bind cytoplasmic structures in HEp-2 cells. Therefore, HEp-2 cell extracts were immunoprecipitated with recombinant stereotypic subset-specific CLL mAbs, revealing a major protein band at approximately 225 kDa that was identified by mass spectrometry as nonmuscle myosin heavy chain IIA (MYHIIA). Reactivity of the stereotypic mAbs with MYHIIA was confirmed by Western blot and immunofluorescence colocalization with anti-MYHIIA antibody. Treatments that alter MYHIIA amounts and cytoplasmic localization resulted in a corresponding change in binding to these mAbs. The appearance of MYHIIA on the surface of cells undergoing stress or apoptosis suggests that CLL mAb may generally bind molecules exposed as a consequence of these events. Binding of CLL mAb to MYHIIA could promote the development, survival, and expansion of these leukemic cells. PMID:18812466

  17. Selecting informative subsets of sparse supermatrices increases the chance to find correct trees.

    PubMed

    Misof, Bernhard; Meyer, Benjamin; von Reumont, Björn Marcus; Kück, Patrick; Misof, Katharina; Meusemann, Karen

    2013-12-03

    Character matrices with extensive missing data are frequently used in phylogenomics with potentially detrimental effects on the accuracy and robustness of tree inference. Therefore, many investigators select taxa and genes with high data coverage. Drawbacks of these selections are their exclusive reliance on data coverage without consideration of actual signal in the data which might, thus, not deliver optimal data matrices in terms of potential phylogenetic signal. In order to circumvent this problem, we have developed a heuristics implemented in a software called mare which (1) assesses information content of genes in supermatrices using a measure of potential signal combined with data coverage and (2) reduces supermatrices with a simple hill climbing procedure to submatrices with high total information content. We conducted simulation studies using matrices of 50 taxa × 50 genes with heterogeneous phylogenetic signal among genes and data coverage between 10-30%. With matrices of 50 taxa × 50 genes with heterogeneous phylogenetic signal among genes and data coverage between 10-30% Maximum Likelihood (ML) tree reconstructions failed to recover correct trees. A selection of a data subset with the herein proposed approach increased the chance to recover correct partial trees more than 10-fold. The selection of data subsets with the herein proposed simple hill climbing procedure performed well either considering the information content or just a simple presence/absence information of genes. We also applied our approach on an empirical data set, addressing questions of vertebrate systematics. With this empirical dataset selecting a data subset with high information content and supporting a tree with high average boostrap support was most successful if information content of genes was considered. Our analyses of simulated and empirical data demonstrate that sparse supermatrices can be reduced on a formal basis outperforming the usually used simple selections of taxa and genes with high data coverage.

  18. Generating a Simulated Fluid Flow over a Surface Using Anisotropic Diffusion

    NASA Technical Reports Server (NTRS)

    Rodriguez, David L. (Inventor); Sturdza, Peter (Inventor)

    2016-01-01

    A fluid-flow simulation over a computer-generated surface is generated using a diffusion technique. The surface is comprised of a surface mesh of polygons. A boundary-layer fluid property is obtained for a subset of the polygons of the surface mesh. A gradient vector is determined for a selected polygon, the selected polygon belonging to the surface mesh but not one of the subset of polygons. A maximum and minimum diffusion rate is determined along directions determined using the gradient vector corresponding to the selected polygon. A diffusion-path vector is defined between a point in the selected polygon and a neighboring point in a neighboring polygon. An updated fluid property is determined for the selected polygon using a variable diffusion rate, the variable diffusion rate based on the minimum diffusion rate, maximum diffusion rate, and the gradient vector.

  19. Electrode channel selection based on backtracking search optimization in motor imagery brain-computer interfaces.

    PubMed

    Dai, Shengfa; Wei, Qingguo

    2017-01-01

    Common spatial pattern algorithm is widely used to estimate spatial filters in motor imagery based brain-computer interfaces. However, use of a large number of channels will make common spatial pattern tend to over-fitting and the classification of electroencephalographic signals time-consuming. To overcome these problems, it is necessary to choose an optimal subset of the whole channels to save computational time and improve the classification accuracy. In this paper, a novel method named backtracking search optimization algorithm is proposed to automatically select the optimal channel set for common spatial pattern. Each individual in the population is a N-dimensional vector, with each component representing one channel. A population of binary codes generate randomly in the beginning, and then channels are selected according to the evolution of these codes. The number and positions of 1's in the code denote the number and positions of chosen channels. The objective function of backtracking search optimization algorithm is defined as the combination of classification error rate and relative number of channels. Experimental results suggest that higher classification accuracy can be achieved with much fewer channels compared to standard common spatial pattern with whole channels.

  20. Diameter distribution in a Brazilian tropical dry forest domain: predictions for the stand and species.

    PubMed

    Lima, Robson B DE; Bufalino, Lina; Alves, Francisco T; Silva, José A A DA; Ferreira, Rinaldo L C

    2017-01-01

    Currently, there is a lack of studies on the correct utilization of continuous distributions for dry tropical forests. Therefore, this work aims to investigate the diameter structure of a brazilian tropical dry forest and to select suitable continuous distributions by means of statistic tools for the stand and the main species. Two subsets were randomly selected from 40 plots. Diameter at base height was obtained. The following functions were tested: log-normal; gamma; Weibull 2P and Burr. The best fits were selected by Akaike's information validation criterion. Overall, the diameter distribution of the dry tropical forest was better described by negative exponential curves and positive skewness. The forest studied showed diameter distributions with decreasing probability for larger trees. This behavior was observed for both the main species and the stand. The generalization of the function fitted for the main species show that the development of individual models is needed. The Burr function showed good flexibility to describe the diameter structure of the stand and the behavior of Mimosa ophthalmocentra and Bauhinia cheilantha species. For Poincianella bracteosa, Aspidosperma pyrifolium and Myracrodum urundeuva better fitting was obtained with the log-normal function.

  1. Are mutagenic non D-loop direct repeat motifs in mitochondrial DNA under a negative selection pressure?

    PubMed Central

    Lakshmanan, Lakshmi Narayanan; Gruber, Jan; Halliwell, Barry; Gunawan, Rudiyanto

    2015-01-01

    Non D-loop direct repeats (DRs) in mitochondrial DNA (mtDNA) have been commonly implicated in the mutagenesis of mtDNA deletions associated with neuromuscular disease and ageing. Further, these DRs have been hypothesized to put a constraint on the lifespan of mammals and are under a negative selection pressure. Using a compendium of 294 mammalian mtDNA, we re-examined the relationship between species lifespan and the mutagenicity of such DRs. Contradicting the prevailing hypotheses, we found no significant evidence that long-lived mammals possess fewer mutagenic DRs than short-lived mammals. By comparing DR counts in human mtDNA with those in selectively randomized sequences, we also showed that the number of DRs in human mtDNA is primarily determined by global mtDNA properties, such as the bias in synonymous codon usage (SCU) and nucleotide composition. We found that SCU bias in mtDNA positively correlates with DR counts, where repeated usage of a subset of codons leads to more frequent DR occurrences. While bias in SCU and nucleotide composition has been attributed to nucleotide mutational bias, mammalian mtDNA still exhibit higher SCU bias and DR counts than expected from such mutational bias, suggesting a lack of negative selection against non D-loop DRs. PMID:25855815

  2. On Subset Selection Procedures for Poisson Processes and Some Applications to the Binomial and Multinomial Problems

    DTIC Science & Technology

    1976-07-01

    PURDUE UNIVERSITY DEPARTMENT OF STATISTICS DIVISION OF MATHEMATICAL SCIENCES ON SUBSET SELECTION PROCEDURES FOR POISSON PROCESSES AND SOME...Mathematical Sciences Mimeograph Series #457, July 1976 This research was supported by the Office of Naval Research under Contract NOOO14-75-C-0455 at Purdue...11 CON PC-111 riFIC-F ,A.F ANO ADDPFS Office of INaval ResearchJu#07 Washington, DC07 36AE 14~~~ rjCr; NF A ’ , A FAA D F 6 - I S it 9 i 1, - ,1 I

  3. Profiling dendritic cell subsets in head and neck squamous cell tonsillar cancer and benign tonsils.

    PubMed

    Abolhalaj, Milad; Askmyr, David; Sakellariou, Christina Alexandra; Lundberg, Kristina; Greiff, Lennart; Lindstedt, Malin

    2018-05-23

    Dendritic cells (DCs) have a key role in orchestrating immune responses and are considered important targets for immunotherapy against cancer. In order to develop effective cancer vaccines, detailed knowledge of the micromilieu in cancer lesions is warranted. In this study, flow cytometry and human transcriptome arrays were used to characterize subsets of DCs in head and neck squamous cell tonsillar cancer and compare them to their counterparts in benign tonsils to evaluate subset-selective biomarkers associated with tonsillar cancer. We describe, for the first time, four subsets of DCs in tonsillar cancer: CD123 + plasmacytoid DCs (pDC), CD1c + , CD141 + , and CD1c - CD141 - myeloid DCs (mDC). An increased frequency of DCs and an elevated mDC/pDC ratio were shown in malignant compared to benign tonsillar tissue. The microarray data demonstrates characteristics specific for tonsil cancer DC subsets, including expression of immunosuppressive molecules and lower expression levels of genes involved in development of effector immune responses in DCs in malignant tonsillar tissue, compared to their counterparts in benign tonsillar tissue. Finally, we present target candidates selectively expressed by different DC subsets in malignant tonsils and confirm expression of CD206/MRC1 and CD207/Langerin on CD1c + DCs at protein level. This study descibes DC characteristics in the context of head and neck cancer and add valuable steps towards future DC-based therapies against tonsillar cancer.

  4. Identifying developmental toxicity pathways for a subset of ToxCast chemicals using human embryonic stem cells and metabolomics

    EPA Science Inventory

    Metabolomics analysis was performed on the supernatant of human embryonic stem (hES) cell cultures exposed to a blinded subset of 11 chemicals selected from the chemical library of EPA's ToxCast™ chemical screening and prioritization research project. Metabolites from hES cultur...

  5. Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data. Part 3. Variable selection in classification.

    PubMed

    Ballabio, Davide; Consonni, Viviana; Mauri, Andrea; Todeschini, Roberto

    2010-01-11

    In multivariate regression and classification issues variable selection is an important procedure used to select an optimal subset of variables with the aim of producing more parsimonious and eventually more predictive models. Variable selection is often necessary when dealing with methodologies that produce thousands of variables, such as Quantitative Structure-Activity Relationships (QSARs) and highly dimensional analytical procedures. In this paper a novel method for variable selection for classification purposes is introduced. This method exploits the recently proposed Canonical Measure of Correlation between two sets of variables (CMC index). The CMC index is in this case calculated for two specific sets of variables, the former being comprised of the independent variables and the latter of the unfolded class matrix. The CMC values, calculated by considering one variable at a time, can be sorted and a ranking of the variables on the basis of their class discrimination capabilities results. Alternatively, CMC index can be calculated for all the possible combinations of variables and the variable subset with the maximal CMC can be selected, but this procedure is computationally more demanding and classification performance of the selected subset is not always the best one. The effectiveness of the CMC index in selecting variables with discriminative ability was compared with that of other well-known strategies for variable selection, such as the Wilks' Lambda, the VIP index based on the Partial Least Squares-Discriminant Analysis, and the selection provided by classification trees. A variable Forward Selection based on the CMC index was finally used in conjunction of Linear Discriminant Analysis. This approach was tested on several chemical data sets. Obtained results were encouraging.

  6. Mapping tropical rainforest canopies using multi-temporal spaceborne imaging spectroscopy

    NASA Astrophysics Data System (ADS)

    Somers, Ben; Asner, Gregory P.

    2013-10-01

    The use of imaging spectroscopy for florisic mapping of forests is complicated by the spectral similarity among coexisting species. Here we evaluated an alternative spectral unmixing strategy combining a time series of EO-1 Hyperion images and an automated feature selection strategy in MESMA. Instead of using the same spectral subset to unmix each image pixel, our modified approach allowed the spectral subsets to vary on a per pixel basis such that each pixel is evaluated using a spectral subset tuned towards maximal separability of its specific endmember class combination or species mixture. The potential of the new approach for floristic mapping of tree species in Hawaiian rainforests was quantitatively demonstrated using both simulated and actual hyperspectral image time-series. With a Cohen's Kappa coefficient of 0.65, our approach provided a more accurate tree species map compared to MESMA (Kappa = 0.54). In addition, by the selection of spectral subsets our approach was about 90% faster than MESMA. The flexible or adaptive use of band sets in spectral unmixing as such provides an interesting avenue to address spectral similarities in complex vegetation canopies.

  7. Application of machine learning on brain cancer multiclass classification

    NASA Astrophysics Data System (ADS)

    Panca, V.; Rustam, Z.

    2017-07-01

    Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.

  8. Analysing the integration of engineering in science lessons with the Engineering-Infused Lesson Rubric

    NASA Astrophysics Data System (ADS)

    Peterman, Karen; Daugherty, Jenny L.; Custer, Rodney L.; Ross, Julia M.

    2017-09-01

    Science teachers are being called on to incorporate engineering practices into their classrooms. This study explores whether the Engineering-Infused Lesson Rubric, a new rubric designed to target best practices in engineering education, could be used to evaluate the extent to which engineering is infused into online science lessons. Eighty lessons were selected at random from three online repositories, and coded with the rubric. Overall results documented the strengths of existing lessons, as well as many components that teachers might strengthen. In addition, a subset of characteristics was found to distinguish lessons with the highest level of engineering infusion. Findings are discussed in relation to the potential of the rubric to help teachers use research evidence-informed practice generally, and in relation to the new content demands of the U.S. Next Generation Science Standards, in particular.

  9. On the Hardness of Subset Sum Problem from Different Intervals

    NASA Astrophysics Data System (ADS)

    Kogure, Jun; Kunihiro, Noboru; Yamamoto, Hirosuke

    The subset sum problem, which is often called as the knapsack problem, is known as an NP-hard problem, and there are several cryptosystems based on the problem. Assuming an oracle for shortest vector problem of lattice, the low-density attack algorithm by Lagarias and Odlyzko and its variants solve the subset sum problem efficiently, when the “density” of the given problem is smaller than some threshold. When we define the density in the context of knapsack-type cryptosystems, weights are usually assumed to be chosen uniformly at random from the same interval. In this paper, we focus on general subset sum problems, where this assumption may not hold. We assume that weights are chosen from different intervals, and make analysis of the effect on the success probability of above algorithms both theoretically and experimentally. Possible application of our result in the context of knapsack cryptosystems is the security analysis when we reduce the data size of public keys.

  10. The effect of cellular isolation and cryopreservation on the expression of markers identifying subsets of regulatory T cells.

    PubMed

    Zhang, Weiying; Nilles, Tricia L; Johnson, Jacquett R; Margolick, Joseph B

    2016-04-01

    The role of CD4(+) regulatory T cells (Tregs) and their subsets during HIV infection is controversial. Cryopreserved peripheral blood mononuclear cells (PBMC) are an important source for assessing number and function of Tregs. However, it is unknown if PBMC isolation and cryopreservation affect the expression of CD120b and CD39, markers that identify specific subsets of Tregs. HIV-uninfected (HIV-) and -infected (HIV+) men were randomly selected from the Multicenter AIDS Cohort Study (MACS). Percentages of CD120b(+) and CD39(+) Tregs measured by flow cytometry in whole blood and in corresponding fresh and cryopreserved PBMC were compared. Percentages of CD120b(+) Tregs were significantly lower in a) fresh PBMC relative to whole blood, and b) freshly thawed frozen PBMC relative to fresh PBMC when the recovery of viable cryopreserved cells was low. When present, low expression of CD120b in frozen PBMC was reversible by 4h of in vitro culture. In contrast, expression of CD39 on Tregs was not affected by isolation and/or cryopreservation of PBMC, or by relative recovery of cryopreserved PBMC. These findings were unaffected by the HIV status of the donor. The data suggest that percentages of CD120b(+) Tregs and CD39(+) Tregs can be validly measured in either whole blood or PBMC (fresh and frozen) in HIV- and HIV+ men. However, for measurement of CD120b(+) Tregs one type of sample should be used consistently within a given study, and thawed frozen cells may require in vitro culture if recovery of viable cells is low. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. Recurrence predictive models for patients with hepatocellular carcinoma after radiofrequency ablation using support vector machines with feature selection methods.

    PubMed

    Liang, Ja-Der; Ping, Xiao-Ou; Tseng, Yi-Ju; Huang, Guan-Tarn; Lai, Feipei; Yang, Pei-Ming

    2014-12-01

    Recurrence of hepatocellular carcinoma (HCC) is an important issue despite effective treatments with tumor eradication. Identification of patients who are at high risk for recurrence may provide more efficacious screening and detection of tumor recurrence. The aim of this study was to develop recurrence predictive models for HCC patients who received radiofrequency ablation (RFA) treatment. From January 2007 to December 2009, 83 newly diagnosed HCC patients receiving RFA as their first treatment were enrolled. Five feature selection methods including genetic algorithm (GA), simulated annealing (SA) algorithm, random forests (RF) and hybrid methods (GA+RF and SA+RF) were utilized for selecting an important subset of features from a total of 16 clinical features. These feature selection methods were combined with support vector machine (SVM) for developing predictive models with better performance. Five-fold cross-validation was used to train and test SVM models. The developed SVM-based predictive models with hybrid feature selection methods and 5-fold cross-validation had averages of the sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and area under the ROC curve as 67%, 86%, 82%, 69%, 90%, and 0.69, respectively. The SVM derived predictive model can provide suggestive high-risk recurrent patients, who should be closely followed up after complete RFA treatment. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  12. On a phase diagram for random neural networks with embedded spike timing dependent plasticity.

    PubMed

    Turova, Tatyana S; Villa, Alessandro E P

    2007-01-01

    This paper presents an original mathematical framework based on graph theory which is a first attempt to investigate the dynamics of a model of neural networks with embedded spike timing dependent plasticity. The neurons correspond to integrate-and-fire units located at the vertices of a finite subset of 2D lattice. There are two types of vertices, corresponding to the inhibitory and the excitatory neurons. The edges are directed and labelled by the discrete values of the synaptic strength. We assume that there is an initial firing pattern corresponding to a subset of units that generate a spike. The number of activated externally vertices is a small fraction of the entire network. The model presented here describes how such pattern propagates throughout the network as a random walk on graph. Several results are compared with computational simulations and new data are presented for identifying critical parameters of the model.

  13. Domino: Extracting, Comparing, and Manipulating Subsets across Multiple Tabular Datasets

    PubMed Central

    Gratzl, Samuel; Gehlenborg, Nils; Lex, Alexander; Pfister, Hanspeter; Streit, Marc

    2016-01-01

    Answering questions about complex issues often requires analysts to take into account information contained in multiple interconnected datasets. A common strategy in analyzing and visualizing large and heterogeneous data is dividing it into meaningful subsets. Interesting subsets can then be selected and the associated data and the relationships between the subsets visualized. However, neither the extraction and manipulation nor the comparison of subsets is well supported by state-of-the-art techniques. In this paper we present Domino, a novel multiform visualization technique for effectively representing subsets and the relationships between them. By providing comprehensive tools to arrange, combine, and extract subsets, Domino allows users to create both common visualization techniques and advanced visualizations tailored to specific use cases. In addition to the novel technique, we present an implementation that enables analysts to manage the wide range of options that our approach offers. Innovative interactive features such as placeholders and live previews support rapid creation of complex analysis setups. We introduce the technique and the implementation using a simple example and demonstrate scalability and effectiveness in a use case from the field of cancer genomics. PMID:26356916

  14. Intraclonal Cell Expansion and Selection Driven by B Cell Receptor in Chronic Lymphocytic Leukemia

    PubMed Central

    Colombo, Monica; Cutrona, Giovanna; Reverberi, Daniele; Fabris, Sonia; Neri, Antonino; Fabbi, Marina; Quintana, Giovanni; Quarta, Giovanni; Ghiotto, Fabio; Fais, Franco; Ferrarini, Manlio

    2011-01-01

    The mutational status of the immunoglobulin heavy-chain variable region (IGHV) genes utilized by chronic lymphocytic leukemia (CLL) clones defines two disease subgroups. Patients with unmutated IGHV have a more aggressive disease and a worse outcome than patients with cells having somatic IGHV gene mutations. Moreover, up to 30% of the unmutated CLL clones exhibit very similar or identical B cell receptors (BcR), often encoded by the same IG genes. These “stereotyped” BcRs have been classified into defined subsets. The presence of an IGHV gene somatic mutation and the utilization of a skewed gene repertoire compared with normal B cells together with the expression of stereotyped receptors by unmutated CLL clones may indicate stimulation/selection by antigenic epitopes. This antigenic stimulation may occur prior to or during neoplastic transformation, but it is unknown whether this stimulation/selection continues after leukemogenesis has ceased. In this study, we focused on seven CLL cases with stereotyped BcR Subset #8 found among a cohort of 700 patients; in six, the cells expressed IgG and utilized IGHV4-39 and IGKV1-39/IGKV1D-39 genes, as reported for Subset #8 BcR. One case exhibited special features, including expression of IgM or IgG by different subclones consequent to an isotype switch, allelic inclusion at the IGH locus in the IgM-expressing cells and a particular pattern of cytogenetic lesions. Collectively, the data indicate a process of antigenic stimulation/selection of the fully transformed CLL cells leading to the expansion of the Subset #8 IgG-bearing subclone. PMID:21541442

  15. Gene expression changes reflect clinical response in a placebo-controlled randomized trial of abatacept in patients with diffuse cutaneous systemic sclerosis.

    PubMed

    Chakravarty, Eliza F; Martyanov, Viktor; Fiorentino, David; Wood, Tammara A; Haddon, David James; Jarrell, Justin Ansel; Utz, Paul J; Genovese, Mark C; Whitfield, Michael L; Chung, Lorinda

    2015-06-13

    Systemic sclerosis is an autoimmune disease characterized by inflammation and fibrosis of the skin and internal organs. We sought to assess the clinical and molecular effects associated with response to intravenous abatacept in patients with diffuse cutaneous systemic. Adult diffuse cutaneous systemic sclerosis patients were randomized in a 2:1 double-blinded fashion to receive abatacept or placebo over 24 weeks. Primary outcomes were safety and the change in modified Rodnan Skin Score (mRSS) at week 24 compared with baseline. Improvers were defined as patients with a decrease in mRSS of ≥30% post-treatment compared to baseline. Skin biopsies were obtained for differential gene expression and pathway enrichment analyses and intrinsic gene expression subset assignment. Ten subjects were randomized to abatacept (n = 7) or placebo (n = 3). Disease duration from first non-Raynaud's symptom was significantly longer (8.8 ± 3.8 years vs. 2.4 ± 1.6 years, p = 0.004) and median mRSS was higher (30 vs. 22, p = 0.05) in the placebo compared to abatacept group. Adverse events were similar in the two groups. Five out of seven patients (71%) randomized to abatacept and one out of three patients (33%) randomized to placebo experienced ≥30% improvement in skin score. Subjects receiving abatacept showed a trend toward improvement in mRSS at week 24 (-8.6 ± 7.5, p = 0.0625) while those in the placebo group did not (-2.3 ± 15, p = 0.75). After adjusting for disease duration, mRSS significantly improved in the abatacept compared with the placebo group (abatacept vs. placebo mRSS decrease estimate -9.8, 95% confidence interval -16.7 to -3.0, p = 0.0114). In the abatacept group, the patients in the inflammatory intrinsic subset showed a trend toward greater improvement in skin score at 24 weeks compared with the patients in the normal-like intrinsic subset (-13.5 ± 3.1 vs. -4.5 ± 6.4, p = 0.067). Abatacept resulted in decreased CD28 co-stimulatory gene expression in improvers consistent with its mechanism of action. Improvers mapped to the inflammatory intrinsic subset and showed decreased gene expression in inflammatory pathways, while non-improver and placebos showed stable or reverse gene expression over 24 weeks. Clinical improvement following abatacept therapy was associated with modulation of inflammatory pathways in skin. ClinicalTrials.gov NCT00442611. Registered 1 March 2007.

  16. There is a need for new systemic sclerosis subset criteria. A content analytic approach.

    PubMed

    Johnson, S R; Soowamber, M L; Fransen, J; Khanna, D; Van Den Hoogen, F; Baron, M; Matucci-Cerinic, M; Denton, C P; Medsger, T A; Carreira, P E; Riemekasten, G; Distler, J; Gabrielli, A; Steen, V; Chung, L; Silver, R; Varga, J; Müller-Ladner, U; Vonk, M C; Walker, U A; Wollheim, F A; Herrick, A; Furst, D E; Czirjak, L; Kowal-Bielecka, O; Del Galdo, F; Cutolo, M; Hunzelmann, N; Murray, C D; Foeldvari, I; Mouthon, L; Damjanov, N; Kahaleh, B; Frech, T; Assassi, S; Saketkoo, L A; Pope, J E

    2018-01-01

    Systemic sclerosis (SSc) is heterogenous. The objectives of this study were to evaluate the purpose, strengths and limitations of existing SSc subset criteria, and identify ideas among experts about subsets. We conducted semi-structured interviews with randomly sampled international SSc experts. The interview transcripts underwent an iterative process with text deconstructed to single thought units until a saturated conceptual framework with coding was achieved and respondent occurrence tabulated. Serial cross-referential analyses of clusters were developed. Thirty experts from 13 countries were included; 67% were male, 63% were from Europe and 37% from North America; median experience of 22.5 years, with a median of 55 new SSc patients annually. Three thematic clusters regarding subsetting were identified: research and communication; management; and prognosis (prediction of internal organ involvement, survival). The strength of the limited/diffuse system was its ease of use, however 10% stated this system had marginal value. Shortcomings of the diffuse/limited classification were the risk of misclassification, predictions/generalizations did not always hold true, and that the elbow or knee threshold was arbitrary. Eighty-seven percent use more than 2 subsets including: SSc sine scleroderma, overlap conditions, antibody-determined subsets, speed of progression, and age of onset (juvenile, elderly). We have synthesized an international view of the construct of SSc subsets in the modern era. We found a number of factors underlying the construct of SSc subsets. Considerations for the next phase include rate of change and hierarchal clustering (e.g. limited/diffuse, then by antibodies).

  17. A Letter on Ancaiani et al. "Evaluating Scientific Research in Italy: The 2004-10 Research Evaluation Exercise"

    ERIC Educational Resources Information Center

    Baccini, Alberto; De Nicolao, Giuseppe

    2017-01-01

    This letter documents some problems in Ancaiani et al. (2015). Namely the evaluation of concordance, based on Cohen's kappa, reported by Ancaiani et al. was not computed on the whole random sample of 9,199 articles, but on a subset of 7,597 articles. The kappas relative to the whole random sample were in the range 0.07-0.15, indicating an…

  18. How diverse are diversity assessment methods? A comparative analysis and benchmarking of molecular descriptor space.

    PubMed

    Koutsoukas, Alexios; Paricharak, Shardul; Galloway, Warren R J D; Spring, David R; Ijzerman, Adriaan P; Glen, Robert C; Marcus, David; Bender, Andreas

    2014-01-27

    Chemical diversity is a widely applied approach to select structurally diverse subsets of molecules, often with the objective of maximizing the number of hits in biological screening. While many methods exist in the area, few systematic comparisons using current descriptors in particular with the objective of assessing diversity in bioactivity space have been published, and this shortage is what the current study is aiming to address. In this work, 13 widely used molecular descriptors were compared, including fingerprint-based descriptors (ECFP4, FCFP4, MACCS keys), pharmacophore-based descriptors (TAT, TAD, TGT, TGD, GpiDAPH3), shape-based descriptors (rapid overlay of chemical structures (ROCS) and principal moments of inertia (PMI)), a connectivity-matrix-based descriptor (BCUT), physicochemical-property-based descriptors (prop2D), and a more recently introduced molecular descriptor type (namely, "Bayes Affinity Fingerprints"). We assessed both the similar behavior of the descriptors in assessing the diversity of chemical libraries, and their ability to select compounds from libraries that are diverse in bioactivity space, which is a property of much practical relevance in screening library design. This is particularly evident, given that many future targets to be screened are not known in advance, but that the library should still maximize the likelihood of containing bioactive matter also for future screening campaigns. Overall, our results showed that descriptors based on atom topology (i.e., fingerprint-based descriptors and pharmacophore-based descriptors) correlate well in rank-ordering compounds, both within and between descriptor types. On the other hand, shape-based descriptors such as ROCS and PMI showed weak correlation with the other descriptors utilized in this study, demonstrating significantly different behavior. We then applied eight of the molecular descriptors compared in this study to sample a diverse subset of sample compounds (4%) from an initial population of 2587 compounds, covering the 25 largest human activity classes from ChEMBL and measured the coverage of activity classes by the subsets. Here, it was found that "Bayes Affinity Fingerprints" achieved an average coverage of 92% of activity classes. Using the descriptors ECFP4, GpiDAPH3, TGT, and random sampling, 91%, 84%, 84%, and 84% of the activity classes were represented in the selected compounds respectively, followed by BCUT, prop2D, MACCS, and PMI (in order of decreasing performance). In addition, we were able to show that there is no visible correlation between compound diversity in PMI space and in bioactivity space, despite frequent utilization of PMI plots to this end. To summarize, in this work, we assessed which descriptors select compounds with high coverage of bioactivity space, and can hence be used for diverse compound selection for biological screening. In cases where multiple descriptors are to be used for diversity selection, this work describes which descriptors behave complementarily, and can hence be used jointly to focus on different aspects of diversity in chemical space.

  19. Generating a Simulated Fluid Flow Over an Aircraft Surface Using Anisotropic Diffusion

    NASA Technical Reports Server (NTRS)

    Rodriguez, David L. (Inventor); Sturdza, Peter (Inventor)

    2013-01-01

    A fluid-flow simulation over a computer-generated aircraft surface is generated using a diffusion technique. The surface is comprised of a surface mesh of polygons. A boundary-layer fluid property is obtained for a subset of the polygons of the surface mesh. A pressure-gradient vector is determined for a selected polygon, the selected polygon belonging to the surface mesh but not one of the subset of polygons. A maximum and minimum diffusion rate is determined along directions determined using a pressure gradient vector corresponding to the selected polygon. A diffusion-path vector is defined between a point in the selected polygon and a neighboring point in a neighboring polygon. An updated fluid property is determined for the selected polygon using a variable diffusion rate, the variable diffusion rate based on the minimum diffusion rate, maximum diffusion rate, and angular difference between the diffusion-path vector and the pressure-gradient vector.

  20. System and method for progressive band selection for hyperspectral images

    NASA Technical Reports Server (NTRS)

    Fisher, Kevin (Inventor)

    2013-01-01

    Disclosed herein are systems, methods, and non-transitory computer-readable storage media for progressive band selection for hyperspectral images. A system having module configured to control a processor to practice the method calculates a virtual dimensionality of a hyperspectral image having multiple bands to determine a quantity Q of how many bands are needed for a threshold level of information, ranks each band based on a statistical measure, selects Q bands from the multiple bands to generate a subset of bands based on the virtual dimensionality, and generates a reduced image based on the subset of bands. This approach can create reduced datasets of full hyperspectral images tailored for individual applications. The system uses a metric specific to a target application to rank the image bands, and then selects the most useful bands. The number of bands selected can be specified manually or calculated from the hyperspectral image's virtual dimensionality.

  1. Efficient Simulation Budget Allocation for Selecting an Optimal Subset

    NASA Technical Reports Server (NTRS)

    Chen, Chun-Hung; He, Donghai; Fu, Michael; Lee, Loo Hay

    2008-01-01

    We consider a class of the subset selection problem in ranking and selection. The objective is to identify the top m out of k designs based on simulated output. Traditional procedures are conservative and inefficient. Using the optimal computing budget allocation framework, we formulate the problem as that of maximizing the probability of correc tly selecting all of the top-m designs subject to a constraint on the total number of samples available. For an approximation of this corre ct selection probability, we derive an asymptotically optimal allocat ion and propose an easy-to-implement heuristic sequential allocation procedure. Numerical experiments indicate that the resulting allocatio ns are superior to other methods in the literature that we tested, and the relative efficiency increases for larger problems. In addition, preliminary numerical results indicate that the proposed new procedur e has the potential to enhance computational efficiency for simulation optimization.

  2. Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection

    NASA Astrophysics Data System (ADS)

    Sarojini, Balakrishnan; Ramaraj, Narayanasamy; Nickolas, Savarimuthu

    Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.

  3. High Aldehyde Dehydrogenase Activity Identifies a Subset of Human Mesenchymal Stromal Cells with Vascular Regenerative Potential.

    PubMed

    Sherman, Stephen E; Kuljanin, Miljan; Cooper, Tyler T; Putman, David M; Lajoie, Gilles A; Hess, David A

    2017-06-01

    During culture expansion, multipotent mesenchymal stromal cells (MSCs) differentially express aldehyde dehydrogenase (ALDH), an intracellular detoxification enzyme that protects long-lived cells against oxidative stress. Thus, MSC selection based on ALDH-activity may be used to reduce heterogeneity and distinguish MSC subsets with improved regenerative potency. After expansion of human bone marrow-derived MSCs, cell progeny was purified based on low versus high ALDH-activity (ALDH hi ) by fluorescence-activated cell sorting, and each subset was compared for multipotent stromal and provascular regenerative functions. Both ALDH l ° and ALDH hi MSC subsets demonstrated similar expression of stromal cell (>95% CD73 + , CD90 + , CD105 + ) and pericyte (>95% CD146 + ) surface markers and showed multipotent differentiation into bone, cartilage, and adipose cells in vitro. Conditioned media (CDM) generated by ALDH hi MSCs demonstrated a potent proliferative and prosurvival effect on human microvascular endothelial cells (HMVECs) under serum-free conditions and augmented HMVEC tube-forming capacity in growth factor-reduced matrices. After subcutaneous transplantation within directed in vivo angiogenesis assay implants into immunodeficient mice, ALDH hi MSC or CDM produced by ALDH hi MSC significantly augmented murine vascular cell recruitment and perfused vessel infiltration compared with ALDH l ° MSC. Although both subsets demonstrated strikingly similar mRNA expression patterns, quantitative proteomic analyses performed on subset-specific CDM revealed the ALDH hi MSC subset uniquely secreted multiple proangiogenic cytokines (vascular endothelial growth factor beta, platelet derived growth factor alpha, and angiogenin) and actively produced multiple factors with chemoattractant (transforming growth factor-β, C-X-C motif chemokine ligand 1, 2, and 3 (GRO), C-C motif chemokine ligand 5 (RANTES), monocyte chemotactic protein 1 (MCP-1), interleukin [IL]-6, IL-8) and matrix-modifying functions (tissue inhibitor of metalloprotinase 1 & 2 (TIMP1/2)). Collectively, MSCs selected for ALDH hi demonstrated enhanced proangiogenic secretory functions and represent a purified MSC subset amenable for vascular regenerative applications. Stem Cells 2017;35:1542-1553. © 2017 AlphaMed Press.

  4. Optimized hyperspectral band selection using hybrid genetic algorithm and gravitational search algorithm

    NASA Astrophysics Data System (ADS)

    Zhang, Aizhu; Sun, Genyun; Wang, Zhenjie

    2015-12-01

    The serious information redundancy in hyperspectral images (HIs) cannot contribute to the data analysis accuracy, instead it require expensive computational resources. Consequently, to identify the most useful and valuable information from the HIs, thereby improve the accuracy of data analysis, this paper proposed a novel hyperspectral band selection method using the hybrid genetic algorithm and gravitational search algorithm (GA-GSA). In the proposed method, the GA-GSA is mapped to the binary space at first. Then, the accuracy of the support vector machine (SVM) classifier and the number of selected spectral bands are utilized to measure the discriminative capability of the band subset. Finally, the band subset with the smallest number of spectral bands as well as covers the most useful and valuable information is obtained. To verify the effectiveness of the proposed method, studies conducted on an AVIRIS image against two recently proposed state-of-the-art GSA variants are presented. The experimental results revealed the superiority of the proposed method and indicated that the method can indeed considerably reduce data storage costs and efficiently identify the band subset with stable and high classification precision.

  5. The Subset Sum game.

    PubMed

    Darmann, Andreas; Nicosia, Gaia; Pferschy, Ulrich; Schauer, Joachim

    2014-03-16

    In this work we address a game theoretic variant of the Subset Sum problem, in which two decision makers (agents/players) compete for the usage of a common resource represented by a knapsack capacity. Each agent owns a set of integer weighted items and wants to maximize the total weight of its own items included in the knapsack. The solution is built as follows: Each agent, in turn, selects one of its items (not previously selected) and includes it in the knapsack if there is enough capacity. The process ends when the remaining capacity is too small for including any item left. We look at the problem from a single agent point of view and show that finding an optimal sequence of items to select is an [Formula: see text]-hard problem. Therefore we propose two natural heuristic strategies and analyze their worst-case performance when (1) the opponent is able to play optimally and (2) the opponent adopts a greedy strategy. From a centralized perspective we observe that some known results on the approximation of the classical Subset Sum can be effectively adapted to the multi-agent version of the problem.

  6. The Subset Sum game☆

    PubMed Central

    Darmann, Andreas; Nicosia, Gaia; Pferschy, Ulrich; Schauer, Joachim

    2014-01-01

    In this work we address a game theoretic variant of the Subset Sum problem, in which two decision makers (agents/players) compete for the usage of a common resource represented by a knapsack capacity. Each agent owns a set of integer weighted items and wants to maximize the total weight of its own items included in the knapsack. The solution is built as follows: Each agent, in turn, selects one of its items (not previously selected) and includes it in the knapsack if there is enough capacity. The process ends when the remaining capacity is too small for including any item left. We look at the problem from a single agent point of view and show that finding an optimal sequence of items to select is an NP-hard problem. Therefore we propose two natural heuristic strategies and analyze their worst-case performance when (1) the opponent is able to play optimally and (2) the opponent adopts a greedy strategy. From a centralized perspective we observe that some known results on the approximation of the classical Subset Sum can be effectively adapted to the multi-agent version of the problem. PMID:25844012

  7. The behavioral economics of consumer brand choice: patterns of reinforcement and utility maximization.

    PubMed

    Foxall, Gordon R; Oliveira-Castro, Jorge M; Schrezenmaier, Teresa C

    2004-06-30

    Purchasers of fast-moving consumer goods generally exhibit multi-brand choice, selecting apparently randomly among a small subset or "repertoire" of tried and trusted brands. Their behavior shows both matching and maximization, though it is not clear just what the majority of buyers are maximizing. Each brand attracts, however, a small percentage of consumers who are 100%-loyal to it during the period of observation. Some of these are exclusively buyers of premium-priced brands who are presumably maximizing informational reinforcement because their demand for the brand is relatively price-insensitive or inelastic. Others buy exclusively the cheapest brands available and can be assumed to maximize utilitarian reinforcement since their behavior is particularly price-sensitive or elastic. Between them are the majority of consumers whose multi-brand buying takes the form of selecting a mixture of economy -- and premium-priced brands. Based on the analysis of buying patterns of 80 consumers for 9 product categories, the paper examines the continuum of consumers so defined and seeks to relate their buying behavior to the question of how and what consumers maximize.

  8. A catalog of galaxy morphology and photometric redshift

    NASA Astrophysics Data System (ADS)

    Paul, Nicholas; Shamir, Lior

    2018-01-01

    Morphology carries important information about the physical characteristics of a galaxy. Here we used machine learning to produce a catalog of ~3,000,000 SDSS galaxies classified by their broad morphology into spiral and elliptical galaxies. Comparison of the catalog to Galaxy Zooshows that the catalog contains a subset of 1.7*10^6 galaxies classified with the same level of consistency as the debiased “superclean” sub-sample. In addition to the morphology, we also computed the photometric redshifts of the galaxies. Several pattern recognition algorithms and variable selection strategies were tested, and the best accuracy of mean absolute error of ~0.0062 was achieved by using random forest with a combination of manually and automatically selected variables. The catalog shows that for redshift lower than 0.085 galaxies that visually look spiral become more prevalent as the redshift gets higher. For redshift greater than 0.085 galaxies thatvisually look elliptical become more prevalent. The catalog as well as the source code used to produce it is publicly available athttps://figshare.com/articles/Morphology_and_photometric_redshift_catalog/4833593 .

  9. Using a genetic algorithm as an optimal band selector in the mid and thermal infrared (2.5-14 μm) to discriminate vegetation species.

    PubMed

    Ullah, Saleem; Groen, Thomas A; Schlerf, Martin; Skidmore, Andrew K; Nieuwenhuis, Willem; Vaiphasa, Chaichoke

    2012-01-01

    Genetic variation between various plant species determines differences in their physio-chemical makeup and ultimately in their hyperspectral emissivity signatures. The hyperspectral emissivity signatures, on the one hand, account for the subtle physio-chemical changes in the vegetation, but on the other hand, highlight the problem of high dimensionality. The aim of this paper is to investigate the performance of genetic algorithms coupled with the spectral angle mapper (SAM) to identify a meaningful subset of wavebands sensitive enough to discriminate thirteen broadleaved vegetation species from the laboratory measured hyperspectral emissivities. The performance was evaluated using an overall classification accuracy and Jeffries Matusita distance. For the multiple plant species, the targeted bands based on genetic algorithms resulted in a high overall classification accuracy (90%). Concentrating on the pairwise comparison results, the selected wavebands based on genetic algorithms resulted in higher Jeffries Matusita (J-M) distances than randomly selected wavebands did. This study concludes that targeted wavebands from leaf emissivity spectra are able to discriminate vegetation species.

  10. Nestedness of desert bat assemblages: species composition patterns in insular and terrestrial landscapes.

    PubMed

    Frick, Winifred F; Hayes, John P; Heady, Paul A

    2009-01-01

    Nested patterns of community composition exist when species at depauperate sites are subsets of those occurring at sites with more species. Nested subset analysis provides a framework for analyzing species occurrences to determine non-random patterns in community composition and potentially identify mechanisms that may shape faunal assemblages. We examined nested subset structure of desert bat assemblages on 20 islands in the southern Gulf of California and at 27 sites along the Baja California peninsula coast, the presumable source pool for the insular faunas. Nested structure was analyzed using a conservative null model that accounts for expected variation in species richness and species incidence across sites (fixed row and column totals). Associations of nestedness and island traits, such as size and isolation, as well as species traits related to mobility, were assessed to determine the potential role of differential extinction and immigration abilities as mechanisms of nestedness. Bat faunas were significantly nested in both the insular and terrestrial landscape and island size was significantly correlated with nested structure, such that species on smaller islands tended to be subsets of species on larger islands, suggesting that differential extinction vulnerabilities may be important in shaping insular bat faunas. The role of species mobility and immigration abilities is less clearly associated with nestedness in this system. Nestedness in the terrestrial landscape is likely due to stochastic processes related to random placement of individuals and this may also influence nested patterns on islands, but additional data on abundances will be necessary to distinguish among these potential mechanisms.

  11. Feature selection for elderly faller classification based on wearable sensors.

    PubMed

    Howcroft, Jennifer; Kofman, Jonathan; Lemaire, Edward D

    2017-05-30

    Wearable sensors can be used to derive numerous gait pattern features for elderly fall risk and faller classification; however, an appropriate feature set is required to avoid high computational costs and the inclusion of irrelevant features. The objectives of this study were to identify and evaluate smaller feature sets for faller classification from large feature sets derived from wearable accelerometer and pressure-sensing insole gait data. A convenience sample of 100 older adults (75.5 ± 6.7 years; 76 non-fallers, 24 fallers based on 6 month retrospective fall occurrence) walked 7.62 m while wearing pressure-sensing insoles and tri-axial accelerometers at the head, pelvis, left and right shanks. Feature selection was performed using correlation-based feature selection (CFS), fast correlation based filter (FCBF), and Relief-F algorithms. Faller classification was performed using multi-layer perceptron neural network, naïve Bayesian, and support vector machine classifiers, with 75:25 single stratified holdout and repeated random sampling. The best performing model was a support vector machine with 78% accuracy, 26% sensitivity, 95% specificity, 0.36 F1 score, and 0.31 MCC and one posterior pelvis accelerometer input feature (left acceleration standard deviation). The second best model achieved better sensitivity (44%) and used a support vector machine with 74% accuracy, 83% specificity, 0.44 F1 score, and 0.29 MCC. This model had ten input features: maximum, mean and standard deviation posterior acceleration; maximum, mean and standard deviation anterior acceleration; mean superior acceleration; and three impulse features. The best multi-sensor model sensitivity (56%) was achieved using posterior pelvis and both shank accelerometers and a naïve Bayesian classifier. The best single-sensor model sensitivity (41%) was achieved using the posterior pelvis accelerometer and a naïve Bayesian classifier. Feature selection provided models with smaller feature sets and improved faller classification compared to faller classification without feature selection. CFS and FCBF provided the best feature subset (one posterior pelvis accelerometer feature) for faller classification. However, better sensitivity was achieved by the second best model based on a Relief-F feature subset with three pressure-sensing insole features and seven head accelerometer features. Feature selection should be considered as an important step in faller classification using wearable sensors.

  12. Random forest feature selection, fusion and ensemble strategy: Combining multiple morphological MRI measures to discriminate among healhy elderly, MCI, cMCI and alzheimer's disease patients: From the alzheimer's disease neuroimaging initiative (ADNI) database.

    PubMed

    Dimitriadis, S I; Liparas, Dimitris; Tsolaki, Magda N

    2018-05-15

    In the era of computer-assisted diagnostic tools for various brain diseases, Alzheimer's disease (AD) covers a large percentage of neuroimaging research, with the main scope being its use in daily practice. However, there has been no study attempting to simultaneously discriminate among Healthy Controls (HC), early mild cognitive impairment (MCI), late MCI (cMCI) and stable AD, using features derived from a single modality, namely MRI. Based on preprocessed MRI images from the organizers of a neuroimaging challenge, 3 we attempted to quantify the prediction accuracy of multiple morphological MRI features to simultaneously discriminate among HC, MCI, cMCI and AD. We explored the efficacy of a novel scheme that includes multiple feature selections via Random Forest from subsets of the whole set of features (e.g. whole set, left/right hemisphere etc.), Random Forest classification using a fusion approach and ensemble classification via majority voting. From the ADNI database, 60 HC, 60 MCI, 60 cMCI and 60 CE were used as a training set with known labels. An extra dataset of 160 subjects (HC: 40, MCI: 40, cMCI: 40 and AD: 40) was used as an external blind validation dataset to evaluate the proposed machine learning scheme. In the second blind dataset, we succeeded in a four-class classification of 61.9% by combining MRI-based features with a Random Forest-based Ensemble Strategy. We achieved the best classification accuracy of all teams that participated in this neuroimaging competition. The results demonstrate the effectiveness of the proposed scheme to simultaneously discriminate among four groups using morphological MRI features for the very first time in the literature. Hence, the proposed machine learning scheme can be used to define single and multi-modal biomarkers for AD. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. The impact of missing sensor information on surgical workflow management.

    PubMed

    Liebmann, Philipp; Meixensberger, Jürgen; Wiedemann, Peter; Neumuth, Thomas

    2013-09-01

    Sensor systems in the operating room may encounter intermittent data losses that reduce the performance of surgical workflow management systems (SWFMS). Sensor data loss could impact SWFMS-based decision support, device parameterization, and information presentation. The purpose of this study was to understand the robustness of surgical process models when sensor information is partially missing. SWFMS changes caused by wrong or no data from the sensor system which tracks the progress of a surgical intervention were tested. The individual surgical process models (iSPMs) from 100 different cataract procedures of 3 ophthalmologic surgeons were used to select a randomized subset and create a generalized surgical process model (gSPM). A disjoint subset was selected from the iSPMs and used to simulate the surgical process against the gSPM. The loss of sensor data was simulated by removing some information from one task in the iSPM. The effect of missing sensor data was measured using several metrics: (a) successful relocation of the path in the gSPM, (b) the number of steps to find the converging point, and (c) the perspective with the highest occurrence of unsuccessful path findings. A gSPM built using 30% of the iSPMs successfully found the correct path in 90% of the cases. The most critical sensor data were the information regarding the instrument used by the surgeon. We found that use of a gSPM to provide input data for a SWFMS is robust and can be accurate despite missing sensor data. A surgical workflow management system can provide the surgeon with workflow guidance in the OR for most cases. Sensor systems for surgical process tracking can be evaluated based on the stability and accuracy of functional and spatial operative results.

  14. Genomic Prediction with Pedigree and Genotype × Environment Interaction in Spring Wheat Grown in South and West Asia, North Africa, and Mexico.

    PubMed

    Sukumaran, Sivakumar; Crossa, Jose; Jarquin, Diego; Lopes, Marta; Reynolds, Matthew P

    2017-02-09

    Developing genomic selection (GS) models is an important step in applying GS to accelerate the rate of genetic gain in grain yield in plant breeding. In this study, seven genomic prediction models under two cross-validation (CV) scenarios were tested on 287 advanced elite spring wheat lines phenotyped for grain yield (GY), thousand-grain weight (GW), grain number (GN), and thermal time for flowering (TTF) in 18 international environments (year-location combinations) in major wheat-producing countries in 2010 and 2011. Prediction models with genomic and pedigree information included main effects and interaction with environments. Two random CV schemes were applied to predict a subset of lines that were not observed in any of the 18 environments (CV1), and a subset of lines that were not observed in a set of the environments, but were observed in other environments (CV2). Genomic prediction models, including genotype × environment (G×E) interaction, had the highest average prediction ability under the CV1 scenario for GY (0.31), GN (0.32), GW (0.45), and TTF (0.27). For CV2, the average prediction ability of the model including the interaction terms was generally high for GY (0.38), GN (0.43), GW (0.63), and TTF (0.53). Wheat lines in site-year combinations in Mexico and India had relatively high prediction ability for GY and GW. Results indicated that prediction ability of lines not observed in certain environments could be relatively high for genomic selection when predicting G×E interaction in multi-environment trials. Copyright © 2017 Sukumaran et al.

  15. ProSelection: A Novel Algorithm to Select Proper Protein Structure Subsets for in Silico Target Identification and Drug Discovery Research.

    PubMed

    Wang, Nanyi; Wang, Lirong; Xie, Xiang-Qun

    2017-11-27

    Molecular docking is widely applied to computer-aided drug design and has become relatively mature in the recent decades. Application of docking in modeling varies from single lead compound optimization to large-scale virtual screening. The performance of molecular docking is highly dependent on the protein structures selected. It is especially challenging for large-scale target prediction research when multiple structures are available for a single target. Therefore, we have established ProSelection, a docking preferred-protein selection algorithm, in order to generate the proper structure subset(s). By the ProSelection algorithm, protein structures of "weak selectors" are filtered out whereas structures of "strong selectors" are kept. Specifically, the structure which has a good statistical performance of distinguishing active ligands from inactive ligands is defined as a strong selector. In this study, 249 protein structures of 14 autophagy-related targets are investigated. Surflex-dock was used as the docking engine to distinguish active and inactive compounds against these protein structures. Both t test and Mann-Whitney U test were used to distinguish the strong from the weak selectors based on the normality of the docking score distribution. The suggested docking score threshold for active ligands (SDA) was generated for each strong selector structure according to the receiver operating characteristic (ROC) curve. The performance of ProSelection was further validated by predicting the potential off-targets of 43 U.S. Federal Drug Administration approved small molecule antineoplastic drugs. Overall, ProSelection will accelerate the computational work in protein structure selection and could be a useful tool for molecular docking, target prediction, and protein-chemical database establishment research.

  16. Infliximab for chronic cutaneous sarcoidosis: a subset analysis from a double-blind randomized clinical trial.

    PubMed

    Baughman, Robert P; Judson, Marc A; Lower, Elyse E; Drent, Marjolein; Costabel, Ulrich; Flavin, Susan; Lo, Kim Hung; Barnathan, Elliot S

    2016-01-15

    Limited evidence exists demonstrating an effective treatment for chronic cutaneous sarcoidosis. To determine infliximab's effectiveness in sarcoidosis. We conducted a subset analysis from a randomized, double-blind, placebo-controlled trial for chronic pulmonary sarcoidosis to determine infliximab's effectiveness. Patients with chronic cutaneous sarcoidosis received infliximab (3 or 5 mg/kg) or placebo over 24 weeks. Of 138 patients, the subset analysis evaluated 17 patients with chronic facial and another 9 patients with nonfacial skin involvement. The SASI evaluated lesions for degree of erythema, desquamation, induration, and percentage of area involved. Facial and nonfacial lesions were scored in a blinded manner. Among 5 placebo-treated and 12 infliximab-treated patients, an improvement was observed with infliximab versus placebo in change from baseline to weeks 12 and 24 in desquamation (P<0.005) and induration (P<0.01) at week 24. Erythema, percentage of area involved and the evaluation of paired photographs did not reveal significant differences. Sample size; more extensive disease in placebo patients; chronic therapy upon enrollment; lung as primary organ of sarcoidosis involvement; limited investigator experience with SASI. Infliximab appears to be a beneficial treatment for chronic cutaneous sarcoidosis. The SASI scoring system demonstrated significant improvement versus placebo in lesion desquamation and induration.

  17. Delineation of soil temperature regimes from HCMM data

    NASA Technical Reports Server (NTRS)

    Day, R. L.; Petersen, G. W. (Principal Investigator)

    1981-01-01

    Supplementary data including photographs as well as topographic, geologic, and soil maps were obtained and evaluated for ground truth purposes and control point selection. A study area (approximately 450 by 450 pixels) was subset from LANDSAT scene No. 2477-17142. Geometric corrections and scaling were performed. Initial enhancement techniques were initiated to aid control point selection and soils interpretation. The SUBSET program was modified to read HCMM tapes and HCMM data were reformated so that they are compatible with the ORSER system. Initial NMAP products of geometrically corrected and scaled raw data tapes (unregistered) of the study were produced.

  18. Decision Aids for Airborne Intercept Operations in Advanced Aircrafts

    NASA Technical Reports Server (NTRS)

    Madni, A.; Freedy, A.

    1981-01-01

    A tactical decision aid (TDA) for the F-14 aircrew, i.e., the naval flight officer and pilot, in conducting a multitarget attack during the performance of a Combat Air Patrol (CAP) role is presented. The TDA employs hierarchical multiattribute utility models for characterizing mission objectives in operationally measurable terms, rule based AI-models for tactical posture selection, and fast time simulation for maneuver consequence prediction. The TDA makes aspect maneuver recommendations, selects and displays the optimum mission posture, evaluates attackable and potentially attackable subsets, and recommends the 'best' attackable subset along with the required course perturbation.

  19. Molecular descriptor subset selection in theoretical peptide quantitative structure-retention relationship model development using nature-inspired optimization algorithms.

    PubMed

    Žuvela, Petar; Liu, J Jay; Macur, Katarzyna; Bączek, Tomasz

    2015-10-06

    In this work, performance of five nature-inspired optimization algorithms, genetic algorithm (GA), particle swarm optimization (PSO), artificial bee colony (ABC), firefly algorithm (FA), and flower pollination algorithm (FPA), was compared in molecular descriptor selection for development of quantitative structure-retention relationship (QSRR) models for 83 peptides that originate from eight model proteins. The matrix with 423 descriptors was used as input, and QSRR models based on selected descriptors were built using partial least squares (PLS), whereas root mean square error of prediction (RMSEP) was used as a fitness function for their selection. Three performance criteria, prediction accuracy, computational cost, and the number of selected descriptors, were used to evaluate the developed QSRR models. The results show that all five variable selection methods outperform interval PLS (iPLS), sparse PLS (sPLS), and the full PLS model, whereas GA is superior because of its lowest computational cost and higher accuracy (RMSEP of 5.534%) with a smaller number of variables (nine descriptors). The GA-QSRR model was validated initially through Y-randomization. In addition, it was successfully validated with an external testing set out of 102 peptides originating from Bacillus subtilis proteomes (RMSEP of 22.030%). Its applicability domain was defined, from which it was evident that the developed GA-QSRR exhibited strong robustness. All the sources of the model's error were identified, thus allowing for further application of the developed methodology in proteomics.

  20. Automatic learning-based beam angle selection for thoracic IMRT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Amit, Guy; Marshall, Andrea; Purdie, Thomas G., E-mail: tom.purdie@rmp.uhn.ca

    Purpose: The treatment of thoracic cancer using external beam radiation requires an optimal selection of the radiation beam directions to ensure effective coverage of the target volume and to avoid unnecessary treatment of normal healthy tissues. Intensity modulated radiation therapy (IMRT) planning is a lengthy process, which requires the planner to iterate between choosing beam angles, specifying dose–volume objectives and executing IMRT optimization. In thorax treatment planning, where there are no class solutions for beam placement, beam angle selection is performed manually, based on the planner’s clinical experience. The purpose of this work is to propose and study a computationallymore » efficient framework that utilizes machine learning to automatically select treatment beam angles. Such a framework may be helpful for reducing the overall planning workload. Methods: The authors introduce an automated beam selection method, based on learning the relationships between beam angles and anatomical features. Using a large set of clinically approved IMRT plans, a random forest regression algorithm is trained to map a multitude of anatomical features into an individual beam score. An optimization scheme is then built to select and adjust the beam angles, considering the learned interbeam dependencies. The validity and quality of the automatically selected beams evaluated using the manually selected beams from the corresponding clinical plans as the ground truth. Results: The analysis included 149 clinically approved thoracic IMRT plans. For a randomly selected test subset of 27 plans, IMRT plans were generated using automatically selected beams and compared to the clinical plans. The comparison of the predicted and the clinical beam angles demonstrated a good average correspondence between the two (angular distance 16.8° ± 10°, correlation 0.75 ± 0.2). The dose distributions of the semiautomatic and clinical plans were equivalent in terms of primary target volume coverage and organ at risk sparing and were superior over plans produced with fixed sets of common beam angles. The great majority of the automatic plans (93%) were approved as clinically acceptable by three radiation therapy specialists. Conclusions: The results demonstrated the feasibility of utilizing a learning-based approach for automatic selection of beam angles in thoracic IMRT planning. The proposed method may assist in reducing the manual planning workload, while sustaining plan quality.« less

  1. Association analysis using USDA diverse rice (Oryza sativa L.) germplasm collections to identify loci influencing grain quality traits

    USDA-ARS?s Scientific Manuscript database

    he USDA rice (Oryza sativa L.) core subset (RCS) was assembled to represent the genetic diversity of the entire USDA-ARS National Small Grains Collection and consists of 1,794 accessions from 114 countries. The USDA rice mini-core (MC) is a subset of 217 accessions from the RCS and was selected to ...

  2. Targeting Stereotyped B Cell Receptors from Chronic Lymphocytic Leukemia Patients with Synthetic Antigen Surrogates.

    PubMed

    Sarkar, Mohosin; Liu, Yun; Qi, Junpeng; Peng, Haiyong; Morimoto, Jumpei; Rader, Christoph; Chiorazzi, Nicholas; Kodadek, Thomas

    2016-04-01

    Chronic lymphocytic leukemia (CLL) is a disease in which a single B-cell clone proliferates relentlessly in peripheral lymphoid organs, bone marrow, and blood. DNA sequencing experiments have shown that about 30% of CLL patients have stereotyped antigen-specific B-cell receptors (BCRs) with a high level of sequence homology in the variable domains of the heavy and light chains. These include many of the most aggressive cases that haveIGHV-unmutated BCRs whose sequences have not diverged significantly from the germ line. This suggests a personalized therapy strategy in which a toxin or immune effector function is delivered selectively to the pathogenic B-cells but not to healthy B-cells. To execute this strategy, serum-stable, drug-like compounds able to target the antigen-binding sites of most or all patients in a stereotyped subset are required. We demonstrate here the feasibility of this approach with the discovery of selective, high affinity ligands for CLL BCRs of the aggressive, stereotyped subset 7P that cross-react with the BCRs of several CLL patients in subset 7p, but not with BCRs from patients outside this subset. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  3. Learning what matters: A neural explanation for the sparsity bias.

    PubMed

    Hassall, Cameron D; Connor, Patrick C; Trappenberg, Thomas P; McDonald, John J; Krigolson, Olave E

    2018-05-01

    The visual environment is filled with complex, multi-dimensional objects that vary in their value to an observer's current goals. When faced with multi-dimensional stimuli, humans may rely on biases to learn to select those objects that are most valuable to the task at hand. Here, we show that decision making in a complex task is guided by the sparsity bias: the focusing of attention on a subset of available features. Participants completed a gambling task in which they selected complex stimuli that varied randomly along three dimensions: shape, color, and texture. Each dimension comprised three features (e.g., color: red, green, yellow). Only one dimension was relevant in each block (e.g., color), and a randomly-chosen value ranking determined outcome probabilities (e.g., green > yellow > red). Participants were faster to respond to infrequent probe stimuli that appeared unexpectedly within stimuli that possessed a more valuable feature than to probes appearing within stimuli possessing a less valuable feature. Event-related brain potentials recorded during the task provided a neurophysiological explanation for sparsity as a learning-dependent increase in optimal attentional performance (as measured by the N2pc component of the human event-related potential) and a concomitant learning-dependent decrease in prediction errors (as measured by the feedback-elicited reward positivity). Together, our results suggest that the sparsity bias guides human reinforcement learning in complex environments. Copyright © 2018 Elsevier B.V. All rights reserved.

  4. Planckian Information (Ip): A New Measure of Order in Atoms, Enzymes, Cells, Brains, Human Societies, and the Cosmos

    NASA Astrophysics Data System (ADS)

    Ji, Sungchul

    A new mathematical formula referred to as the Planckian distribution equation (PDE) has been found to fit long-tailed histograms generated in various fields of studies, ranging from atomic physics to single-molecule enzymology, cell biology, brain neurobiology, glottometrics, econophysics, and to cosmology. PDE can be derived from a Gaussian-like equation (GLE) by non-linearly transforming its variable, x, while keeping the y coordinate constant. Assuming that GLE represents a random distribution (due to its symmetry), it is possible to define a binary logarithm of the ratio between the areas under the curves of PDE and GLE as a measure of the non-randomness (or order) underlying the biophysicochemical processes generating long-tailed histograms that fit PDE. This new function has been named the Planckian information, IP, which (i) may be a new measure of order that can be applied widely to both natural and human sciences and (ii) can serve as the opposite of the Boltzmann-Gibbs entropy, S, which is a measure of disorder. The possible rationales for the universality of PDE may include (i) the universality of the wave-particle duality embedded in PDE, (ii) the selection of subsets of random processes (thereby breaking the symmetry of GLE) as the basic mechanism of generating order, organization, and function, and (iii) the quantity-quality complementarity as the connection between PDE and Peircean semiotics.

  5. Lunar crater arcs. [origins, distribution and age classification of Pre-Imbrian families

    NASA Technical Reports Server (NTRS)

    Jaffe, L. D.; Bulkley, E. O.

    1976-01-01

    An analysis has been made of the tendency of large lunar craters to lie along circles. A catalog of the craters at least 50 km in diameter was prepared first, noting position, diameter, rim sharpness and completion, nature of underlying surface, and geological age. The subset of those craters 50-400 km in diameter was then used as input to computer programs which identified each 'family' of four or more craters of selected geological age lying on a circular arc. For comparison, families were also identified for randomized crater models in which the crater spatial density was matched to that on the moon, either overall or separately for mare and highland areas. The observed frequency of lunar arcuate families was statistically highly significantly greater than for the randomized models, for craters classified as either late-pre-Imbrian (Nectarian), middle pre-Imbrian, or early pre-Imbrian, as well as for a number of larger age-classes. The lunar families tend to center in specific areas of the moon; these lie in highlands rather than maria and are different for families of Nectarian craters than for pre-Nectarian. The origin of the arcuate crater groupings is not understood.

  6. Automatic detection of atrial fibrillation in cardiac vibration signals.

    PubMed

    Brueser, C; Diesel, J; Zink, M D H; Winter, S; Schauerte, P; Leonhardt, S

    2013-01-01

    We present a study on the feasibility of the automatic detection of atrial fibrillation (AF) from cardiac vibration signals (ballistocardiograms/BCGs) recorded by unobtrusive bedmounted sensors. The proposed system is intended as a screening and monitoring tool in home-healthcare applications and not as a replacement for ECG-based methods used in clinical environments. Based on BCG data recorded in a study with 10 AF patients, we evaluate and rank seven popular machine learning algorithms (naive Bayes, linear and quadratic discriminant analysis, support vector machines, random forests as well as bagged and boosted trees) for their performance in separating 30 s long BCG epochs into one of three classes: sinus rhythm, atrial fibrillation, and artifact. For each algorithm, feature subsets of a set of statistical time-frequency-domain and time-domain features were selected based on the mutual information between features and class labels as well as first- and second-order interactions among features. The classifiers were evaluated on a set of 856 epochs by means of 10-fold cross-validation. The best algorithm (random forests) achieved a Matthews correlation coefficient, mean sensitivity, and mean specificity of 0.921, 0.938, and 0.982, respectively.

  7. Evaluation of the effect of selective serotonin-reuptake inhibitors on lymphocyte subsets in patients with a major depressive disorder.

    PubMed

    Hernandez, Maria Eugenia; Martinez-Fong, Daniel; Perez-Tapia, Mayra; Estrada-Garcia, Iris; Estrada-Parra, Sergio; Pavón, Lenin

    2010-02-01

    To date, only the effect of a short-term antidepressant treatment (<12 weeks) on neuroendocrinoimmune alterations in patients with a major depressive disorder has been evaluated. Our objective was to determine the effect of a 52-week long treatment with selective serotonin-reuptake inhibitors on lymphocyte subsets. The participants were thirty-one patients and twenty-two healthy volunteers. The final number of patients (10) resulted from selection and course, as detailed in the enrollment scheme. Methods used to psychiatrically analyze the participants included the Mini-International Neuropsychiatric Interview, Hamilton Depression Scale and Beck Depression Inventory. The peripheral lymphocyte subsets were measured in peripheral blood using flow cytometry. Before treatment, increased counts of natural killer (NK) cells in patients were statistically significant when compared with those of healthy volunteers (312+/-29 versus 158+/-30; cells/mL), but no differences in the populations of T and B cells were found. The patients showed remission of depressive episodes after 20 weeks of treatment along with an increase in NK cell and B cell populations, which remained increased until the end of the study. At the 52nd week of treatment, patients showed an increase in the counts of NK cells (396+/-101 cells/mL) and B cells (268+/-64 cells/mL) compared to healthy volunteers (NK, 159+/-30 cells/mL; B cells, 179+/-37 cells/mL). We conclude that long-term treatment with selective serotonin-reuptake inhibitors not only causes remission of depressive symptoms, but also affects lymphocyte subset populations. The physiopathological consequence of these changes remains to be determined.

  8. Updating estimates of low streamflow statistics to account for possible trends

    NASA Astrophysics Data System (ADS)

    Blum, A. G.; Archfield, S. A.; Hirsch, R. M.; Vogel, R. M.; Kiang, J. E.; Dudley, R. W.

    2017-12-01

    Given evidence of both increasing and decreasing trends in low flows in many streams, methods are needed to update estimators of low flow statistics used in water resources management. One such metric is the 10-year annual low-flow statistic (7Q10) calculated as the annual minimum seven-day streamflow which is exceeded in nine out of ten years on average. Historical streamflow records may not be representative of current conditions at a site if environmental conditions are changing. We present a new approach to frequency estimation under nonstationary conditions that applies a stationary nonparametric quantile estimator to a subset of the annual minimum flow record. Monte Carlo simulation experiments were used to evaluate this approach across a range of trend and no trend scenarios. Relative to the standard practice of using the entire available streamflow record, use of a nonparametric quantile estimator combined with selection of the most recent 30 or 50 years for 7Q10 estimation were found to improve accuracy and reduce bias. Benefits of data subset selection approaches were greater for higher magnitude trends annual minimum flow records with lower coefficients of variation. A nonparametric trend test approach for subset selection did not significantly improve upon always selecting the last 30 years of record. At 174 stream gages in the Chesapeake Bay region, 7Q10 estimators based on the most recent 30 years of flow record were compared to estimators based on the entire period of record. Given the availability of long records of low streamflow, using only a subset of the flow record ( 30 years) can be used to update 7Q10 estimators to better reflect current streamflow conditions.

  9. Collectively loading an application in a parallel computer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

    Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.

  10. Non-native Speech Perception Training Using Vowel Subsets: Effects of Vowels in Sets and Order of Training

    PubMed Central

    Nishi, Kanae; Kewley-Port, Diane

    2008-01-01

    Purpose Nishi and Kewley-Port (2007) trained Japanese listeners to perceive nine American English monophthongs and showed that a protocol using all nine vowels (fullset) produced better results than the one using only the three more difficult vowels (subset). The present study extended the target population to Koreans and examined whether protocols combining the two stimulus sets would provide more effective training. Method Three groups of five Korean listeners were trained on American English vowels for nine days using one of the three protocols: fullset only, first three days on subset then six days on fullset, or first six days on fullset then three days on subset. Participants' performance was assessed by pre- and post-training tests, as well as by a mid-training test. Results 1) Fullset training was also effective for Koreans; 2) no advantage was found for the two combined protocols over the fullset only protocol, and 3) sustained “non-improvement” was observed for training using one of the combined protocols. Conclusions In using subsets for training American English vowels, care should be taken not only in the selection of subset vowels, but also for the training orders of subsets. PMID:18664694

  11. Whole grain intake, determined by dietary records and plasma alkylresorcinol concentrations, is low among pregnant women in Singapore.

    PubMed

    Ross, Alastair B; Colega, Marjorelee T; Lim, Ai Lin; Silva-Zolezzi, Irma; Macé, Katherine; Saw, Seang Mei; Kwek, Kenneth; Gluckman, Peter; Godfrey, Keith M; Chong, Yap-Seng; Chong, Mary F F

    2015-01-01

    To quantify whole grain intake in pregnant women in Singapore in order to provide the first detailed analysis of whole grain intake in an Asian country and in pregnant women. Analysis of 24-h diet recalls in a cross-sectional cohort study and analysis of a biomarker of whole grain intake (plasma alkylresorcinols) in a subset of subjects. The Growing Up in Singapore Towards healthy Outcomes-mother offspring cohort study based in Singapore. 998 pregnant mothers with complete 24-h recalls taken during their 26-28th week of gestation. Plasma samples from a randomly select subset of 100 subjects were analysed for plasma alkylresorcinols. Median (IQR) whole grain intake for the cohort and the 30% who reported eating whole grains were 0 (IQR 0, 9) and 23.6 (IQR 14.6, 44.2) g/day respectively. Plasma alkylresorcinol concentrations were very low [median (IQR)=9 (3, 15) nmol/L], suggesting low intake of whole grain wheat in this population. Plasma alkylresorcinols were correlated with whole grain wheat intake (Spearman's r=0.35; p<0.01). Whole grain intake among pregnant mothers in Singapore was well below the 2-3 (60-95 g) servings of whole grains per day recommended by the Singapore Health Promotion Board. Efforts to increase whole grain intake should be supported to encourage people to choose whole grains over refined grains in their diet.

  12. Rough sets and Laplacian score based cost-sensitive feature selection

    PubMed Central

    Yu, Shenglong

    2018-01-01

    Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of “good” features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms. PMID:29912884

  13. Rough sets and Laplacian score based cost-sensitive feature selection.

    PubMed

    Yu, Shenglong; Zhao, Hong

    2018-01-01

    Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of "good" features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.

  14. 2008 Niday Perinatal Database quality audit: report of a quality assurance project.

    PubMed

    Dunn, S; Bottomley, J; Ali, A; Walker, M

    2011-12-01

    This quality assurance project was designed to determine the reliability, completeness and comprehensiveness of the data entered into Niday Perinatal Database. Quality of the data was measured by comparing data re-abstracted from the patient record to the original data entered into the Niday Perinatal Database. A representative sample of hospitals in Ontario was selected and a random sample of 100 linked mother and newborn charts were audited for each site. A subset of 33 variables (representing 96 data fields) from the Niday dataset was chosen for re-abstraction. Of the data fields for which Cohen's kappa statistic or intraclass correlation coefficient (ICC) was calculated, 44% showed substantial or almost perfect agreement (beyond chance). However, about 17% showed less than 95% agreement and a kappa or ICC value of less than 60% indicating only slight, fair or moderate agreement (beyond chance). Recommendations to improve the quality of these data fields are presented.

  15. Practical characterization of quantum devices without tomography

    NASA Astrophysics Data System (ADS)

    Landon-Cardinal, Olivier; Flammia, Steven; Silva, Marcus; Liu, Yi-Kai; Poulin, David

    2012-02-01

    Quantum tomography is the main method used to assess the quality of quantum information processing devices, but its complexity presents a major obstacle for the characterization of even moderately large systems. Part of the reason for this complexity is that tomography generates much more information than is usually sought. Taking a more targeted approach, we develop schemes that enable (i) estimating the ?delity of an experiment to a theoretical ideal description, (ii) learning which description within a reduced subset best matches the experimental data. Both these approaches yield a signi?cant reduction in resources compared to tomography. In particular, we show how to estimate the ?delity between a predicted pure state and an arbitrary experimental state using only a constant number of Pauli expectation values selected at random according to an importance-weighting rule. In addition, we propose methods for certifying quantum circuits and learning continuous-time quantum dynamics that are described by local Hamiltonians or Lindbladians.

  16. Ultrastructural and functional fate of recycled vesicles in hippocampal synapses

    PubMed Central

    Rey, Stephanie A.; Smith, Catherine A.; Fowler, Milena W.; Crawford, Freya; Burden, Jemima J.; Staras, Kevin

    2015-01-01

    Efficient recycling of synaptic vesicles is thought to be critical for sustained information transfer at central terminals. However, the specific contribution that retrieved vesicles make to future transmission events remains unclear. Here we exploit fluorescence and time-stamped electron microscopy to track the functional and positional fate of vesicles endocytosed after readily releasable pool (RRP) stimulation in rat hippocampal synapses. We show that most vesicles are recovered near the active zone but subsequently take up random positions in the cluster, without preferential bias for future use. These vesicles non-selectively queue, advancing towards the release site with further stimulation in an actin-dependent manner. Nonetheless, the small subset of vesicles retrieved recently in the stimulus train persist nearer the active zone and exhibit more privileged use in the next RRP. Our findings reveal heterogeneity in vesicle fate based on nanoscale position and timing rules, providing new insights into the origins of future pool constitution. PMID:26292808

  17. Hematology and biochemistry reference intervals for Ontario commercial nursing pigs close to the time of weaning

    PubMed Central

    Perri, Amanda M.; O’Sullivan, Terri L.; Harding, John C.S.; Wood, R. Darren; Friendship, Robert M.

    2017-01-01

    The evaluation of pig hematology and biochemistry parameters is rarely done largely due to the costs associated with laboratory testing and labor, and the limited availability of reference intervals needed for interpretation. Within-herd and between-herd biological variation of these values also make it difficult to establish reference intervals. Regardless, baseline reference intervals are important to aid veterinarians in the interpretation of blood parameters for the diagnosis and treatment of diseased swine. The objective of this research was to provide reference intervals for hematology and biochemistry parameters of 3-week-old commercial nursing piglets in Ontario. A total of 1032 pigs lacking clinical signs of disease from 20 swine farms were sampled for hematology and iron panel evaluation, with biochemistry analysis performed on a subset of 189 randomly selected pigs. The 95% reference interval, mean, median, range, and 90% confidence intervals were calculated for each parameter. PMID:28373729

  18. Comparison of Different EHG Feature Selection Methods for the Detection of Preterm Labor

    PubMed Central

    Alamedine, D.; Khalil, M.; Marque, C.

    2013-01-01

    Numerous types of linear and nonlinear features have been extracted from the electrohysterogram (EHG) in order to classify labor and pregnancy contractions. As a result, the number of available features is now very large. The goal of this study is to reduce the number of features by selecting only the relevant ones which are useful for solving the classification problem. This paper presents three methods for feature subset selection that can be applied to choose the best subsets for classifying labor and pregnancy contractions: an algorithm using the Jeffrey divergence (JD) distance, a sequential forward selection (SFS) algorithm, and a binary particle swarm optimization (BPSO) algorithm. The two last methods are based on a classifier and were tested with three types of classifiers. These methods have allowed us to identify common features which are relevant for contraction classification. PMID:24454536

  19. HIV-1 protease cleavage site prediction based on two-stage feature selection method.

    PubMed

    Niu, Bing; Yuan, Xiao-Cheng; Roeper, Preston; Su, Qiang; Peng, Chun-Rong; Yin, Jing-Yuan; Ding, Juan; Li, HaiPeng; Lu, Wen-Cong

    2013-03-01

    Knowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method. Thirty important biochemical features were found based on a jackknife test from the original data set containing 4,248 features. By using the AdaBoost method with the thirty selected features the prediction model yields an accuracy of 96.7% for the jackknife test and 92.1% for an independent set test, with increased accuracy over the original dataset by 6.7% and 77.4%, respectively. Our feature selection scheme could be a useful technique for finding effective competitive inhibitors of HIV protease.

  20. Mediators of effects of a selective family-focused violence prevention approach for middle school students.

    PubMed

    2012-02-01

    This study examined how parenting and family characteristics targeted in a selective prevention program mediated effects on key youth proximal outcomes related to violence perpetration. The selective intervention was evaluated within the context of a multi-site trial involving random assignment of 37 schools to four conditions: a universal intervention composed of a student social-cognitive curriculum and teacher training, a selective family-focused intervention with a subset of high-risk students, a condition combining these two interventions, and a no-intervention control condition. Two cohorts of sixth-grade students (total N = 1,062) exhibiting high levels of aggression and social influence were the sample for this study. Analyses of pre-post change compared to controls using intent-to-treat analyses found no significant effects. However, estimates incorporating participation of those assigned to the intervention and predicted participation among those not assigned revealed significant positive effects on student aggression, use of aggressive strategies for conflict management, and parental estimation of student's valuing of achievement. Findings also indicated intervention effects on two targeted family processes: discipline practices and family cohesion. Mediation analyses found evidence that change in these processes mediated effects on some outcomes, notably aggressive behavior and valuing of school achievement. Results support the notion that changing parenting practices and the quality of family relationships can prevent the escalation in aggression and maintain positive school engagement for high-risk youth.

  1. Selective principal component regression analysis of fluorescence hyperspectral image to assess aflatoxin contamination in corn

    USDA-ARS?s Scientific Manuscript database

    Selective principal component regression analysis (SPCR) uses a subset of the original image bands for principal component transformation and regression. For optimal band selection before the transformation, this paper used genetic algorithms (GA). In this case, the GA process used the regression co...

  2. Marginal semi-supervised sub-manifold projections with informative constraints for dimensionality reduction and recognition.

    PubMed

    Zhang, Zhao; Zhao, Mingbo; Chow, Tommy W S

    2012-12-01

    In this work, sub-manifold projections based semi-supervised dimensionality reduction (DR) problem learning from partial constrained data is discussed. Two semi-supervised DR algorithms termed Marginal Semi-Supervised Sub-Manifold Projections (MS³MP) and orthogonal MS³MP (OMS³MP) are proposed. MS³MP in the singular case is also discussed. We also present the weighted least squares view of MS³MP. Based on specifying the types of neighborhoods with pairwise constraints (PC) and the defined manifold scatters, our methods can preserve the local properties of all points and discriminant structures embedded in the localized PC. The sub-manifolds of different classes can also be separated. In PC guided methods, exploring and selecting the informative constraints is challenging and random constraint subsets significantly affect the performance of algorithms. This paper also introduces an effective technique to select the informative constraints for DR with consistent constraints. The analytic form of the projection axes can be obtained by eigen-decomposition. The connections between this work and other related work are also elaborated. The validity of the proposed constraint selection approach and DR algorithms are evaluated by benchmark problems. Extensive simulations show that our algorithms can deliver promising results over some widely used state-of-the-art semi-supervised DR techniques. Copyright © 2012 Elsevier Ltd. All rights reserved.

  3. Selecting the optimum plot size for a California design-based stream and wetland mapping program.

    PubMed

    Lackey, Leila G; Stein, Eric D

    2014-04-01

    Accurate estimates of the extent and distribution of wetlands and streams are the foundation of wetland monitoring, management, restoration, and regulatory programs. Traditionally, these estimates have relied on comprehensive mapping. However, this approach is prohibitively resource-intensive over large areas, making it both impractical and statistically unreliable. Probabilistic (design-based) approaches to evaluating status and trends provide a more cost-effective alternative because, compared with comprehensive mapping, overall extent is inferred from mapping a statistically representative, randomly selected subset of the target area. In this type of design, the size of sample plots has a significant impact on program costs and on statistical precision and accuracy; however, no consensus exists on the appropriate plot size for remote monitoring of stream and wetland extent. This study utilized simulated sampling to assess the performance of four plot sizes (1, 4, 9, and 16 km(2)) for three geographic regions of California. Simulation results showed smaller plot sizes (1 and 4 km(2)) were most efficient for achieving desired levels of statistical accuracy and precision. However, larger plot sizes were more likely to contain rare and spatially limited wetland subtypes. Balancing these considerations led to selection of 4 km(2) for the California status and trends program.

  4. An efficient voting algorithm for finding additive biclusters with random background.

    PubMed

    Xiao, Jing; Wang, Lusheng; Liu, Xiaowen; Jiang, Tao

    2008-12-01

    The biclustering problem has been extensively studied in many areas, including e-commerce, data mining, machine learning, pattern recognition, statistics, and, more recently, computational biology. Given an n x m matrix A (n >or= m), the main goal of biclustering is to identify a subset of rows (called objects) and a subset of columns (called properties) such that some objective function that specifies the quality of the found bicluster (formed by the subsets of rows and of columns of A) is optimized. The problem has been proved or conjectured to be NP-hard for various objective functions. In this article, we study a probabilistic model for the implanted additive bicluster problem, where each element in the n x m background matrix is a random integer from [0, L - 1] for some integer L, and a k x k implanted additive bicluster is obtained from an error-free additive bicluster by randomly changing each element to a number in [0, L - 1] with probability theta. We propose an O(n(2)m) time algorithm based on voting to solve the problem. We show that when k >or= Omega(square root of (n log n)), the voting algorithm can correctly find the implanted bicluster with probability at least 1 - (9/n(2)). We also implement our algorithm as a C++ program named VOTE. The implementation incorporates several ideas for estimating the size of an implanted bicluster, adjusting the threshold in voting, dealing with small biclusters, and dealing with overlapping implanted biclusters. Our experimental results on both simulated and real datasets show that VOTE can find biclusters with a high accuracy and speed.

  5. Decomposition of complex microbial behaviors into resource-based stress responses

    PubMed Central

    Carlson, Ross P.

    2009-01-01

    Motivation: Highly redundant metabolic networks and experimental data from cultures likely adapting simultaneously to multiple stresses can complicate the analysis of cellular behaviors. It is proposed that the explicit consideration of these factors is critical to understanding the competitive basis of microbial strategies. Results: Wide ranging, seemingly unrelated Escherichia coli physiological fluxes can be simply and accurately described as linear combinations of a few ecologically relevant stress adaptations. These strategies were identified by decomposing the central metabolism of E.coli into elementary modes (mathematically defined biochemical pathways) and assessing the resource investment cost–benefit properties for each pathway. The approach capitalizes on the inherent tradeoffs related to investing finite resources like nitrogen into different pathway enzymes when the pathways have varying metabolic efficiencies. The subset of ecologically competitive pathways represented 0.02% of the total permissible pathways. The biological relevance of the assembled strategies was tested against 10 000 randomly constructed pathway subsets. None of the randomly assembled collections were able to describe all of the considered experimental data as accurately as the cost-based subset. The results suggest these metabolic strategies are biologically significant. The current descriptions were compared with linear programming (LP)-based flux descriptions using the Euclidean distance metric. The current study's pathway subset described the experimental fluxes with better accuracy than the LP results without having to test multiple objective functions or constraints and while providing additional ecological insight into microbial behavior. The assembled pathways seem to represent a generalized set of strategies that can describe a wide range of microbial responses and hint at evolutionary processes where a handful of successful metabolic strategies are utilized simultaneously in different combinations to adapt to diverse conditions. Contact: rossc@biofilms.montana.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19008248

  6. What MISR data are available for field experiments?

    Atmospheric Science Data Center

    2014-12-08

    MISR data and imagery are available for many field campaigns. Select data products are subset for the region and dates of interest. Special gridded regional products may be available as well as Local Mode data for select targets...

  7. Post-traumatic stress disorder in older adults: a systematic review of the psychotherapy treatment literature.

    PubMed

    Dinnen, Stephanie; Simiola, Vanessa; Cook, Joan M

    2015-01-01

    Older adults represent the fastest growing segment of the US and industrialized populations. However, older adults have generally not been included in randomized clinical trials of psychotherapy for post-traumatic stress disorder (PTSD). This review examined reports of psychological treatment for trauma-related problems, primarily PTSD, in studies with samples of at least 50% adults aged 55 and older using standardized measures. A systematic review of the literature was conducted on psychotherapy for PTSD with older adults using PubMed, Medline, PsychInfo, CINAHL, PILOTS, and Google Scholar. A total of 42 studies were retrieved for full review; 22 were excluded because they did not provide at least one outcome measure or results were not reported by age in the case of mixed-age samples. Of the 20 studies that met review criteria, there were: 13 case studies or series, three uncontrolled pilot studies, two randomized clinical trials, one non-randomized concurrent control study and one post hoc effectiveness study. Significant methodological limitations in the current older adult PTSD treatment outcome literature were found reducing its internal validity and generalizability, including non-randomized research designs, lack of comparison conditions and small sample sizes. Select evidence-based interventions validated in younger and middle-aged populations appear acceptable and efficacious with older adults. There are few treatment studies on subsets of the older adult population including cultural and ethnic minorities, women, the oldest old (over 85), and those who are cognitively impaired. Implications for clinical practice and future research directions are discussed.

  8. Combined prolonged exposure therapy and paroxetine for PTSD related to the World Trade Center attack: a randomized controlled trial.

    PubMed

    Schneier, Franklin R; Neria, Yuval; Pavlicova, Martina; Hembree, Elizabeth; Suh, Eun Jung; Amsel, Lawrence; Marshall, Randall D

    2012-01-01

    Selective serotonin reuptake inhibitors (SSRIs) are often recommended in combination with established cognitive-behavioral therapies (CBTs) for posttraumatic stress disorder (PTSD), but combined initial treatment of PTSD has not been studied under controlled conditions. There are also few studies of either SSRIs or CBT in treating PTSD related to terrorism. The authors compared prolonged exposure therapy (a CBT) plus paroxetine (an SSRI) with prolonged exposure plus placebo in the treatment of terrorism-related PTSD. Adult survivors of the World Trade Center attack of September 11, 2001, with PTSD were randomly assigned to 10 weeks of treatment with prolonged exposure (10 sessions) plus paroxetine (N=19) or prolonged exposure plus placebo (N=18). After week 10, patients discontinued prolonged exposure and were offered 12 additional weeks of continued randomized treatment. Patients treated with prolonged exposure plus paroxetine experienced significantly greater improvement in PTSD symptoms (incidence rate ratio=0.50, 95% CI=0.30-0.85) and remission status (odds ratio=12.6, 95% CI=1.23-129) during 10 weeks of combined treatment than patients treated with prolonged exposure plus placebo. Response rate and quality of life were also significantly more improved with combined treatment. The subset of patients who continued randomized treatment for 12 additional weeks showed no group differences. Initial treatment with paroxetine plus prolonged exposure was more efficacious than prolonged exposure plus placebo for PTSD related to the World Trade Center attack. Combined treatment medication and prolonged exposure therapy deserves further study in larger samples with diverse forms of PTSD and over longer follow-up periods.

  9. Feature selection with harmony search.

    PubMed

    Diao, Ren; Shen, Qiang

    2012-12-01

    Many search strategies have been exploited for the task of feature selection (FS), in an effort to identify more compact and better quality subsets. Such work typically involves the use of greedy hill climbing (HC), or nature-inspired heuristics, in order to discover the optimal solution without going through exhaustive search. In this paper, a novel FS approach based on harmony search (HS) is presented. It is a general approach that can be used in conjunction with many subset evaluation techniques. The simplicity of HS is exploited to reduce the overall complexity of the search process. The proposed approach is able to escape from local solutions and identify multiple solutions owing to the stochastic nature of HS. Additional parameter control schemes are introduced to reduce the effort and impact of parameter configuration. These can be further combined with the iterative refinement strategy, tailored to enforce the discovery of quality subsets. The resulting approach is compared with those that rely on HC, genetic algorithms, and particle swarm optimization, accompanied by in-depth studies of the suggested improvements.

  10. Adoptive therapy with chimeric antigen receptor-modified T cells of defined subset composition.

    PubMed

    Riddell, Stanley R; Sommermeyer, Daniel; Berger, Carolina; Liu, Lingfeng Steven; Balakrishnan, Ashwini; Salter, Alex; Hudecek, Michael; Maloney, David G; Turtle, Cameron J

    2014-01-01

    The ability to engineer T cells to recognize tumor cells through genetic modification with a synthetic chimeric antigen receptor has ushered in a new era in cancer immunotherapy. The most advanced clinical applications are in targeting CD19 on B-cell malignancies. The clinical trials of CD19 chimeric antigen receptor therapy have thus far not attempted to select defined subsets before transduction or imposed uniformity of the CD4 and CD8 cell composition of the cell products. This review will discuss the rationale for and challenges to using adoptive therapy with genetically modified T cells of defined subset and phenotypic composition.

  11. Safety and pharmacodynamics of venetoclax (ABT-199) in a randomized single and multiple ascending dose study in women with systemic lupus erythematosus.

    PubMed

    Lu, P; Fleischmann, R; Curtis, C; Ignatenko, S; Clarke, S H; Desai, M; Wong, S L; Grebe, K M; Black, K; Zeng, J; Stolzenbach, J; Medema, J K

    2018-02-01

    Objective The anti-apoptotic protein B-cell lymphoma 2 (Bcl-2) may contribute to the pathogenesis of systemic lupus erythematosus. The safety, tolerability, and pharmacodynamics of the selective Bcl-2 inhibitor venetoclax (ABT-199) were assessed in women with systemic lupus erythematosus. Methods A phase 1, double-blind, randomized, placebo controlled study evaluated single ascending doses (10, 30, 90, 180, 300, and 500 mg) and multiple ascending doses (2 cycles; 30, 60, 120, 240, 400, and 600 mg for 1 week, and then 3 weeks off per cycle) of orally administered venetoclax. Eligible participants were aged 18-65 years with a diagnosis of systemic lupus erythematosus for 6 months or more receiving stable therapy for systemic lupus erythematosus (which could have included corticosteroids and/or stable antimalarials). Results All patients (48/48) completed the single ascending dose, 25 continued into the multiple ascending dose, and 44/50 completed the multiple ascending dose; two of the withdrawals (venetoclax 60 mg and 600 mg cohorts) were due to adverse events. Adverse event incidences were slightly higher in the venetoclax groups compared with the placebo groups, with no dose dependence. There were no serious adverse events with venetoclax. The most common adverse events were headache, nausea, and fatigue. Venetoclax 600 mg multiple ascending dose treatment depleted total lymphocytes and B cells by approximately 50% and 80%, respectively. Naive, switched memory, and memory B-cell subsets enriched in autoreactive B cells exhibited dose-dependent reduction of up to approximately 80%. There were no consistent or marked changes in neutrophils, natural killer cells, hemoglobin, or platelets. Conclusions Venetoclax was generally well tolerated in women with systemic lupus erythematosus and reduced total lymphocytes and disease-relevant subsets of antigen-experienced B cells. Registration ClinicalTrials.gov: NCT01686555.

  12. Pre-cART Elevation of CRP and CD4+ T-Cell Immune Activation Associated With HIV Clinical Progression in a Multinational Case-Cohort Study.

    PubMed

    Balagopal, Ashwin; Asmuth, David M; Yang, Wei-Teng; Campbell, Thomas B; Gupte, Nikhil; Smeaton, Laura; Kanyama, Cecilia; Grinsztejn, Beatriz; Santos, Breno; Supparatpinyo, Khuanchai; Badal-Faesen, Sharlaa; Lama, Javier R; Lalloo, Umesh G; Zulu, Fatima; Pawar, Jyoti S; Riviere, Cynthia; Kumarasamy, Nagalingeswaran; Hakim, James; Li, Xiao-Dong; Pollard, Richard B; Semba, Richard D; Thomas, David L; Bollinger, Robert C; Gupta, Amita

    2015-10-01

    Despite the success of combination antiretroviral therapy (cART), a subset of HIV-infected patients who initiate cART develop early clinical progression to AIDS; therefore, some cART initiators are not fully benefitted by cART. Immune activation pre-cART may predict clinical progression in cART initiators. A case-cohort study (n = 470) within the multinational Prospective Evaluation of Antiretrovirals in Resource-Limited Settings clinical trial (1571 HIV treatment-naive adults who initiated cART; CD4 T-cell count <300 cells/mm; 9 countries) was conducted. A subcohort of 30 participants per country was randomly selected; additional cases were added from the main cohort. Cases [n = 236 (random subcohort 36; main cohort 200)] had clinical progression (incident WHO stage 3/4 event or death) within 96 weeks after cART initiation. Immune activation biomarkers were quantified pre-cART. Associations between biomarkers and clinical progression were examined using weighted multivariable Cox-proportional hazards models. Median age was 35 years, 45% were women, 49% black, 31% Asian, and 9% white. Median CD4 T-cell count was 167 cells per cubic millimeter. In multivariate analysis, highest quartile C-reactive protein concentration [adjusted hazard ratio (aHR), 2.53; 95% confidence interval (CI): 1.02 to 6.28] and CD4 T-cell activation (aHR, 5.18; 95% CI: 1.09 to 24.47) were associated with primary outcomes, compared with lowest quartiles. sCD14 had a trend toward association with clinical failure (aHR, 2.24; 95% CI: 0.96 to 5.21). Measuring C-reactive protein and CD4 T-cell activation may identify patients with CD4 T-cell counts <300 cells per cubic millimeter at risk for early clinical progression when initiating cART. Additional vigilance and symptom-based screening may be required in this subset of patients even after beginning cART.

  13. Pre-cART Elevation of CRP and CD4+ T-cell Immune Activation Associated with HIV Clinical Progression in a Multinational Case-Cohort Study

    PubMed Central

    Balagopal, Ashwin; Asmuth, David M.; Yang, Wei-Teng; Campbell, Thomas B.; Gupte, Nikhil; Smeaton, Laura; Kanyama, Cecilia; Grinsztejn, Beatriz; Santos, Breno; Supparatpinyo, Khuanchai; Badal-Faesen, Sharlaa; Lama, Javier R.; Lalloo, Umesh G.; Zulu, Fatima; Pawar, Jyoti S; Riviere, Cynthia; Kumarasamy, Nagalingeswaran; Hakim, James; Li, Xiao-Dong; Pollard, Richard B.; Semba, Richard D.; Thomas, David L.; Bollinger, Robert C.; Gupta, Amita

    2015-01-01

    Background Despite the success of combination antiretroviral therapy (cART), a subset of HIV-infected patients who initiate cART develop early clinical progression to AIDS; therefore some cART initiators are not fully benefitted by cART. Immune activation pre-cART may predict clinical progression in cART initiators. Methods A case-cohort study (n=470) within the multinational Prospective Evaluation of Antiretrovirals in Resource-Limited Settings (PEARLS) clinical trial (1571 HIV treatment-naïve adults who initiated cART; CD4+ T cell count <300 cells/mm3; nine countries) was conducted. A subcohort of 30 participants/country was randomly selected; additional cases were added from the main cohort. Cases (n=236 [random subcohort–36; main cohort–200]) had clinical progression (incident WHO Stage 3/4 event or death) within 96 weeks following cART initiation. Immune activation biomarkers were quantified pre-cART. Associations between biomarkers and clinical progression were examined using weighted multivariable Cox-proportional hazards models. Results Median age was 35 years, 45% were women, 49% black, 31% Asian, and 9% white. Median CD4+ T-cell count was 167 cells/mm3. In multivariate analysis, highest quartile CRP concentration (adjusted hazards ratio [aHR] 2.53, 95%CI 1.02-6.28) and CD4+ T-cell activation (aHR 5.18, 95CI 1.09-24.47) were associated with primary outcomes, compared to lowest quartiles. sCD14 had a trend towards association with clinical failure (aHR 2.24, 95%CI 0.96–5.21). Conclusions Measuring CRP and CD4+ T-cell activation may identify patients with CD4+ T cell counts < 300 cells/mm3 at risk for early clinical progression when initiating cART. Additional vigilance and symptom-based screening may be required in this subset of patients even after beginning cART. PMID:26017661

  14. Selection-Fusion Approach for Classification of Datasets with Missing Values

    PubMed Central

    Ghannad-Rezaie, Mostafa; Soltanian-Zadeh, Hamid; Ying, Hao; Dong, Ming

    2010-01-01

    This paper proposes a new approach based on missing value pattern discovery for classifying incomplete data. This approach is particularly designed for classification of datasets with a small number of samples and a high percentage of missing values where available missing value treatment approaches do not usually work well. Based on the pattern of the missing values, the proposed approach finds subsets of samples for which most of the features are available and trains a classifier for each subset. Then, it combines the outputs of the classifiers. Subset selection is translated into a clustering problem, allowing derivation of a mathematical framework for it. A trade off is established between the computational complexity (number of subsets) and the accuracy of the overall classifier. To deal with this trade off, a numerical criterion is proposed for the prediction of the overall performance. The proposed method is applied to seven datasets from the popular University of California, Irvine data mining archive and an epilepsy dataset from Henry Ford Hospital, Detroit, Michigan (total of eight datasets). Experimental results show that classification accuracy of the proposed method is superior to those of the widely used multiple imputations method and four other methods. They also show that the level of superiority depends on the pattern and percentage of missing values. PMID:20212921

  15. Evaluation of a developmental hierarchy for breast cancer cells to assess risk-based patient selection for targeted treatment.

    PubMed

    Bliss, Sarah A; Paul, Sunirmal; Pobiarzyn, Piotr W; Ayer, Seda; Sinha, Garima; Pant, Saumya; Hilton, Holly; Sharma, Neha; Cunha, Maria F; Engelberth, Daniel J; Greco, Steven J; Bryan, Margarette; Kucia, Magdalena J; Kakar, Sham S; Ratajczak, Mariusz Z; Rameshwar, Pranela

    2018-01-10

    This study proposes that a novel developmental hierarchy of breast cancer (BC) cells (BCCs) could predict treatment response and outcome. The continued challenge to treat BC requires stratification of BCCs into distinct subsets. This would provide insights on how BCCs evade treatment and adapt dormancy for decades. We selected three subsets, based on the relative expression of octamer-binding transcription factor 4 A (Oct4A) and then analysed each with Affymetrix gene chip. Oct4A is a stem cell gene and would separate subsets based on maturation. Data analyses and gene validation identified three membrane proteins, TMEM98, GPR64 and FAT4. BCCs from cell lines and blood from BC patients were analysed for these three membrane proteins by flow cytometry, along with known markers of cancer stem cells (CSCs), CD44, CD24 and Oct4, aldehyde dehydrogenase 1 (ALDH1) activity and telomere length. A novel working hierarchy of BCCs was established with the most immature subset as CSCs. This group was further subdivided into long- and short-term CSCs. Analyses of 20 post-treatment blood indicated that circulating CSCs and early BC progenitors may be associated with recurrence or early death. These results suggest that the novel hierarchy may predict treatment response and prognosis.

  16. Microbiota of the Small Intestine Is Selectively Engulfed by Phagocytes of the Lamina Propria and Peyer’s Patches

    PubMed Central

    Morikawa, Masatoshi; Tsujibe, Satoshi; Kiyoshima-Shibata, Junko; Watanabe, Yohei; Kato-Nagaoka, Noriko; Shida, Kan; Matsumoto, Satoshi

    2016-01-01

    Phagocytes such as dendritic cells and macrophages, which are distributed in the small intestinal mucosa, play a crucial role in maintaining mucosal homeostasis by sampling the luminal gut microbiota. However, there is limited information regarding microbial uptake in a steady state. We investigated the composition of murine gut microbiota that is engulfed by phagocytes of specific subsets in the small intestinal lamina propria (SILP) and Peyer’s patches (PP). Analysis of bacterial 16S rRNA gene amplicon sequences revealed that: 1) all the phagocyte subsets in the SILP primarily engulfed Lactobacillus (the most abundant microbe in the small intestine), whereas CD11bhi and CD11bhiCD11chi cell subsets in PP mostly engulfed segmented filamentous bacteria (indigenous bacteria in rodents that are reported to adhere to intestinal epithelial cells); and 2) among the Lactobacillus species engulfed by the SILP cell subsets, L. murinus was engulfed more frequently than L. taiwanensis, although both these Lactobacillus species were abundant in the small intestine under physiological conditions. These results suggest that small intestinal microbiota is selectively engulfed by phagocytes that localize in the adjacent intestinal mucosa in a steady state. These observations may provide insight into the crucial role of phagocytes in immune surveillance of the small intestinal mucosa. PMID:27701454

  17. Microbiota of the Small Intestine Is Selectively Engulfed by Phagocytes of the Lamina Propria and Peyer's Patches.

    PubMed

    Morikawa, Masatoshi; Tsujibe, Satoshi; Kiyoshima-Shibata, Junko; Watanabe, Yohei; Kato-Nagaoka, Noriko; Shida, Kan; Matsumoto, Satoshi

    2016-01-01

    Phagocytes such as dendritic cells and macrophages, which are distributed in the small intestinal mucosa, play a crucial role in maintaining mucosal homeostasis by sampling the luminal gut microbiota. However, there is limited information regarding microbial uptake in a steady state. We investigated the composition of murine gut microbiota that is engulfed by phagocytes of specific subsets in the small intestinal lamina propria (SILP) and Peyer's patches (PP). Analysis of bacterial 16S rRNA gene amplicon sequences revealed that: 1) all the phagocyte subsets in the SILP primarily engulfed Lactobacillus (the most abundant microbe in the small intestine), whereas CD11bhi and CD11bhiCD11chi cell subsets in PP mostly engulfed segmented filamentous bacteria (indigenous bacteria in rodents that are reported to adhere to intestinal epithelial cells); and 2) among the Lactobacillus species engulfed by the SILP cell subsets, L. murinus was engulfed more frequently than L. taiwanensis, although both these Lactobacillus species were abundant in the small intestine under physiological conditions. These results suggest that small intestinal microbiota is selectively engulfed by phagocytes that localize in the adjacent intestinal mucosa in a steady state. These observations may provide insight into the crucial role of phagocytes in immune surveillance of the small intestinal mucosa.

  18. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres-Focus on Feature Selection.

    PubMed

    Zawbaa, Hossam M; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

    2016-01-01

    Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven.

  19. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres—Focus on Feature Selection

    PubMed Central

    Zawbaa, Hossam M.; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

    2016-01-01

    Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven. PMID:27315205

  20. Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: comparison of methods in two diverse groups of maize inbreds (Zea mays L.).

    PubMed

    Rincent, R; Laloë, D; Nicolas, S; Altmann, T; Brunel, D; Revilla, P; Rodríguez, V M; Moreno-Gonzalez, J; Melchinger, A; Bauer, E; Schoen, C-C; Meyer, N; Giauffret, C; Bauland, C; Jamin, P; Laborde, J; Monod, H; Flament, P; Charcosset, A; Moreau, L

    2012-10-01

    Genomic selection refers to the use of genotypic information for predicting breeding values of selection candidates. A prediction formula is calibrated with the genotypes and phenotypes of reference individuals constituting the calibration set. The size and the composition of this set are essential parameters affecting the prediction reliabilities. The objective of this study was to maximize reliabilities by optimizing the calibration set. Different criteria based on the diversity or on the prediction error variance (PEV) derived from the realized additive relationship matrix-best linear unbiased predictions model (RA-BLUP) were used to select the reference individuals. For the latter, we considered the mean of the PEV of the contrasts between each selection candidate and the mean of the population (PEVmean) and the mean of the expected reliabilities of the same contrasts (CDmean). These criteria were tested with phenotypic data collected on two diversity panels of maize (Zea mays L.) genotyped with a 50k SNPs array. In the two panels, samples chosen based on CDmean gave higher reliabilities than random samples for various calibration set sizes. CDmean also appeared superior to PEVmean, which can be explained by the fact that it takes into account the reduction of variance due to the relatedness between individuals. Selected samples were close to optimality for a wide range of trait heritabilities, which suggests that the strategy presented here can efficiently sample subsets in panels of inbred lines. A script to optimize reference samples based on CDmean is available on request.

  1. Prospects for Genomic Selection in Cassava Breeding.

    PubMed

    Wolfe, Marnin D; Del Carpio, Dunia Pino; Alabi, Olumide; Ezenwaka, Lydia C; Ikeogu, Ugochukwu N; Kayondo, Ismail S; Lozano, Roberto; Okeke, Uche G; Ozimati, Alfred A; Williams, Esuma; Egesi, Chiedozie; Kawuki, Robert S; Kulakow, Peter; Rabbi, Ismail Y; Jannink, Jean-Luc

    2017-11-01

    Cassava ( Crantz) is a clonally propagated staple food crop in the tropics. Genomic selection (GS) has been implemented at three breeding institutions in Africa to reduce cycle times. Initial studies provided promising estimates of predictive abilities. Here, we expand on previous analyses by assessing the accuracy of seven prediction models for seven traits in three prediction scenarios: cross-validation within populations, cross-population prediction and cross-generation prediction. We also evaluated the impact of increasing the training population (TP) size by phenotyping progenies selected either at random or with a genetic algorithm. Cross-validation results were mostly consistent across programs, with nonadditive models predicting of 10% better on average. Cross-population accuracy was generally low (mean = 0.18) but prediction of cassava mosaic disease increased up to 57% in one Nigerian population when data from another related population were combined. Accuracy across generations was poorer than within-generation accuracy, as expected, but accuracy for dry matter content and mosaic disease severity should be sufficient for rapid-cycling GS. Selection of a prediction model made some difference across generations, but increasing TP size was more important. With a genetic algorithm, selection of one-third of progeny could achieve an accuracy equivalent to phenotyping all progeny. We are in the early stages of GS for this crop but the results are promising for some traits. General guidelines that are emerging are that TPs need to continue to grow but phenotyping can be done on a cleverly selected subset of individuals, reducing the overall phenotyping burden. Copyright © 2017 Crop Science Society of America.

  2. Nanolaminate microfluidic device for mobility selection of particles

    DOEpatents

    Surh, Michael P [Livermore, CA; Wilson, William D [Pleasanton, CA; Barbee, Jr., Troy W.; Lane, Stephen M [Oakland, CA

    2006-10-10

    A microfluidic device made from nanolaminate materials that are capable of electrophoretic selection of particles on the basis of their mobility. Nanolaminate materials are generally alternating layers of two materials (one conducting, one insulating) that are made by sputter coating a flat substrate with a large number of layers. Specific subsets of the conducting layers are coupled together to form a single, extended electrode, interleaved with other similar electrodes. Thereby, the subsets of conducting layers may be dynamically charged to create time-dependent potential fields that can trap or transport charge colloidal particles. The addition of time-dependence is applicable to all geometries of nanolaminate electrophoretic and electrochemical designs from sinusoidal to nearly step-like.

  3. Selective outcome reporting and sponsorship in randomized controlled trials in IVF and ICSI.

    PubMed

    Braakhekke, M; Scholten, I; Mol, F; Limpens, J; Mol, B W; van der Veen, F

    2017-10-01

    Are randomized controlled trials (RCTs) on IVF and ICSI subject to selective outcome reporting and is this related to sponsorship? There are inconsistencies, independent from sponsorship, in the reporting of primary outcome measures in the majority of IVF and ICSI trials, indicating selective outcome reporting. RCTs are subject to bias at various levels. Of these biases, selective outcome reporting is particularly relevant to IVF and ICSI trials since there is a wide variety of outcome measures to choose from. An established cause of reporting bias is sponsorship. It is, at present, unknown whether RCTs in IVF/ICSI are subject to selective outcome reporting and whether this is related with sponsorship. We systematically searched RCTs on IVF and ICSI published between January 2009 and March 2016 in MEDLINE, EMBASE, the Cochrane Central Register of Controlled Trials and the publisher subset of PubMed. We analysed 415 RCTs. Per included RCT, we extracted data on impact factor of the journal, sample size, power calculation, and trial registry and thereafter data on primary outcome measure, the direction of trial results and sponsorship. Of the 415 identified RCTs, 235 were excluded for our primary analysis, because the sponsorship was not reported. Of the 180 RCTs included in our analysis, 7 trials did not report on any primary outcome measure and 107 of the remaining 173 trials (62%) reported on surrogate primary outcome measures. Of the 114 registered trials, 21 trials (18%) provided primary outcomes in their manuscript that were different from those in the trial registry. This indicates selective outcome reporting. We found no association between selective outcome reporting and sponsorship. We ran additional analyses to include the trials that had not reported sponsorship and found no outcomes that differed from our primary analysis. Since the majority of the trials did not report on sponsorship, there is a risk on sampling bias. IVF and ICSI trials are subject, to a large extent, to selective outcome reporting. Readers should be aware of this to avoid implementation of false or misleading results in clinical practice. No funding received and there are no conflicts of interest. N/A. © The Author 2017. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  4. Regularity of random attractors for fractional stochastic reaction-diffusion equations on Rn

    NASA Astrophysics Data System (ADS)

    Gu, Anhui; Li, Dingshi; Wang, Bixiang; Yang, Han

    2018-06-01

    We investigate the regularity of random attractors for the non-autonomous non-local fractional stochastic reaction-diffusion equations in Hs (Rn) with s ∈ (0 , 1). We prove the existence and uniqueness of the tempered random attractor that is compact in Hs (Rn) and attracts all tempered random subsets of L2 (Rn) with respect to the norm of Hs (Rn). The main difficulty is to show the pullback asymptotic compactness of solutions in Hs (Rn) due to the noncompactness of Sobolev embeddings on unbounded domains and the almost sure nondifferentiability of the sample paths of the Wiener process. We establish such compactness by the ideas of uniform tail-estimates and the spectral decomposition of solutions in bounded domains.

  5. Variable Selection for Road Segmentation in Aerial Images

    NASA Astrophysics Data System (ADS)

    Warnke, S.; Bulatov, D.

    2017-05-01

    For extraction of road pixels from combined image and elevation data, Wegner et al. (2015) proposed classification of superpixels into road and non-road, after which a refinement of the classification results using minimum cost paths and non-local optimization methods took place. We believed that the variable set used for classification was to a certain extent suboptimal, because many variables were redundant while several features known as useful in Photogrammetry and Remote Sensing are missed. This motivated us to implement a variable selection approach which builds a model for classification using portions of training data and subsets of features, evaluates this model, updates the feature set, and terminates when a stopping criterion is satisfied. The choice of classifier is flexible; however, we tested the approach with Logistic Regression and Random Forests, and taylored the evaluation module to the chosen classifier. To guarantee a fair comparison, we kept the segment-based approach and most of the variables from the related work, but we extended them by additional, mostly higher-level features. Applying these superior features, removing the redundant ones, as well as using more accurately acquired 3D data allowed to keep stable or even to reduce the misclassification error in a challenging dataset.

  6. Accuracy of references and quotations in veterinary journals.

    PubMed

    Hinchcliff, K W; Bruce, N J; Powers, J D; Kipp, M L

    1993-02-01

    The accuracy of references and quotations used to substantiate statements of fact in articles published in 6 frequently cited veterinary journals was examined. Three hundred references were randomly selected, and the accuracy of each citation was examined. A subset of 100 references was examined for quotational accuracy; ie, the accuracy with which authors represented the work or assertions of the author being cited. Of the 300 references selected, 295 were located, and 125 major errors were found in 88 (29.8%) of them. Sixty-seven (53.6%) major errors were found involving authors, 12 (9.6%) involved the article title, 14 (11.2%) involved the book or journal title, and 32 (25.6%) involved the volume number, date, or page numbers. Sixty-eight minor errors were detected. The accuracy of 111 quotations from 95 citations in 65 articles was examined. Nine quotations were technical and not classified, 86 (84.3%) were classified as correct, 2 (1.9%) contained minor misquotations, and 14 (13.7%) contained major misquotations. We concluded that misquotations and errors in citations occur frequently in veterinary journals, but at a rate similar to that reported for other biomedical journals.

  7. The Differential Effects of Reward on Space- and Object-Based Attentional Allocation

    PubMed Central

    Shomstein, Sarah

    2013-01-01

    Estimating reward contingencies and allocating attentional resources to a subset of relevant information are the most important contributors to increasing adaptability of an organism. Although recent evidence suggests that reward- and attention-based guidance recruits overlapping cortical regions and has similar effects on sensory responses, the exact nature of the relationship between the two remains elusive. Here, using event-related fMRI on human participants, we contrasted the effects of reward on space- and object-based selection in the same experimental setting. Reward was either distributed randomly or biased a particular object. Behavioral and neuroimaging results show that space- and object-based attention is influenced by reward differentially. Space-based attentional allocation is mandatory, integrating reward information over time, whereas object-based attentional allocation is a default setting that is completely replaced by the reward signal. Nonadditivity of the effects of reward and object-based attention was observed consistently at multiple levels of analysis in early visual areas as well as in control regions. These results provide strong evidence that space- and object-based allocation are two independent attentional mechanisms, and suggest that reward serves to constrain attentional selection. PMID:23804086

  8. Identification of novel inhibitors of DNA methylation by screening of a chemical library.

    PubMed

    Ceccaldi, Alexandre; Rajavelu, Arumugam; Ragozin, Sergey; Sénamaud-Beaufort, Catherine; Bashtrykov, Pavel; Testa, Noé; Dali-Ali, Hana; Maulay-Bailly, Christine; Amand, Séverine; Guianvarc'h, Dominique; Jeltsch, Albert; Arimondo, Paola B

    2013-03-15

    In order to discover new inhibitors of the DNA methyltransferase 3A/3L complex, we used a medium-throughput nonradioactive screen on a random collection of 1120 small organic compounds. After a primary hit detection against DNA methylation activity of the murine Dnmt3A/3L catalytic complex, we further evaluated the EC50 of the 12 most potent hits as well as their cytotoxicity on DU145 prostate cancer cultured cells. Interestingly, most of the inhibitors showed low micromolar activities and little cytotoxicity. Dichlone, a small halogenated naphthoquinone, classically used as pesticide and fungicide, showed the lowest EC50 at 460 nM. We briefly assessed the selectivity of a subset of our new inhibitors against hDNMT1 and bacterial Dnmts, including M. SssI and EcoDam, and the protein lysine methyltransferase PKMT G9a and the mode of inhibition. Globally, the tested molecules showed a clear preference for the DNA methyltransferases, but poor selectivity among them. Two molecules including Dichlone efficiently reactivated YFP gene expression in a stable HEK293 cell line by promoter demethylation. Their efficacy was comparable to the DNMT inhibitor of reference 5-azacytidine.

  9. Cultural Participation in the Philadelphia Area.

    ERIC Educational Resources Information Center

    Hanford, Terry; And Others

    Described are the results of two separate surveys of cross-sections of the Philadelphia public concerning their cultural behavior, attitudes, and perceptions. The more recent survey was conducted in 1984 with a random telephone sample of 404 Philadelphia residents. The other survey consisted of a subset of approximately 400 Philadelphia area…

  10. Variability in prefrontal hemodynamic response during exposure to repeated self-selected music excerpts, a near-infrared spectroscopy study.

    PubMed

    Moghimi, Saba; Schudlo, Larissa; Chau, Tom; Guerguerian, Anne-Marie

    2015-01-01

    Music-induced brain activity modulations in areas involved in emotion regulation may be useful in achieving therapeutic outcomes. Clinical applications of music may involve prolonged or repeated exposures to music. However, the variability of the observed brain activity patterns in repeated exposures to music is not well understood. We hypothesized that multiple exposures to the same music would elicit more consistent activity patterns than exposure to different music. In this study, the temporal and spatial variability of cerebral prefrontal hemodynamic response was investigated across multiple exposures to self-selected musical excerpts in 10 healthy adults. The hemodynamic changes were measured using prefrontal cortex near infrared spectroscopy and represented by instantaneous phase values. Based on spatial and temporal characteristics of these observed hemodynamic changes, we defined a consistency index to represent variability across these domains. The consistency index across repeated exposures to the same piece of music was compared to the consistency index corresponding to prefrontal activity from randomly matched non-identical musical excerpts. Consistency indexes were significantly different for identical versus non-identical musical excerpts when comparing a subset of repetitions. When all four exposures were compared, no significant difference was observed between the consistency indexes of randomly matched non-identical musical excerpts and the consistency index corresponding to repetitions of the same musical excerpts. This observation suggests the existence of only partial consistency between repeated exposures to the same musical excerpt, which may stem from the role of the prefrontal cortex in regulating other cognitive and emotional processes.

  11. Recent wetland land loss due to hurricanes: improved estimates based upon multiple source images

    USGS Publications Warehouse

    Kranenburg, Christine J.; Palaseanu-Lovejoy, Monica; Barras, John A.; Brock, John C.; Wang, Ping; Rosati, Julie D.; Roberts, Tiffany M.

    2011-01-01

    The objective of this study was to provide a moderate resolution 30-m fractional water map of the Chenier Plain for 2003, 2006 and 2009 by using information contained in high-resolution satellite imagery of a subset of the study area. Indices and transforms pertaining to vegetation and water were created using the high-resolution imagery, and a threshold was applied to obtain a categorical land/water map. The high-resolution data was used to train a decision-tree classifier to estimate percent water in a lower resolution (Landsat) image. Two new water indices based on the tasseled cap transformation were proposed for IKONOS imagery in wetland environments and more than 700 input parameter combinations were considered for each Landsat image classified. Final selection and thresholding of the resulting percent water maps involved over 5,000 unambiguous classified random points using corresponding 1-m resolution aerial photographs, and a statistical optimization procedure to determine the threshold at which the maximum Kappa coefficient occurs. Each selected dataset has a Kappa coefficient, percent correctly classified (PCC) water, land and total greater than 90%. An accuracy assessment using 1,000 independent random points was performed. Using the validation points, the PCC values decreased to around 90%. The time series change analysis indicated that due to Hurricane Rita, the study area lost 6.5% of marsh area, and transient changes were less than 3% for either land or water. Hurricane Ike resulted in an additional 8% land loss, although not enough time has passed to discriminate between persistent and transient changes.

  12. Variability in Prefrontal Hemodynamic Response during Exposure to Repeated Self-Selected Music Excerpts, a Near-Infrared Spectroscopy Study

    PubMed Central

    Moghimi, Saba; Schudlo, Larissa; Chau, Tom; Guerguerian, Anne-Marie

    2015-01-01

    Music-induced brain activity modulations in areas involved in emotion regulation may be useful in achieving therapeutic outcomes. Clinical applications of music may involve prolonged or repeated exposures to music. However, the variability of the observed brain activity patterns in repeated exposures to music is not well understood. We hypothesized that multiple exposures to the same music would elicit more consistent activity patterns than exposure to different music. In this study, the temporal and spatial variability of cerebral prefrontal hemodynamic response was investigated across multiple exposures to self-selected musical excerpts in 10 healthy adults. The hemodynamic changes were measured using prefrontal cortex near infrared spectroscopy and represented by instantaneous phase values. Based on spatial and temporal characteristics of these observed hemodynamic changes, we defined a consistency index to represent variability across these domains. The consistency index across repeated exposures to the same piece of music was compared to the consistency index corresponding to prefrontal activity from randomly matched non-identical musical excerpts. Consistency indexes were significantly different for identical versus non-identical musical excerpts when comparing a subset of repetitions. When all four exposures were compared, no significant difference was observed between the consistency indexes of randomly matched non-identical musical excerpts and the consistency index corresponding to repetitions of the same musical excerpts. This observation suggests the existence of only partial consistency between repeated exposures to the same musical excerpt, which may stem from the role of the prefrontal cortex in regulating other cognitive and emotional processes. PMID:25837268

  13. Seeking mathematics success for college students: a randomized field trial of an adapted approach

    NASA Astrophysics Data System (ADS)

    Gula, Taras; Hoessler, Carolyn; Maciejewski, Wes

    2015-11-01

    Many students enter the Canadian college system with insufficient mathematical ability and leave the system with little improvement. Those students who enter with poor mathematics ability typically take a developmental mathematics course as their first and possibly only mathematics course. The educational experiences that comprise a developmental mathematics course vary widely and are, too often, ineffective at improving students' ability. This trend is concerning, since low mathematics ability is known to be related to lower rates of success in subsequent courses. To date, little attention has been paid to the selection of an instructional approach to consistently apply across developmental mathematics courses. Prior research suggests that an appropriate instructional method would involve explicit instruction and practising mathematical procedures linked to a mathematical concept. This study reports on a randomized field trial of a developmental mathematics approach at a college in Ontario, Canada. The new approach is an adaptation of the JUMP Math program, an explicit instruction method designed for primary and secondary school curriculae, to the college learning environment. In this study, a subset of courses was assigned to JUMP Math and the remainder was taught in the same style as in the previous years. We found consistent, modest improvement in the JUMP Math sections compared to the non-JUMP sections, after accounting for potential covariates. The findings from this randomized field trial, along with prior research on effective education for developmental mathematics students, suggest that JUMP Math is a promising way to improve college student outcomes.

  14. Selecting sequence variants to improve genomic predictions for dairy cattle

    USDA-ARS?s Scientific Manuscript database

    Millions of genetic variants have been identified by population-scale sequencing projects, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Methods of selecting sequence variants were compared using both simulated sequence genotypes and actual data from run ...

  15. Quality-of-life outcomes with a disodium EDTA chelation regimen for coronary disease: results from the trial to assess chelation therapy randomized trial.

    PubMed

    Mark, Daniel B; Anstrom, Kevin J; Clapp-Channing, Nancy E; Knight, J David; Boineau, Robin; Goertz, Christine; Rozema, Theodore C; Liu, Diane M; Nahin, Richard L; Rosenberg, Yves; Drisko, Jeanne; Lee, Kerry L; Lamas, Gervasio A

    2014-07-01

    The National Institutes of Health.funded Trial to Assess Chelation Therapy (TACT) randomized 1708 stablecoronary disease patients aged .50 years who were .6 months post.myocardial infarction (2003.2010) to 40 infusions ofa multicomponent EDTA chelation solution or placebo. Chelation reduced the primary composite end point of mortality,recurrent myocardial infarction, stroke, coronary revascularization, or hospitalization for angina (hazard ratio, 0.82; 95%confidence interval, 0.69.0.99; P=0.035). In a randomly selected subset of 911 patients, we prospectively collected a battery of quality-of-life(QOL) instruments at baseline and at 6, 12, and 24 months after randomization. The prespecified primary QOL measures were the Duke Activity Status Index (Table I in the Data Supplement) and the Medical Outcomes Study Short-Form 36 Mental Health Inventory-5. All comparisons were by intention to treat. Baseline clinical and QOL variables were well balanced in the 451 patients randomized to chelation and in the 460 patients randomized to placebo. The Duke Activity Status Index improved in both groups during the first 6 months of therapy, but we found no evidence for a treatment-related difference (mean difference [chelation.placebo] during follow-up, 0.9 [95% confidence interval, .0.7 to 2.6; P=0.27]).There was no statistically significant evidence of a treatment-related difference in the Mental Health Inventory-5 during follow-up (mean difference, 1.0; 95% confidence interval, .0.1 to 2.0; P=0.08). None of the secondary QOL measures showed a consistent treatment-related difference. In stable, predominantly asymptomatic coronary disease patients with a history of myocardial infarction,EDTA chelation therapy did not have a detectable effect on QOL during 2 years of follow-up. URL: http://clinicaltrials.gov. Unique identifier: NCT00044213.

  16. Minimal ensemble based on subset selection using ECG to diagnose categories of CAN.

    PubMed

    Abawajy, Jemal; Kelarev, Andrei; Yi, Xun; Jelinek, Herbert F

    2018-07-01

    Early diagnosis of cardiac autonomic neuropathy (CAN) is critical for reversing or decreasing its progression and prevent complications. Diagnostic accuracy or precision is one of the core requirements of CAN detection. As the standard Ewing battery tests suffer from a number of shortcomings, research in automating and improving the early detection of CAN has recently received serious attention in identifying additional clinical variables and designing advanced ensembles of classifiers to improve the accuracy or precision of CAN diagnostics. Although large ensembles are commonly proposed for the automated diagnosis of CAN, large ensembles are characterized by slow processing speed and computational complexity. This paper applies ECG features and proposes a new ensemble-based approach for diagnosis of CAN progression. We introduce a Minimal Ensemble Based On Subset Selection (MEBOSS) for the diagnosis of all categories of CAN including early, definite and atypical CAN. MEBOSS is based on a novel multi-tier architecture applying classifier subset selection as well as the training subset selection during several steps of its operation. Our experiments determined the diagnostic accuracy or precision obtained in 5 × 2 cross-validation for various options employed in MEBOSS and other classification systems. The experiments demonstrate the operation of the MEBOSS procedure invoking the most effective classifiers available in the open source software environment SageMath. The results of our experiments show that for the large DiabHealth database of CAN related parameters MEBOSS outperformed other classification systems available in SageMath and achieved 94% to 97% precision in 5 × 2 cross-validation correctly distinguishing any two CAN categories to a maximum of five categorizations including control, early, definite, severe and atypical CAN. These results show that MEBOSS architecture is effective and can be recommended for practical implementations in systems for the diagnosis of CAN progression. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. WE-DE-201-04: Cross Validation of Knowledge-Based Treatment Planning for Prostate LDR Brachytherapy Using Principle Component Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Roper, J; Ghavidel, B; Godette, K

    Purpose: To validate a knowledge-based algorithm for prostate LDR brachytherapy treatment planning. Methods: A dataset of 100 cases was compiled from an active prostate seed implant service. Cases were randomized into 10 subsets. For each subset, the 90 remaining library cases were registered to a common reference frame and then characterized on a point by point basis using principle component analysis (PCA). Each test case was converted to PCA vectors using the same process and compared with each library case using a Mahalanobis distance to evaluate similarity. Rank order PCA scores were used to select the best-matched library case. Themore » seed arrangement was extracted from the best-matched case and used as a starting point for planning the test case. Any subsequent modifications were recorded that required input from a treatment planner to achieve V100>95%, V150<60%, V200<20%. To simulate operating-room planning constraints, seed activity was held constant, and the seed count could not increase. Results: The computational time required to register test-case contours and evaluate PCA similarity across the library was 10s. Preliminary analysis of 2 subsets shows that 9 of 20 test cases did not require any seed modifications to obtain an acceptable plan. Five test cases required fewer than 10 seed modifications or a grid shift. Another 5 test cases required approximately 20 seed modifications. An acceptable plan was not achieved for 1 outlier, which was substantially larger than its best match. Modifications took between 5s and 6min. Conclusion: A knowledge-based treatment planning algorithm for prostate LDR brachytherapy is being cross validated using 100 prior cases. Preliminary results suggest that for this size library, acceptable plans can be achieved without planner input in about half of the cases while varying amounts of planner input are needed in remaining cases. Computational time and planning time are compatible with clinical practice.« less

  18. A search engine to access PubMed monolingual subsets: proof of concept and evaluation in French.

    PubMed

    Griffon, Nicolas; Schuers, Matthieu; Soualmia, Lina Fatima; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse; Darmoni, Stéfan Jacques

    2014-12-01

    PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing. The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French). To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy. More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French. It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese.

  19. A Search Engine to Access PubMed Monolingual Subsets: Proof of Concept and Evaluation in French

    PubMed Central

    Schuers, Matthieu; Soualmia, Lina Fatima; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse; Darmoni, Stéfan Jacques

    2014-01-01

    Background PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing. Objective The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French). Methods To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy. Results More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French. Conclusions It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese. PMID:25448528

  20. Development and validation of classifiers and variable subsets for predicting nursing home admission.

    PubMed

    Nuutinen, Mikko; Leskelä, Riikka-Leena; Suojalehto, Ella; Tirronen, Anniina; Komssi, Vesa

    2017-04-13

    In previous years a substantial number of studies have identified statistically important predictors of nursing home admission (NHA). However, as far as we know, the analyses have been done at the population-level. No prior research has analysed the prediction accuracy of a NHA model for individuals. This study is an analysis of 3056 longer-term home care customers in the city of Tampere, Finland. Data were collected from the records of social and health service usage and RAI-HC (Resident Assessment Instrument - Home Care) assessment system during January 2011 and September 2015. The aim was to find out the most efficient variable subsets to predict NHA for individuals and validate the accuracy. The variable subsets of predicting NHA were searched by sequential forward selection (SFS) method, a variable ranking metric and the classifiers of logistic regression (LR), support vector machine (SVM) and Gaussian naive Bayes (GNB). The validation of the results was guaranteed using randomly balanced data sets and cross-validation. The primary performance metrics for the classifiers were the prediction accuracy and AUC (average area under the curve). The LR and GNB classifiers achieved 78% accuracy for predicting NHA. The most important variables were RAI MAPLE (Method for Assigning Priority Levels), functional impairment (RAI IADL, Activities of Daily Living), cognitive impairment (RAI CPS, Cognitive Performance Scale), memory disorders (diagnoses G30-G32 and F00-F03) and the use of community-based health-service and prior hospital use (emergency visits and periods of care). The accuracy of the classifier for individuals was high enough to convince the officials of the city of Tampere to integrate the predictive model based on the findings of this study as a part of home care information system. Further work need to be done to evaluate variables that are modifiable and responsive to interventions.

  1. Optimization of image reconstruction method for SPECT studies performed using [⁹⁹mTc-EDDA/HYNIC] octreotate in patients with neuroendocrine tumors.

    PubMed

    Sowa-Staszczak, Anna; Lenda-Tracz, Wioletta; Tomaszuk, Monika; Głowa, Bogusław; Hubalewska-Dydejczyk, Alicja

    2013-01-01

    Somatostatin receptor scintigraphy (SRS) is a useful tool in the assessment of GEP-NET (gastroenteropancreatic neuroendocrine tumor) patients. The choice of appropriate settings of image reconstruction parameters is crucial in interpretation of these images. The aim of the study was to investigate how the GEP NET lesion signal to noise ratio (TCS/TCB) depends on different reconstruction settings for Flash 3D software (Siemens). SRS results of 76 randomly selected patients with confirmed GEP-NET were analyzed. For SPECT studies the data were acquired using standard clinical settings 3-4 h after the injection of 740 MBq 99mTc-[EDDA/HYNIC] octreotate. To obtain final images the OSEM 3D Flash reconstruction with different settings and FBP reconstruction were used. First, the TCS/TCB ratio in voxels was analyzed for different combinations of the number of subsets and the number of iterations of the OSEM 3D Flash reconstruction. Secondly, the same ratio was analyzed for different parameters of the Gaussian filter (with FWHM = 2-4 times greater from the pixel size). Also the influence of scatter correction on the TCS/TCB ratio was investigated. With increasing number of subsets and iterations, the increase of TCS/TCB ratio was observed. With increasing settings of Gauss [FWHM coefficient] filter, the decrease of TCS/TCB ratio was reported. The use of scatter correction slightly decreases the values of this ratio. OSEM algorithm provides a meaningfully better reconstruction of the SRS SPECT study as compared to the FBP technique. A high number of subsets improves image quality (images are smoother). Increasing number of iterations gives a better contrast and the shapes of lesions and organs are sharper. The choice of reconstruction parameters is a compromise between image qualitative appearance and its quantitative accuracy and should not be modified when comparing multiple studies of the same patient.

  2. Limited plasticity in the phenotypic variance-covariance matrix for male advertisement calls in the black field cricket, Teleogryllus commodus

    PubMed Central

    Pitchers, W. R.; Brooks, R.; Jennions, M. D.; Tregenza, T.; Dworkin, I.; Hunt, J.

    2013-01-01

    Phenotypic integration and plasticity are central to our understanding of how complex phenotypic traits evolve. Evolutionary change in complex quantitative traits can be predicted using the multivariate breeders’ equation, but such predictions are only accurate if the matrices involved are stable over evolutionary time. Recent work, however, suggests that these matrices are temporally plastic, spatially variable and themselves evolvable. The data available on phenotypic variance-covariance matrix (P) stability is sparse, and largely focused on morphological traits. Here we compared P for the structure of the complex sexual advertisement call of six divergent allopatric populations of the Australian black field cricket, Teleogryllus commodus. We measured a subset of calls from wild-caught crickets from each of the populations and then a second subset after rearing crickets under common-garden conditions for three generations. In a second experiment, crickets from each population were reared in the laboratory on high- and low-nutrient diets and their calls recorded. In both experiments, we estimated P for call traits and used multiple methods to compare them statistically (Flury hierarchy, geometric subspace comparisons and random skewers). Despite considerable variation in means and variances of individual call traits, the structure of P was largely conserved among populations, across generations and between our rearing diets. Our finding that P remains largely stable, among populations and between environmental conditions, suggests that selection has preserved the structure of call traits in order that they can function as an integrated unit. PMID:23530814

  3. Universality Classes of Interaction Structures for NK Fitness Landscapes

    NASA Astrophysics Data System (ADS)

    Hwang, Sungmin; Schmiegelt, Benjamin; Ferretti, Luca; Krug, Joachim

    2018-07-01

    Kauffman's NK-model is a paradigmatic example of a class of stochastic models of genotypic fitness landscapes that aim to capture generic features of epistatic interactions in multilocus systems. Genotypes are represented as sequences of L binary loci. The fitness assigned to a genotype is a sum of contributions, each of which is a random function defined on a subset of k ≤ L loci. These subsets or neighborhoods determine the genetic interactions of the model. Whereas earlier work on the NK model suggested that most of its properties are robust with regard to the choice of neighborhoods, recent work has revealed an important and sometimes counter-intuitive influence of the interaction structure on the properties of NK fitness landscapes. Here we review these developments and present new results concerning the number of local fitness maxima and the statistics of selectively accessible (that is, fitness-monotonic) mutational pathways. In particular, we develop a unified framework for computing the exponential growth rate of the expected number of local fitness maxima as a function of L, and identify two different universality classes of interaction structures that display different asymptotics of this quantity for large k. Moreover, we show that the probability that the fitness landscape can be traversed along an accessible path decreases exponentially in L for a large class of interaction structures that we characterize as locally bounded. Finally, we discuss the impact of the NK interaction structures on the dynamics of evolution using adaptive walk models.

  4. Universality Classes of Interaction Structures for NK Fitness Landscapes

    NASA Astrophysics Data System (ADS)

    Hwang, Sungmin; Schmiegelt, Benjamin; Ferretti, Luca; Krug, Joachim

    2018-02-01

    Kauffman's NK-model is a paradigmatic example of a class of stochastic models of genotypic fitness landscapes that aim to capture generic features of epistatic interactions in multilocus systems. Genotypes are represented as sequences of L binary loci. The fitness assigned to a genotype is a sum of contributions, each of which is a random function defined on a subset of k ≤ L loci. These subsets or neighborhoods determine the genetic interactions of the model. Whereas earlier work on the NK model suggested that most of its properties are robust with regard to the choice of neighborhoods, recent work has revealed an important and sometimes counter-intuitive influence of the interaction structure on the properties of NK fitness landscapes. Here we review these developments and present new results concerning the number of local fitness maxima and the statistics of selectively accessible (that is, fitness-monotonic) mutational pathways. In particular, we develop a unified framework for computing the exponential growth rate of the expected number of local fitness maxima as a function of L, and identify two different universality classes of interaction structures that display different asymptotics of this quantity for large k. Moreover, we show that the probability that the fitness landscape can be traversed along an accessible path decreases exponentially in L for a large class of interaction structures that we characterize as locally bounded. Finally, we discuss the impact of the NK interaction structures on the dynamics of evolution using adaptive walk models.

  5. Information Entropy of Influenza A Segment 7

    NASA Astrophysics Data System (ADS)

    Thompson, William A.; Fan, Shaohua; Weltman, Joel K.

    2008-12-01

    Information entropy (H) is a measure of uncertainty at each position within in a sequence of nucleotides.H was used to characterize a set of influenza A segment 7 nucleotide sequences. Nucleotide locations of high entropy were identified near the 5’ start of all of the sequences and the sequences were assigned to subsets according to synonymous nucleotide variants at those positions: either uracil at position six (U6), cytosine at position six (C6), adenine (A12) at position 12, guanine at position 12 (G12), adenine at position 15 (A15) or cytosine (C15) at position 15. H values were found to be correlated/corresponding (Kendall tau) along the lengths of the nucleotide segments of the subset pairs at each position. However, the H values of each subset of sequences were statistically distinguishable from those of the other member of the pair (Kolmogorov-Smirnov test). The joint probability of uncorrelated distributions of U6 and C6 sequences to viral subtypes and to viral host species was 34 times greater than for the A12:G12 subset pair and 214 times greater than for the A15:C15 pair. This result indicates that the high entropy position six of segment 7 is either a reporter or a sentinel location. The fact that not one of the H5N1 sequences in the dataset was a member of the C6 subset, but all 125 H5N1 sequences are members of the U6 subset suggests a non-random sentinel function.

  6. Selection of reliable reference genes for quantitative real-time PCR gene expression analysis in Jute (Corchorus capsularis) under stress treatments

    PubMed Central

    Niu, Xiaoping; Qi, Jianmin; Zhang, Gaoyang; Xu, Jiantang; Tao, Aifen; Fang, Pingping; Su, Jianguang

    2015-01-01

    To accurately measure gene expression using quantitative reverse transcription PCR (qRT-PCR), reliable reference gene(s) are required for data normalization. Corchorus capsularis, an annual herbaceous fiber crop with predominant biodegradability and renewability, has not been investigated for the stability of reference genes with qRT-PCR. In this study, 11 candidate reference genes were selected and their expression levels were assessed using qRT-PCR. To account for the influence of experimental approach and tissue type, 22 different jute samples were selected from abiotic and biotic stress conditions as well as three different tissue types. The stability of the candidate reference genes was evaluated using geNorm, NormFinder, and BestKeeper programs, and the comprehensive rankings of gene stability were generated by aggregate analysis. For the biotic stress and NaCl stress subsets, ACT7 and RAN were suitable as stable reference genes for gene expression normalization. For the PEG stress subset, UBC, and DnaJ were sufficient for accurate normalization. For the tissues subset, four reference genes TUBβ, UBI, EF1α, and RAN were sufficient for accurate normalization. The selected genes were further validated by comparing expression profiles of WRKY15 in various samples, and two stable reference genes were recommended for accurate normalization of qRT-PCR data. Our results provide researchers with appropriate reference genes for qRT-PCR in C. capsularis, and will facilitate gene expression study under these conditions. PMID:26528312

  7. Mining nutrigenetics patterns related to obesity: use of parallel multifactor dimensionality reduction.

    PubMed

    Karayianni, Katerina N; Grimaldi, Keith A; Nikita, Konstantina S; Valavanis, Ioannis K

    2015-01-01

    This paper aims to enlighten the complex etiology beneath obesity by analysing data from a large nutrigenetics study, in which nutritional and genetic factors associated with obesity were recorded for around two thousand individuals. In our previous work, these data have been analysed using artificial neural network methods, which identified optimised subsets of factors to predict one's obesity status. These methods did not reveal though how the selected factors interact with each other in the obtained predictive models. For that reason, parallel Multifactor Dimensionality Reduction (pMDR) was used here to further analyse the pre-selected subsets of nutrigenetic factors. Within pMDR, predictive models using up to eight factors were constructed, further reducing the input dimensionality, while rules describing the interactive effects of the selected factors were derived. In this way, it was possible to identify specific genetic variations and their interactive effects with particular nutritional factors, which are now under further study.

  8. Optimisation algorithms for ECG data compression.

    PubMed

    Haugland, D; Heber, J G; Husøy, J H

    1997-07-01

    The use of exact optimisation algorithms for compressing digital electrocardiograms (ECGs) is demonstrated. As opposed to traditional time-domain methods, which use heuristics to select a small subset of representative signal samples, the problem of selecting the subset is formulated in rigorous mathematical terms. This approach makes it possible to derive algorithms guaranteeing the smallest possible reconstruction error when a bounded selection of signal samples is interpolated. The proposed model resembles well-known network models and is solved by a cubic dynamic programming algorithm. When applied to standard test problems, the algorithm produces a compressed representation for which the distortion is about one-half of that obtained by traditional time-domain compression techniques at reasonable compression ratios. This illustrates that, in terms of the accuracy of decoded signals, existing time-domain heuristics for ECG compression may be far from what is theoretically achievable. The paper is an attempt to bridge this gap.

  9. Mediators of Effects of a Selective Family-Focused Violence Prevention Approach for Middle School Students

    PubMed Central

    2013-01-01

    This study examined how parenting and family characteristics targeted in a selective prevention program mediated effects on key youth proximal outcomes related to violence perpetration. The selective intervention was evaluated within the context of a multi-site trial involving random assignment of 37 schools to four conditions: a universal intervention composed of a student social-cognitive curriculum and teacher training, a selective family-focused intervention with a subset of high-risk students, a condition combining these two interventions, and a no-intervention control condition. Two cohorts of sixth-grade students (total N=1,062) exhibiting high levels of aggression and social influence were the sample for this study. Analyses of pre-post change compared to controls using intent-to-treat analyses found no significant effects. However, estimates incorporating participation of those assigned to the intervention and predicted participation among those not assigned revealed significant positive effects on student aggression, use of aggressive strategies for conflict management, and parental estimation of student’s valuing of achievement. Findings also indicated intervention effects on two targeted family processes: discipline practices and family cohesion. Mediation analyses found evidence that change in these processes mediated effects on some outcomes, notably aggressive behavior and valuing of school achievement. Results support the notion that changing parenting practices and the quality of family relationships can prevent the escalation in aggression and maintain positive school engagement for high-risk youth. PMID:21932067

  10. Heteroresistance at the single-cell level: adapting to antibiotic stress through a population-based strategy and growth-controlled interphenotypic coordination.

    PubMed

    Wang, Xiaorong; Kang, Yu; Luo, Chunxiong; Zhao, Tong; Liu, Lin; Jiang, Xiangdan; Fu, Rongrong; An, Shuchang; Chen, Jichao; Jiang, Ning; Ren, Lufeng; Wang, Qi; Baillie, J Kenneth; Gao, Zhancheng; Yu, Jun

    2014-02-11

    Heteroresistance refers to phenotypic heterogeneity of microbial clonal populations under antibiotic stress, and it has been thought to be an allocation of a subset of "resistant" cells for surviving in higher concentrations of antibiotic. The assumption fits the so-called bet-hedging strategy, where a bacterial population "hedges" its "bet" on different phenotypes to be selected by unpredicted environment stresses. To test this hypothesis, we constructed a heteroresistance model by introducing a blaCTX-M-14 gene (coding for a cephalosporin hydrolase) into a sensitive Escherichia coli strain. We confirmed heteroresistance in this clone and that a subset of the cells expressed more hydrolase and formed more colonies in the presence of ceftriaxone (exhibited stronger "resistance"). However, subsequent single-cell-level investigation by using a microfluidic device showed that a subset of cells with a distinguishable phenotype of slowed growth and intensified hydrolase expression emerged, and they were not positively selected but increased their proportion in the population with ascending antibiotic concentrations. Therefore, heteroresistance--the gradually decreased colony-forming capability in the presence of antibiotic--was a result of a decreased growth rate rather than of selection for resistant cells. Using a mock strain without the resistance gene, we further demonstrated the existence of two nested growth-centric feedback loops that control the expression of the hydrolase and maximize population growth in various antibiotic concentrations. In conclusion, phenotypic heterogeneity is a population-based strategy beneficial for bacterial survival and propagation through task allocation and interphenotypic collaboration, and the growth rate provides a critical control for the expression of stress-related genes and an essential mechanism in responding to environmental stresses. Heteroresistance is essentially phenotypic heterogeneity, where a population-based strategy is thought to be at work, being assumed to be variable cell-to-cell resistance to be selected under antibiotic stress. Exact mechanisms of heteroresistance and its roles in adaptation to antibiotic stress have yet to be fully understood at the molecular and single-cell levels. In our study, we have not been able to detect any apparent subset of "resistant" cells selected by antibiotics; on the contrary, cell populations differentiate into phenotypic subsets with variable growth statuses and hydrolase expression. The growth rate appears to be sensitive to stress intensity and plays a key role in controlling hydrolase expression at both the bulk population and single-cell levels. We have shown here, for the first time, that phenotypic heterogeneity can be beneficial to a growing bacterial population through task allocation and interphenotypic collaboration other than partitioning cells into different categories of selective advantage.

  11. Assessment of fracture risk: value of random population-based samples--the Geelong Osteoporosis Study.

    PubMed

    Henry, M J; Pasco, J A; Seeman, E; Nicholson, G C; Sanders, K M; Kotowicz, M A

    2001-01-01

    Fracture risk is determined by bone mineral density (BMD). The T-score, a measure of fracture risk, is the position of an individual's BMD in relation to a reference range. The aim of this study was to determine the magnitude of change in the T-score when different sampling techniques were used to produce the reference range. Reference ranges were derived from three samples, drawn from the same region: (1) an age-stratified population-based random sample, (2) unselected volunteers, and (3) a selected healthy subset of the population-based sample with no diseases or drugs known to affect bone. T-scores were calculated using the three reference ranges for a cohort of women who had sustained a fracture and as a group had a low mean BMD (ages 35-72 yr; n = 484). For most comparisons, the T-scores for the fracture cohort were more negative using the population reference range. The difference in T-scores reached 1.0 SD. The proportion of the fracture cohort classified as having osteoporosis at the spine was 26, 14, and 23% when the population, volunteer, and healthy reference ranges were applied, respectively. The use of inappropriate reference ranges results in substantial changes to T-scores and may lead to inappropriate management.

  12. Automatic Target Recognition: Statistical Feature Selection of Non-Gaussian Distributed Target Classes

    DTIC Science & Technology

    2011-06-01

    implementing, and evaluating many feature selection algorithms. Mucciardi and Gose compared seven different techniques for choosing subsets of pattern...122 THIS PAGE INTENTIONALLY LEFT BLANK 123 LIST OF REFERENCES [1] A. Mucciardi and E. Gose , “A comparison of seven techniques for

  13. Characterization of the CD4+ and CD8+ tumor infiltrating lymphocytes propagated with bispecific monoclonal antibodies.

    PubMed

    Wong, J T; Pinto, C E; Gifford, J D; Kurnick, J T; Kradin, R L

    1989-11-15

    To study the CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) in the antitumor response, we propagated these subsets directly from tumor tissues with anti-CD3:anti-CD8 (CD3,8) and anti-CD3:anti-CD4 (CD3,4) bispecific mAb (BSMAB). CD3,8 BSMAB cause selective cytolysis of CD8+ lymphocytes by bridging the CD8 molecules of target lymphocytes to the CD3 molecular complex of cytolytic T lymphocytes with concurrent activation and proliferation of residual CD3+CD4+ T lymphocytes. Similarly, CD3,4 BSMAB cause selective lysis of CD4+ lymphocytes whereas concurrently activating the residual CD3+CD8+ T cells. Small tumor fragments from four malignant melanoma and three renal cell carcinoma patients were cultured in medium containing CD3,8 + IL-2, CD3,4 + IL-2, or IL-2 alone. CD3,8 led to selective propagation of the CD4+ TIL whereas CD3,4 led to selective propagation of the CD8+ TIL from each of the tumors. The phenotypes of the TIL subset cultures were generally stable when assayed over a 1 to 3 months period and after further expansion with anti-CD3 mAb or lectins. Specific 51Cr release of labeled target cells that were bridged to the CD3 molecular complexes of TIL suggested that both CD4+ and CD8+ TIL cultures have the capacity of mediating cytolysis via their Ti/CD3 TCR complexes. In addition, both CD4+ and CD8+ TIL cultures from most patients caused substantial (greater than 20%) lysis of the NK-sensitive K562 cell line. The majority of CD4+ but not CD8+ TIL cultures also produced substantial lysis of the NK-resistant Daudi cell line. Lysis of the autologous tumor by the TIL subsets was assessed in two patients with malignant melanoma. The CD8+ TIL from one tumor demonstrated cytotoxic activity against the autologous tumor but negligible lysis of allogeneic melanoma targets. In conclusion, immunocompetent CD4+ and CD8+ TIL subsets can be isolated and expanded directly from small tumor fragments of malignant melanoma and renal cell carcinoma using BSMAB. The resultant TIL subsets can be further expanded for detailed studies or for adoptive immunotherapy.

  14. A regulator's view of comparative effectiveness research.

    PubMed

    Temple, Robert

    2012-02-01

    'Comparative effectiveness' is the current enthusiasm, and for good reason. After knowing a treatment works, the most critical question is how it compares with alternatives. Comparative studies are not commonly conducted by drug companies and they represent a significant methodological challenge. Comparative data could include evidence of overall superiority to an alternative or advantages in identifiable subsets, for example, people who do not respond to or tolerate alternatives, or members of a genetic subset and could also include convincing evidence that there is little difference between two treatments. To describe regulations, guidance, and Food and Drug Administration experience related to studies of comparative effectiveness, including approaches to showing superiority and problems encountered in showing similarity. Review of Food and Drug Administration regulations and guidance and experience with showing superiority and similarity, particularly related to randomized trials and epidemiologic studies. Methods exist, and they have been successful for showing overall superiority of one drug over another, advantages in specific population subsets. Efforts to show true equivalence face problems of definition and very large sample sizes needed to rule out small differences. There is need for further discussion of what is meant by similarity or equivalence of two treatments. Comparative studies are challenging because differences between effective therapies are likely to be small and can be detected reliably only in randomized trials, often large ones. Despite the difficulties, comparative trials have been successful and we clearly would like to see more of them.

  15. Spatial and Functional Selectivity of Peripheral Nerve Signal Recording With the Transversal Intrafascicular Multichannel Electrode (TIME).

    PubMed

    Badia, Jordi; Raspopovic, Stanisa; Carpaneto, Jacopo; Micera, Silvestro; Navarro, Xavier

    2016-01-01

    The selection of suitable peripheral nerve electrodes for biomedical applications implies a trade-off between invasiveness and selectivity. The optimal design should provide the highest selectivity for targeting a large number of nerve fascicles with the least invasiveness and potential damage to the nerve. The transverse intrafascicular multichannel electrode (TIME), transversally inserted in the peripheral nerve, has been shown to be useful for the selective activation of subsets of axons, both at inter- and intra-fascicular levels, in the small sciatic nerve of the rat. In this study we assessed the capabilities of TIME for the selective recording of neural activity, considering the topographical selectivity and the distinction of neural signals corresponding to different sensory types. Topographical recording selectivity was proved by the differential recording of CNAPs from different subsets of nerve fibers, such as those innervating toes 2 and 4 of the hindpaw of the rat. Neural signals elicited by sensory stimuli applied to the rat paw were successfully recorded. Signal processing allowed distinguishing three different types of sensory stimuli such as tactile, proprioceptive and nociceptive ones with high performance. These findings further support the suitability of TIMEs for neuroprosthetic applications, by exploiting the transversal topographical structure of the peripheral nerves.

  16. Ensemble of random forests One vs. Rest classifiers for MCI and AD prediction using ANOVA cortical and subcortical feature selection and partial least squares.

    PubMed

    Ramírez, J; Górriz, J M; Ortiz, A; Martínez-Murcia, F J; Segovia, F; Salas-Gonzalez, D; Castillo-Barnes, D; Illán, I A; Puntonet, C G

    2018-05-15

    Alzheimer's disease (AD) is the most common cause of dementia in the elderly and affects approximately 30 million individuals worldwide. Mild cognitive impairment (MCI) is very frequently a prodromal phase of AD, and existing studies have suggested that people with MCI tend to progress to AD at a rate of about 10-15% per year. However, the ability of clinicians and machine learning systems to predict AD based on MRI biomarkers at an early stage is still a challenging problem that can have a great impact in improving treatments. The proposed system, developed by the SiPBA-UGR team for this challenge, is based on feature standardization, ANOVA feature selection, partial least squares feature dimension reduction and an ensemble of One vs. Rest random forest classifiers. With the aim of improving its performance when discriminating healthy controls (HC) from MCI, a second binary classification level was introduced that reconsiders the HC and MCI predictions of the first level. The system was trained and evaluated on an ADNI datasets that consist of T1-weighted MRI morphological measurements from HC, stable MCI, converter MCI and AD subjects. The proposed system yields a 56.25% classification score on the test subset which consists of 160 real subjects. The classifier yielded the best performance when compared to: (i) One vs. One (OvO), One vs. Rest (OvR) and error correcting output codes (ECOC) as strategies for reducing the multiclass classification task to multiple binary classification problems, (ii) support vector machines, gradient boosting classifier and random forest as base binary classifiers, and (iii) bagging ensemble learning. A robust method has been proposed for the international challenge on MCI prediction based on MRI data. The system yielded the second best performance during the competition with an accuracy rate of 56.25% when evaluated on the real subjects of the test set. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Better physical activity classification using smartphone acceleration sensor.

    PubMed

    Arif, Muhammad; Bilal, Mohsin; Kattan, Ahmed; Ahamed, S Iqbal

    2014-09-01

    Obesity is becoming one of the serious problems for the health of worldwide population. Social interactions on mobile phones and computers via internet through social e-networks are one of the major causes of lack of physical activities. For the health specialist, it is important to track the record of physical activities of the obese or overweight patients to supervise weight loss control. In this study, acceleration sensor present in the smartphone is used to monitor the physical activity of the user. Physical activities including Walking, Jogging, Sitting, Standing, Walking upstairs and Walking downstairs are classified. Time domain features are extracted from the acceleration data recorded by smartphone during different physical activities. Time and space complexity of the whole framework is done by optimal feature subset selection and pruning of instances. Classification results of six physical activities are reported in this paper. Using simple time domain features, 99 % classification accuracy is achieved. Furthermore, attributes subset selection is used to remove the redundant features and to minimize the time complexity of the algorithm. A subset of 30 features produced more than 98 % classification accuracy for the six physical activities.

  18. Occurrence and distribution of methyl tert-butyl ether and other volatile organic compounds in drinking water in the Northeast and Mid-Atlantic regions of the United States, 1993-98

    USGS Publications Warehouse

    Grady, S.J.; Casey, G.D.

    2001-01-01

    Data on volatile organic compounds (VOCs) in drinking water supplied by 2,110 randomly selected community water systems (CWSs) in 12 Northeast and Mid-Atlantic States indicate 64 VOC analytes were detected at least once during 1993-98. Selection of the 2,110 CWSs inventoried for this study targeted 20 percent of the 10,479 active CWSs in the region and represented a random subset of the total distribution by State, source of water, and size of system. The data include 21,635 analyses of drinking water collected for compliance monitoring under the Safe Drinking Water Act; the data mostly represent finished drinking water collected at the pointof- entry to, or at more distal locations within, each CWS?s distribution system following any watertreatment processes. VOC detections were more common in drinking water supplied by large systems (serving more than 3,300 people) that tap surface-water sources or both surface- and groundwater sources than in small systems supplied exclusively by ground-water sources. Trihalomethane (THM) compounds, which are potentially formed during the process of disinfecting drinking water with chlorine, were detected in 45 percent of the randomly selected CWSs. Chloroform was the most frequently detected THM, reported in 39 percent of the CWSs. The gasoline additive methyl tert-butyl ether (MTBE) was the most frequently detected VOC in drinking water after the THMs. MTBE was detected in 8.9 percent of the 1,194 randomly selected CWSs that analyzed samples for MTBE at any reporting level, and it was detected in 7.8 percent of the 1,074 CWSs that provided MTBE data at the 1.0-?g/L (microgram per liter) reporting level. As with other VOCs reported in drinking water, most MTBE concentrations were less than 5.0 ?g/L, and less than 1 percent of CWSs reported MTBE concentrations at or above the 20.0-?g/L lower limit recommended by the U.S. Environmental Protection Agency?s Drinking-Water Advisory. The frequency of MTBE detections in drinking water is significantly related to high- MTBE-use patterns. Detections are five times more likely in areas where MTBE is or has been used in gasoline at greater than 5 percent by volume as part of the oxygenated or reformulated (OXY/RFG) fuels program. Detection frequencies of the individual gasoline compounds (benzene, toluene, ethylbenzene, and xylenes (BTEX)) were mostly less than 3 percent of the randomly selected CWSs, but collectively, BTEX compounds were detected in 8.4 percent of CWSs. BTEX concentrations also were low and just three drinkingwater samples contained BTEX at concentrations exceeding 20 ?g/L. Co-occurrence of MTBE and BTEX was rare, and only 0.8 percent of CWSs reported simultaneous detections of MTBE and BTEX compounds. Low concentrations and cooccurrence of MTBE and BTEX indicate most gasoline contaminants in drinking water probably represent nonpoint sources. Solvents were frequently detected in drinking water in the 12-State area. One or more of 27 individual solvent VOCs were detected at any reporting level in 3,080 drinking-water samples from 304 randomly selected CWSs (14 percent) and in 206 CWSs (9.8 percent) at concentrations at or above 1.0 ?g/L. High co-occurrence among solvents probably reflects common sources and the presence of transformation by-products. Other VOCs were relatively rarely detected in drinking water in the 12-State area. Six percent (127) of the 2,110 randomly selected CWSs reported concentrations of 16 VOCs at or above drinking-water criteria. The 127 CWSs collectively serve 2.6 million people. The occurrence of VOCs in drinking water was significantly associated (p<0.0001) with high population- density urban areas. New Jersey, Massachusetts, and Rhode Island, States with substantial urbanization and high population density, had the highest frequency of VOC detections among the 12 States. More than two-thirds of the randomly selected CWSs in New Jersey reported detecting VOC concentrations in drinking water at or above 1

  19. On Some Multiple Decision Problems

    DTIC Science & Technology

    1976-08-01

    parameter space. Some recent results in the area of subset selection formulation are Gnanadesikan and Gupta [28], Gupta and Studden [43], Gupta and...York, pp. 363-376. [27) Gnanadesikan , M. (1966). Some Selection and Ranking Procedures for Multivariate Normal Populations. Ph.D. Thesis. Dept. of...Statist., Purdue Univ., West Lafayette, Indiana 47907. [28) Gnanadesikan , M. and Gupta, S. S. (1970). Selection procedures for multivariate normal

  20. A computer program for fast and easy typing of partial endoglucanase gene sequence into phylotypes and sequevars 1&2 (select agents) of Ralstonia solanacearum

    USDA-ARS?s Scientific Manuscript database

    The phytopathogen Ralstonia solanacearum is a species complex that contains a subset of strains that are quarantined or select agent pathogens. An unidentified R. solanacearum strain is considered a select agent in the US until proven otherwise, which can be done by phylogenetic analysis of a partia...

  1. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data.

    PubMed

    Wang, Shuaiqun; Aorigele; Kong, Wei; Zeng, Weiming; Hong, Xiaomin

    2016-01-01

    Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes.

  2. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data

    PubMed Central

    Aorigele; Zeng, Weiming; Hong, Xiaomin

    2016-01-01

    Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes. PMID:27579323

  3. Choosing "Something Else" as a Sexual Identity: Evaluating Response Options on the National Health Interview Survey.

    PubMed

    Eliason, Michele J; Streed, Carl G

    2017-10-01

    Researchers struggle to find effective ways to measure sexual and gender identities to determine whether there are health differences among subsets of the LGBTQ+ population. This study examines responses on the National Health Interview Survey (NHIS) sexual identity questions among 277 LGBTQ+ healthcare providers. Eighteen percent indicated that their sexual identity was "something else" on the first question, and 57% of those also selected "something else" on the second question. Half of the genderqueer/gender variant participants and 100% of transgender-identified participants selected "something else" as their sexual identity. The NHIS question does not allow all respondents in LGBTQ+ populations to be categorized, thus we are potentially missing vital health disparity information about subsets of the LGBTQ+ population.

  4. Hippocampus shape analysis for temporal lobe epilepsy detection in magnetic resonance imaging

    NASA Astrophysics Data System (ADS)

    Kohan, Zohreh; Azmi, Reza

    2016-03-01

    There are evidences in the literature that Temporal Lobe Epilepsy (TLE) causes some lateralized atrophy and deformation on hippocampus and other substructures of the brain. Magnetic Resonance Imaging (MRI), due to high-contrast soft tissue imaging, is one of the most popular imaging modalities being used in TLE diagnosis and treatment procedures. Using an algorithm to help clinicians for better and more effective shape deformations analysis could improve the diagnosis and treatment of the disease. In this project our purpose is to design, implement and test a classification algorithm for MRIs based on hippocampal asymmetry detection using shape and size-based features. Our method consisted of two main parts; (1) shape feature extraction, and (2) image classification. We tested 11 different shape and size features and selected four of them that detect the asymmetry in hippocampus significantly in a randomly selected subset of the dataset. Then, we employed a support vector machine (SVM) classifier to classify the remaining images of the dataset to normal and epileptic images using our selected features. The dataset contains 25 patient images in which 12 cases were used as a training set and the rest 13 cases for testing the performance of classifier. We measured accuracy, specificity and sensitivity of, respectively, 76%, 100%, and 70% for our algorithm. The preliminary results show that using shape and size features for detecting hippocampal asymmetry could be helpful in TLE diagnosis in MRI.

  5. A data science approach to candidate gene selection of pain regarded as a process of learning and neural plasticity.

    PubMed

    Ultsch, Alfred; Kringel, Dario; Kalso, Eija; Mogil, Jeffrey S; Lötsch, Jörn

    2016-12-01

    The increasing availability of "big data" enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 535 genes identified empirically as relevant to pain with the knowledge about the functions of thousands of genes. Starting from an accepted description of chronic pain as displaying systemic features described by the terms "learning" and "neuronal plasticity," a functional genomics analysis proposed that among the functions of the 535 "pain genes," the biological processes "learning or memory" (P = 8.6 × 10) and "nervous system development" (P = 2.4 × 10) are statistically significantly overrepresented as compared with the annotations to these processes expected by chance. After establishing that the hypothesized biological processes were among important functional genomics features of pain, a subset of n = 34 pain genes were found to be annotated with both Gene Ontology terms. Published empirical evidence supporting their involvement in chronic pain was identified for almost all these genes, including 1 gene identified in March 2016 as being involved in pain. By contrast, such evidence was virtually absent in a randomly selected set of 34 other human genes. Hence, the present computational functional genomics-based method can be used for candidate gene selection, providing an alternative to established methods.

  6. Two-dimensional materials from high-throughput computational exfoliation of experimentally known compounds.

    PubMed

    Mounet, Nicolas; Gibertini, Marco; Schwaller, Philippe; Campi, Davide; Merkys, Andrius; Marrazzo, Antimo; Sohier, Thibault; Castelli, Ivano Eligio; Cepellotti, Andrea; Pizzi, Giovanni; Marzari, Nicola

    2018-03-01

    Two-dimensional (2D) materials have emerged as promising candidates for next-generation electronic and optoelectronic applications. Yet, only a few dozen 2D materials have been successfully synthesized or exfoliated. Here, we search for 2D materials that can be easily exfoliated from their parent compounds. Starting from 108,423 unique, experimentally known 3D compounds, we identify a subset of 5,619 compounds that appear layered according to robust geometric and bonding criteria. High-throughput calculations using van der Waals density functional theory, validated against experimental structural data and calculated random phase approximation binding energies, further allowed the identification of 1,825 compounds that are either easily or potentially exfoliable. In particular, the subset of 1,036 easily exfoliable cases provides novel structural prototypes and simple ternary compounds as well as a large portfolio of materials to search from for optimal properties. For a subset of 258 compounds, we explore vibrational, electronic, magnetic and topological properties, identifying 56 ferromagnetic and antiferromagnetic systems, including half-metals and half-semiconductors.

  7. Two-dimensional materials from high-throughput computational exfoliation of experimentally known compounds

    NASA Astrophysics Data System (ADS)

    Mounet, Nicolas; Gibertini, Marco; Schwaller, Philippe; Campi, Davide; Merkys, Andrius; Marrazzo, Antimo; Sohier, Thibault; Castelli, Ivano Eligio; Cepellotti, Andrea; Pizzi, Giovanni; Marzari, Nicola

    2018-02-01

    Two-dimensional (2D) materials have emerged as promising candidates for next-generation electronic and optoelectronic applications. Yet, only a few dozen 2D materials have been successfully synthesized or exfoliated. Here, we search for 2D materials that can be easily exfoliated from their parent compounds. Starting from 108,423 unique, experimentally known 3D compounds, we identify a subset of 5,619 compounds that appear layered according to robust geometric and bonding criteria. High-throughput calculations using van der Waals density functional theory, validated against experimental structural data and calculated random phase approximation binding energies, further allowed the identification of 1,825 compounds that are either easily or potentially exfoliable. In particular, the subset of 1,036 easily exfoliable cases provides novel structural prototypes and simple ternary compounds as well as a large portfolio of materials to search from for optimal properties. For a subset of 258 compounds, we explore vibrational, electronic, magnetic and topological properties, identifying 56 ferromagnetic and antiferromagnetic systems, including half-metals and half-semiconductors.

  8. Planned Missing Data Designs for Spline Growth Models in Salivary Cortisol Research

    ERIC Educational Resources Information Center

    Hogue, Candace M.; Pornprasertmanit, Sunthud; Fry, Mary D.; Rhemtulla, Mijke; Little, Todd D.

    2013-01-01

    Salivary cortisol is often used as an index of physiological and psychological stress in exercise science and psychoneuroendocrine research. A primary concern when designing research studies examining cortisol stems from the high cost of analysis. Planned missing data designs involve intentionally omitting a random subset of observations from data…

  9. Comparison of RAPD Linkage Maps Constructed For a Single Longleaf Pine From Both Haploid and Diploid Mapping Populations

    Treesearch

    Thomas L. Kubisiak; C.Dana Nelson; W.L. Name; M. Stine

    1996-01-01

    Considerable concern has been voiced regarding the reproducibility/transferability of RAPD markers across different genetic backgrounds in genetic mapping experiments. Therefore, separate gametic subsets (mapping populations) were used to construct individual random amplified polymorphic DNA (RAPD) linkage maps for a single longleaf pine (Pinus palustris...

  10. Distinguishing Different Strategies of Across-Dimension Attentional Selection

    ERIC Educational Resources Information Center

    Huang, Liqiang; Pashler, Harold

    2012-01-01

    Selective attention in multidimensional displays has usually been examined using search tasks requiring the detection of a single target. We examined the ability to perceive a spatial structure in multi-item subsets of a display that were defined either conjunctively or disjunctively. Observers saw two adjacent displays and indicated whether the…

  11. Testing Different Model Building Procedures Using Multiple Regression.

    ERIC Educational Resources Information Center

    Thayer, Jerome D.

    The stepwise regression method of selecting predictors for computer assisted multiple regression analysis was compared with forward, backward, and best subsets regression, using 16 data sets. The results indicated the stepwise method was preferred because of its practical nature, when the models chosen by different selection methods were similar…

  12. Feasibility of the "Bring Your Own Device" Model in Clinical Research: Results from a Randomized Controlled Pilot Study of a Mobile Patient Engagement Tool.

    PubMed

    Pugliese, Laura; Woodriff, Molly; Crowley, Olga; Lam, Vivian; Sohn, Jeremy; Bradley, Scott

    2016-03-16

    Rising rates of smartphone ownership highlight opportunities for improved mobile application usage in clinical trials. While current methods call for device provisioning, the "bring your own device" (BYOD) model permits participants to use personal phones allowing for improved patient engagement and lowered operational costs. However, more evidence is needed to demonstrate the BYOD model's feasibility in research settings. To assess if CentrosHealth, a mobile application designed to support trial compliance, produces different outcomes in medication adherence and application engagement when distributed through study-provisioned devices compared to the BYOD model. 87 participants were randomly selected to use the mobile application or no intervention for a 28-day pilot study at a 2:1 randomization ratio (2 intervention: 1 control) and asked to consume a twice-daily probiotic supplement. The application users were further randomized into two groups: receiving the application on a personal "BYOD" or study-provided smartphone. In-depth interviews were performed in a randomly-selected subset of the intervention group (five BYOD and five study-provided smartphone users). The BYOD subgroup showed significantly greater engagement than study-provided phone users, as shown by higher application use frequency and duration over the study period. The BYOD subgroup also demonstrated a significant effect of engagement on medication adherence for number of application sessions (unstandardized regression coefficient beta=0.0006, p=0.02) and time spent therein (beta=0.00001, p=0.03). Study-provided phone users showed higher initial adherence rates, but greater decline (5.7%) than BYOD users (0.9%) over the study period. In-depth interviews revealed that participants preferred the BYOD model over using study-provided devices. Results indicate that the BYOD model is feasible in health research settings and improves participant experience, calling for further BYOD model validity assessment. Although group differences in medication adherence decline were insignificant, the greater trend of decline in provisioned device users warrants further investigation to determine if trends reach significance over time. Significantly higher application engagement rates and effect of engagement on medication adherence in the BYOD subgroup similarly imply that greater application engagement may correlate to better medication adherence over time.

  13. Efficacy and safety of flavocoxid compared with naproxen in subjects with osteoarthritis of the knee- a subset analysis.

    PubMed

    Levy, Robert; Khokhlov, Alexander; Kopenkin, Sergey; Bart, Boris; Ermolova, Tatiana; Kantemirova, Raiasa; Mazurov, Vadim; Bell, Marjorie; Caldron, Paul; Pillai, Lakshmi; Burnett, Bruce

    2010-12-01

    twice-daily flavocoxid, a cyclooxygenase and 5-lipoxygenase inhibitor with potent antioxidant activity of botanical origin, was evaluated for 12 weeks in a randomized, double-blind, active-comparator study against naproxen in 220 subjects with moderate-severe osteoarthritis (OA) of the knee. As previously reported, both groups noted a significant reduction in the signs and symptoms of OA with no detectable differences in efficacy between the groups when the entire intent-to-treat population was considered. This post-hoc analysis compares the efficacy of flavocoxid to naproxen in different subsets of patients, specifically those related to age, gender, and disease severity as reported at baseline for individual response parameters. in the original randomized, double-blind study, 220 subjects were assigned to receive either flavocoxid (500 mg twice daily) or naproxen (500 mg twice daily) for 12 weeks. In this subgroup analysis, primary outcome measures including the Western Ontario and McMaster Universities OA index and subscales, timed walk, and secondary efficacy variables, including investigator global assessment for disease and global response to treatment, subject visual analog scale for discomfort, overall disease activity, global response to treatment, index joint tenderness and mobility, were evaluated for differing trends between the study groups. subset analyses revealed some statistically significant differences and some notable trends in favor of the flavocoxid group. These trends became stronger the longer the subjects continued on therapy. These observations were specifically noted in older subjects (>60 years), males and in subjects with milder disease, particularly those with lower subject global assessment of disease activity and investigator global assessment for disease and faster walking times at baseline. initial analysis of the entire intent-to-treat population revealed that flavocoxid was as effective as naproxen in managing the signs and symptoms of OA of the knee. Detailed analyses of subject subsets demonstrated distinct trends in favor of flavocoxid for specific groups of subjects.

  14. Entropy-based gene ranking without selection bias for the predictive classification of microarray data.

    PubMed

    Furlanello, Cesare; Serafini, Maria; Merler, Stefano; Jurman, Giuseppe

    2003-11-06

    We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process). With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.

  15. Application of validation data for assessing spatial interpolation methods for 8-h ozone or other sparsely monitored constituents.

    PubMed

    Joseph, John; Sharif, Hatim O; Sunil, Thankam; Alamgir, Hasanat

    2013-07-01

    The adverse health effects of high concentrations of ground-level ozone are well-known, but estimating exposure is difficult due to the sparseness of urban monitoring networks. This sparseness discourages the reservation of a portion of the monitoring stations for validation of interpolation techniques precisely when the risk of overfitting is greatest. In this study, we test a variety of simple spatial interpolation techniques for 8-h ozone with thousands of randomly selected subsets of data from two urban areas with monitoring stations sufficiently numerous to allow for true validation. Results indicate that ordinary kriging with only the range parameter calibrated in an exponential variogram is the generally superior method, and yields reliable confidence intervals. Sparse data sets may contain sufficient information for calibration of the range parameter even if the Moran I p-value is close to unity. R script is made available to apply the methodology to other sparsely monitored constituents. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. Evolution of households’ responses to the groundwater arsenic crisis in Bangladesh: information on environmental health risks can have increasing behavioral impact over time

    PubMed Central

    Balasubramanya, Soumya; Pfaff, Alexander; Bennear, Lori; Tarozzi, Alessandro; Ahmed, Kazi Matin; Schoenfeld, Amy; van Geen, Alexander

    2014-01-01

    A national campaign of well testing through 2003 enabled households in rural Bangladesh to switch, at least for drinking, from high-arsenic wells to neighboring lower-arsenic wells. We study the well-switching dynamics over time by re-interviewing, in 2008, a randomly selected subset of households in the Araihazar region who had been interviewed in 2005. Contrary to concerns that the impact of arsenic information on switching behavior would erode over time, we find that not only was 2003–2005 switching highly persistent but also new switching by 2008 doubled the share of households at unsafe wells who had switched. The passage of time also had a cost: 22% of households did not recall test results by 2008. The loss of arsenic knowledge led to staying at unsafe wells and switching from safe wells. Our results support ongoing well testing for arsenic to reinforce this beneficial information. PMID:25383015

  17. Adjuvant intraperitoneal chromic phosphate therapy for women with apparent early ovarian carcinoma who have not undergone comprehensive surgical staging

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Soper, J.T.; Berchuck, A.; Clarke-Pearson, D.L.

    1991-08-15

    Forty-nine women with apparent Stage 1 and 2 ovarian carcinoma received intraperitoneal phosphate 32 as the only adjuvant therapy after primary surgery. In addition to bilateral salpingo-oophorectomy, 40 (82%) had analysis of peritoneal cytology, and 35 (71%) underwent omentectomy. Random peritoneal biopsies and retroperitoneal lymph node sampling were not done in any of these patients. The overall and disease-free survival rates were 86% and 75%, respectively, with no significant differences by stage, histologic grade, histologic type, or low-risk versus high-risk subsets recognized in patients who received comprehensive surgical staging. Seven (58%) of 12 patients had lymph node metastasis as themore » first site of recurrence, including two of three with late recurrences. Significant morbidity related to intraperitoneal chromic phosphate (32P) occurred in one (2%) woman. These results emphasize the need for comprehensive surgical staging of women with apparent early ovarian carcinoma to aid in the selection of appropriate initial adjuvant therapy.« less

  18. Using the gini coefficient to measure the chemical diversity of small-molecule libraries.

    PubMed

    Weidlich, Iwona E; Filippov, Igor V

    2016-08-15

    Modern databases of small organic molecules contain tens of millions of structures. The size of theoretically available chemistry is even larger. However, despite the large amount of chemical information, the "big data" moment for chemistry has not yet provided the corresponding payoff of cheaper computer-predicted medicine or robust machine-learning models for the determination of efficacy and toxicity. Here, we present a study of the diversity of chemical datasets using a measure that is commonly used in socioeconomic studies. We demonstrate the use of this diversity measure on several datasets that were constructed to contain various congeneric subsets of molecules as well as randomly selected molecules. We also apply our method to a number of well-known databases that are frequently used for structure-activity relationship modeling. Our results show the poor diversity of the common sources of potential lead compounds compared to actual known drugs. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  19. Graph Learning in Knowledge Bases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Goldberg, Sean; Wang, Daisy Zhe

    The amount of text data has been growing exponentially in recent years, giving rise to automatic information extraction methods that store text annotations in a database. The current state-of-theart structured prediction methods, however, are likely to contain errors and it’s important to be able to manage the overall uncertainty of the database. On the other hand, the advent of crowdsourcing has enabled humans to aid machine algorithms at scale. As part of this project we introduced pi-CASTLE , a system that optimizes and integrates human and machine computing as applied to a complex structured prediction problem involving conditional random fieldsmore » (CRFs). We proposed strategies grounded in information theory to select a token subset, formulate questions for the crowd to label, and integrate these labelings back into the database using a method of constrained inference. On both a text segmentation task over academic citations and a named entity recognition task over tweets we showed an order of magnitude improvement in accuracy gain over baseline methods.« less

  20. 47 CFR 1.1602 - Designation for random selection.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 47 Telecommunication 1 2010-10-01 2010-10-01 false Designation for random selection. 1.1602 Section 1.1602 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Random Selection Procedures for Mass Media Services General Procedures § 1.1602 Designation for random selection...

  1. 47 CFR 1.1602 - Designation for random selection.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 47 Telecommunication 1 2011-10-01 2011-10-01 false Designation for random selection. 1.1602 Section 1.1602 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Random Selection Procedures for Mass Media Services General Procedures § 1.1602 Designation for random selection...

  2. [The study on the changes of serum IL- 6, TNF-α and peripheral blood T lymphocyte subsets in the pregnant women during perinatal period].

    PubMed

    Li, Juan

    2011-03-01

    To study the change law of serum IL-6, TNF-α and peripheral blood T lymphocyte subsets in the pregnant women during perinatal period. 100 pregnant women in our hospital from November 2009 to October 2010 were selected as research object, and the serum IL-6, TNF-α and peripheral blood T lymphocyte subsets be-fore and at labor onset occurring, after delivery at the first and third day were analyzed and compared. According the study, the serum IL-6 and TNF-aat labor onset occurring were higher than those before labor onset and af-ter delivery at the first and third day , the CD3(+), CD4 (+), CD8(+) and CD4/CD8 decreased first and then increased, all P < 0. 05, there were significant differences. The changes of serum IL-6, TNF-α and peripheral blood T lymphocyte subsets in the pregnant women during perinatal period has a regular pattern, and it is worthy of.

  3. MODIS Interactive Subsetting Tool (MIST)

    NASA Astrophysics Data System (ADS)

    McAllister, M.; Duerr, R.; Haran, T.; Khalsa, S. S.; Miller, D.

    2008-12-01

    In response to requests from the user community, NSIDC has teamed with the Oak Ridge National Laboratory Distributive Active Archive Center (ORNL DAAC) and the Moderate Resolution Data Center (MrDC) to provide time series subsets of satellite data covering stations in the Greenland Climate Network (GC-NET) and the International Arctic Systems for Observing the Atmosphere (IASOA) network. To serve these data NSIDC created the MODIS Interactive Subsetting Tool (MIST). MIST works with 7 km by 7 km subset time series of certain Version 5 (V005) MODIS products over GC-Net and IASOA stations. User- selected data are delivered in a text Comma Separated Value (CSV) file format. MIST also provides online analysis capabilities that include generating time series and scatter plots. Currently, MIST is a Beta prototype and NSIDC intends that user requests will drive future development of the tool. The intent of this poster is to introduce MIST to the MODIS data user audience and illustrate some of the online analysis capabilities.

  4. Fish swarm intelligent to optimize real time monitoring of chips drying using machine vision

    NASA Astrophysics Data System (ADS)

    Hendrawan, Y.; Hawa, L. C.; Damayanti, R.

    2018-03-01

    This study attempted to apply machine vision-based chips drying monitoring system which is able to optimise the drying process of cassava chips. The objective of this study is to propose fish swarm intelligent (FSI) optimization algorithms to find the most significant set of image features suitable for predicting water content of cassava chips during drying process using artificial neural network model (ANN). Feature selection entails choosing the feature subset that maximizes the prediction accuracy of ANN. Multi-Objective Optimization (MOO) was used in this study which consisted of prediction accuracy maximization and feature-subset size minimization. The results showed that the best feature subset i.e. grey mean, L(Lab) Mean, a(Lab) energy, red entropy, hue contrast, and grey homogeneity. The best feature subset has been tested successfully in ANN model to describe the relationship between image features and water content of cassava chips during drying process with R2 of real and predicted data was equal to 0.9.

  5. The transcription factor NRSF contributes to epileptogenesis by selective repression of a subset of target genes

    PubMed Central

    McClelland, Shawn; Brennan, Gary P; Dubé, Celine; Rajpara, Seeta; Iyer, Shruti; Richichi, Cristina; Bernard, Christophe; Baram, Tallie Z

    2014-01-01

    The mechanisms generating epileptic neuronal networks following insults such as severe seizures are unknown. We have previously shown that interfering with the function of the neuron-restrictive silencer factor (NRSF/REST), an important transcription factor that influences neuronal phenotype, attenuated development of this disorder. In this study, we found that epilepsy-provoking seizures increased the low NRSF levels in mature hippocampus several fold yet surprisingly, provoked repression of only a subset (∼10%) of potential NRSF target genes. Accordingly, the repressed gene-set was rescued when NRSF binding to chromatin was blocked. Unexpectedly, genes selectively repressed by NRSF had mid-range binding frequencies to the repressor, a property that rendered them sensitive to moderate fluctuations of NRSF levels. Genes selectively regulated by NRSF during epileptogenesis coded for ion channels, receptors, and other crucial contributors to neuronal function. Thus, dynamic, selective regulation of NRSF target genes may play a role in influencing neuronal properties in pathological and physiological contexts. DOI: http://dx.doi.org/10.7554/eLife.01267.001 PMID:25117540

  6. Dense mesh sampling for video-based facial animation

    NASA Astrophysics Data System (ADS)

    Peszor, Damian; Wojciechowska, Marzena

    2016-06-01

    The paper describes an approach for selection of feature points on three-dimensional, triangle mesh obtained using various techniques from several video footages. This approach has a dual purpose. First, it allows to minimize the data stored for the purpose of facial animation, so that instead of storing position of each vertex in each frame, one could store only a small subset of vertices for each frame and calculate positions of others based on the subset. Second purpose is to select feature points that could be used for anthropometry-based retargeting of recorded mimicry to another model, with sampling density beyond that which can be achieved using marker-based performance capture techniques. Developed approach was successfully tested on artificial models, models constructed using structured light scanner, and models constructed from video footages using stereophotogrammetry.

  7. Fuzzy Subspace Clustering

    NASA Astrophysics Data System (ADS)

    Borgelt, Christian

    In clustering we often face the situation that only a subset of the available attributes is relevant for forming clusters, even though this may not be known beforehand. In such cases it is desirable to have a clustering algorithm that automatically weights attributes or even selects a proper subset. In this paper I study such an approach for fuzzy clustering, which is based on the idea to transfer an alternative to the fuzzifier (Klawonn and Höppner, What is fuzzy about fuzzy clustering? Understanding and improving the concept of the fuzzifier, In: Proc. 5th Int. Symp. on Intelligent Data Analysis, 254-264, Springer, Berlin, 2003) to attribute weighting fuzzy clustering (Keller and Klawonn, Int J Uncertain Fuzziness Knowl Based Syst 8:735-746, 2000). In addition, by reformulating Gustafson-Kessel fuzzy clustering, a scheme for weighting and selecting principal axes can be obtained. While in Borgelt (Feature weighting and feature selection in fuzzy clustering, In: Proc. 17th IEEE Int. Conf. on Fuzzy Systems, IEEE Press, Piscataway, NJ, 2008) I already presented such an approach for a global selection of attributes and principal axes, this paper extends it to a cluster-specific selection, thus arriving at a fuzzy subspace clustering algorithm (Parsons, Haque, and Liu, 2004).

  8. Hypergraph Based Feature Selection Technique for Medical Diagnosis.

    PubMed

    Somu, Nivethitha; Raman, M R Gauthama; Kirthivasan, Kannan; Sriram, V S Shankar

    2016-11-01

    The impact of internet and information systems across various domains have resulted in substantial generation of multidimensional datasets. The use of data mining and knowledge discovery techniques to extract the original information contained in the multidimensional datasets play a significant role in the exploitation of complete benefit provided by them. The presence of large number of features in the high dimensional datasets incurs high computational cost in terms of computing power and time. Hence, feature selection technique has been commonly used to build robust machine learning models to select a subset of relevant features which projects the maximal information content of the original dataset. In this paper, a novel Rough Set based K - Helly feature selection technique (RSKHT) which hybridize Rough Set Theory (RST) and K - Helly property of hypergraph representation had been designed to identify the optimal feature subset or reduct for medical diagnostic applications. Experiments carried out using the medical datasets from the UCI repository proves the dominance of the RSKHT over other feature selection techniques with respect to the reduct size, classification accuracy and time complexity. The performance of the RSKHT had been validated using WEKA tool, which shows that RSKHT had been computationally attractive and flexible over massive datasets.

  9. Terahertz imaging with compressed sensing and phase retrieval.

    PubMed

    Chan, Wai Lam; Moravec, Matthew L; Baraniuk, Richard G; Mittleman, Daniel M

    2008-05-01

    We describe a novel, high-speed pulsed terahertz (THz) Fourier imaging system based on compressed sensing (CS), a new signal processing theory, which allows image reconstruction with fewer samples than traditionally required. Using CS, we successfully reconstruct a 64 x 64 image of an object with pixel size 1.4 mm using a randomly chosen subset of the 4096 pixels, which defines the image in the Fourier plane, and observe improved reconstruction quality when we apply phase correction. For our chosen image, only about 12% of the pixels are required for reassembling the image. In combination with phase retrieval, our system has the capability to reconstruct images with only a small subset of Fourier amplitude measurements and thus has potential application in THz imaging with cw sources.

  10. The proposal of architecture for chemical splitting to optimize QSAR models for aquatic toxicity.

    PubMed

    Colombo, Andrea; Benfenati, Emilio; Karelson, Mati; Maran, Uko

    2008-06-01

    One of the challenges in the field of quantitative structure-activity relationship (QSAR) analysis is the correct classification of a chemical compound to an appropriate model for the prediction of activity. Thus, in previous studies, compounds have been divided into distinct groups according to their mode of action or chemical class. In the current study, theoretical molecular descriptors were used to divide 568 organic substances into subsets with toxicity measured for the 96-h lethal median concentration for the Fathead minnow (Pimephales promelas). Simple constitutional descriptors such as the number of aliphatic and aromatic rings and a quantum chemical descriptor, maximum bond order of a carbon atom divide compounds into nine subsets. For each subset of compounds the automatic forward selection of descriptors was applied to construct QSAR models. Significant correlations were achieved for each subset of chemicals and all models were validated with the leave-one-out internal validation procedure (R(2)(cv) approximately 0.80). The results encourage to consider this alternative way for the prediction of toxicity using QSAR subset models without direct reference to the mechanism of toxic action or the traditional chemical classification.

  11. 47 CFR 1.1603 - Conduct of random selection.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 47 Telecommunication 1 2010-10-01 2010-10-01 false Conduct of random selection. 1.1603 Section 1.1603 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Random Selection Procedures for Mass Media Services General Procedures § 1.1603 Conduct of random selection. The...

  12. 47 CFR 1.1603 - Conduct of random selection.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 47 Telecommunication 1 2011-10-01 2011-10-01 false Conduct of random selection. 1.1603 Section 1.1603 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Random Selection Procedures for Mass Media Services General Procedures § 1.1603 Conduct of random selection. The...

  13. Toward optimal feature and time segment selection by divergence method for EEG signals classification.

    PubMed

    Wang, Jie; Feng, Zuren; Lu, Na; Luo, Jing

    2018-06-01

    Feature selection plays an important role in the field of EEG signals based motor imagery pattern classification. It is a process that aims to select an optimal feature subset from the original set. Two significant advantages involved are: lowering the computational burden so as to speed up the learning procedure and removing redundant and irrelevant features so as to improve the classification performance. Therefore, feature selection is widely employed in the classification of EEG signals in practical brain-computer interface systems. In this paper, we present a novel statistical model to select the optimal feature subset based on the Kullback-Leibler divergence measure, and automatically select the optimal subject-specific time segment. The proposed method comprises four successive stages: a broad frequency band filtering and common spatial pattern enhancement as preprocessing, features extraction by autoregressive model and log-variance, the Kullback-Leibler divergence based optimal feature and time segment selection and linear discriminate analysis classification. More importantly, this paper provides a potential framework for combining other feature extraction models and classification algorithms with the proposed method for EEG signals classification. Experiments on single-trial EEG signals from two public competition datasets not only demonstrate that the proposed method is effective in selecting discriminative features and time segment, but also show that the proposed method yields relatively better classification results in comparison with other competitive methods. Copyright © 2018 Elsevier Ltd. All rights reserved.

  14. Enantioselectivity in Candida antarctica lipase B: A molecular dynamics study

    PubMed Central

    Raza, Sami; Fransson, Linda; Hult, Karl

    2001-01-01

    A major problem in predicting the enantioselectivity of an enzyme toward substrate molecules is that even high selectivity toward one substrate enantiomer over the other corresponds to a very small difference in free energy. However, total free energies in enzyme-substrate systems are very large and fluctuate significantly because of general protein motion. Candida antarctica lipase B (CALB), a serine hydrolase, displays enantioselectivity toward secondary alcohols. Here, we present a modeling study where the aim has been to develop a molecular dynamics-based methodology for the prediction of enantioselectivity in CALB. The substrates modeled (seven in total) were 3-methyl-2-butanol with various aliphatic carboxylic acids and also 2-butanol, as well as 3,3-dimethyl-2-butanol with octanoic acid. The tetrahedral reaction intermediate was used as a model of the transition state. Investigative analyses were performed on ensembles of nonminimized structures and focused on the potential energies of a number of subsets within the modeled systems to determine which specific regions are important for the prediction of enantioselectivity. One category of subset was based on atoms that make up the core structural elements of the transition state. We considered that a more favorable energetic conformation of such a subset should relate to a greater likelihood for catalysis to occur, thus reflecting higher selectivity. The results of this study conveyed that the use of this type of subset was viable for the analysis of structural ensembles and yielded good predictions of enantioselectivity. PMID:11266619

  15. Creating a non-linear total sediment load formula using polynomial best subset regression model

    NASA Astrophysics Data System (ADS)

    Okcu, Davut; Pektas, Ali Osman; Uyumaz, Ali

    2016-08-01

    The aim of this study is to derive a new total sediment load formula which is more accurate and which has less application constraints than the well-known formulae of the literature. 5 most known stream power concept sediment formulae which are approved by ASCE are used for benchmarking on a wide range of datasets that includes both field and flume (lab) observations. The dimensionless parameters of these widely used formulae are used as inputs in a new regression approach. The new approach is called Polynomial Best subset regression (PBSR) analysis. The aim of the PBRS analysis is fitting and testing all possible combinations of the input variables and selecting the best subset. Whole the input variables with their second and third powers are included in the regression to test the possible relation between the explanatory variables and the dependent variable. While selecting the best subset a multistep approach is used that depends on significance values and also the multicollinearity degrees of inputs. The new formula is compared to others in a holdout dataset and detailed performance investigations are conducted for field and lab datasets within this holdout data. Different goodness of fit statistics are used as they represent different perspectives of the model accuracy. After the detailed comparisons are carried out we figured out the most accurate equation that is also applicable on both flume and river data. Especially, on field dataset the prediction performance of the proposed formula outperformed the benchmark formulations.

  16. Estimation of reference intervals from small samples: an example using canine plasma creatinine.

    PubMed

    Geffré, A; Braun, J P; Trumel, C; Concordet, D

    2009-12-01

    According to international recommendations, reference intervals should be determined from at least 120 reference individuals, which often are impossible to achieve in veterinary clinical pathology, especially for wild animals. When only a small number of reference subjects is available, the possible bias cannot be known and the normality of the distribution cannot be evaluated. A comparison of reference intervals estimated by different methods could be helpful. The purpose of this study was to compare reference limits determined from a large set of canine plasma creatinine reference values, and large subsets of this data, with estimates obtained from small samples selected randomly. Twenty sets each of 120 and 27 samples were randomly selected from a set of 1439 plasma creatinine results obtained from healthy dogs in another study. Reference intervals for the whole sample and for the large samples were determined by a nonparametric method. The estimated reference limits for the small samples were minimum and maximum, mean +/- 2 SD of native and Box-Cox-transformed values, 2.5th and 97.5th percentiles by a robust method on native and Box-Cox-transformed values, and estimates from diagrams of cumulative distribution functions. The whole sample had a heavily skewed distribution, which approached Gaussian after Box-Cox transformation. The reference limits estimated from small samples were highly variable. The closest estimates to the 1439-result reference interval for 27-result subsamples were obtained by both parametric and robust methods after Box-Cox transformation but were grossly erroneous in some cases. For small samples, it is recommended that all values be reported graphically in a dot plot or histogram and that estimates of the reference limits be compared using different methods.

  17. Antifibrinolytic agents and desmopressin as hemostatic agents in cardiac surgery.

    PubMed

    Erstad, B L

    2001-09-01

    To review the use of systemic hemostatic medications for reducing bleeding and transfusion requirements with cardiac surgery. Articles were obtained through computerized searches involving MEDLINE (from 1966 to September 2000). Additionally, several textbooks containing information on the diagnosis and management of bleeding associated with cardiac surgery were reviewed. The bibliographies of retrieved publications and textbooks were reviewed for additional references. Due to the large number of randomized investigations involving systemic hemostatic medications for reducing bleeding associated with cardiac surgery, the article selection process focused on recent randomized controlled trials, metaanalyses and pharmacoeconomic evaluations. The primary outcomes extracted from the literature were blood loss and associated transfusion requirements, although other outcome measures such as mortality were extracted when available. Although the majority of investigations for reducing cardiac bleeding and transfusion requirements have involved aprotinin, evidence from recent meta-analyses and randomized trials indicates that the synthetic antifibrinolytic agents, aminocaproic acid and tranexamic acid, have similar clinical efficacy. Additionally, aminocaproic acid (and to a lesser extent tranexamic acid) is much less costly. More comparative information of hemostatic agents is needed retative to other outcomes (eg., reoperation rates, myocardial infarction, stroke). There is insufficient evidence to recommend the use of desmopressin for reducing bleeding and transfusion requirements in cardiac surgery, although certain subsets of patients may benefit from its use. Of the medications that have been used to reduce bleeding and transfusion requirements with cardiac surgery, the antifibrinolytic agents have the best evidence supporting their use. Aminocaproic acid is the least costly therapy based on medication costs and transfusion requirements.

  18. 77 FR 43879 - Self-Regulatory Organizations; NYSE Arca, Inc.; Notice of Designation of a Longer Period for...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-07-26

    ... Proposed Rule Change Amending NYSE Arca Equities Rule 7.31(h) To Add a PL Select Order Type July 20, 2012...(h) to add a PL Select Order type. The proposed rule change was published for comment in the Federal... security at a specified, undisplayed price. The PL Select Order would be a subset of the PL Order that...

  19. Gene selection for microarray data classification via subspace learning and manifold regularization.

    PubMed

    Tang, Chang; Cao, Lijuan; Zheng, Xiao; Wang, Minhui

    2017-12-19

    With the rapid development of DNA microarray technology, large amount of genomic data has been generated. Classification of these microarray data is a challenge task since gene expression data are often with thousands of genes but a small number of samples. In this paper, an effective gene selection method is proposed to select the best subset of genes for microarray data with the irrelevant and redundant genes removed. Compared with original data, the selected gene subset can benefit the classification task. We formulate the gene selection task as a manifold regularized subspace learning problem. In detail, a projection matrix is used to project the original high dimensional microarray data into a lower dimensional subspace, with the constraint that the original genes can be well represented by the selected genes. Meanwhile, the local manifold structure of original data is preserved by a Laplacian graph regularization term on the low-dimensional data space. The projection matrix can serve as an importance indicator of different genes. An iterative update algorithm is developed for solving the problem. Experimental results on six publicly available microarray datasets and one clinical dataset demonstrate that the proposed method performs better when compared with other state-of-the-art methods in terms of microarray data classification. Graphical Abstract The graphical abstract of this work.

  20. A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization.

    PubMed

    Vafaee Sharbaf, Fatemeh; Mosafer, Sara; Moattar, Mohammad Hossein

    2016-06-01

    This paper proposes an approach for gene selection in microarray data. The proposed approach consists of a primary filter approach using Fisher criterion which reduces the initial genes and hence the search space and time complexity. Then, a wrapper approach which is based on cellular learning automata (CLA) optimized with ant colony method (ACO) is used to find the set of features which improve the classification accuracy. CLA is applied due to its capability to learn and model complicated relationships. The selected features from the last phase are evaluated using ROC curve and the most effective while smallest feature subset is determined. The classifiers which are evaluated in the proposed framework are K-nearest neighbor; support vector machine and naïve Bayes. The proposed approach is evaluated on 4 microarray datasets. The evaluations confirm that the proposed approach can find the smallest subset of genes while approaching the maximum accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Nest-site selection and nest success of an Arctic-breeding passerine, Smith's Longspur, in a changing climate

    USGS Publications Warehouse

    McFarland, Heather R.; Kendall, Steve J.; Powell, Abby

    2017-01-01

    Despite changes in shrub cover and weather patterns associated with climate change in the Arctic, little is known about the breeding requirements of most passerines tied to northern regions. We investigated the nesting biology and nest habitat characteristics of Smith's Longspurs (Calcarius pictus) in 2 study areas in the Brooks Range of Alaska, USA. First, we examined variation in nesting phenology in relation to local temperatures. We then characterized nesting habitat and analyzed nest-site selection for a subset of nests (n = 86) in comparison with paired random points. Finally, we estimated the daily survival rate of 257 nests found in 2007–2013 with respect to both habitat characteristics and weather variables. Nest initiation was delayed in years with snow events, heavy rain, and freezing temperatures early in the breeding season. Nests were typically found in open, low-shrub tundra, and never among tall shrubs (mean shrub height at nests = 26.8 ± 6.7 cm). We observed weak nest-site selection patterns. Considering the similarity between nest sites and paired random points, coupled with the unique social mating system of Smith's Longspurs, we suggest that habitat selection may occur at the neighborhood scale and not at the nest-site scale. The best approximating model explaining nest survival suggested a positive relationship with the numbers of days above 21°C that an individual nest experienced; there was little support for models containing habitat variables. The daily nest survival rate was high (0.972–0.982) compared with that of most passerines in forested or grassland habitats, but similar to that of passerines nesting on tundra. Considering their high nesting success and ability to delay nest initiation during inclement weather, Smith's Longspurs may be resilient to predicted changes in weather regimes on the breeding grounds. Thus, the greatest threat to breeding Smith's Longspurs associated with climate change may be the loss of low-shrub habitat types, which could significantly change the characteristics of breeding areas.

  2. A meta-analysis of the association between gestational diabetes mellitus and chronic hepatitis B infection during pregnancy

    PubMed Central

    2014-01-01

    Background Chronic hepatitis B (CHB) infection during pregnancy is associated with insulin resistance. A meta-analytic technique was used to quantify the evidence of an association between CHB infection and the risk of gestational diabetes (GDM) among pregnant women. Methods We searched PubMed for studies up to September 5th 2013. Additional studies were obtained from other sources. We selected studies using a cohort-study design and reported a quantitative association between CHB infection during pregnancy and risk of GDM. A total of 280 articles were identified, of which fourteen publications involving 439,514 subjects met the inclusion criteria. A sequential algorithm was used to reduce between-study heterogeneity, and further meta-analysis was conducted using a random-effects model. Results Ten out of the fourteen studies were highly homogeneous, indicating an association of 1.11 [the adjusted odds ratio, 95% confidence interval 0.96 - 1.28] between CHB infection during pregnancy and the risk of developing GDM. The heterogeneity of the additional four studies may be due to selection bias or possible aetiological differences for special subsets of pregnant women. Conclusions These results indicate that CHB infection during pregnancy is not associated with an increased risk of developing GDM among pregnant women except those from Iran. PMID:24618120

  3. Design and Application of Drought Indexes in Highly Regulated Mediterranean Water Systems

    NASA Astrophysics Data System (ADS)

    Castelletti, A.; Zaniolo, M.; Giuliani, M.

    2017-12-01

    Costs of drought are progressively increasing due to the undergoing alteration of hydro-meteorological regimes induced by climate change. Although drought management is largely studied in the literature, most of the traditional drought indexes fail in detecting critical events in highly regulated systems, which generally rely on ad-hoc formulations and cannot be generalized to different context. In this study, we contribute a novel framework for the design of a basin-customized drought index. This index represents a surrogate of the state of the basin and is computed by combining the available information about the water available in the system to reproduce a representative target variable for the drought condition of the basin (e.g., water deficit). To select the relevant variables and combinatione thereof, we use an advanced feature extraction algorithm called Wrapper for Quasi Equally Informative Subset Selection (W-QEISS). W-QEISS relies on a multi-objective evolutionary algorithm to find Pareto-efficient subsets of variables by maximizing the wrapper accuracy, minimizing the number of selected variables, and optimizing relevance and redundancy of the subset. The accuracy objective is evaluated trough the calibration of an extreme learning machine of the water deficit for each candidate subset of variables, with the index selected from the resulting solutions identifying a suitable compromise between accuracy, cardinality, relevance, and redundancy. The approach is tested on Lake Como, Italy, a regulated lake mainly operated for irrigation supply. In the absence of an institutional drought monitoring system, we constructed the combined index using all the hydrological variables from the existing monitoring system as well as common drought indicators at multiple time aggregations. The soil moisture deficit in the root zone computed by a distributed-parameter water balance model of the agricultural districts is used as target variable. Numerical results show that our combined drought index succesfully reproduces the deficit. The index represents a valuable information for supporting appropriate drought management strategies, including the possibility of directly informing the lake operations about the drought conditions and improve the overall reliability of the irrigation supply system.

  4. Frequency and distribution of incidental findings deemed appropriate for S modifier designation on low-dose CT in a lung cancer screening program.

    PubMed

    Reiter, Michael J; Nemesure, Allison; Madu, Ezemonye; Reagan, Lisa; Plank, April

    2018-06-01

    To describe the frequency, distribution and reporting patterns of incidental findings receiving the Lung-RADS S modifier on low-dose chest computed tomography (CT) among lung cancer screening participants. This retrospective investigation included 581 individuals who received baseline low-dose chest CT for lung cancer screening between October 2013 and June 2017 at a single center. Incidental findings resulting in assignment of Lung-RADS S modifier were recorded as were incidental abnormalities detailed within the body of the radiology report only. A subset of 60 randomly selected CTs was reviewed by a second (blinded) radiologist to evaluate inter-rater variability of Lung-RADS reporting. A total of 261 (45%) participants received the Lung-RADS S modifier on baseline CT with 369 incidental findings indicated as potentially clinically significant. Coronary artery calcification was most commonly reported, accounting for 182 of the 369 (49%) findings. An additional 141 incidentalomas of the same types as these 369 findings were described in reports but were not labelled with the S modifier. Therefore, as high as 69% (402 of 581) of participants could have received the S modifier if reporting was uniform. Inter-radiologist concordance of S modifier reporting in a subset of 60 participants was poor (42% agreement, kappa = 0.2). Incidental findings are commonly identified on chest CT for lung cancer screening, yet reporting of the S modifier within Lung-RADS is inconsistent. Specific guidelines are necessary to better define potentially clinically significant abnormalities and to improve reporting uniformity. Copyright © 2018 Elsevier B.V. All rights reserved.

  5. Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification.

    PubMed

    Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo

    2016-01-01

    Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.

  6. Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification

    PubMed Central

    Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo

    2016-01-01

    Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases. PMID:26764911

  7. Automatic design of basin-specific drought indexes for highly regulated water systems

    NASA Astrophysics Data System (ADS)

    Zaniolo, Marta; Giuliani, Matteo; Castelletti, Andrea Francesco; Pulido-Velazquez, Manuel

    2018-04-01

    Socio-economic costs of drought are progressively increasing worldwide due to undergoing alterations of hydro-meteorological regimes induced by climate change. Although drought management is largely studied in the literature, traditional drought indexes often fail at detecting critical events in highly regulated systems, where natural water availability is conditioned by the operation of water infrastructures such as dams, diversions, and pumping wells. Here, ad hoc index formulations are usually adopted based on empirical combinations of several, supposed-to-be significant, hydro-meteorological variables. These customized formulations, however, while effective in the design basin, can hardly be generalized and transferred to different contexts. In this study, we contribute FRIDA (FRamework for Index-based Drought Analysis), a novel framework for the automatic design of basin-customized drought indexes. In contrast to ad hoc empirical approaches, FRIDA is fully automated, generalizable, and portable across different basins. FRIDA builds an index representing a surrogate of the drought conditions of the basin, computed by combining all the relevant available information about the water circulating in the system identified by means of a feature extraction algorithm. We used the Wrapper for Quasi-Equally Informative Subset Selection (W-QEISS), which features a multi-objective evolutionary algorithm to find Pareto-efficient subsets of variables by maximizing the wrapper accuracy, minimizing the number of selected variables, and optimizing relevance and redundancy of the subset. The preferred variable subset is selected among the efficient solutions and used to formulate the final index according to alternative model structures. We apply FRIDA to the case study of the Jucar river basin (Spain), a drought-prone and highly regulated Mediterranean water resource system, where an advanced drought management plan relying on the formulation of an ad hoc state index is used for triggering drought management measures. The state index was constructed empirically with a trial-and-error process begun in the 1980s and finalized in 2007, guided by the experts from the Confederación Hidrográfica del Júcar (CHJ). Our results show that the automated variable selection outcomes align with CHJ's 25-year-long empirical refinement. In addition, the resultant FRIDA index outperforms the official State Index in terms of accuracy in reproducing the target variable and cardinality of the selected inputs set.

  8. A Simple Approach to Account for Climate Model Interdependence in Multi-Model Ensembles

    NASA Astrophysics Data System (ADS)

    Herger, N.; Abramowitz, G.; Angelil, O. M.; Knutti, R.; Sanderson, B.

    2016-12-01

    Multi-model ensembles are an indispensable tool for future climate projection and its uncertainty quantification. Ensembles containing multiple climate models generally have increased skill, consistency and reliability. Due to the lack of agreed-on alternatives, most scientists use the equally-weighted multi-model mean as they subscribe to model democracy ("one model, one vote").Different research groups are known to share sections of code, parameterizations in their model, literature, or even whole model components. Therefore, individual model runs do not represent truly independent estimates. Ignoring this dependence structure might lead to a false model consensus, wrong estimation of uncertainty and effective number of independent models.Here, we present a way to partially address this problem by selecting a subset of CMIP5 model runs so that its climatological mean minimizes the RMSE compared to a given observation product. Due to the cancelling out of errors, regional biases in the ensemble mean are reduced significantly.Using a model-as-truth experiment we demonstrate that those regional biases persist into the future and we are not fitting noise, thus providing improved observationally-constrained projections of the 21st century. The optimally selected ensemble shows significantly higher global mean surface temperature projections than the original ensemble, where all the model runs are considered. Moreover, the spread is decreased well beyond that expected from the decreased ensemble size.Several previous studies have recommended an ensemble selection approach based on performance ranking of the model runs. Here, we show that this approach can perform even worse than randomly selecting ensemble members and can thus be harmful. We suggest that accounting for interdependence in the ensemble selection process is a necessary step for robust projections for use in impact assessments, adaptation and mitigation of climate change.

  9. Estimation of Symptom Severity During Chemotherapy From Passively Sensed Data: Exploratory Study

    PubMed Central

    Dey, Anind K; Ferreira, Denzil; Kamarck, Thomas; Sun, Weijing; Bae, Sangwon; Doryab, Afsaneh

    2017-01-01

    Background Physical and psychological symptoms are common during chemotherapy in cancer patients, and real-time monitoring of these symptoms can improve patient outcomes. Sensors embedded in mobile phones and wearable activity trackers could be potentially useful in monitoring symptoms passively, with minimal patient burden. Objective The aim of this study was to explore whether passively sensed mobile phone and Fitbit data could be used to estimate daily symptom burden during chemotherapy. Methods A total of 14 patients undergoing chemotherapy for gastrointestinal cancer participated in the 4-week study. Participants carried an Android phone and wore a Fitbit device for the duration of the study and also completed daily severity ratings of 12 common symptoms. Symptom severity ratings were summed to create a total symptom burden score for each day, and ratings were centered on individual patient means and categorized into low, average, and high symptom burden days. Day-level features were extracted from raw mobile phone sensor and Fitbit data and included features reflecting mobility and activity, sleep, phone usage (eg, duration of interaction with phone and apps), and communication (eg, number of incoming and outgoing calls and messages). We used a rotation random forests classifier with cross-validation and resampling with replacement to evaluate population and individual model performance and correlation-based feature subset selection to select nonredundant features with the best predictive ability. Results Across 295 days of data with both symptom and sensor data, a number of mobile phone and Fitbit features were correlated with patient-reported symptom burden scores. We achieved an accuracy of 88.1% for our population model. The subset of features with the best accuracy included sedentary behavior as the most frequent activity, fewer minutes in light physical activity, less variable and average acceleration of the phone, and longer screen-on time and interactions with apps on the phone. Mobile phone features had better predictive ability than Fitbit features. Accuracy of individual models ranged from 78.1% to 100% (mean 88.4%), and subsets of relevant features varied across participants. Conclusions Passive sensor data, including mobile phone accelerometer and usage and Fitbit-assessed activity and sleep, were related to daily symptom burden during chemotherapy. These findings highlight opportunities for long-term monitoring of cancer patients during chemotherapy with minimal patient burden as well as real-time adaptive interventions aimed at early management of worsening or severe symptoms. PMID:29258977

  10. Estimation of Symptom Severity During Chemotherapy From Passively Sensed Data: Exploratory Study.

    PubMed

    Low, Carissa A; Dey, Anind K; Ferreira, Denzil; Kamarck, Thomas; Sun, Weijing; Bae, Sangwon; Doryab, Afsaneh

    2017-12-19

    Physical and psychological symptoms are common during chemotherapy in cancer patients, and real-time monitoring of these symptoms can improve patient outcomes. Sensors embedded in mobile phones and wearable activity trackers could be potentially useful in monitoring symptoms passively, with minimal patient burden. The aim of this study was to explore whether passively sensed mobile phone and Fitbit data could be used to estimate daily symptom burden during chemotherapy. A total of 14 patients undergoing chemotherapy for gastrointestinal cancer participated in the 4-week study. Participants carried an Android phone and wore a Fitbit device for the duration of the study and also completed daily severity ratings of 12 common symptoms. Symptom severity ratings were summed to create a total symptom burden score for each day, and ratings were centered on individual patient means and categorized into low, average, and high symptom burden days. Day-level features were extracted from raw mobile phone sensor and Fitbit data and included features reflecting mobility and activity, sleep, phone usage (eg, duration of interaction with phone and apps), and communication (eg, number of incoming and outgoing calls and messages). We used a rotation random forests classifier with cross-validation and resampling with replacement to evaluate population and individual model performance and correlation-based feature subset selection to select nonredundant features with the best predictive ability. Across 295 days of data with both symptom and sensor data, a number of mobile phone and Fitbit features were correlated with patient-reported symptom burden scores. We achieved an accuracy of 88.1% for our population model. The subset of features with the best accuracy included sedentary behavior as the most frequent activity, fewer minutes in light physical activity, less variable and average acceleration of the phone, and longer screen-on time and interactions with apps on the phone. Mobile phone features had better predictive ability than Fitbit features. Accuracy of individual models ranged from 78.1% to 100% (mean 88.4%), and subsets of relevant features varied across participants. Passive sensor data, including mobile phone accelerometer and usage and Fitbit-assessed activity and sleep, were related to daily symptom burden during chemotherapy. These findings highlight opportunities for long-term monitoring of cancer patients during chemotherapy with minimal patient burden as well as real-time adaptive interventions aimed at early management of worsening or severe symptoms. ©Carissa A Low, Anind K Dey, Denzil Ferreira, Thomas Kamarck, Weijing Sun, Sangwon Bae, Afsaneh Doryab. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 19.12.2017.

  11. Selecting and Validating Tasks from a Kindergarten Screening Battery that Best Predict Third Grade Educational Placement

    ERIC Educational Resources Information Center

    Scott, Marcia Strong; Delgado, Christine F.; Tu, Shihfen; Fletcher, Kathryn L.

    2005-01-01

    In this study, predictive classification accuracy was used to select those tasks from a kindergarten screening battery that best identified children who, three years later, were classified as educable mentally handicapped or as having a specific learning disability. A subset of measures enabled correct classification of 91% of the children in…

  12. Selecting climate change scenarios using impact-relevant sensitivities

    Treesearch

    Julie A. Vano; John B. Kim; David E. Rupp; Philip W. Mote

    2015-01-01

    Climate impact studies often require the selection of a small number of climate scenarios. Ideally, a subset would have simulations that both (1) appropriately represent the range of possible futures for the variable/s most important to the impact under investigation and (2) come from global climate models (GCMs) that provide plausible results for future climate in the...

  13. [Varicocele and coincidental abacterial prostato-vesiculitis: negative role about the sperm output].

    PubMed

    Vicari, Enzo; La Vignera, Sandro; Tracia, Angelo; Cardì, Francesco; Donati, Angelo

    2003-03-01

    To evaluate the frequency and the role of a coincidentally expressed abacterial prostato-vesiculitis (PV) on sperm output in patients with left varicocele (Vr). We evaluated 143 selected infertile patients (mean age 27 years, range 21-43), with oligo- and/or astheno- and/or teratozoospermia (OAT) subdivided in two groups. Group A included 76 patients with previous varicocelectomy and persistent OAT. Group B included 67 infertile patients (mean age 26 years, range 21-37) with OAT and not varicocelectomized. Patients with Vr and coincidental didymo-epididymal ultrasound (US) abnormalities were excluded from the study. Following rectal prostato-vesicular ultrasonography, each group was subdivided in two subsets on the basis of the absence (group A: subset Vr-/PV-; and group B: subset Vr+/PV-) or the presence of an abacterial PV (group A: subset Vr-/PV+; group B: subset Vr+/PV+). Particularly, PV was present in 47.4% and 41.8% patients of groups A and B, respectively. This coincidental pathology was ipsilateral with Vr in the 61% of the cases. Semen analysis was performed in all patients. Patients of group A showed a total sperm number significantly higher than those found in group B. In presence of PV, sperm parameters were not significantly different between matched--subsets (Vr-/PV+ vs. Vr+/PV+). In absence of PV, the sperm density, the total sperm number and the percentage of forward motility from subset with previous varicocelectomy (Vr-/PV) exhibited values significantly higher than those found in the matched--subset (Vr+/PV-). Sperm analysis alone performed in patients with left Vr is not a useful prognostic post-varicocelectomy marker. Since following varicocelectomy a lack of sperm response could mask another coincidental pathology, the identification through US scans of a possible PV may be mandatory. On the other hand, an integrated uro-andrological approach, including US scans, allows to enucleate subsets of patients with Vr alone, who will have an expected better sperm response following Vr repair.

  14. CD127 and CD25 expression defines CD4+ T cell subsets that are differentially depleted during HIV infection.

    PubMed

    Dunham, Richard M; Cervasi, Barbara; Brenchley, Jason M; Albrecht, Helmut; Weintrob, Amy; Sumpter, Beth; Engram, Jessica; Gordon, Shari; Klatt, Nichole R; Frank, Ian; Sodora, Donald L; Douek, Daniel C; Paiardini, Mirko; Silvestri, Guido

    2008-04-15

    Decreased CD4(+) T cell counts are the best marker of disease progression during HIV infection. However, CD4(+) T cells are heterogeneous in phenotype and function, and it is unknown how preferential depletion of specific CD4(+) T cell subsets influences disease severity. CD4(+) T cells can be classified into three subsets by the expression of receptors for two T cell-tropic cytokines, IL-2 (CD25) and IL-7 (CD127). The CD127(+)CD25(low/-) subset includes IL-2-producing naive and central memory T cells; the CD127(-)CD25(-) subset includes mainly effector T cells expressing perforin and IFN-gamma; and the CD127(low)CD25(high) subset includes FoxP3-expressing regulatory T cells. Herein we investigated how the proportions of these T cell subsets are changed during HIV infection. When compared with healthy controls, HIV-infected patients show a relative increase in CD4(+)CD127(-)CD25(-) T cells that is related to an absolute decline of CD4(+)CD127(+)CD25(low/-) T cells. Interestingly, this expansion of CD4(+)CD127(-) T cells was not observed in naturally SIV-infected sooty mangabeys. The relative expansion of CD4(+)CD127(-)CD25(-) T cells correlated directly with the levels of total CD4(+) T cell depletion and immune activation. CD4(+)CD127(-)CD25(-) T cells were not selectively resistant to HIV infection as levels of cell-associated virus were similar in all non-naive CD4(+) T cell subsets. These data indicate that, during HIV infection, specific changes in the fraction of CD4(+) T cells expressing CD25 and/or CD127 are associated with disease progression. Further studies will determine whether monitoring the three subsets of CD4(+) T cells defined based on the expression of CD25 and CD127 should be used in the clinical management of HIV-infected individuals.

  15. Contextual cueing in multiconjunction visual search is dependent on color- and configuration-based intertrial contingencies.

    PubMed

    Geyer, Thomas; Shi, Zhuanghua; Müller, Hermann J

    2010-06-01

    Three experiments examined memory-based guidance of visual search using a modified version of the contextual-cueing paradigm (Jiang & Chun, 2001). The target, if present, was a conjunction of color and orientation, with target (and distractor) features randomly varying across trials (multiconjunction search). Under these conditions, reaction times (RTs) were faster when all items in the display appeared at predictive ("old") relative to nonpredictive ("new") locations. However, this RT benefit was smaller compared to when only one set of items, namely that sharing the target's color (but not that in the alternative color) appeared in predictive arrangement. In all conditions, contextual cueing was reliable on both target-present and -absent trials and enhanced if a predictive display was preceded by a predictive (though differently arranged) display, rather than a nonpredictive display. These results suggest that (1) contextual cueing is confined to color subsets of items, that (2) retrieving contextual associations for one color subset of items can be impeded by associations formed within the alternative subset ("contextual interference"), and (3) that contextual cueing is modulated by intertrial priming.

  16. A hybrid Bayesian hierarchical model combining cohort and case-control studies for meta-analysis of diagnostic tests: Accounting for partial verification bias.

    PubMed

    Ma, Xiaoye; Chen, Yong; Cole, Stephen R; Chu, Haitao

    2016-12-01

    To account for between-study heterogeneity in meta-analysis of diagnostic accuracy studies, bivariate random effects models have been recommended to jointly model the sensitivities and specificities. As study design and population vary, the definition of disease status or severity could differ across studies. Consequently, sensitivity and specificity may be correlated with disease prevalence. To account for this dependence, a trivariate random effects model had been proposed. However, the proposed approach can only include cohort studies with information estimating study-specific disease prevalence. In addition, some diagnostic accuracy studies only select a subset of samples to be verified by the reference test. It is known that ignoring unverified subjects may lead to partial verification bias in the estimation of prevalence, sensitivities, and specificities in a single study. However, the impact of this bias on a meta-analysis has not been investigated. In this paper, we propose a novel hybrid Bayesian hierarchical model combining cohort and case-control studies and correcting partial verification bias at the same time. We investigate the performance of the proposed methods through a set of simulation studies. Two case studies on assessing the diagnostic accuracy of gadolinium-enhanced magnetic resonance imaging in detecting lymph node metastases and of adrenal fluorine-18 fluorodeoxyglucose positron emission tomography in characterizing adrenal masses are presented. © The Author(s) 2014.

  17. A Hybrid Bayesian Hierarchical Model Combining Cohort and Case-control Studies for Meta-analysis of Diagnostic Tests: Accounting for Partial Verification Bias

    PubMed Central

    Ma, Xiaoye; Chen, Yong; Cole, Stephen R.; Chu, Haitao

    2014-01-01

    To account for between-study heterogeneity in meta-analysis of diagnostic accuracy studies, bivariate random effects models have been recommended to jointly model the sensitivities and specificities. As study design and population vary, the definition of disease status or severity could differ across studies. Consequently, sensitivity and specificity may be correlated with disease prevalence. To account for this dependence, a trivariate random effects model had been proposed. However, the proposed approach can only include cohort studies with information estimating study-specific disease prevalence. In addition, some diagnostic accuracy studies only select a subset of samples to be verified by the reference test. It is known that ignoring unverified subjects may lead to partial verification bias in the estimation of prevalence, sensitivities and specificities in a single study. However, the impact of this bias on a meta-analysis has not been investigated. In this paper, we propose a novel hybrid Bayesian hierarchical model combining cohort and case-control studies and correcting partial verification bias at the same time. We investigate the performance of the proposed methods through a set of simulation studies. Two case studies on assessing the diagnostic accuracy of gadolinium-enhanced magnetic resonance imaging in detecting lymph node metastases and of adrenal fluorine-18 fluorodeoxyglucose positron emission tomography in characterizing adrenal masses are presented. PMID:24862512

  18. Randomly and Non-Randomly Missing Renal Function Data in the Strong Heart Study: A Comparison of Imputation Methods

    PubMed Central

    Shara, Nawar; Yassin, Sayf A.; Valaitis, Eduardas; Wang, Hong; Howard, Barbara V.; Wang, Wenyu; Lee, Elisa T.; Umans, Jason G.

    2015-01-01

    Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS). Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989–1991), 2 (1993–1995), and 3 (1998–1999) was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results. PMID:26414328

  19. Randomly and Non-Randomly Missing Renal Function Data in the Strong Heart Study: A Comparison of Imputation Methods.

    PubMed

    Shara, Nawar; Yassin, Sayf A; Valaitis, Eduardas; Wang, Hong; Howard, Barbara V; Wang, Wenyu; Lee, Elisa T; Umans, Jason G

    2015-01-01

    Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS). Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989-1991), 2 (1993-1995), and 3 (1998-1999) was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results.

  20. Inherited Disorders as a Risk Factor and Predictor of Neurodevelopmental Outcome in Pediatric Cancer

    ERIC Educational Resources Information Center

    Ullrich, Nicole J.

    2008-01-01

    Each year in the United States, an average of one to two children per 10,000 develop cancer. The etiology of most childhood cancer remains largely unknown but is likely attributable to random or induced genetic aberrations in somatic tissue. However, a subset of children develops cancer in the setting of an underlying inheritable condition…

  1. Getting Lucky: How Guessing Threatens the Validity of Performance Classifications

    ERIC Educational Resources Information Center

    Foley, Brett P.

    2016-01-01

    There is always a chance that examinees will answer multiple choice (MC) items correctly by guessing. Design choices in some modern exams have created situations where guessing at random through the full exam--rather than only for a subset of items where the examinee does not know the answer--can be an effective strategy to pass the exam. This…

  2. Segregation by onset asynchrony.

    PubMed

    Hancock, P J B; Walton, L; Mitchell, G; Plenderleith, Y; Phillips, W A

    2008-08-05

    We describe a simple psychophysical paradigm for studying figure-ground segregation by onset asynchrony. Two pseudorandom arrays of Gabor patches are displayed, to left and right of fixation. Within one array, a subset of elements form a figure, such as a randomly curving path, that can only be reliably detected when their onset is not synchronized with that of the background elements. Several findings are reported. First, for most participants, segregation required an onset asynchrony of 20-40 ms. Second, detection was no better when the figure was presented first, and thus by itself, than when the background elements were presented first, even though in the latter case the figure could not be detected in either of the two successive displays alone. Third, asynchrony segregated subsets of randomly oriented elements equally well. Fourth, asynchronous onsets aligned with the path could be discriminated from those lying on the path but not aligned with it. Fifth, both transient and sustained neural activity contribute to detection. We argue that these findings are compatible with neural signaling by synchronized rate codes. Finally, schizophrenic disorganization is associated with reduced sensitivity. Thus, in addition to bearing upon basic theoretical issues, this paradigm may have clinical utility.

  3. Experimental investigation of clogging dynamics in homogeneous porous medium

    NASA Astrophysics Data System (ADS)

    Shen, Jikang; Ni, Rui

    2017-03-01

    A 3-D refractive-index matching Lagrangian particle tracking (3D-RIM-LPT) system was developed to study the filtration and the clogging process inside a homogeneous porous medium. A small subset of particles flowing through the porous medium was dyed and tracked. As this subset was randomly chosen, its dynamics is representative of all the rest. The statistics of particle locations, number, and velocity were obtained as functions of different volumetric concentrations. It is found that in our system the clogging time decays with the particle concentration following a power law relationship. As the concentration increases, there is a transition from depth filtration to cake filtration. At high concentration, more clogged pores lead to frequent flow redirections and more transverse migrations of particles. In addition, the velocity distribution in the transverse direction is symmetrical around zero, and it is slightly more intermittent than the random Gaussian curve due to particle-particle and particle-grain interactions. In contrast, as clogging develops, the longitudinal velocity of particles along the mean flow direction peaks near zero because of many trapped particles. But at the same time, the remaining open pores will experience larger pressure and, as a result, particles through those pores tend to have larger longitudinal velocities.

  4. Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers.

    PubMed

    Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin

    2013-01-01

    DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.

  5. Multimodal biometric approach for cancelable face template generation

    NASA Astrophysics Data System (ADS)

    Paul, Padma Polash; Gavrilova, Marina

    2012-06-01

    Due to the rapid growth of biometric technology, template protection becomes crucial to secure integrity of the biometric security system and prevent unauthorized access. Cancelable biometrics is emerging as one of the best solutions to secure the biometric identification and verification system. We present a novel technique for robust cancelable template generation algorithm that takes advantage of the multimodal biometric using feature level fusion. Feature level fusion of different facial features is applied to generate the cancelable template. A proposed algorithm based on the multi-fold random projection and fuzzy communication scheme is used for this purpose. In cancelable template generation, one of the main difficulties is keeping interclass variance of the feature. We have found that interclass variations of the features that are lost during multi fold random projection can be recovered using fusion of different feature subsets and projecting in a new feature domain. Applying the multimodal technique in feature level, we enhance the interclass variability hence improving the performance of the system. We have tested the system for classifier fusion for different feature subset and different cancelable template fusion. Experiments have shown that cancelable template improves the performance of the biometric system compared with the original template.

  6. Increased appropriateness of customized alert acknowledgement reasons for overridden medication alerts in a computerized provider order entry system.

    PubMed

    Dekarske, Brian M; Zimmerman, Christopher R; Chang, Robert; Grant, Paul J; Chaffee, Bruce W

    2015-12-01

    Computerized provider order entry systems commonly contain alerting mechanisms for patient allergies, incorrect doses, or drug-drug interactions when ordering medications. Providers have the option to override (bypass) these alerts and continue with the order unchanged. This study examines the effect of customizing medication alert override options on the appropriateness of override selection related to patient allergies, drug dosing, and drug-drug interactions when ordering medications in an electronic medical record. In this prospective, randomized crossover study, providers were randomized into cohorts that required a reason for overriding a medication alert from a customized or non-customized list of override reasons and/or by free-text entry. The primary outcome was to compare override responses that appropriately correlate with the alert type between the customized and non-customized configurations. The appropriateness of a subset of free-text responses that represented an affirmative and active acknowledgement of the alert without further explanation was classified as "indeterminate." Results were analyzed in three different ways by classifying indeterminate answers as either appropriate, inappropriate, or excluded entirely. Secondary outcomes included the appropriateness of override reasons when comparing cohorts and individual providers, reason selection based on order within the override list, and the determination of the frequency of free-text use, nonsensical responses, and multiple selection responses. Twenty-two clinicians were randomized into 2 cohorts and a total of 1829 alerts with a required response were generated during the study period. The customized configuration had a higher rate of appropriateness when compared to the non-customized configuration regardless of how indeterminate responses were classified (p<0.001). When comparing cohorts, appropriateness was significantly higher in the customized configuration regardless of the classification of indeterminate responses (p<0.001) with one exception: when indeterminate responses were considered inappropriate for the cohort of providers that were first exposed to the non-customized list (p=0.103). Free-text use was higher in the customized configuration overall (p<0.001), and there was no difference in nonsensical response between configurations (p=0.39). There is a benefit realized by using a customized list for medication override reasons. Poor application design or configuration can negatively affect provider behavior when responding to important medication alerts. Copyright © 2015. Published by Elsevier Ireland Ltd.

  7. AOIPS data base management systems support for GARP data sets

    NASA Technical Reports Server (NTRS)

    Gary, J. P.

    1977-01-01

    A data base management system is identified, developed to provide flexible access to data sets produced by GARP during its data systems tests. The content and coverage of the data base are defined and a computer-aided, interactive information storage and retrieval system, implemented to facilitate access to user specified data subsets, is described. The computer programs developed to provide the capability were implemented on the highly interactive, minicomputer-based AOIPS and are referred to as the data retrieval system (DRS). Implemented as a user interactive but menu guided system, the DRS permits users to inventory the data tape library and create duplicate or subset data sets based on a user selected window defined by time and latitude/longitude boundaries. The DRS permits users to select, display, or produce formatted hard copy of individual data items contained within the data records.

  8. Selective norepinephrine reuptake inhibition as a human model of orthostatic intolerance

    NASA Technical Reports Server (NTRS)

    Schroeder, Christoph; Tank, Jens; Boschmann, Michael; Diedrich, Andre; Sharma, Arya M.; Biaggioni, Italo; Luft, Friedrich C.; Jordan, Jens; Robertson, D. (Principal Investigator)

    2002-01-01

    BACKGROUND: Observations in patients with functional mutations of the norepinephrine transporter (NET) gene suggest that impaired norepinephrine uptake may contribute to idiopathic orthostatic intolerance. METHODS AND RESULTS: We studied the effect of the selective NET blocker reboxetine and placebo in a randomized, double-blind, crossover fashion on cardiovascular responses to cold pressor testing, handgrip testing, and a graded head-up tilt test (HUT) in 18 healthy subjects. In a subset, we determined isoproterenol and phenylephrine sensitivities. Subjects ingested 8 mg reboxetine or placebo 12 hours and 1 hour before testing. In the supine position, heart rate was 65+/-2 bpm with placebo and 71+/-3 bpm with reboxetine. At 75 degrees HUT, heart rate was 84+/-3 and 119+/-4 bpm with placebo and with reboxetine (P<0.0001). Mean arterial pressure was 85+/-2 with placebo and 91+/-2 mm Hg with reboxetine while supine (P<0.01) and 88+/-2 mm Hg and 90+/-3 mm Hg at 75 degrees HUT. Blood pressure responses to cold pressor and handgrip testing were attenuated with reboxetine. Reboxetine increased the sensitivity to the chronotropic effect of isoproterenol and the pressor effect of phenylephrine. Vasovagal reactions occurred in 9 subjects on placebo and in 1 subject on reboxetine. CONCLUSIONS: Selective NET blockade creates a phenotype that resembles idiopathic orthostatic intolerance. This observation supports the hypothesis that disordered norepinephrine uptake mechanisms can contribute to human cardiovascular disease. Our study also suggests that NET inhibition might be useful in preventing vasovagal reactions.

  9. Identifying Patients with Atrioventricular Septal Defect in Down Syndrome Populations by Using Self-Normalizing Neural Networks and Feature Selection.

    PubMed

    Pan, Xiaoyong; Hu, Xiaohua; Zhang, Yu Hang; Feng, Kaiyan; Wang, Shao Peng; Chen, Lei; Huang, Tao; Cai, Yu Dong

    2018-04-12

    Atrioventricular septal defect (AVSD) is a clinically significant subtype of congenital heart disease (CHD) that severely influences the health of babies during birth and is associated with Down syndrome (DS). Thus, exploring the differences in functional genes in DS samples with and without AVSD is a critical way to investigate the complex association between AVSD and DS. In this study, we present a computational method to distinguish DS patients with AVSD from those without AVSD using the newly proposed self-normalizing neural network (SNN). First, each patient was encoded by using the copy number of probes on chromosome 21. The encoded features were ranked by the reliable Monte Carlo feature selection (MCFS) method to obtain a ranked feature list. Based on this feature list, we used a two-stage incremental feature selection to construct two series of feature subsets and applied SNNs to build classifiers to identify optimal features. Results show that 2737 optimal features were obtained, and the corresponding optimal SNN classifier constructed on optimal features yielded a Matthew's correlation coefficient (MCC) value of 0.748. For comparison, random forest was also used to build classifiers and uncover optimal features. This method received an optimal MCC value of 0.582 when top 132 features were utilized. Finally, we analyzed some key features derived from the optimal features in SNNs found in literature support to further reveal their essential roles.

  10. Generative Topographic Mapping of Conformational Space.

    PubMed

    Horvath, Dragos; Baskin, Igor; Marcou, Gilles; Varnek, Alexandre

    2017-10-01

    Herein, Generative Topographic Mapping (GTM) was challenged to produce planar projections of the high-dimensional conformational space of complex molecules (the 1LE1 peptide). GTM is a probability-based mapping strategy, and its capacity to support property prediction models serves to objectively assess map quality (in terms of regression statistics). The properties to predict were total, non-bonded and contact energies, surface area and fingerprint darkness. Map building and selection was controlled by a previously introduced evolutionary strategy allowed to choose the best-suited conformational descriptors, options including classical terms and novel atom-centric autocorrellograms. The latter condensate interatomic distance patterns into descriptors of rather low dimensionality, yet precise enough to differentiate between close favorable contacts and atom clashes. A subset of 20 K conformers of the 1LE1 peptide, randomly selected from a pool of 2 M geometries (generated by the S4MPLE tool) was employed for map building and cross-validation of property regression models. The GTM build-up challenge reached robust three-fold cross-validated determination coefficients of Q 2 =0.7…0.8, for all modeled properties. Mapping of the full 2 M conformer set produced intuitive and information-rich property landscapes. Functional and folding subspaces appear as well-separated zones, even though RMSD with respect to the PDB structure was never used as a selection criterion of the maps. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Genomic selection for slaughter age in pigs using the Cox frailty model.

    PubMed

    Santos, V S; Martins Filho, S; Resende, M D V; Azevedo, C F; Lopes, P S; Guimarães, S E F; Glória, L S; Silva, F F

    2015-10-19

    The aim of this study was to compare genomic selection methodologies using a linear mixed model and the Cox survival model. We used data from an F2 population of pigs, in which the response variable was the time in days from birth to the culling of the animal and the covariates were 238 markers [237 single nucleotide polymorphism (SNP) plus the halothane gene]. The data were corrected for fixed effects, and the accuracy of the method was determined based on the correlation of the ranks of predicted genomic breeding values (GBVs) in both models with the corrected phenotypic values. The analysis was repeated with a subset of SNP markers with largest absolute effects. The results were in agreement with the GBV prediction and the estimation of marker effects for both models for uncensored data and for normality. However, when considering censored data, the Cox model with a normal random effect (S1) was more appropriate. Since there was no agreement between the linear mixed model and the imputed data (L2) for the prediction of genomic values and the estimation of marker effects, the model S1 was considered superior as it took into account the latent variable and the censored data. Marker selection increased correlations between the ranks of predicted GBVs by the linear and Cox frailty models and the corrected phenotypic values, and 120 markers were required to increase the predictive ability for the characteristic analyzed.

  12. Clinical-scale selection and viral transduction of human naïve and central memory CD8+ T cells for adoptive cell therapy of cancer patients.

    PubMed

    Casati, Anna; Varghaei-Nahvi, Azam; Feldman, Steven Alexander; Assenmacher, Mario; Rosenberg, Steven Aaron; Dudley, Mark Edward; Scheffold, Alexander

    2013-10-01

    The adoptive transfer of lymphocytes genetically engineered to express tumor-specific antigen receptors is a potent strategy to treat cancer patients. T lymphocyte subsets, such as naïve or central memory T cells, selected in vitro prior to genetic engineering have been extensively investigated in preclinical mouse models, where they demonstrated improved therapeutic efficacy. However, so far, this is challenging to realize in the clinical setting, since good manufacturing practices (GMP) procedures for complex cell sorting and genetic manipulation are limited. To be able to directly compare the immunological attributes and therapeutic efficacy of naïve (T(N)) and central memory (T(CM)) CD8(+) T cells, we investigated clinical-scale procedures for their parallel selection and in vitro manipulation. We also evaluated currently available GMP-grade reagents for stimulation of T cell subsets, including a new type of anti-CD3/anti-CD28 nanomatrix. An optimized protocol was established for the isolation of both CD8(+) T(N) cells (CD4(-)CD62L(+)CD45RA(+)) and CD8(+) T(CM) (CD4(-)CD62L(+)CD45RA(-)) from a single patient. The highly enriched T cell subsets can be efficiently transduced and expanded to large cell numbers, sufficient for clinical applications and equivalent to or better than current cell and gene therapy approaches with unselected lymphocyte populations. The GMP protocols for selection of T(N) and T(CM) we reported here will be the basis for clinical trials analyzing safety, in vivo persistence and clinical efficacy in cancer patients and will help to generate a more reliable and efficacious cellular product.

  13. Partition dataset according to amino acid type improves the prediction of deleterious non-synonymous SNPs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Jing; Li, Yuan-Yuan; Shanghai Center for Bioinformation Technology, Shanghai 200235

    2012-03-02

    Highlights: Black-Right-Pointing-Pointer Proper dataset partition can improve the prediction of deleterious nsSNPs. Black-Right-Pointing-Pointer Partition according to original residue type at nsSNP is a good criterion. Black-Right-Pointing-Pointer Similar strategy is supposed promising in other machine learning problems. -- Abstract: Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNPmore » site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.« less

  14. The Isolation and Enrichment of Large Numbers of Highly Purified Mouse Spleen Dendritic Cell Populations and Their In Vitro Equivalents.

    PubMed

    Vremec, David

    2016-01-01

    Dendritic cells (DCs) form a complex network of cells that initiate and orchestrate immune responses against a vast array of pathogenic challenges. Developmentally and functionally distinct DC subtypes differentially regulate T-cell function. Importantly it is the ability of DC to capture and process antigen, whether from pathogens, vaccines, or self-components, and present it to naive T cells that is the key to their ability to initiate an immune response. Our typical isolation procedure for DC from murine spleen was designed to efficiently extract all DC subtypes, without bias and without alteration to their in vivo phenotype, and involves a short collagenase digestion of the tissue, followed by selection for cells of light density and finally negative selection for DC. The isolation procedure can accommodate DC numbers that have been artificially increased via administration of fms-like tyrosine kinase 3 ligand (Flt3L), either directly through a series of subcutaneous injections or by seeding with an Flt3L secreting murine melanoma. Flt3L may also be added to bone marrow cultures to produce large numbers of in vitro equivalents of the spleen DC subsets. Total DC, or their subsets, may be further purified using immunofluorescent labeling and flow cytometric cell sorting. Cell sorting may be completely bypassed by separating DC subsets using a combination of fluorescent antibody labeling and anti-fluorochrome magnetic beads. Our procedure enables efficient separation of the distinct DC subsets, even in cases where mouse numbers or flow cytometric cell sorting time is limiting.

  15. G-STRATEGY: Optimal Selection of Individuals for Sequencing in Genetic Association Studies

    PubMed Central

    Wang, Miaoyan; Jakobsdottir, Johanna; Smith, Albert V.; McPeek, Mary Sara

    2017-01-01

    In a large-scale genetic association study, the number of phenotyped individuals available for sequencing may, in some cases, be greater than the study’s sequencing budget will allow. In that case, it can be important to prioritize individuals for sequencing in a way that optimizes power for association with the trait. Suppose a cohort of phenotyped individuals is available, with some subset of them possibly already sequenced, and one wants to choose an additional fixed-size subset of individuals to sequence in such a way that the power to detect association is maximized. When the phenotyped sample includes related individuals, power for association can be gained by including partial information, such as phenotype data of ungenotyped relatives, in the analysis, and this should be taken into account when assessing whom to sequence. We propose G-STRATEGY, which uses simulated annealing to choose a subset of individuals for sequencing that maximizes the expected power for association. In simulations, G-STRATEGY performs extremely well for a range of complex disease models and outperforms other strategies with, in many cases, relative power increases of 20–40% over the next best strategy, while maintaining correct type 1 error. G-STRATEGY is computationally feasible even for large datasets and complex pedigrees. We apply G-STRATEGY to data on HDL and LDL from the AGES-Reykjavik and REFINE-Reykjavik studies, in which G-STRATEGY is able to closely-approximate the power of sequencing the full sample by selecting for sequencing a only small subset of the individuals. PMID:27256766

  16. Multi atlas based segmentation: Should we prefer the best atlas group over the group of best atlases?

    PubMed

    Zaffino, Paolo; Ciardo, Delia; Raudaschl, Patrik; Fritscher, Karl; Ricotti, Rosalinda; Alterio, Daniela; Marvaso, Giulia; Fodor, Cristiana; Baroni, Guido; Amato, Francesco; Orecchia, Roberto; Jereczek-Fossa, Barbara Alicja; Sharp, Gregory C; Spadea, Maria Francesca

    2018-05-22

    Multi Atlas Based Segmentation (MABS) uses a database of atlas images, and an atlas selection process is used to choose an atlas subset for registration and voting. In the current state of the art, atlases are chosen according to a similarity criterion between the target subject and each atlas in the database. In this paper, we propose a new concept for atlas selection that relies on selecting the best performing group of atlases rather than the group of highest scoring individual atlases. Experiments were performed using CT images of 50 patients, with contours of brainstem and parotid glands. The dataset was randomly split in 2 groups: 20 volumes were used as an atlas database and 30 served as target subjects for testing. Classic oracle group selection, where atlases are chosen by the highest Dice Similarity Coefficient (DSC) with the target, was performed. This was compared to oracle Group selection, where all the combinations of atlas subgroups were considered and scored by computing DSC with the target subject. Subsequently, Convolutional Neural Networks (CNNs) were designed to predict the best group of atlases. The results were compared also with the selection strategy based on Normalized Mutual Information (NMI). Oracle group was proved to be significantly better that classic oracle selection (p<10-5). Atlas group selection led to a median±interquartile DSC of 0.740±0.084, 0.718±0.086 and 0.670±0.097 for brainstem and left/right parotid glands respectively, outperforming NMI selection 0.676±0.113, 0.632±0.104 and 0.606±0.118 (p<0.001) as well as classic oracle selection. The implemented methodology is a proof of principle that selecting the atlases by considering the performance of the entire group of atlases instead of each single atlas leads to higher segmentation accuracy, being even better then current oracle strategy. This finding opens a new discussion about the most appropriate atlas selection criterion for MABS. © 2018 Institute of Physics and Engineering in Medicine.

  17. Circulating B cells in type 1 diabetics exhibit fewer maturation-associated phenotypes.

    PubMed

    Hanley, Patrick; Sutter, Jennifer A; Goodman, Noah G; Du, Yangzhu; Sekiguchi, Debora R; Meng, Wenzhao; Rickels, Michael R; Naji, Ali; Luning Prak, Eline T

    2017-10-01

    Although autoantibodies have been used for decades as diagnostic and prognostic markers in type 1 diabetes (T1D), further analysis of developmental abnormalities in B cells could reveal tolerance checkpoint defects that could improve individualized therapy. To evaluate B cell developmental progression in T1D, immunophenotyping was used to classify circulating B cells into transitional, mature naïve, mature activated, and resting memory subsets. Then each subset was analyzed for the expression of additional maturation-associated markers. While the frequencies of B cell subsets did not differ significantly between patients and controls, some T1D subjects exhibited reduced proportions of B cells that expressed transmembrane activator and CAML interactor (TACI) and Fas receptor (FasR). Furthermore, some T1D subjects had B cell subsets with lower frequencies of class switching. These results suggest circulating B cells exhibit variable maturation phenotypes in T1D. These phenotypic variations may correlate with differences in B cell selection in individual T1D patients. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  18. Feature Selection and Pedestrian Detection Based on Sparse Representation.

    PubMed

    Yao, Shihong; Wang, Tao; Shen, Weiming; Pan, Shaoming; Chong, Yanwen; Ding, Fei

    2015-01-01

    Pedestrian detection have been currently devoted to the extraction of effective pedestrian features, which has become one of the obstacles in pedestrian detection application according to the variety of pedestrian features and their large dimension. Based on the theoretical analysis of six frequently-used features, SIFT, SURF, Haar, HOG, LBP and LSS, and their comparison with experimental results, this paper screens out the sparse feature subsets via sparse representation to investigate whether the sparse subsets have the same description abilities and the most stable features. When any two of the six features are fused, the fusion feature is sparsely represented to obtain its important components. Sparse subsets of the fusion features can be rapidly generated by avoiding calculation of the corresponding index of dimension numbers of these feature descriptors; thus, the calculation speed of the feature dimension reduction is improved and the pedestrian detection time is reduced. Experimental results show that sparse feature subsets are capable of keeping the important components of these six feature descriptors. The sparse features of HOG and LSS possess the same description ability and consume less time compared with their full features. The ratios of the sparse feature subsets of HOG and LSS to their full sets are the highest among the six, and thus these two features can be used to best describe the characteristics of the pedestrian and the sparse feature subsets of the combination of HOG-LSS show better distinguishing ability and parsimony.

  19. Evaluating physical habitat and water chemistry data from statewide stream monitoring programs to establish least-impacted conditions in Washington State

    USGS Publications Warehouse

    Wilmoth, Siri K.; Irvine, Kathryn M.; Larson, Chad

    2015-01-01

    Various GIS-generated land-use predictor variables, physical habitat metrics, and water chemistry variables from 75 reference streams and 351 randomly sampled sites throughout Washington State were evaluated for effectiveness at discriminating reference from random sites within level III ecoregions. A combination of multivariate clustering and ordination techniques were used. We describe average observed conditions for a subset of predictor variables as well as proposing statistical criteria for establishing reference conditions for stream habitat in Washington. Using these criteria, we determined whether any of the random sites met expectations for reference condition and whether any of the established reference sites failed to meet expectations for reference condition. Establishing these criteria will set a benchmark from which future data will be compared.

  20. VizieR Online Data Catalog: RR Lyraes in SDSS stripe 82 (Watkins+, 2009)

    NASA Astrophysics Data System (ADS)

    Watkins, L. L.; Evans, N. W.; Belokurov, V.; Smith, M. C.; Hewett, P. C.; Bramich, D. M.; Gilmore, G. F.; Irwin, M. J.; Vidrih, S.; Wyrzykowski, L.; Zucker, D. B.

    2015-10-01

    In this paper, we select first the variable objects in Stripe 82 and then the subset of RR Lyraes, using the Bramich et al. (2008MNRAS.386..887B, Cat. V/141) light-motion curve catalogue (LMCC) and HLC. We make a selection of the variable objects and an identification of RR Lyrae stars. (2 data files).

  1. Effect of a Surprising Downward Shift in Reinforcer Value on Stimulus Over-Selectivity in a Simultaneous Discrimination Procedure

    ERIC Educational Resources Information Center

    Reynolds, Gemma; Reed, Phil

    2013-01-01

    Stimulus over-selectivity refers to the phenomenon whereby behavior is controlled by a subset of elements in the environment at the expense of other equally salient aspects of the environment. The experiments explored whether this cue interference effect was reduced following a surprising downward shift in reinforcer value. Experiment 1 revealed…

  2. Development and Validation of a Predictive Model to Identify Individuals Likely to Have Undiagnosed Chronic Obstructive Pulmonary Disease Using an Administrative Claims Database.

    PubMed

    Moretz, Chad; Zhou, Yunping; Dhamane, Amol D; Burslem, Kate; Saverno, Kim; Jain, Gagan; Devercelli, Giovanna; Kaila, Shuchita; Ellis, Jeffrey J; Hernandez, Gemzel; Renda, Andrew

    2015-12-01

    Despite the importance of early detection, delayed diagnosis of chronic obstructive pulmonary disease (COPD) is relatively common. Approximately 12 million people in the United States have undiagnosed COPD. Diagnosis of COPD is essential for the timely implementation of interventions, such as smoking cessation programs, drug therapies, and pulmonary rehabilitation, which are aimed at improving outcomes and slowing disease progression. To develop and validate a predictive model to identify patients likely to have undiagnosed COPD using administrative claims data. A predictive model was developed and validated utilizing a retro-spective cohort of patients with and without a COPD diagnosis (cases and controls), aged 40-89, with a minimum of 24 months of continuous health plan enrollment (Medicare Advantage Prescription Drug [MAPD] and commercial plans), and identified between January 1, 2009, and December 31, 2012, using Humana's claims database. Stratified random sampling based on plan type (commercial or MAPD) and index year was performed to ensure that cases and controls had a similar distribution of these variables. Cases and controls were compared to identify demographic, clinical, and health care resource utilization (HCRU) characteristics associated with a COPD diagnosis. Stepwise logistic regression (SLR), neural networking, and decision trees were used to develop a series of models. The models were trained, validated, and tested on randomly partitioned subsets of the sample (Training, Validation, and Test data subsets). Measures used to evaluate and compare the models included area under the curve (AUC); index of the receiver operating characteristics (ROC) curve; sensitivity, specificity, positive predictive value (PPV); and negative predictive value (NPV). The optimal model was selected based on AUC index on the Test data subset. A total of 50,880 cases and 50,880 controls were included, with MAPD patients comprising 92% of the study population. Compared with controls, cases had a statistically significantly higher comorbidity burden and HCRU (including hospitalizations, emergency room visits, and medical procedures). The optimal predictive model was generated using SLR, which included 34 variables that were statistically significantly associated with a COPD diagnosis. After adjusting for covariates, anticholinergic bronchodilators (OR = 3.336) and tobacco cessation counseling (OR = 2.871) were found to have a large influence on the model. The final predictive model had an AUC of 0.754, sensitivity of 60%, specificity of 78%, PPV of 73%, and an NPV of 66%. This claims-based predictive model provides an acceptable level of accuracy in identifying patients likely to have undiagnosed COPD in a large national health plan. Identification of patients with undiagnosed COPD may enable timely management and lead to improved health outcomes and reduced COPD-related health care expenditures.

  3. Cellular evidence for selfish spermatogonial selection in aged human testes.

    PubMed

    Maher, G J; Goriely, A; Wilkie, A O M

    2014-05-01

    Owing to a recent trend for delayed paternity, the genomic integrity of spermatozoa of older men has become a focus of increased interest. Older fathers are at higher risk for their children to be born with several monogenic conditions collectively termed paternal age effect (PAE) disorders, which include achondroplasia, Apert syndrome and Costello syndrome. These disorders are caused by specific mutations originating almost exclusively from the male germline, in genes encoding components of the tyrosine kinase receptor/RAS/MAPK signalling pathway. These particular mutations, occurring randomly during mitotic divisions of spermatogonial stem cells (SSCs), are predicted to confer a selective/growth advantage on the mutant SSC. This selective advantage leads to a clonal expansion of the mutant cells over time, which generates mutant spermatozoa at levels significantly above the background mutation rate. This phenomenon, termed selfish spermatogonial selection, is likely to occur in all men. In rare cases, probably because of additional mutational events, selfish spermatogonial selection may lead to spermatocytic seminoma. The studies that initially predicted the clonal nature of selfish spermatogonial selection were based on DNA analysis, rather than the visualization of mutant clones in intact testes. In a recent study that aimed to identify these clones directly, we stained serial sections of fixed testes for expression of melanoma antigen family A4 (MAGEA4), a marker of spermatogonia. A subset of seminiferous tubules with an appearance and distribution compatible with the predicted mutant clones were identified. In these tubules, termed 'immunopositive tubules', there is an increased density of spermatogonia positive for markers related to selfish selection (FGFR3) and SSC self-renewal (phosphorylated AKT). Here we detail the properties of the immunopositive tubules and how they relate to the predicted mutant clones, as well as discussing the utility of identifying the potential cellular source of PAE mutations. © 2013 American Society of Andrology and European Academy of Andrology.

  4. Selective Non-contact Field Radiofrequency Extended Treatment Protocol: Evaluation of Safety and Efficacy.

    PubMed

    Moradi, Amir; Palm, Melanie

    2015-09-01

    Currently there are many non-invasive radiofrequency (RF) devices on the market that are utilized in the field of aesthetic medicine. At this time, there is only one FDA cleared device on the market that emits RF energy using a non-contact delivery system for circumferential reduction by means of adipocyte disruption. Innovation of treatment protocols is an integral part of aesthetic device development. However, when protocol modifications are made it is important to look at the safety as well as the potential for improved efficacy before initiating change. The purpose of this study was to evaluate the safety and efficacy of a newly designed extended treatment protocol using an operator independent selective non-contact RF device for the improvement in the contour and circumferential reduction of the abdomen and flanks (love handles). Twenty-five subjects enrolled in the IRB approved multi-center study to receive four weekly 45-minute RF treatments to the abdomen and love handles. Standardized digital photographs and circumference measurements were taken at baseline and at the 1- and 3-month follow-up visits. Biometric measurements including weight, hydration and body fat were obtained at baseline and each study visit. A subset of 4 subjects were randomly selected to undergo baseline serum lipid and liver-related blood tests with follow-up labs taken: 1 day post-treatment 1, 1 day post-treatment 4, and at the 1- and 3-month follow-up visits. Twenty-four subjects (22 female, 2 male), average age of 47.9 years (30-69 years), completed the study. The data of the twenty-four subjects revealed a statistically significant change in circumference P<.001 with an average decrease in circumference of 4.22cm at the 3-month follow-up visit. Lab values for the subset of 4 subjects remained relatively unchanged with only minor fluctuations noted in the serum lipid values in two of the subjects. Three independent evaluators viewed pre-treatment and 3-month post treatment photographs to determine which photo was the after photo. The evaluators were able to correctly identify the post treatment photos with an 88% accuracy rate. Treatments were well tolerated by all subjects. No study related adverse events were reported. This study found that an extended treatment protocol using a selective RF device is a safe and effective method for the reduction of circumference and improved contouring of the abdomen and love handles.

  5. Fourier dimension of random images

    NASA Astrophysics Data System (ADS)

    Ekström, Fredrik

    2016-10-01

    Given a compact set of real numbers, a random C^{m + α}-diffeomorphism is constructed such that the image of any measure concentrated on the set and satisfying a certain condition involving a real number s, almost surely has Fourier dimension greater than or equal to s / (m + α). This is used to show that every Borel subset of the real numbers of Hausdorff dimension s is C^{m + α}-equivalent to a set of Fourier dimension greater than or equal to s / (m + α ). In particular every Borel set is diffeomorphic to a Salem set, and the Fourier dimension is not invariant under Cm-diffeomorphisms for any m.

  6. Sparse sampling and reconstruction for electron and scanning probe microscope imaging

    DOEpatents

    Anderson, Hyrum; Helms, Jovana; Wheeler, Jason W.; Larson, Kurt W.; Rohrer, Brandon R.

    2015-07-28

    Systems and methods for conducting electron or scanning probe microscopy are provided herein. In a general embodiment, the systems and methods for conducting electron or scanning probe microscopy with an undersampled data set include: driving an electron beam or probe to scan across a sample and visit a subset of pixel locations of the sample that are randomly or pseudo-randomly designated; determining actual pixel locations on the sample that are visited by the electron beam or probe; and processing data collected by detectors from the visits of the electron beam or probe at the actual pixel locations and recovering a reconstructed image of the sample.

  7. An Efficient Voting Algorithm for Finding Additive Biclusters with Random Background

    PubMed Central

    Xiao, Jing; Wang, Lusheng; Liu, Xiaowen

    2008-01-01

    Abstract The biclustering problem has been extensively studied in many areas, including e-commerce, data mining, machine learning, pattern recognition, statistics, and, more recently, computational biology. Given an n × m matrix A (n ≥ m), the main goal of biclustering is to identify a subset of rows (called objects) and a subset of columns (called properties) such that some objective function that specifies the quality of the found bicluster (formed by the subsets of rows and of columns of A) is optimized. The problem has been proved or conjectured to be NP-hard for various objective functions. In this article, we study a probabilistic model for the implanted additive bicluster problem, where each element in the n × m background matrix is a random integer from [0, L − 1] for some integer L, and a k × k implanted additive bicluster is obtained from an error-free additive bicluster by randomly changing each element to a number in [0, L − 1] with probability θ. We propose an O (n2m) time algorithm based on voting to solve the problem. We show that when \\documentclass{aastex}\\usepackage{amsbsy}\\usepackage{amsfonts}\\usepackage{amssymb}\\usepackage{bm}\\usepackage{mathrsfs}\\usepackage{pifont}\\usepackage{stmaryrd}\\usepackage{textcomp}\\usepackage{portland, xspace}\\usepackage{amsmath, amsxtra}\\pagestyle{empty}\\DeclareMathSizes{10}{9}{7}{6}\\begin{document}$$k \\geq \\Omega (\\sqrt{n \\log n})$$\\end{document}, the voting algorithm can correctly find the implanted bicluster with probability at least \\documentclass{aastex}\\usepackage{amsbsy}\\usepackage{amsfonts}\\usepackage{amssymb}\\usepackage{bm}\\usepackage{mathrsfs}\\usepackage{pifont}\\usepackage{stmaryrd}\\usepackage{textcomp}\\usepackage{portland, xspace}\\usepackage{amsmath, amsxtra}\\pagestyle{empty}\\DeclareMathSizes{10}{9}{7}{6}\\begin{document}$$1 - {\\frac {9} {n^ {2}}}$$\\end{document}. We also implement our algorithm as a C++ program named VOTE. The implementation incorporates several ideas for estimating the size of an implanted bicluster, adjusting the threshold in voting, dealing with small biclusters, and dealing with overlapping implanted biclusters. Our experimental results on both simulated and real datasets show that VOTE can find biclusters with a high accuracy and speed. PMID:19040364

  8. 3-D inversion of airborne electromagnetic data parallelized and accelerated by local mesh and adaptive soundings

    NASA Astrophysics Data System (ADS)

    Yang, Dikun; Oldenburg, Douglas W.; Haber, Eldad

    2014-03-01

    Airborne electromagnetic (AEM) methods are highly efficient tools for assessing the Earth's conductivity structures in a large area at low cost. However, the configuration of AEM measurements, which typically have widely distributed transmitter-receiver pairs, makes the rigorous modelling and interpretation extremely time-consuming in 3-D. Excessive overcomputing can occur when working on a large mesh covering the entire survey area and inverting all soundings in the data set. We propose two improvements. The first is to use a locally optimized mesh for each AEM sounding for the forward modelling and calculation of sensitivity. This dedicated local mesh is small with fine cells near the sounding location and coarse cells far away in accordance with EM diffusion and the geometric decay of the signals. Once the forward problem is solved on the local meshes, the sensitivity for the inversion on the global mesh is available through quick interpolation. Using local meshes for AEM forward modelling avoids unnecessary computing on fine cells on a global mesh that are far away from the sounding location. Since local meshes are highly independent, the forward modelling can be efficiently parallelized over an array of processors. The second improvement is random and dynamic down-sampling of the soundings. Each inversion iteration only uses a random subset of the soundings, and the subset is reselected for every iteration. The number of soundings in the random subset, determined by an adaptive algorithm, is tied to the degree of model regularization. This minimizes the overcomputing caused by working with redundant soundings. Our methods are compared against conventional methods and tested with a synthetic example. We also invert a field data set that was previously considered to be too large to be practically inverted in 3-D. These examples show that our methodology can dramatically reduce the processing time of 3-D inversion to a practical level without losing resolution. Any existing modelling technique can be included into our framework of mesh decoupling and adaptive sampling to accelerate large-scale 3-D EM inversions.

  9. Medical students perceive better group learning processes when large classes are made to seem small.

    PubMed

    Hommes, Juliette; Arah, Onyebuchi A; de Grave, Willem; Schuwirth, Lambert W T; Scherpbier, Albert J J A; Bos, Gerard M J

    2014-01-01

    Medical schools struggle with large classes, which might interfere with the effectiveness of learning within small groups due to students being unfamiliar to fellow students. The aim of this study was to assess the effects of making a large class seem small on the students' collaborative learning processes. A randomised controlled intervention study was undertaken to make a large class seem small, without the need to reduce the number of students enrolling in the medical programme. The class was divided into subsets: two small subsets (n=50) as the intervention groups; a control group (n=102) was mixed with the remaining students (the non-randomised group n∼100) to create one large subset. The undergraduate curriculum of the Maastricht Medical School, applying the Problem-Based Learning principles. In this learning context, students learn mainly in tutorial groups, composed randomly from a large class every 6-10 weeks. The formal group learning activities were organised within the subsets. Students from the intervention groups met frequently within the formal groups, in contrast to the students from the large subset who hardly enrolled with the same students in formal activities. Three outcome measures assessed students' group learning processes over time: learning within formally organised small groups, learning with other students in the informal context and perceptions of the intervention. Formal group learning processes were perceived more positive in the intervention groups from the second study year on, with a mean increase of β=0.48. Informal group learning activities occurred almost exclusively within the subsets as defined by the intervention from the first week involved in the medical curriculum (E-I indexes>-0.69). Interviews tapped mainly positive effects and negligible negative side effects of the intervention. Better group learning processes can be achieved in large medical schools by making large classes seem small.

  10. Medical Students Perceive Better Group Learning Processes when Large Classes Are Made to Seem Small

    PubMed Central

    Hommes, Juliette; Arah, Onyebuchi A.; de Grave, Willem; Schuwirth, Lambert W. T.; Scherpbier, Albert J. J. A.; Bos, Gerard M. J.

    2014-01-01

    Objective Medical schools struggle with large classes, which might interfere with the effectiveness of learning within small groups due to students being unfamiliar to fellow students. The aim of this study was to assess the effects of making a large class seem small on the students' collaborative learning processes. Design A randomised controlled intervention study was undertaken to make a large class seem small, without the need to reduce the number of students enrolling in the medical programme. The class was divided into subsets: two small subsets (n = 50) as the intervention groups; a control group (n = 102) was mixed with the remaining students (the non-randomised group n∼100) to create one large subset. Setting The undergraduate curriculum of the Maastricht Medical School, applying the Problem-Based Learning principles. In this learning context, students learn mainly in tutorial groups, composed randomly from a large class every 6–10 weeks. Intervention The formal group learning activities were organised within the subsets. Students from the intervention groups met frequently within the formal groups, in contrast to the students from the large subset who hardly enrolled with the same students in formal activities. Main Outcome Measures Three outcome measures assessed students' group learning processes over time: learning within formally organised small groups, learning with other students in the informal context and perceptions of the intervention. Results Formal group learning processes were perceived more positive in the intervention groups from the second study year on, with a mean increase of β = 0.48. Informal group learning activities occurred almost exclusively within the subsets as defined by the intervention from the first week involved in the medical curriculum (E-I indexes>−0.69). Interviews tapped mainly positive effects and negligible negative side effects of the intervention. Conclusion Better group learning processes can be achieved in large medical schools by making large classes seem small. PMID:24736272

  11. Strategies to Improve Activity Recognition Based on Skeletal Tracking: Applying Restrictions Regarding Body Parts and Similarity Boundaries †

    PubMed Central

    Gutiérrez-López-Franca, Carlos; Hervás, Ramón; Johnson, Esperanza

    2018-01-01

    This paper aims to improve activity recognition systems based on skeletal tracking through the study of two different strategies (and its combination): (a) specialized body parts analysis and (b) stricter restrictions for the most easily detectable activities. The study was performed using the Extended Body-Angles Algorithm, which is able to analyze activities using only a single key sample. This system allows to select, for each considered activity, which are its relevant joints, which makes it possible to monitor the body of the user selecting only a subset of the same. But this feature of the system has both advantages and disadvantages. As a consequence, in the past we had some difficulties with the recognition of activities that only have a small subset of the joints of the body as relevant. The goal of this work, therefore, is to analyze the effect produced by the application of several strategies on the results of an activity recognition system based on skeletal tracking joint oriented devices. Strategies that we applied with the purpose of improve the recognition rates of the activities with a small subset of relevant joints. Through the results of this work, we aim to give the scientific community some first indications about which considered strategy is better. PMID:29789478

  12. A Genetic Algorithm Based Support Vector Machine Model for Blood-Brain Barrier Penetration Prediction

    PubMed Central

    Zhang, Daqing; Xiao, Jianfeng; Zhou, Nannan; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian

    2015-01-01

    Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. PMID:26504797

  13. Quantitative impact of thymic selection on Foxp3+ and Foxp3- subsets of self-peptide/MHC class II-specific CD4+ T cells.

    PubMed

    Moon, James J; Dash, Pradyot; Oguin, Thomas H; McClaren, Jennifer L; Chu, H Hamlet; Thomas, Paul G; Jenkins, Marc K

    2011-08-30

    It is currently thought that T cells with specificity for self-peptide/MHC (pMHC) ligands are deleted during thymic development, thereby preventing autoimmunity. In the case of CD4(+) T cells, what is unclear is the extent to which self-peptide/MHC class II (pMHCII)-specific T cells are deleted or become Foxp3(+) regulatory T cells. We addressed this issue by characterizing a natural polyclonal pMHCII-specific CD4(+) T-cell population in mice that either lacked or expressed the relevant antigen in a ubiquitous pattern. Mice expressing the antigen contained one-third the number of pMHCII-specific T cells as mice lacking the antigen, and the remaining cells exhibited low TCR avidity. In mice lacking the antigen, the pMHCII-specific T-cell population was dominated by phenotypically naive Foxp3(-) cells, but also contained a subset of Foxp3(+) regulatory cells. Both Foxp3(-) and Foxp3(+) pMHCII-specific T-cell numbers were reduced in mice expressing the antigen, but the Foxp3(+) subset was more resistant to changes in number and TCR repertoire. Therefore, thymic selection of self-pMHCII-specific CD4(+) T cells results in incomplete deletion within the normal polyclonal repertoire, especially among regulatory T cells.

  14. A tool for selecting SNPs for association studies based on observed linkage disequilibrium patterns.

    PubMed

    De La Vega, Francisco M; Isaac, Hadar I; Scafe, Charles R

    2006-01-01

    The design of genetic association studies using single-nucleotide polymorphisms (SNPs) requires the selection of subsets of the variants providing high statistical power at a reasonable cost. SNPs must be selected to maximize the probability that a causative mutation is in linkage disequilibrium (LD) with at least one marker genotyped in the study. The HapMap project performed a genome-wide survey of genetic variation with about a million SNPs typed in four populations, providing a rich resource to inform the design of association studies. A number of strategies have been proposed for the selection of SNPs based on observed LD, including construction of metric LD maps and the selection of haplotype tagging SNPs. Power calculations are important at the study design stage to ensure successful results. Integrating these methods and annotations can be challenging: the algorithms required to implement these methods are complex to deploy, and all the necessary data and annotations are deposited in disparate databases. Here, we present the SNPbrowser Software, a freely available tool to assist in the LD-based selection of markers for association studies. This stand-alone application provides fast query capabilities and swift visualization of SNPs, gene annotations, power, haplotype blocks, and LD map coordinates. Wizards implement several common SNP selection workflows including the selection of optimal subsets of SNPs (e.g. tagging SNPs). Selected SNPs are screened for their conversion potential to either TaqMan SNP Genotyping Assays or the SNPlex Genotyping System, two commercially available genotyping platforms, expediting the set-up of genetic studies with an increased probability of success.

  15. Channel and feature selection in multifunction myoelectric control.

    PubMed

    Khushaba, Rami N; Al-Jumaily, Adel

    2007-01-01

    Real time controlling devices based on myoelectric singles (MES) is one of the challenging research problems. This paper presents a new approach to reduce the computational cost of real time systems driven by Myoelectric signals (MES) (a.k.a Electromyography--EMG). The new approach evaluates the significance of feature/channel selection on MES pattern recognition. Particle Swarm Optimization (PSO), an evolutionary computational technique, is employed to search the feature/channel space for important subsets. These important subsets will be evaluated using a multilayer perceptron trained with back propagation neural network (BPNN). Practical results acquired from tests done on six subjects' datasets of MES signals measured in a noninvasive manner using surface electrodes are presented. It is proved that minimum error rates can be achieved by considering the correct combination of features/channels, thus providing a feasible system for practical implementation purpose for rehabilitation of patients.

  16. Neural networks for vertical microcode compaction

    NASA Astrophysics Data System (ADS)

    Chu, Pong P.

    1992-09-01

    Neural networks provide an alternative way to solve complex optimization problems. Instead of performing a program of instructions sequentially as in a traditional computer, neural network model explores many competing hypotheses simultaneously using its massively parallel net. The paper shows how to use the neural network approach to perform vertical micro-code compaction for a micro-programmed control unit. The compaction procedure includes two basic steps. The first step determines the compatibility classes and the second step selects a minimal subset to cover the control signals. Since the selection process is an NP- complete problem, to find an optimal solution is impractical. In this study, we employ a customized neural network to obtain the minimal subset. We first formalize this problem, and then define an `energy function' and map it to a two-layer fully connected neural network. The modified network has two types of neurons and can always obtain a valid solution.

  17. Surveillance system and method having parameter estimation and operating mode partitioning

    NASA Technical Reports Server (NTRS)

    Bickford, Randall L. (Inventor)

    2003-01-01

    A system and method for monitoring an apparatus or process asset including partitioning an unpartitioned training data set into a plurality of training data subsets each having an operating mode associated thereto; creating a process model comprised of a plurality of process submodels each trained as a function of at least one of the training data subsets; acquiring a current set of observed signal data values from the asset; determining an operating mode of the asset for the current set of observed signal data values; selecting a process submodel from the process model as a function of the determined operating mode of the asset; calculating a current set of estimated signal data values from the selected process submodel for the determined operating mode; and outputting the calculated current set of estimated signal data values for providing asset surveillance and/or control.

  18. Genetic Algorithms Applied to Multi-Objective Aerodynamic Shape Optimization

    NASA Technical Reports Server (NTRS)

    Holst, Terry L.

    2004-01-01

    A genetic algorithm approach suitable for solving multi-objective optimization problems is described and evaluated using a series of aerodynamic shape optimization problems. Several new features including two variations of a binning selection algorithm and a gene-space transformation procedure are included. The genetic algorithm is suitable for finding pareto optimal solutions in search spaces that are defined by any number of genes and that contain any number of local extrema. A new masking array capability is included allowing any gene or gene subset to be eliminated as decision variables from the design space. This allows determination of the effect of a single gene or gene subset on the pareto optimal solution. Results indicate that the genetic algorithm optimization approach is flexible in application and reliable. The binning selection algorithms generally provide pareto front quality enhancements and moderate convergence efficiency improvements for most of the problems solved.

  19. Genetic Algorithms Applied to Multi-Objective Aerodynamic Shape Optimization

    NASA Technical Reports Server (NTRS)

    Holst, Terry L.

    2005-01-01

    A genetic algorithm approach suitable for solving multi-objective problems is described and evaluated using a series of aerodynamic shape optimization problems. Several new features including two variations of a binning selection algorithm and a gene-space transformation procedure are included. The genetic algorithm is suitable for finding Pareto optimal solutions in search spaces that are defined by any number of genes and that contain any number of local extrema. A new masking array capability is included allowing any gene or gene subset to be eliminated as decision variables from the design space. This allows determination of the effect of a single gene or gene subset on the Pareto optimal solution. Results indicate that the genetic algorithm optimization approach is flexible in application and reliable. The binning selection algorithms generally provide Pareto front quality enhancements and moderate convergence efficiency improvements for most of the problems solved.

  20. Scalable amplification of strand subsets from chip-synthesized oligonucleotide libraries

    PubMed Central

    Schmidt, Thorsten L.; Beliveau, Brian J.; Uca, Yavuz O.; Theilmann, Mark; Da Cruz, Felipe; Wu, Chao-Ting; Shih, William M.

    2015-01-01

    Synthetic oligonucleotides are the main cost factor for studies in DNA nanotechnology, genetics and synthetic biology, which all require thousands of these at high quality. Inexpensive chip-synthesized oligonucleotide libraries can contain hundreds of thousands of distinct sequences, however only at sub-femtomole quantities per strand. Here we present a selective oligonucleotide amplification method, based on three rounds of rolling-circle amplification, that produces nanomole amounts of single-stranded oligonucleotides per millilitre reaction. In a multistep one-pot procedure, subsets of hundreds or thousands of single-stranded DNAs with different lengths can selectively be amplified and purified together. These oligonucleotides are used to fold several DNA nanostructures and as primary fluorescence in situ hybridization probes. The amplification cost is lower than other reported methods (typically around US$ 20 per nanomole total oligonucleotides produced) and is dominated by the use of commercial enzymes. PMID:26567534

  1. Optimizing an Actuator Array for the Control of Multi-Frequency Noise in Aircraft Interiors

    NASA Technical Reports Server (NTRS)

    Palumbo, D. L.; Padula, S. L.

    1997-01-01

    Techniques developed for selecting an optimized actuator array for interior noise reduction at a single frequency are extended to the multi-frequency case. Transfer functions for 64 actuators were obtained at 5 frequencies from ground testing the rear section of a fully trimmed DC-9 fuselage. A single loudspeaker facing the left side of the aircraft was the primary source. A combinatorial search procedure (tabu search) was employed to find optimum actuator subsets of from 2 to 16 actuators. Noise reduction predictions derived from the transfer functions were used as a basis for evaluating actuator subsets during optimization. Results indicate that it is necessary to constrain actuator forces during optimization. Unconstrained optimizations selected actuators which require unrealistically large forces. Two methods of constraint are evaluated. It is shown that a fast, but approximate, method yields results equivalent to an accurate, but computationally expensive, method.

  2. The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases.

    PubMed

    Heidema, A Geert; Boer, Jolanda M A; Nagelkerke, Nico; Mariman, Edwin C M; van der A, Daphne L; Feskens, Edith J M

    2006-04-21

    Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases.

  3. From assessment to improvement of elderly care in general practice using decision support to increase adherence to ACOVE quality indicators: study protocol for randomized control trial

    PubMed Central

    2014-01-01

    Background Previous efforts such as Assessing Care of Vulnerable Elders (ACOVE) provide quality indicators for assessing the care of elderly patients, but thus far little has been done to leverage this knowledge to improve care for these patients. We describe a clinical decision support system to improve general practitioner (GP) adherence to ACOVE quality indicators and a protocol for investigating impact on GPs’ adherence to the rules. Design We propose two randomized controlled trials among a group of Dutch GP teams on adherence to ACOVE quality indicators. In both trials a clinical decision support system provides un-intrusive feedback appearing as a color-coded, dynamically updated, list of items needing attention. The first trial pertains to real-time automatically verifiable rules. The second trial concerns non-automatically verifiable rules (adherence cannot be established by the clinical decision support system itself, but the GPs report whether they will adhere to the rules). In both trials we will randomize teams of GPs caring for the same patients into two groups, A and B. For the automatically verifiable rules, group A GPs receive support only for a specific inter-related subset of rules, and group B GPs receive support only for the remainder of the rules. For non-automatically verifiable rules, group A GPs receive feedback framed as actions with positive consequences, and group B GPs receive feedback framed as inaction with negative consequences. GPs indicate whether they adhere to non-automatically verifiable rules. In both trials, the main outcome measure is mean adherence, automatically derived or self-reported, to the rules. Discussion We relied on active end-user involvement in selecting the rules to support, and on a model for providing feedback displayed as color-coded real-time messages concerning the patient visiting the GP at that time, without interrupting the GP’s workflow with pop-ups. While these aspects are believed to increase clinical decision support system acceptance and its impact on adherence to the selected clinical rules, systems with these properties have not yet been evaluated. Trial registration Controlled Trials NTR3566 PMID:24642339

  4. Incidence of tuberculosis among school-going adolescents in South India.

    PubMed

    Uppada, Dharma Rao; Selvam, Sumithra; Jesuraj, Nelson; Lau, Esther L; Doherty, T Mark; Grewal, Harleen M S; Vaz, Mario; Lindtjørn, Bernt

    2016-07-26

    Tuberculosis (TB) incidence data in vaccine target populations, particularly adolescents, are important for designing and powering vaccine clinical trials. Little is known about the incidence of tuberculosis among adolescents in India. The objective of current study is to estimate the incidence of pulmonary tuberculosis (PTB) disease among adolescents attending school in South India using two different surveillance methods (active and passive) and to compare the incidence between the two groups. The study was a prospective cohort study with a 2-year follow-up period. The study was conducted in Palamaner, Chittoor District of Andhra Pradesh, South India from February 2007 to July 2010. A random sampling procedure was used to select a subset of schools to enable approximately 8000 subjects to be available for randomization in the study. A stratified randomization procedure was used to assign the selected schools to either active or passive surveillance. Participants who met the criteria for being exposed to TB were referred to the diagnostic ward for pulmonary tuberculosis confirmation. A total number of 3441 males and 3202 females between the ages 11 and less than 18 years were enrolled into the study. Of the 3102 participants in the active surveillance group, four subjects were diagnosed with definite tuberculosis, four subjects with probable tuberculosis, and 71 subjects had non-tuberculous Mycobacteria (NTM) isolated from their sputum. Of the 3541 participants in the passive surveillance group, four subjects were diagnosed with definite tuberculosis, two subjects with probable tuberculosis, and 48 subjects had non-tuberculosis Mycobacteria isolated from their sputum. The incidence of definite + probable TB was 147.60 / 100,000 person years in the active surveillance group and 87 / 100,000 person years in the passive surveillance group. The incidence of pulmonary tuberculosis among adolescents in our study is lower than similar studies conducted in South Africa and Eastern Uganda - countries with a higher incidence of tuberculosis and human immunodeficiency virus (HIV) than India. The study data will inform sample design for vaccine efficacy trials among adolescents in India.

  5. Identity and Diversity of Human Peripheral Th and T Regulatory Cells Defined by Single-Cell Mass Cytometry.

    PubMed

    Kunicki, Matthew A; Amaya Hernandez, Laura C; Davis, Kara L; Bacchetta, Rosa; Roncarolo, Maria-Grazia

    2018-01-01

    Human CD3 + CD4 + Th cells, FOXP3 + T regulatory (Treg) cells, and T regulatory type 1 (Tr1) cells are essential for ensuring peripheral immune response and tolerance, but the diversity of Th, Treg, and Tr1 cell subsets has not been fully characterized. Independent functional characterization of human Th1, Th2, Th17, T follicular helper (Tfh), Treg, and Tr1 cells has helped to define unique surface molecules, transcription factors, and signaling profiles for each subset. However, the adequacy of these markers to recapitulate the whole CD3 + CD4 + T cell compartment remains questionable. In this study, we examined CD3 + CD4 + T cell populations by single-cell mass cytometry. We characterize the CD3 + CD4 + Th, Treg, and Tr1 cell populations simultaneously across 23 memory T cell-associated surface and intracellular molecules. High-dimensional analysis identified several new subsets, in addition to the already defined CD3 + CD4 + Th, Treg, and Tr1 cell populations, for a total of 11 Th cell, 4 Treg, and 1 Tr1 cell subsets. Some of these subsets share markers previously thought to be selective for Treg, Th1, Th2, Th17, and Tfh cells, including CD194 (CCR4) + FOXP3 + Treg and CD183 (CXCR3) + T-bet + Th17 cell subsets. Unsupervised clustering displayed a phenotypic organization of CD3 + CD4 + T cells that confirmed their diversity but showed interrelation between the different subsets, including similarity between Th1-Th2-Tfh cell populations and Th17 cells, as well as similarity of Th2 cells with Treg cells. In conclusion, the use of single-cell mass cytometry provides a systems-level characterization of CD3 + CD4 + T cells in healthy human blood, which represents an important baseline reference to investigate abnormalities of different subsets in immune-mediated pathologies. Copyright © 2017 by The American Association of Immunologists, Inc.

  6. Human attention filters for single colors.

    PubMed

    Sun, Peng; Chubb, Charles; Wright, Charles E; Sperling, George

    2016-10-25

    The visual images in the eyes contain much more information than the brain can process. An important selection mechanism is feature-based attention (FBA). FBA is best described by attention filters that specify precisely the extent to which items containing attended features are selectively processed and the extent to which items that do not contain the attended features are attenuated. The centroid-judgment paradigm enables quick, precise measurements of such human perceptual attention filters, analogous to transmission measurements of photographic color filters. Subjects use a mouse to locate the centroid-the center of gravity-of a briefly displayed cloud of dots and receive precise feedback. A subset of dots is distinguished by some characteristic, such as a different color, and subjects judge the centroid of only the distinguished subset (e.g., dots of a particular color). The analysis efficiently determines the precise weight in the judged centroid of dots of every color in the display (i.e., the attention filter for the particular attended color in that context). We report 32 attention filters for single colors. Attention filters that discriminate one saturated hue from among seven other equiluminant distractor hues are extraordinarily selective, achieving attended/unattended weight ratios >20:1. Attention filters for selecting a color that differs in saturation or lightness from distractors are much less selective than attention filters for hue (given equal discriminability of the colors), and their filter selectivities are proportional to the discriminability distance of neighboring colors, whereas in the same range hue attention-filter selectivity is virtually independent of discriminabilty.

  7. AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity.

    PubMed

    Sun, Lei; Wang, Jun; Wei, Jinmao

    2017-03-14

    The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features). The existing ROC-based feature selection approaches are simple and effective in evaluating individual features. However, these approaches may fail to find real target feature subset due to their lack of effective means to reduce the redundancy between features, which is essential in machine learning. In this paper, we propose to assess feature complementarity by a trick of measuring the distances between the misclassified instances and their nearest misses on the dimensions of pairwise features. If a misclassified instance and its nearest miss on one feature dimension are far apart on another feature dimension, the two features are regarded as complementary to each other. Subsequently, we propose a novel filter feature selection approach on the basis of the ROC analysis. The new approach employs an efficient heuristic search strategy to select optimal features with highest complementarities. The experimental results on a broad range of microarray data sets validate that the classifiers built on the feature subset selected by our approach can get the minimal balanced error rate with a small amount of significant features. Compared with other ROC-based feature selection approaches, our new approach can select fewer features and effectively improve the classification performance.

  8. Serum Uromodulin: A Biomarker of Long-Term Kidney Allograft Failure.

    PubMed

    Bostom, Andrew; Steubl, Dominik; Garimella, Pranav S; Franceschini, Nora; Roberts, Mary B; Pasch, Andreas; Ix, Joachim H; Tuttle, Katherine R; Ivanova, Anastasia; Shireman, Theresa; Kim, S Joseph; Gohh, Reginald; Weiner, Daniel E; Levey, Andrew S; Hsu, Chi-Yuan; Kusek, John W; Eaton, Charles B

    2018-01-01

    Uromodulin is a kidney-derived glycoprotein and putative tubular function index. Lower serum uromodulin was recently associated with increased risk for kidney allograft failure in a preliminary, longitudinal single-center -European study involving 91 kidney transplant recipients (KTRs). The Folic Acid for Vascular Outcome Reduction in Transplantation (FAVORIT) trial is a completed, large, multiethnic controlled clinical trial cohort, which studied chronic, stable KTRs. We conducted a case cohort analysis using a randomly selected subset of patients (random subcohort, n = 433), and all individuals who developed kidney allograft failure (cases, n = 226) during follow-up. Serum uromodulin was determined in this total of n = 613 FAVORIT trial participants at randomization. Death-censored kidney allograft failure was the study outcome. The 226 kidney allograft failures occurred during a median surveillance of 3.2 years. Unadjusted, weighted Cox proportional hazards modeling revealed that lower serum uromodulin, tertile 1 vs. tertile 3, was associated with a threefold greater risk for kidney allograft failure (hazards ratio [HR], 95% CI 3.20 [2.05-5.01]). This association was attenuated but persisted at twofold greater risk for allograft failure, after adjustment for age, sex, smoking, allograft type and vintage, prevalent diabetes mellitus and cardiovascular disease (CVD), total/high-density lipoprotein cholesterol ratio, systolic blood pressure, estimated glomerular filtration rate, and natural log urinary albumin/creatinine: HR 2.00, 95% CI (1.06-3.77). Lower serum uromodulin, a possible indicator of less well-preserved renal tubular function, remained associated with greater risk for kidney allograft failure, after adjustment for major, established clinical kidney allograft failure and CVD risk factors, in a large, multiethnic cohort of long-term, stable KTRs. © 2018 S. Karger AG, Basel.

  9. Selection of Polynomial Chaos Bases via Bayesian Model Uncertainty Methods with Applications to Sparse Approximation of PDEs with Stochastic Inputs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karagiannis, Georgios; Lin, Guang

    2014-02-15

    Generalized polynomial chaos (gPC) expansions allow the representation of the solution of a stochastic system as a series of polynomial terms. The number of gPC terms increases dramatically with the dimension of the random input variables. When the number of the gPC terms is larger than that of the available samples, a scenario that often occurs if the evaluations of the system are expensive, the evaluation of the gPC expansion can be inaccurate due to over-fitting. We propose a fully Bayesian approach that allows for global recovery of the stochastic solution, both in spacial and random domains, by coupling Bayesianmore » model uncertainty and regularization regression methods. It allows the evaluation of the PC coefficients on a grid of spacial points via (1) Bayesian model average or (2) medial probability model, and their construction as functions on the spacial domain via spline interpolation. The former accounts the model uncertainty and provides Bayes-optimal predictions; while the latter, additionally, provides a sparse representation of the solution by evaluating the expansion on a subset of dominating gPC bases when represented as a gPC expansion. Moreover, the method quantifies the importance of the gPC bases through inclusion probabilities. We design an MCMC sampler that evaluates all the unknown quantities without the need of ad-hoc techniques. The proposed method is suitable for, but not restricted to, problems whose stochastic solution is sparse at the stochastic level with respect to the gPC bases while the deterministic solver involved is expensive. We demonstrate the good performance of the proposed method and make comparisons with others on 1D, 14D and 40D in random space elliptic stochastic partial differential equations.« less

  10. Optimal auxiliary-covariate-based two-phase sampling design for semiparametric efficient estimation of a mean or mean difference, with application to clinical trials.

    PubMed

    Gilbert, Peter B; Yu, Xuesong; Rotnitzky, Andrea

    2014-03-15

    To address the objective in a clinical trial to estimate the mean or mean difference of an expensive endpoint Y, one approach employs a two-phase sampling design, wherein inexpensive auxiliary variables W predictive of Y are measured in everyone, Y is measured in a random sample, and the semiparametric efficient estimator is applied. This approach is made efficient by specifying the phase two selection probabilities as optimal functions of the auxiliary variables and measurement costs. While this approach is familiar to survey samplers, it apparently has seldom been used in clinical trials, and several novel results practicable for clinical trials are developed. We perform simulations to identify settings where the optimal approach significantly improves efficiency compared to approaches in current practice. We provide proofs and R code. The optimality results are developed to design an HIV vaccine trial, with objective to compare the mean 'importance-weighted' breadth (Y) of the T-cell response between randomized vaccine groups. The trial collects an auxiliary response (W) highly predictive of Y and measures Y in the optimal subset. We show that the optimal design-estimation approach can confer anywhere between absent and large efficiency gain (up to 24 % in the examples) compared to the approach with the same efficient estimator but simple random sampling, where greater variability in the cost-standardized conditional variance of Y given W yields greater efficiency gains. Accurate estimation of E[Y | W] is important for realizing the efficiency gain, which is aided by an ample phase two sample and by using a robust fitting method. Copyright © 2013 John Wiley & Sons, Ltd.

  11. Optimal Auxiliary-Covariate Based Two-Phase Sampling Design for Semiparametric Efficient Estimation of a Mean or Mean Difference, with Application to Clinical Trials

    PubMed Central

    Gilbert, Peter B.; Yu, Xuesong; Rotnitzky, Andrea

    2014-01-01

    To address the objective in a clinical trial to estimate the mean or mean difference of an expensive endpoint Y, one approach employs a two-phase sampling design, wherein inexpensive auxiliary variables W predictive of Y are measured in everyone, Y is measured in a random sample, and the semi-parametric efficient estimator is applied. This approach is made efficient by specifying the phase-two selection probabilities as optimal functions of the auxiliary variables and measurement costs. While this approach is familiar to survey samplers, it apparently has seldom been used in clinical trials, and several novel results practicable for clinical trials are developed. Simulations are performed to identify settings where the optimal approach significantly improves efficiency compared to approaches in current practice. Proofs and R code are provided. The optimality results are developed to design an HIV vaccine trial, with objective to compare the mean “importance-weighted” breadth (Y) of the T cell response between randomized vaccine groups. The trial collects an auxiliary response (W) highly predictive of Y, and measures Y in the optimal subset. We show that the optimal design-estimation approach can confer anywhere between absent and large efficiency gain (up to 24% in the examples) compared to the approach with the same efficient estimator but simple random sampling, where greater variability in the cost-standardized conditional variance of Y given W yields greater efficiency gains. Accurate estimation of E[Y∣W] is important for realizing the efficiency gain, which is aided by an ample phase-two sample and by using a robust fitting method. PMID:24123289

  12. C2 Nerve Field Stimulation for the Treatment of Fibromyalgia: A Prospective, Double-blind, Randomized, Controlled Cross-over Study.

    PubMed

    Plazier, Mark; Ost, Jan; Stassijns, Gaëtane; De Ridder, Dirk; Vanneste, Sven

    2015-01-01

    Fibromyalgia is a condition characterized by widespread chronic pain. Due to the high prevalence and high costs, it has a substantial burden on society. Treatment results are diverse and only help a small subset of patients. C2 nerve field stimulation, aka occipital nerve stimulation, is helpful and a minimally invasive treatment for primary headache syndromes. Small C2 pilot studies seem to be beneficial in fibromyalgia. Forty patients were implanted with a subcutaneous electrode in the C2 dermatoma as part of a prospective, double-blind, randomized, controlled cross-over study followed by an open label follow up period of 6 months. The patients underwent 2 week periods of different doses of stimulation consisting of minimal (.1 mA), subthreshold, and suprathreshold (for paresthesias) in a randomized order. Twenty seven patients received a permanent implant and 25 completed the 6 month open label follow up period. During the 6 week trial phase of the study, patients had an overall decrease of 36% on the fibromyalgia impact questionnaire (FIQ), a decrease of 33% fibromyalgia pain and improvement of 42% on the impact on daily life activities and quality. These results imply an overall improvement in the disease burden, maintained at 6 months follow up, as well as an improvement in life quality of 50%. Seventy six percent of patients were satisfied or very satisfied with their treatment. There seems to be a dose-response curve, with increasing amplitudes leading to better clinical outcomes. Subcutaneous C2 nerve field stimulation seems to offer a safe and effective treatment option for selected medically intractable patients with fibromyalgia. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Gene selection heuristic algorithm for nutrigenomics studies.

    PubMed

    Valour, D; Hue, I; Grimard, B; Valour, B

    2013-07-15

    Large datasets from -omics studies need to be deeply investigated. The aim of this paper is to provide a new method (LEM method) for the search of transcriptome and metabolome connections. The heuristic algorithm here described extends the classical canonical correlation analysis (CCA) to a high number of variables (without regularization) and combines well-conditioning and fast-computing in "R." Reduced CCA models are summarized in PageRank matrices, the product of which gives a stochastic matrix that resumes the self-avoiding walk covered by the algorithm. Then, a homogeneous Markov process applied to this stochastic matrix converges the probabilities of interconnection between genes, providing a selection of disjointed subsets of genes. This is an alternative to regularized generalized CCA for the determination of blocks within the structure matrix. Each gene subset is thus linked to the whole metabolic or clinical dataset that represents the biological phenotype of interest. Moreover, this selection process reaches the aim of biologists who often need small sets of genes for further validation or extended phenotyping. The algorithm is shown to work efficiently on three published datasets, resulting in meaningfully broadened gene networks.

  14. An Adaptive Genetic Association Test Using Double Kernel Machines.

    PubMed

    Zhan, Xiang; Epstein, Michael P; Ghosh, Debashis

    2015-10-01

    Recently, gene set-based approaches have become very popular in gene expression profiling studies for assessing how genetic variants are related to disease outcomes. Since most genes are not differentially expressed, existing pathway tests considering all genes within a pathway suffer from considerable noise and power loss. Moreover, for a differentially expressed pathway, it is of interest to select important genes that drive the effect of the pathway. In this article, we propose an adaptive association test using double kernel machines (DKM), which can both select important genes within the pathway as well as test for the overall genetic pathway effect. This DKM procedure first uses the garrote kernel machines (GKM) test for the purposes of subset selection and then the least squares kernel machine (LSKM) test for testing the effect of the subset of genes. An appealing feature of the kernel machine framework is that it can provide a flexible and unified method for multi-dimensional modeling of the genetic pathway effect allowing for both parametric and nonparametric components. This DKM approach is illustrated with application to simulated data as well as to data from a neuroimaging genetics study.

  15. Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

    PubMed Central

    Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

    2016-01-01

    We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872

  16. Interhemispheric Effective and Functional Cortical Connectivity Signatures of Spina Bifida Are Consistent with Callosal Anomaly

    PubMed Central

    Malekpour, Sheida; Li, Zhimin; Cheung, Bing Leung Patrick; Castillo, Eduardo M.; Papanicolaou, Andrew C.; Kramer, Larry A.; Fletcher, Jack M.

    2012-01-01

    Abstract The impact of the posterior callosal anomalies associated with spina bifida on interhemispheric cortical connectivity is studied using a method for estimating cortical multivariable autoregressive models from scalp magnetoencephalography data. Interhemispheric effective and functional connectivity, measured using conditional Granger causality and coherence, respectively, is determined for the anterior and posterior cortical regions in a population of five spina bifida and five control subjects during a resting eyes-closed state. The estimated connectivity is shown to be consistent over the randomly selected subsets of the data for each subject. The posterior interhemispheric effective and functional connectivity and cortical power are significantly lower in the spina bifida group, a result that is consistent with posterior callosal anomalies. The anterior interhemispheric effective and functional connectivity are elevated in the spina bifida group, a result that may reflect compensatory mechanisms. In contrast, the intrahemispheric effective connectivity is comparable in the two groups. The differences between the spina bifida and control groups are most significant in the θ and α bands. PMID:22571349

  17. A Fast Reduced Kernel Extreme Learning Machine.

    PubMed

    Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua

    2016-04-01

    In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Fundamental Movement Skills among Iranian Primary School Children.

    PubMed

    Aalizadeh, Bahman; Mohamadzadeh, Hassan; Hosseini, Fatemeh Sadat

    2014-12-01

    To determine the relationship between anthropometric indicators, physical activity (PA) and socioeconomic status (SES) with fundamental movement skills (FMS) among Iranian male students. In this descriptive study, based on SES scores, 241 students (7-10 years) were randomly selected and classified in high, medium and low groups. All children were measured by 8 morphology anthropometric measures. In order to examine a subset of manipulative skills and to measure physical activity and socioeconomic status, Test of Gross Motor Development (TGMD2) and, interviewer-administered questionnaires were used, respectively. The data were analyzed using Pearson correlation and multiple regression. There was a significant positive correlation between SES and body mass index (BMI), while a significant negative correlation existed between PA and BMI. Object control skills were significantly correlated with height, foot length, forearm length, hand length and physical activity. Students with low socioeconomic status were more qualified in movements than other students who were in medium and high socioeconomic status. Therefore, parents need to encourage students to be more active in order to prevent obesity and to facilitate development of object control skills in high socioeconomic status.

  19. Sparse imaging for fast electron microscopy

    NASA Astrophysics Data System (ADS)

    Anderson, Hyrum S.; Ilic-Helms, Jovana; Rohrer, Brandon; Wheeler, Jason; Larson, Kurt

    2013-02-01

    Scanning electron microscopes (SEMs) are used in neuroscience and materials science to image centimeters of sample area at nanometer scales. Since imaging rates are in large part SNR-limited, large collections can lead to weeks of around-the-clock imaging time. To increase data collection speed, we propose and demonstrate on an operational SEM a fast method to sparsely sample and reconstruct smooth images. To accurately localize the electron probe position at fast scan rates, we model the dynamics of the scan coils, and use the model to rapidly and accurately visit a randomly selected subset of pixel locations. Images are reconstructed from the undersampled data by compressed sensing inversion using image smoothness as a prior. We report image fidelity as a function of acquisition speed by comparing traditional raster to sparse imaging modes. Our approach is equally applicable to other domains of nanometer microscopy in which the time to position a probe is a limiting factor (e.g., atomic force microscopy), or in which excessive electron doses might otherwise alter the sample being observed (e.g., scanning transmission electron microscopy).

  20. Risk assessment of nonylphenol and its ethoxylates in U.S. river water and sediment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weeks, J.A.; Adams, W.J.; Guiney, P.D.

    1994-12-31

    A comprehensive program addressing the risks of nonylphenol (NP) and its ethoxylates (NPE) in aquatic environments of the United States has been undertaken by the Alkyl Phenol Ethoxylates Panel of the Chemical Manufacturers Association cooperating with EPA. Several hundred million pounds of NPE surfactants are used in the US each year. Nonylphenol can be an intermediate product of degradation of nonylphenol ethoxylates. A survey of those river reaches most likely to contain NPE and NP residues was conducted based on a random sample of a subset of the EPA River Reach File defined by certain selection criteria. Applying enhanced analyticalmore » techniques, little or no NP and NPE were found in river water at most locations, while low levels were usually detected in sediment. Acute and chronic toxicity tests using a variety of organisms have also been completed. New results are presented for shrimp, fish, tadpoles, midges, and algae. The risk of NP to the aquatic environment is examined by comparison of observed levels with toxicity benchmarks, and by application of equilibrium partitioning theory to calculate sediment interstitial chemical concentrations.« less

  1. Accessing Information in Working Memory: Can the Focus of Attention Grasp Two Elements at the Same Time?

    ERIC Educational Resources Information Center

    Oberauer, Klaus; Bialkova, Svetlana

    2009-01-01

    Processing information in working memory requires selective access to a subset of working-memory contents by a focus of attention. Complex cognition often requires joint access to 2 items in working memory. How does the focus select 2 items? Two experiments with an arithmetic task and 1 with a spatial task investigate time demands for successive…

  2. The XC chemokine receptor 1 is a conserved selective marker of mammalian cells homologous to mouse CD8α+ dendritic cells

    PubMed Central

    Crozat, Karine; Guiton, Rachel; Contreras, Vanessa; Feuillet, Vincent; Dutertre, Charles-Antoine; Ventre, Erwan; Vu Manh, Thien-Phong; Baranek, Thomas; Storset, Anne K.; Marvel, Jacqueline; Boudinot, Pierre; Hosmalin, Anne; Schwartz-Cornil, Isabelle

    2010-01-01

    Human BDCA3+ dendritic cells (DCs) were suggested to be homologous to mouse CD8α+ DCs. We demonstrate that human BDCA3+ DCs are more efficient than their BDCA1+ counterparts or plasmacytoid DCs (pDCs) in cross-presenting antigen and activating CD8+ T cells, which is similar to mouse CD8α+ DCs as compared with CD11b+ DCs or pDCs, although with more moderate differences between human DC subsets. Yet, no specific marker was known to be shared between homologous DC subsets across species. We found that XC chemokine receptor 1 (XCR1) is specifically expressed and active in mouse CD8α+, human BDCA3+, and sheep CD26+ DCs and is conserved across species. The mRNA encoding the XCR1 ligand chemokine (C motif) ligand 1 (XCL1) is selectively expressed in natural killer (NK) and CD8+ T lymphocytes at steady-state and is enhanced upon activation. Moreover, the Xcl1 mRNA is selectively expressed at high levels in central memory compared with naive CD8+ T lymphocytes. Finally, XCR1−/− mice have decreased early CD8+ T cell responses to Listeria monocytogenes infection, which is associated with higher bacterial loads early in infection. Therefore, XCR1 constitutes the first conserved specific marker for cell subsets homologous to mouse CD8α+ DCs in higher vertebrates and promotes their ability to activate early CD8+ T cell defenses against an intracellular pathogenic bacteria. PMID:20479118

  3. Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech

    PubMed Central

    Álvarez, Aitor; Sierra, Basilio; Arruti, Andoni; López-Gil, Juan-Miguel; Garay-Vitoria, Nestor

    2015-01-01

    In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one. PMID:26712757

  4. Peritoneal Dialysate Glucose Load and Systemic Glucose Metabolism in Non-Diabetics: Results from the GLOBAL Fluid Cohort Study

    PubMed Central

    Chess, James; Do, Jun-Young; Noh, Hyunjin; Lee, Hi-Bahl; Kim, Yong-Lim; Summers, Angela; Williams, Paul Ford; Davison, Sara; Dorval, Marc

    2016-01-01

    Background and Objectives Glucose control is a significant predictor of mortality in diabetic peritoneal dialysis (PD) patients. During PD, the local toxic effects of intra-peritoneal glucose are well recognized, but despite large amounts of glucose being absorbed, the systemic effects of this in non-diabetic patients are not clear. We sought to clarify whether dialysate glucose has an effect upon systemic glucose metabolism. Methods and Materials We analysed the Global Fluid Study cohort, a prospective, observational cohort study initiated in 2002. A subset of 10 centres from 3 countries with high data quality were selected (368 incident and 272 prevalent non-diabetic patients), with multilevel, multivariable analysis of the reciprocal of random glucose levels, and a stratified-by-centre Cox survival analysis. Results The median follow up was 5.6 and 6.4 years respectively in incident and prevalent patients. On multivariate analysis, serum glucose increased with age (β = -0.007, 95%CI -0.010, -0.004) and decreased with higher serum sodium (β = 0.002, 95%CI 0.0005, 0.003) in incident patients and increased with dialysate glucose (β = -0.0002, 95%CI -0.0004, -0.00006) in prevalent patients. Levels suggested undiagnosed diabetes in 5.4% of prevalent patients. Glucose levels predicted death in unadjusted analyses of both incident and prevalent groups but in an adjusted survival analysis they did not (for random glucose 6–10 compared with <6, Incident group HR 0.92, 95%CI 0.58, 1.46, Prevalent group HR 1.42, 95%CI 0.86, 2.34). Conclusions In prevalent non-diabetic patients, random glucose levels at a diabetic level are under-recognised and increase with dialysate glucose load. Random glucose levels predict mortality in unadjusted analyses, but this association has not been proven in adjusted analyses. PMID:27249020

  5. Pathways to Disease: The Biological Consequences of Social Adversity on Asthma in Minority Youth

    DTIC Science & Technology

    2016-10-01

    the microbiome, and teleomere length and relate these biomarkers to the measured exposures to adversity and stress. The selection of and methods to...granted approval from the HRPO at the end of December 2015. After selecting a subset of our study population for evaluation, we experienced a second delay...of almost three months in setting up our account for Clinical lab testing. Since we were able to prepare the selected samples for biomarker testing

  6. Automatic migraine classification via feature selection committee and machine learning techniques over imaging and questionnaire data.

    PubMed

    Garcia-Chimeno, Yolanda; Garcia-Zapirain, Begonya; Gomez-Beldarrain, Marian; Fernandez-Ruanova, Begonya; Garcia-Monco, Juan Carlos

    2017-04-13

    Feature selection methods are commonly used to identify subsets of relevant features to facilitate the construction of models for classification, yet little is known about how feature selection methods perform in diffusion tensor images (DTIs). In this study, feature selection and machine learning classification methods were tested for the purpose of automating diagnosis of migraines using both DTIs and questionnaire answers related to emotion and cognition - factors that influence of pain perceptions. We select 52 adult subjects for the study divided into three groups: control group (15), subjects with sporadic migraine (19) and subjects with chronic migraine and medication overuse (18). These subjects underwent magnetic resonance with diffusion tensor to see white matter pathway integrity of the regions of interest involved in pain and emotion. The tests also gather data about pathology. The DTI images and test results were then introduced into feature selection algorithms (Gradient Tree Boosting, L1-based, Random Forest and Univariate) to reduce features of the first dataset and classification algorithms (SVM (Support Vector Machine), Boosting (Adaboost) and Naive Bayes) to perform a classification of migraine group. Moreover we implement a committee method to improve the classification accuracy based on feature selection algorithms. When classifying the migraine group, the greatest improvements in accuracy were made using the proposed committee-based feature selection method. Using this approach, the accuracy of classification into three types improved from 67 to 93% when using the Naive Bayes classifier, from 90 to 95% with the support vector machine classifier, 93 to 94% in boosting. The features that were determined to be most useful for classification included are related with the pain, analgesics and left uncinate brain (connected with the pain and emotions). The proposed feature selection committee method improved the performance of migraine diagnosis classifiers compared to individual feature selection methods, producing a robust system that achieved over 90% accuracy in all classifiers. The results suggest that the proposed methods can be used to support specialists in the classification of migraines in patients undergoing magnetic resonance imaging.

  7. Upper bounds on sequential decoding performance parameters

    NASA Technical Reports Server (NTRS)

    Jelinek, F.

    1974-01-01

    This paper presents the best obtainable random coding and expurgated upper bounds on the probabilities of undetectable error, of t-order failure (advance to depth t into an incorrect subset), and of likelihood rise in the incorrect subset, applicable to sequential decoding when the metric bias G is arbitrary. Upper bounds on the Pareto exponent are also presented. The G-values optimizing each of the parameters of interest are determined, and are shown to lie in intervals that in general have nonzero widths. The G-optimal expurgated bound on undetectable error is shown to agree with that for maximum likelihood decoding of convolutional codes, and that on failure agrees with the block code expurgated bound. Included are curves evaluating the bounds for interesting choices of G and SNR for a binary-input quantized-output Gaussian additive noise channel.

  8. CD127 and CD25 Expression Defines CD4+ T Cell Subsets That Are Differentially Depleted during HIV Infection1

    PubMed Central

    Dunham, Richard M.; Cervasi, Barbara; Brenchley, Jason M.; Albrecht, Helmut; Weintrob, Amy; Sumpter, Beth; Engram, Jessica; Gordon, Shari; Klatt, Nichole R.; Frank, Ian; Sodora, Donald L.; Douek, Daniel C.; Paiardini, Mirko; Silvestri, Guido

    2009-01-01

    Decreased CD4+ T cell counts are the best marker of disease progression during HIV infection. However, CD4+ T cells are heterogeneous in phenotype and function, and it is unknown how preferential depletion of specific CD4+ T cell subsets influences disease severity. CD4+ T cells can be classified into three subsets by the expression of receptors for two T cell-tropic cytokines, IL-2 (CD25) and IL-7 (CD127). The CD127+CD25low/− subset includes IL-2-producing naive and central memory T cells; the CD127−CD25− subset includes mainly effector T cells expressing perforin and IFN-γ; and the CD127lowCD25high subset includes FoxP3-expressing regulatory T cells. Herein we investigated how the proportions of these T cell subsets are changed during HIV infection. When compared with healthy controls, HIV-infected patients show a relative increase in CD4+CD127−CD25− T cells that is related to an absolute decline of CD4+CD127+CD25low/− T cells. Interestingly, this expansion of CD4+CD127− T cells was not observed in naturally SIV-infected sooty mangabeys. The relative expansion of CD4+CD127−CD25− T cells correlated directly with the levels of total CD4+ T cell depletion and immune activation. CD4+CD127−CD25− T cells were not selectively resistant to HIV infection as levels of cell-associated virus were similar in all non-naive CD4+ T cell subsets. These data indicate that, during HIV infection, specific changes in the fraction of CD4+ T cells expressing CD25 and/or CD127 are associated with disease progression. Further studies will determine whether monitoring the three subsets of CD4+ T cells defined based on the expression of CD25 and CD127 should be used in the clinical management of HIV-infected individuals. PMID:18390743

  9. Optimal design of compact and connected nature reserves for multiple species.

    PubMed

    Wang, Yicheng; Önal, Hayri

    2016-04-01

    When designing a conservation reserve system for multiple species, spatial attributes of the reserves must be taken into account at species level. The existing optimal reserve design literature considers either one spatial attribute or when multiple attributes are considered the analysis is restricted only to one species. We built a linear integer programing model that incorporates compactness and connectivity of the landscape reserved for multiple species. The model identifies multiple reserves that each serve a subset of target species with a specified coverage probability threshold to ensure the species' long-term survival in the reserve, and each target species is covered (protected) with another probability threshold at the reserve system level. We modeled compactness by minimizing the total distance between selected sites and central sites, and we modeled connectivity of a selected site to its designated central site by selecting at least one of its adjacent sites that has a nearer distance to the central site. We considered structural distance and functional distances that incorporated site quality between sites. We tested the model using randomly generated data on 2 species, one ground species that required structural connectivity and the other an avian species that required functional connectivity. We applied the model to 10 bird species listed as endangered by the state of Illinois (U.S.A.). Spatial coherence and selection cost of the reserves differed substantially depending on the weights assigned to these 2 criteria. The model can be used to design a reserve system for multiple species, especially species whose habitats are far apart in which case multiple disjunct but compact and connected reserves are advantageous. The model can be modified to increase or decrease the distance between reserves to reduce or promote population connectivity. © 2015 Society for Conservation Biology.

  10. Compositional pressure and translational selection determine codon usage in the extremely GC-poor unicellular eukaryote Entamoeba histolytica.

    PubMed

    Romero, H; Zavala, A; Musto, H

    2000-01-25

    It is widely accepted that the compositional pressure is the only factor shaping codon usage in unicellular species displaying extremely biased genomic compositions. This seems to be the case in the prokaryotes Mycoplasma capricolum, Rickettsia prowasekii and Borrelia burgdorferi (GC-poor), and in Micrococcus luteus (GC-rich). However, in the GC-poor unicellular eukaryotes Dictyostelium discoideum and Plasmodium falciparum, there is evidence that selection, acting at the level of translation, influences codon choices. This is a twofold intriguing finding, since (1) the genomic GC levels of the above mentioned eukaryotes are lower than the GC% of any studied bacteria, and (2) bacteria usually have larger effective population sizes than eukaryotes, and hence natural selection is expected to overcome more efficiently the randomizing effects of genetic drift among prokaryotes than among eukaryotes. In order to gain a new insight about this problem, we analysed the patterns of codon preferences of the nuclear genes of Entamoeba histolytica, a unicellular eukaryote characterised by an extremely AT-rich genome (GC = 25%). The overall codon usage is strongly biased towards A and T in the third codon positions, and among the presumed highly expressed sequences, there is an increased relative usage of a subset of codons, many of which are C-ending. Since an increase in C in third codon positions is 'against' the compositional bias, we conclude that codon usage in E. histolytica, as happens in D. discoideum and P. falciparum, is the result of an equilibrium between compositional pressure and selection. These findings raise the question of why strongly compositionally biased eukaryotic cells may be more sensitive to the (presumed) slight differences among synonymous codons than compositionally biased bacteria.

  11. Maclisp extensions

    NASA Technical Reports Server (NTRS)

    Bawden, A.; Burke, G. S.; Hoffman, C. W.

    1981-01-01

    A common subset of selected facilities available in Maclisp and its derivatives (PDP-10 and Multics Maclisp, Lisp Machine Lisp (Zetalisp), and NIL) is decribed. The object is to add in writing code which can run compatibly in more than one of these environments.

  12. Consequences of plant invasions on compartmentalization and species’ roles in plant–pollinator networks

    PubMed Central

    Albrecht, Matthias; Padrón, Benigno; Bartomeus, Ignasi; Traveset, Anna

    2014-01-01

    Compartmentalization—the organization of ecological interaction networks into subsets of species that do not interact with other subsets (true compartments) or interact more frequently among themselves than with other species (modules)—has been identified as a key property for the functioning, stability and evolution of ecological communities. Invasions by entomophilous invasive plants may profoundly alter the way interaction networks are compartmentalized. We analysed a comprehensive dataset of 40 paired plant–pollinator networks (invaded versus uninvaded) to test this hypothesis. We show that invasive plants have higher generalization levels with respect to their pollinators than natives. The consequences for network topology are that—rather than displacing native species from the network—plant invaders attracting pollinators into invaded modules tend to play new important topological roles (i.e. network hubs, module hubs and connectors) and cause role shifts in native species, creating larger modules that are more connected among each other. While the number of true compartments was lower in invaded compared with uninvaded networks, the effect of invasion on modularity was contingent on the study system. Interestingly, the generalization level of the invasive plants partially explains this pattern, with more generalized invaders contributing to a lower modularity. Our findings indicate that the altered interaction structure of invaded networks makes them more robust against simulated random secondary species extinctions, but more vulnerable when the typically highly connected invasive plants go extinct first. The consequences and pathways by which biological invasions alter the interaction structure of plant–pollinator communities highlighted in this study may have important dynamical and functional implications, for example, by influencing multi-species reciprocal selection regimes and coevolutionary processes. PMID:24943368

  13. Epidemiology of upper urinary tract stone disease in a Taiwanese population: a nationwide, population based study.

    PubMed

    Huang, Wei-Yi; Chen, Yu-Fen; Carter, Stacey; Chang, Hong-Chiang; Lan, Chung-Fu; Huang, Kuo-How

    2013-06-01

    We investigated the epidemiology of upper urinary tract stone disease in Taiwan using a nationwide, population based database. This study was based on the National Health Insurance Research Database of Taiwan, which contains data on all medical beneficiary claims from 22.72 million enrollees, accounting for almost 99% of the Taiwanese population. The Longitudinal Health Insurance Database 2005, a subset of the National Health Insurance Research Database, contains data on all medical benefit claims from 1997 through 2010 for a subset of 1 million beneficiaries randomly sampled from the 2005 enrollment file. For epidemiological analysis we selected subjects whose claims records included the diagnosis of upper urinary tract urolithiasis. The age adjusted rate of medical care visits for upper urinary tract urolithiasis decreased by 6.5% from 1,367/100,000 subjects in 1998 to 1,278/100,000 in 2010. There was a significantly decreasing trend during the 13-year period in visits from female and all subjects (r(2) = 0.86, p = 0.001 and r(2) = 0.52, p = 0.005, respectively). In contrast, an increasing trend was noted for male subjects (r(2) = 0.45, p = 0.012). The age adjusted prevalence in 2010 was 9.01%, 5.79% and 7.38% in male, female and all subjects, respectively. The overall recurrence rate at 1 and 5 years was 6.12% and 34.71%, respectively. Male subjects had a higher recurrence rate than female subjects. Our study provides important information on the epidemiology of upper urinary tract stone disease in Taiwan, helping to quantify the burden of urolithiasis and establish strategies to decrease the risk of urolithiasis. Copyright © 2013 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.

  14. Evaluation of risk and protective factors for work-related bite injuries to veterinary technicians certified in Minnesota.

    PubMed

    Nordgren, Leslie D; Gerberich, Susan G; Alexander, Bruce H; Church, Timothy R; Bender, Jeff B; Ryan, Andrew D

    2014-08-15

    To identify risk and protective factors for work-related bite injuries among veterinary technicians certified in Minnesota. Nested case-control study. 868 certified veterinary technicians (CVTs). A questionnaire was mailed to CVTs who previously participated in a survey regarding work-related injuries and did (cases; 301 surveys sent) or did not (controls; 567) report qualifying work-related animal bite injuries in the preceding 12 months. Descriptive statistics were summarized. Demographic and work-related variables for the month preceding the bite injury (for cases) or a randomly selected month (controls) were assessed with univariate analysis (489 CVTs) and multivariate analysis of a subset of 337 CVTs who worked in small or mixed mostly small animal facilities. Responses were received from 176 case and 313 control CVTs. For the subset of 337 CVTs, risk of bite injury was higher for those < 25 years of age (OR, 3.82; 95% confidence interval [CI], 1.84 to 7.94) than for those ≥ 35 years of age, for those who had worked < 5 years (OR, 3.24; 95% CI, 1.63 to 6.45) versus ≥ 10 years in any veterinary facility, and for those who handled ≥ 5 species/d (OR, 1.99; 95% CI, 1.06 to 3.74) versus < 3 species/d. Risk was lower for CVTs who handled < 10 versus ≥ 20 animals/d (OR, 0.23; 95% CI, 0.08 to 0.71). Several work-related factors were associated with the risk of work-related bite injury to CVTs. These findings may serve as a basis for development of intervention efforts and future research regarding work-related injuries among veterinary staff.

  15. Estimation of fat-free mass in Asian neonates using bioelectrical impedance analysis

    PubMed Central

    Tint, Mya-Thway; Ward, Leigh C; Soh, Shu E; Aris, Izzuddin M; Chinnadurai, Amutha; Saw, Seang Mei; Gluckman, Peter D; Godfrey, Keith M; Chong, Yap-Seng; Kramer, Michael S; Yap, Fabian; Lingwood, Barbara; Lee, Yung Seng

    2016-01-01

    The aims of this study were to develop and validate a prediction equation of fat-free mass (FFM) based on bioelectrical impedance analysis (BIA) and anthropometry using air displacement plethysmography (ADP) as a reference in Asian neonates and to test the applicability of the prediction equations in independent Western cohort. A total of 173 neonates at birth and 140 at week-2 of age were included. Multiple linear regression analysis was performed to develop the prediction equations in a two-third randomly selected subset and validated on the remaining one-third subset at each time point and in an independent Queensland cohort. FFM measured by ADP was the dependent variable and anthropometric measures, sex and impedance quotient (L2/R50) were independent variables in the model. Accuracy of prediction equations were assessed using intra-class correlation and Bland-Altman analyses. L2/R50 was the significant predictor of FFM at week-2 but not at birth. Compared to the model using weight, sex and length, including L2/R50 slightly improved the prediction with a bias of 0.01kg with 2SD limits of agreement (LOA) (0.18, −0.20). Prediction explained 88.9% of variation but not beyond that of anthropometry. Applying these equations to Queensland cohort provided similar performance at the appropriate age. However, when the Queensland equations were applied to our cohort, the bias increased slightly but with similar LOA. BIA appears to have limited use in predicting FFM in the first few weeks of life compared to simple anthropometry in Asian populations. There is a need for population and age appropriate FFM prediction equations. PMID:26856420

  16. Frequency, diagnostic performance of coproantigen detection and genotyping of the Giardia among patients referred to a multi-level teaching hospital in northern India.

    PubMed

    Ghoshal, Ujjala; Shukla, Ratnakar; Pant, Priyannk; Ghoshal, Uday C

    Giardiasis, a common gastrointestinal parasitic infection in tropics, is diagnosed on stool microscopy (gold standard); however, its sensitivity is low due to intermittent fecal shedding. Coproantigen detection (ELISA) is useful but requires further evaluation. We aimed to study: (a) detection of Giardia by stool microscopy and/or coproantigen, (b) diagnostic performance of fecal antigen detection and microscopy, and c) genotypic characterization of G. lamblia using PCR specific for triose phosphate isomerase (tpi) gene. Stool samples from 2992 patients were examined by microscopy from March 2013 to March 2015 in a multi level teaching hospital in northern India. Giardia coproantigen detection was performed by ELISA in a subset of patients. Genetic characterization of G. lamblia was performed by PCR targeting tpi gene in a subset of microscopy positive stool samples. Of 2992 patients, 132 (4.4%) had Giardia by microscopy (cyst/trophozoite) and/or ELISA. ELISA was performed in 264 patients; of them, 127 were positive by microscopy. Sensitivity, specificity, positive and negative predictive values of ELISA were 91, 91, 94, and 91%, respectively, using microscopy as a gold standard. PCR was performed in 116 randomly selected samples having Giardia using tpi gene. Assemblages A and B were found among 44 (38%) and 72 (62%) patients, respectively. Assemblage B was more often associated with malnutrition and loss of appetite than A (48/72 [67%] vs. 21/44 [48%], P = 0.044 and 17/72 [24%] vs. 14/44 [32%], P = 0.019). We conclude that 4.4% of studied population had giardiasis. Fecal antigen is a useful method for diagnosis and assemblage B is the most common genotype.

  17. Comorbidities contribute to the risk of cancer death among Aboriginal and non-Aboriginal South Australians: Analysis of a matched cohort study.

    PubMed

    Banham, David; Roder, David; Brown, Alex

    2018-02-01

    Aboriginal Australians have poorer cancer survival than other Australians. Diagnoses at later stages and correlates of remote area living influence, but do not fully explain, these disparities. Little is known of the prevalence and influence of comorbid conditions experienced by Aboriginal people, including their effect on cancer survival. This study quantifies hospital recorded comorbidities using the Elixhauser Comorbidity Index (ECI), examines their influence on risk of cancer death, then considers effect variation by Aboriginality. Cancers diagnosed among Aboriginal South Australians in 1990-2010 (N = 777) were matched with randomly selected non-Aboriginal cases by birth year, diagnostic year, sex, and primary site, then linked to administrative hospital records to the time of diagnosis. Competing risk regression summarised associations of Aboriginal status, stage, geographic attributes and comorbidities with risk of cancer death. A threshold of four or more ECI conditions was associated with increased risk of cancer death (sub-hazard ratio SHR 1.66, 95%CI 1.11-2.46). Alternatively, the presence of any one of a subset of ECI conditions was associated with similarly increased risk (SHR = 1.62, 95%CI 1.23-2.14). The observed effects did not differ between Aboriginal and matched non-Aboriginal cases. However, Aboriginal cases experienced three times higher exposure than non-Aboriginal to four or more ECI conditions (14.2% versus 4.5%) and greater exposure to the subset of ECI conditions (20.7% versus 8.0%). Comorbidities at diagnosis increased the risk of cancer death in addition to risks associated with Aboriginality, remoteness of residence and disease stage at diagnosis. The Aboriginal cohort experienced comparatively greater exposure to comorbidities which adds to disparities in cancer outcomes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Auditing SNOMED CT hierarchical relations based on lexical features of concepts in non-lattice subgraphs.

    PubMed

    Cui, Licong; Bodenreider, Olivier; Shi, Jay; Zhang, Guo-Qiang

    2018-02-01

    We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations. Our approach involves 3 stages. In stage 1, all non-lattice subgraphs of SNOMED CT's IS-A hierarchical relations are extracted. In stage 2, lexical attributes of fully-specified concept names in such non-lattice subgraphs are extracted. For each concept in a non-lattice subgraph, we enrich its set of attributes with attributes from its ancestor concepts within the non-lattice subgraph. In stage 3, subset inclusion relations between the lexical attribute sets of each pair of concepts in each non-lattice subgraph are compared to existing IS-A relations in SNOMED CT. For concept pairs within each non-lattice subgraph, if a subset relation is identified but an IS-A relation is not present in SNOMED CT IS-A transitive closure, then a missing IS-A relation is reported. The September 2017 release of SNOMED CT (US edition) was used in this investigation. A total of 14,380 non-lattice subgraphs were extracted, from which we suggested a total of 41,357 missing IS-A relations. For evaluation purposes, 200 non-lattice subgraphs were randomly selected from 996 smaller subgraphs (of size 4, 5, or 6) within the "Clinical Finding" and "Procedure" sub-hierarchies. Two domain experts confirmed 185 (among 223) suggested missing IS-A relations, a precision of 82.96%. Our results demonstrate that analyzing the lexical features of concepts in non-lattice subgraphs is an effective approach for auditing SNOMED CT. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. A robust method using propensity score stratification for correcting verification bias for binary tests

    PubMed Central

    He, Hua; McDermott, Michael P.

    2012-01-01

    Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified. PMID:21856650

  20. Estimation of fat-free mass in Asian neonates using bioelectrical impedance analysis.

    PubMed

    Tint, Mya-Thway; Ward, Leigh C; Soh, Shu E; Aris, Izzuddin M; Chinnadurai, Amutha; Saw, Seang Mei; Gluckman, Peter D; Godfrey, Keith M; Chong, Yap-Seng; Kramer, Michael S; Yap, Fabian; Lingwood, Barbara; Lee, Yung Seng

    2016-03-28

    The aims of this study were to develop and validate a prediction equation of fat-free mass (FFM) based on bioelectrical impedance analysis (BIA) and anthropometry using air-displacement plethysmography (ADP) as a reference in Asian neonates and to test the applicability of the prediction equations in an independent Western cohort. A total of 173 neonates at birth and 140 at two weeks of age were included. Multiple linear regression analysis was performed to develop the prediction equations in a two-third randomly selected subset and validated on the remaining one-third subset at each time point and in an independent Queensland cohort. FFM measured by ADP was the dependent variable, and anthropometric measures, sex and impedance quotient (L2/R50) were independent variables in the model. Accuracy of prediction equations was assessed using intra-class correlation and Bland-Altman analyses. L2/R50 was the significant predictor of FFM at week two but not at birth. Compared with the model using weight, sex and length, including L2/R50 slightly improved the prediction with a bias of 0·01 kg with 2 sd limits of agreement (LOA) (0·18, -0·20). Prediction explained 88·9 % of variation but not beyond that of anthropometry. Applying these equations to the Queensland cohort provided similar performance at the appropriate age. However, when the Queensland equations were applied to our cohort, the bias increased slightly but with similar LOA. BIA appears to have limited use in predicting FFM in the first few weeks of life compared with simple anthropometry in Asian populations. There is a need for population- and age-appropriate FFM prediction equations.

  1. Designing basin-customized combined drought indices via feature extraction

    NASA Astrophysics Data System (ADS)

    Zaniolo, Marta; Giuliani, Matteo; Castelletti, Andrea

    2017-04-01

    The socio-economic costs of drought are progressively increasing worldwide due to the undergoing alteration of hydro-meteorological regimes induced by climate change. Although drought management is largely studied in the literature, most of the traditional drought indexes fail in detecting critical events in highly regulated systems, which generally rely on ad-hoc formulations and cannot be generalized to different context. In this study, we contribute a novel framework for the design of a basin-customized drought index. This index represents a surrogate of the state of the basin and is computed by combining the available information about the water available in the system to reproduce a representative target variable for the drought condition of the basin (e.g., water deficit). To select the relevant variables and how to combine them, we use an advanced feature extraction algorithm called Wrapper for Quasi Equally Informative Subset Selection (W-QEISS). The W-QEISS algorithm relies on a multi-objective evolutionary algorithm to find Pareto-efficient subsets of variables by maximizing the wrapper accuracy, minimizing the number of selected variables (cardinality) and optimizing relevance and redundancy of the subset. The accuracy objective is evaluated trough the calibration of a pre-defined model (i.e., an extreme learning machine) of the water deficit for each candidate subset of variables, with the index selected from the resulting solutions identifying a suitable compromise between accuracy, cardinality, relevance, and redundancy. The proposed methodology is tested in the case study of Lake Como in northern Italy, a regulated lake mainly operated for irrigation supply to four downstream agricultural districts. In the absence of an institutional drought monitoring system, we constructed the combined index using all the hydrological variables from the existing monitoring system as well as the most common drought indicators at multiple time aggregations. The soil moisture deficit in the root zone computed by a distributed-parameter water balance model of the agricultural districts is used as target variable. Numerical results show that our framework succeeds in constructing a combined drought index that reproduces the soil moisture deficit. Moreover, this index represents a valuable information for supporting appropriate drought management strategies, including the possibility of directly informing the lake operations about the drought conditions and improve the overall reliability of the irrigation supply system.

  2. A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs.

    PubMed

    Li, Feifei; Piao, Minghao; Piao, Yongjun; Li, Meijing; Ryu, Keun Ho

    2014-10-01

    Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearson's correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.

  3. On the selection of gantry and collimator angles for isocenter localization using Winston-Lutz tests.

    PubMed

    Du, Weiliang; Johnson, Jennifer L; Jiang, Wei; Kudchadker, Rajat J

    2016-01-08

    In Winston-Lutz (WL) tests, the isocenter of a linear accelerator (linac) is determined as the intersection of radiation central axes (CAX) from multiple gantry, collimator, and couch angles. It is well known that the CAX can wobble due to mechanical imperfections of the linac. Previous studies suggested that the wobble varies with gantry and collimator angles. Therefore, the isocenter determined in the WL tests has a profound dependence on the gantry and collimator angles at which CAX are sampled. In this study, we evaluated the systematic and random errors in the iso-centers determined with different CAX sampling schemes. Digital WL tests were performed on six linacs. For each WL test, 63 CAX were sampled at nine gantry angles and seven collimator angles. Subsets of these data were used to simulate the effects of various CAX sampling schemes. An isocenter was calculated from each subset of CAX and compared against the reference isocenter, which was calculated from 48 opposing CAX. The differences between the calculated isocenters and the reference isocenters ranged from 0 to 0.8 mm. The differences diminished to less than 0.2 mm when 24 or more CAX were sampled. Isocenters determined with collimator 0° were vertically lower than those determined with collimator 90° and 270°. Isocenter localization errors in the longitudinal direction (along the axis of gantry rotation) showed a strong dependence on the collimator angle selected. The errors in all directions were significantly reduced when opposing collimator angles and opposing gantry angles were employed. The isocenter localization errors were less than 0.2 mm with the common CAX sampling scheme, which used four cardinal gantry angles and two opposing collimator angles. Reproducibility stud-ies on one linac showed that the mean and maximum variations of CAX during the WL tests were 0.053 mm and 0.30 mm, respectively. The maximal variation in the resulting isocenters was 0.068 mm if 48 CAX were used, or 0.13 mm if four CAX were used. Quantitative results from this study are useful for understanding and minimizing the isocenter uncertainty in WL tests.

  4. Immune Reactions against Gene Gun Vaccines Are Differentially Modulated by Distinct Dendritic Cell Subsets in the Skin

    PubMed Central

    Deressa, Tekalign; Strandt, Helen; Florindo Pinheiro, Douglas; Mittermair, Roberta; Pizarro Pesado, Jennifer; Thalhamer, Josef; Hammerl, Peter; Stoecklinger, Angelika

    2015-01-01

    The skin accommodates multiple dendritic cell (DC) subsets with remarkable functional diversity. Immune reactions are initiated and modulated by the triggering of DC by pathogen-associated or endogenous danger signals. In contrast to these processes, the influence of intrinsic features of protein antigens on the strength and type of immune responses is much less understood. Therefore, we investigated the involvement of distinct DC subsets in immune reactions against two structurally different model antigens, E. coli beta-galactosidase (betaGal) and chicken ovalbumin (OVA) under otherwise identical conditions. After epicutaneous administration of the respective DNA vaccines with a gene gun, wild type mice induced robust immune responses against both antigens. However, ablation of langerin+ DC almost abolished IgG1 and cytotoxic T lymphocytes against betaGal but enhanced T cell and antibody responses against OVA. We identified epidermal Langerhans cells (LC) as the subset responsible for the suppression of anti-OVA reactions and found regulatory T cells critically involved in this process. In contrast, reactions against betaGal were not affected by the selective elimination of LC, indicating that this antigen required a different langerin+ DC subset. The opposing findings obtained with OVA and betaGal vaccines were not due to immune-modulating activities of either the plasmid DNA or the antigen gene products, nor did the differential cellular localization, size or dose of the two proteins account for the opposite effects. Thus, skin-borne protein antigens may be differentially handled by distinct DC subsets, and, in this way, intrinsic features of the antigen can participate in immune modulation. PMID:26030383

  5. The emerging role of ALK inhibitors in the treatment of advanced non-small cell lung cancer.

    PubMed

    Galetta, Domenico; Rossi, Antonio; Pisconti, Salvatore; Colucci, Giuseppe

    2012-04-01

    Most NSCLC patients are diagnosed in the advanced stage of the disease. Recently, chemotherapeutic agents have reached a plateau of effectiveness. Increased understanding of cancer biology has revealed several potential therapeutic strategies that have led to marketing of new biologic agents. The echinoderm microtubule-associated protein like-4-anaplastic lymphoma kinase (EML4-ALK) fusion oncogene represents one of the newest molecular targets in NSCLC, identifying a subset of NSCLC patients characterized by distinct clinicopathological features. The available results concerning ALK inhibitors for the treatment of advanced NSCLC patients. An electronic search was used to retrieve the articles addressing this topic. In a pivotal Phase I clinical trial, crizotinib (PF-02341066), a small-molecule ALK inhibitor, demonstrated impressive antitumor activity in the majority of NSCLC patients with ALK fusions. Phase III randomized trials investigating crizotinib in this subgroup of patients are ongoing. If the results from these large international trials confirm the efficacy of crizotinib in the subset of patients, the next few years could see the treatment of advanced NSCLC patients with ALK fusions. Specific inhibitors would realize the so called personalized medicine in subsets of this disease.

  6. Existence of CD8α-like dendritic cells with a conserved functional specialization and a common molecular signature in distant mammalian species.

    PubMed

    Contreras, Vanessa; Urien, Céline; Guiton, Rachel; Alexandre, Yannick; Vu Manh, Thien-Phong; Andrieu, Thibault; Crozat, Karine; Jouneau, Luc; Bertho, Nicolas; Epardaud, Mathieu; Hope, Jayne; Savina, Ariel; Amigorena, Sebastian; Bonneau, Michel; Dalod, Marc; Schwartz-Cornil, Isabelle

    2010-09-15

    The mouse lymphoid organ-resident CD8alpha(+) dendritic cell (DC) subset is specialized in Ag presentation to CD8(+) T cells. Recent evidence shows that mouse nonlymphoid tissue CD103(+) DCs and human blood DC Ag 3(+) DCs share similarities with CD8alpha(+) DCs. We address here whether the organization of DC subsets is conserved across mammals in terms of gene expression signatures, phenotypic characteristics, and functional specialization, independently of the tissue of origin. We study the DC subsets that migrate from the skin in the ovine species that, like all domestic animals, belongs to the Laurasiatheria, a distinct phylogenetic clade from the supraprimates (human/mouse). We demonstrate that the minor sheep CD26(+) skin lymph DC subset shares significant transcriptomic similarities with mouse CD8alpha(+) and human blood DC Ag 3(+) DCs. This allowed the identification of a common set of phenotypic characteristics for CD8alpha-like DCs in the three mammalian species (i.e., SIRP(lo), CADM1(hi), CLEC9A(hi), CD205(hi), XCR1(hi)). Compared to CD26(-) DCs, the sheep CD26(+) DCs show 1) potent stimulation of allogeneic naive CD8(+) T cells with high selective induction of the Ifngamma and Il22 genes; 2) dominant efficacy in activating specific CD8(+) T cells against exogenous soluble Ag; and 3) selective expression of functional pathways associated with high capacity for Ag cross-presentation. Our results unravel a unifying definition of the CD8alpha(+)-like DCs across mammalian species and identify molecular candidates that could be used for the design of vaccines applying to mammals in general.

  7. Langerin+ dermal dendritic cells are critical for CD8+ T cell activation and IgH γ-1 class switching in response to gene gun vaccines.

    PubMed

    Stoecklinger, Angelika; Eticha, Tekalign D; Mesdaghi, Mehrnaz; Kissenpfennig, Adrien; Malissen, Bernard; Thalhamer, Josef; Hammerl, Peter

    2011-02-01

    The C-type lectin langerin/CD207 was originally discovered as a specific marker for epidermal Langerhans cells (LC). Recently, additional and distinct subsets of langerin(+) dendritic cells (DC) have been identified in lymph nodes and peripheral tissues of mice. Although the role of LC for immune activation or modulation is now being discussed controversially, other langerin(+) DC appear crucial for protective immunity in a growing set of infection and vaccination models. In knock-in mice that express the human diphtheria toxin receptor under control of the langerin promoter, injection of diphtheria toxin ablates LC for several weeks whereas other langerin(+) DC subsets are replenished within just a few days. Thus, by careful timing of diphtheria toxin injections selective states of deficiency in either LC only or all langerin(+) cells can be established. Taking advantage of this system, we found that, unlike selective LC deficiency, ablation of all langerin(+) DC abrogated the activation of IFN-γ-producing and cytolytic CD8(+) T cells after gene gun vaccination. Moreover, we identified migratory langerin(+) dermal DC as the subset that directly activated CD8(+) T cells in lymph nodes. Langerin(+) DC were also critical for IgG1 but not IgG2a Ab induction, suggesting differential polarization of CD4(+) T helper cells by langerin(+) or langerin-negative DC, respectively. In contrast, protein vaccines administered with various adjuvants induced IgG1 independently of langerin(+) DC. Taken together, these findings reflect a highly specialized division of labor between different DC subsets both with respect to Ag encounter as well as downstream processes of immune activation.

  8. Cerebellins are differentially expressed in selective subsets of neurons throughout the brain.

    PubMed

    Seigneur, Erica; Südhof, Thomas C

    2017-10-15

    Cerebellins are secreted hexameric proteins that form tripartite complexes with the presynaptic cell-adhesion molecules neurexins or 'deleted-in-colorectal-cancer', and the postsynaptic glutamate-receptor-related proteins GluD1 and GluD2. These tripartite complexes are thought to regulate synapses. However, cerebellins are expressed in multiple isoforms whose relative distributions and overall functions are not understood. Three of the four cerebellins, Cbln1, Cbln2, and Cbln4, autonomously assemble into homohexamers, whereas the Cbln3 requires Cbln1 for assembly and secretion. Here, we show that Cbln1, Cbln2, and Cbln4 are abundantly expressed in nearly all brain regions, but exhibit strikingly different expression patterns and developmental dynamics. Using newly generated knockin reporter mice for Cbln2 and Cbln4, we find that Cbln2 and Cbln4 are not universally expressed in all neurons, but only in specific subsets of neurons. For example, Cbln2 and Cbln4 are broadly expressed in largely non-overlapping subpopulations of excitatory cortical neurons, but only sparse expression was observed in excitatory hippocampal neurons of the CA1- or CA3-region. Similarly, Cbln2 and Cbln4 are selectively expressed, respectively, in inhibitory interneurons and excitatory mitral projection neurons of the main olfactory bulb; here, these two classes of neurons form dendrodendritic reciprocal synapses with each other. A few brain regions, such as the nucleus of the lateral olfactory tract, exhibit astoundingly high Cbln2 expression levels. Viewed together, our data show that cerebellins are abundantly expressed in relatively small subsets of neurons, suggesting specific roles restricted to subsets of synapses. © 2017 Wiley Periodicals, Inc.

  9. Unsupervised Feature Selection Based on the Morisita Index for Hyperspectral Images

    NASA Astrophysics Data System (ADS)

    Golay, Jean; Kanevski, Mikhail

    2017-04-01

    Hyperspectral sensors are capable of acquiring images with hundreds of narrow and contiguous spectral bands. Compared with traditional multispectral imagery, the use of hyperspectral images allows better performance in discriminating between land-cover classes, but it also results in large redundancy and high computational data processing. To alleviate such issues, unsupervised feature selection techniques for redundancy minimization can be implemented. Their goal is to select the smallest subset of features (or bands) in such a way that all the information content of a data set is preserved as much as possible. The present research deals with the application to hyperspectral images of a recently introduced technique of unsupervised feature selection: the Morisita-Based filter for Redundancy Minimization (MBRM). MBRM is based on the (multipoint) Morisita index of clustering and on the Morisita estimator of Intrinsic Dimension (ID). The fundamental idea of the technique is to retain only the bands which contribute to increasing the ID of an image. In this way, redundant bands are disregarded, since they have no impact on the ID. Besides, MBRM has several advantages over benchmark techniques: in addition to its ability to deal with large data sets, it can capture highly-nonlinear dependences and its implementation is straightforward in any programming environment. Experimental results on freely available hyperspectral images show the good effectiveness of MBRM in remote sensing data processing. Comparisons with benchmark techniques are carried out and random forests are used to assess the performance of MBRM in reducing the data dimensionality without loss of relevant information. References [1] C. Traina Jr., A.J.M. Traina, L. Wu, C. Faloutsos, Fast feature selection using fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases, SBBD, pp. 158-171, 2000. [2] J. Golay, M. Kanevski, A new estimator of intrinsic dimension based on the multipoint Morisita index, Pattern Recognition 48(12), pp. 4070-4081, 2015. [3] J. Golay, M. Kanevski, Unsupervised feature selection based on the Morisita estimator of intrinsic dimension, arXiv:1608.05581, 2016.

  10. A network-based, integrative study to identify core biological pathways that drive breast cancer clinical subtypes

    PubMed Central

    Dutta, B; Pusztai, L; Qi, Y; André, F; Lazar, V; Bianchini, G; Ueno, N; Agarwal, R; Wang, B; Shiang, C Y; Hortobagyi, G N; Mills, G B; Symmans, W F; Balázsi, G

    2012-01-01

    Background: The rapid collection of diverse genome-scale data raises the urgent need to integrate and utilise these resources for biological discovery or biomedical applications. For example, diverse transcriptomic and gene copy number variation data are currently collected for various cancers, but relatively few current methods are capable to utilise the emerging information. Methods: We developed and tested a data-integration method to identify gene networks that drive the biology of breast cancer clinical subtypes. The method simultaneously overlays gene expression and gene copy number data on protein–protein interaction, transcriptional-regulatory and signalling networks by identifying coincident genomic and transcriptional disturbances in local network neighborhoods. Results: We identified distinct driver-networks for each of the three common clinical breast cancer subtypes: oestrogen receptor (ER)+, human epidermal growth factor receptor 2 (HER2)+, and triple receptor-negative breast cancers (TNBC) from patient and cell line data sets. Driver-networks inferred from independent datasets were significantly reproducible. We also confirmed the functional relevance of a subset of randomly selected driver-network members for TNBC in gene knockdown experiments in vitro. We found that TNBC driver-network members genes have increased functional specificity to TNBC cell lines and higher functional sensitivity compared with genes selected by differential expression alone. Conclusion: Clinical subtype-specific driver-networks identified through data integration are reproducible and functionally important. PMID:22343619

  11. Sequence-related amplified polymorphism (SRAP) markers: A potential resource for studies in plant molecular biology(1.).

    PubMed

    Robarts, Daniel W H; Wolfe, Andrea D

    2014-07-01

    In the past few decades, many investigations in the field of plant biology have employed selectively neutral, multilocus, dominant markers such as inter-simple sequence repeat (ISSR), random-amplified polymorphic DNA (RAPD), and amplified fragment length polymorphism (AFLP) to address hypotheses at lower taxonomic levels. More recently, sequence-related amplified polymorphism (SRAP) markers have been developed, which are used to amplify coding regions of DNA with primers targeting open reading frames. These markers have proven to be robust and highly variable, on par with AFLP, and are attained through a significantly less technically demanding process. SRAP markers have been used primarily for agronomic and horticultural purposes, developing quantitative trait loci in advanced hybrids and assessing genetic diversity of large germplasm collections. Here, we suggest that SRAP markers should be employed for research addressing hypotheses in plant systematics, biogeography, conservation, ecology, and beyond. We provide an overview of the SRAP literature to date, review descriptive statistics of SRAP markers in a subset of 171 publications, and present relevant case studies to demonstrate the applicability of SRAP markers to the diverse field of plant biology. Results of these selected works indicate that SRAP markers have the potential to enhance the current suite of molecular tools in a diversity of fields by providing an easy-to-use, highly variable marker with inherent biological significance.

  12. Sequence-related amplified polymorphism (SRAP) markers: A potential resource for studies in plant molecular biology1

    PubMed Central

    Robarts, Daniel W. H.; Wolfe, Andrea D.

    2014-01-01

    In the past few decades, many investigations in the field of plant biology have employed selectively neutral, multilocus, dominant markers such as inter-simple sequence repeat (ISSR), random-amplified polymorphic DNA (RAPD), and amplified fragment length polymorphism (AFLP) to address hypotheses at lower taxonomic levels. More recently, sequence-related amplified polymorphism (SRAP) markers have been developed, which are used to amplify coding regions of DNA with primers targeting open reading frames. These markers have proven to be robust and highly variable, on par with AFLP, and are attained through a significantly less technically demanding process. SRAP markers have been used primarily for agronomic and horticultural purposes, developing quantitative trait loci in advanced hybrids and assessing genetic diversity of large germplasm collections. Here, we suggest that SRAP markers should be employed for research addressing hypotheses in plant systematics, biogeography, conservation, ecology, and beyond. We provide an overview of the SRAP literature to date, review descriptive statistics of SRAP markers in a subset of 171 publications, and present relevant case studies to demonstrate the applicability of SRAP markers to the diverse field of plant biology. Results of these selected works indicate that SRAP markers have the potential to enhance the current suite of molecular tools in a diversity of fields by providing an easy-to-use, highly variable marker with inherent biological significance. PMID:25202637

  13. Translational initiation in Leishmania tarentolae and Phytomonas serpens (Kinetoplastida) is strongly influenced by pre-ATG triplet and its 5' sequence context.

    PubMed

    Lukes, Julius; Paris, Zdenek; Regmi, Sandesh; Breitling, Reinhard; Mureev, Sergey; Kushnir, Susanna; Pyatkov, Konstantin; Jirků, Milan; Alexandrov, Kirill A

    2006-08-01

    To investigate the influence of sequence context of translation initiation codon on translation efficiency in Kinetoplastida, we constructed a library of expression plasmids randomized in the three nucleotides prefacing ATG of a reporter gene encoding enhanced green fluorescent protein (EGFP). All 64 possible combinations of pre-ATG triplets were individually stably integrated into the rDNA locus of Leishmania tarentolae and the resulting cell lines were assessed for EGFP expression. The expression levels were quantified directly by measuring the fluorescence of EGFP protein in living cells and confirmed by Western blotting. We observed a strong influence of the pre-ATG triplet on the level of protein expression over a 20-fold range. To understand the degree of evolutionary conservation of the observed effect, we transformed Phytomonas serpens, a trypanosomatid parasite of plants, with a subset of the constructs. The pattern of translational efficiency mediated by individual pre-ATG triplets in this species was similar to that observed in L. tarentolae. However, the pattern of translational efficiency of two other proteins (red fluorescent protein and tetracycline repressor) containing selected pre-ATG triplets did not correlate with either EGFP or each other. Thus, we conclude that a conserved mechanism of translation initiation site selection exists in kinetoplastids that is strongly influenced not only by the pre-ATG sequences but also by the coding region of the gene.

  14. Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics.

    PubMed

    Keich, Uri; Kertesz-Farkas, Attila; Noble, William Stafford

    2015-08-07

    Interpreting the potentially vast number of hypotheses generated by a shotgun proteomics experiment requires a valid and accurate procedure for assigning statistical confidence estimates to identified tandem mass spectra. Despite the crucial role such procedures play in most high-throughput proteomics experiments, the scientific literature has not reached a consensus about the best confidence estimation methodology. In this work, we evaluate, using theoretical and empirical analysis, four previously proposed protocols for estimating the false discovery rate (FDR) associated with a set of identified tandem mass spectra: two variants of the target-decoy competition protocol (TDC) of Elias and Gygi and two variants of the separate target-decoy search protocol of Käll et al. Our analysis reveals significant biases in the two separate target-decoy search protocols. Moreover, the one TDC protocol that provides an unbiased FDR estimate among the target PSMs does so at the cost of forfeiting a random subset of high-scoring spectrum identifications. We therefore propose the mix-max procedure to provide unbiased, accurate FDR estimates in the presence of well-calibrated scores. The method avoids biases associated with the two separate target-decoy search protocols and also avoids the propensity for target-decoy competition to discard a random subset of high-scoring target identifications.

  15. Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics

    PubMed Central

    2016-01-01

    Interpreting the potentially vast number of hypotheses generated by a shotgun proteomics experiment requires a valid and accurate procedure for assigning statistical confidence estimates to identified tandem mass spectra. Despite the crucial role such procedures play in most high-throughput proteomics experiments, the scientific literature has not reached a consensus about the best confidence estimation methodology. In this work, we evaluate, using theoretical and empirical analysis, four previously proposed protocols for estimating the false discovery rate (FDR) associated with a set of identified tandem mass spectra: two variants of the target-decoy competition protocol (TDC) of Elias and Gygi and two variants of the separate target-decoy search protocol of Käll et al. Our analysis reveals significant biases in the two separate target-decoy search protocols. Moreover, the one TDC protocol that provides an unbiased FDR estimate among the target PSMs does so at the cost of forfeiting a random subset of high-scoring spectrum identifications. We therefore propose the mix-max procedure to provide unbiased, accurate FDR estimates in the presence of well-calibrated scores. The method avoids biases associated with the two separate target-decoy search protocols and also avoids the propensity for target-decoy competition to discard a random subset of high-scoring target identifications. PMID:26152888

  16. Modeling Grade IV Gas Emboli using a Limited Failure Population Model with Random Effects

    NASA Technical Reports Server (NTRS)

    Thompson, Laura A.; Conkin, Johnny; Chhikara, Raj S.; Powell, Michael R.

    2002-01-01

    Venous gas emboli (VGE) (gas bubbles in venous blood) are associated with an increased risk of decompression sickness (DCS) in hypobaric environments. A high grade of VGE can be a precursor to serious DCS. In this paper, we model time to Grade IV VGE considering a subset of individuals assumed to be immune from experiencing VGE. Our data contain monitoring test results from subjects undergoing up to 13 denitrogenation test procedures prior to exposure to a hypobaric environment. The onset time of Grade IV VGE is recorded as contained within certain time intervals. We fit a parametric (lognormal) mixture survival model to the interval-and right-censored data to account for the possibility of a subset of "cured" individuals who are immune to the event. Our model contains random subject effects to account for correlations between repeated measurements on a single individual. Model assessments and cross-validation indicate that this limited failure population mixture model is an improvement over a model that does not account for the potential of a fraction of cured individuals. We also evaluated some alternative mixture models. Predictions from the best fitted mixture model indicate that the actual process is reasonably approximated by a limited failure population model.

  17. COMPARISON OF RANDOM AND SYSTEMATIC SITE SELECTION FOR ASSESSING ATTAINMENT OF AQUATIC LIFE USES IN SEGMENTS OF THE OHIO RIVER

    EPA Science Inventory

    This report is a description of field work and data analysis results comparing a design comparable to systematic site selection with one based on random selection of sites. The report is expected to validate the use of random site selection in the bioassessment program for the O...

  18. MISR Regional SAMUM Imagery Overview

    Atmospheric Science Data Center

    2016-08-24

    ... View Data  |  Download Data About this Web Site: Visualizations of select MISR Level 3 data for special regional ... regional version used in support of the SAMUM Campaign. More information about the Level 1 and Level 2 products subsetted for the SAMUM ...

  19. MISR Regional VBBE Imagery Overview

    Atmospheric Science Data Center

    2016-08-24

    ... View Data  |  Download Data About this Web Site: Visualizations of select MISR Level 3 data for special regional ... regional version used in support of the VBBE Campaign. More information about the Level 1 and Level 2 products subsetted for the VBBE ...

  20. Randomized trials published in some Chinese journals: how many are randomized?

    PubMed

    Wu, Taixiang; Li, Youping; Bian, Zhaoxiang; Liu, Guanjian; Moher, David

    2009-07-02

    The approximately 1100 medical journals now active in China are publishing a rapidly increasing number of research reports, including many studies identified by their authors as randomized controlled trials. It has been noticed that these reports mostly present positive results, and their quality and authenticity have consequently been called into question. We investigated the adequacy of randomization of clinical trials published in recent years in China to determine how many of them met acceptable standards for allocating participants to treatment groups. The China National Knowledge Infrastructure electronic database was searched for reports of randomized controlled trials on 20 common diseases published from January 1994 to June 2005. From this sample, a subset of trials that appeared to have used randomization methods was selected. Twenty-one investigators trained in the relevant knowledge, communication skills and quality control issues interviewed the original authors of these trials about the participant randomization methods and related quality-control features of their trials. From an initial sample of 37,313 articles identified in the China National Knowledge Infrastructure database, we found 3137 apparent randomized controlled trials. Of these, 1452 were studies of conventional medicine (published in 411 journals) and 1685 were studies of traditional Chinese medicine (published in 352 journals). Interviews with the authors of 2235 of these reports revealed that only 207 studies adhered to accepted methodology for randomization and could on those grounds be deemed authentic randomized controlled trials (6.8%, 95% confidence interval 5.9-7.7). There was no statistically significant difference in the rate of authenticity between randomized controlled trials of traditional interventions and those of conventional interventions. Randomized controlled trials conducted at hospitals affiliated to medical universities were more likely to be authentic than trials conducted at level 3 and level 2 hospitals (relative risk 1.58, 95% confidence interval 1.18-2.13, and relative risk 14.42, 95% confidence interval 9.40-22.10, respectively). The likelihood of authenticity was higher in level 3 hospitals than in level 2 hospitals (relative risk 9.32, 95% confidence interval 5.83-14.89). All randomized controlled trials of pre-market drug clinical trial were authentic by our criteria. Of the trials conducted at university-affiliated hospitals, 56.3% were authentic (95% confidence interval 32.0-81.0). Most reports of randomized controlled trials published in some Chinese journals lacked an adequate description of randomization. Similarly, most so called 'randomized controlled trials' were not real randomized controlled trials owing to a lack of adequate understanding on the part of the authors of rigorous clinical trial design. All randomized controlled trials of pre-market drug clinical trial included in this research were authentic. Randomized controlled trials conducted by authors in high level hospitals, especially in hospitals affiliated to medical universities had a higher rate of authenticity. That so many non-randomized controlled trials were published as randomized controlled trials reflected the fact that peer review needs to be improved and a good practice guide for peer review including how to identify the authenticity of the study urgently needs to be developed.

Top