2013-01-01
Background Community-based health care planning and regulation necessitates grouping facilities and areal units into regions of similar health care use. Limited research has explored the methodologies used in creating these regions. We offer a new methodology that clusters facilities based on similarities in patient utilization patterns and geographic location. Our case study focused on Hospital Groups in Michigan, the allocation units used for predicting future inpatient hospital bed demand in the state’s Bed Need Methodology. The scientific, practical, and political concerns that were considered throughout the formulation and development of the methodology are detailed. Methods The clustering methodology employs a 2-step K-means + Ward’s clustering algorithm to group hospitals. The final number of clusters is selected using a heuristic that integrates both a statistical-based measure of cluster fit and characteristics of the resulting Hospital Groups. Results Using recent hospital utilization data, the clustering methodology identified 33 Hospital Groups in Michigan. Conclusions Despite being developed within the politically charged climate of Certificate of Need regulation, we have provided an objective, replicable, and sustainable methodology to create Hospital Groups. Because the methodology is built upon theoretically sound principles of clustering analysis and health care service utilization, it is highly transferable across applications and suitable for grouping facilities or areal units. PMID:23964905
Delamater, Paul L; Shortridge, Ashton M; Messina, Joseph P
2013-08-22
Community-based health care planning and regulation necessitates grouping facilities and areal units into regions of similar health care use. Limited research has explored the methodologies used in creating these regions. We offer a new methodology that clusters facilities based on similarities in patient utilization patterns and geographic location. Our case study focused on Hospital Groups in Michigan, the allocation units used for predicting future inpatient hospital bed demand in the state's Bed Need Methodology. The scientific, practical, and political concerns that were considered throughout the formulation and development of the methodology are detailed. The clustering methodology employs a 2-step K-means + Ward's clustering algorithm to group hospitals. The final number of clusters is selected using a heuristic that integrates both a statistical-based measure of cluster fit and characteristics of the resulting Hospital Groups. Using recent hospital utilization data, the clustering methodology identified 33 Hospital Groups in Michigan. Despite being developed within the politically charged climate of Certificate of Need regulation, we have provided an objective, replicable, and sustainable methodology to create Hospital Groups. Because the methodology is built upon theoretically sound principles of clustering analysis and health care service utilization, it is highly transferable across applications and suitable for grouping facilities or areal units.
A hierarchical clustering methodology for the estimation of toxicity.
Martin, Todd M; Harten, Paul; Venkatapathy, Raghuraman; Das, Shashikala; Young, Douglas M
2008-01-01
ABSTRACT A quantitative structure-activity relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural similarity is defined in terms of 2-D physicochemical descriptors (such as connectivity and E-state indices). A genetic algorithm-based technique is used to generate statistically valid QSAR models for each cluster (using the pool of descriptors described above). The toxicity for a given query compound is estimated using the weighted average of the predictions from the closest cluster from each step in the hierarchical clustering assuming that the compound is within the domain of applicability of the cluster. The hierarchical clustering methodology was tested using a Tetrahymena pyriformis acute toxicity data set containing 644 chemicals in the training set and with two prediction sets containing 339 and 110 chemicals. The results from the hierarchical clustering methodology were compared to the results from several different QSAR methodologies.
A Hierarchical Clustering Methodology for the Estimation of Toxicity
A Quantitative Structure Activity Relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural sim...
Craig, Hugh; Berretta, Regina; Moscato, Pablo
2016-01-01
In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. PMID:27571416
Peleg, Mor; Asbeh, Nuaman; Kuflik, Tsvi; Schertz, Mitchell
2009-02-01
Children with developmental disorders usually exhibit multiple developmental problems (comorbidities). Hence, such diagnosis needs to revolve on developmental disorder groups. Our objective is to systematically identify developmental disorder groups and represent them in an ontology. We developed a methodology that combines two methods (1) a literature-based ontology that we created, which represents developmental disorders and potential developmental disorder groups, and (2) clustering for detecting comorbid developmental disorders in patient data. The ontology is used to interpret and improve clustering results and the clustering results are used to validate the ontology and suggest directions for its development. We evaluated our methodology by applying it to data of 1175 patients from a child development clinic. We demonstrated that the ontology improves clustering results, bringing them closer to an expert generated gold-standard. We have shown that our methodology successfully combines an ontology with a clustering method to support systematic identification and representation of developmental disorder groups.
Ortiz-Rosario, Alexis; Adeli, Hojjat; Buford, John A
2017-01-15
Researchers often rely on simple methods to identify involvement of neurons in a particular motor task. The historical approach has been to inspect large groups of neurons and subjectively separate neurons into groups based on the expertise of the investigator. In cases where neuron populations are small it is reasonable to inspect these neuronal recordings and their firing rates carefully to avoid data omissions. In this paper, a new methodology is presented for automatic objective classification of neurons recorded in association with behavioral tasks into groups. By identifying characteristics of neurons in a particular group, the investigator can then identify functional classes of neurons based on their relationship to the task. The methodology is based on integration of a multiple signal classification (MUSIC) algorithm to extract relevant features from the firing rate and an expectation-maximization Gaussian mixture algorithm (EM-GMM) to cluster the extracted features. The methodology is capable of identifying and clustering similar firing rate profiles automatically based on specific signal features. An empirical wavelet transform (EWT) was used to validate the features found in the MUSIC pseudospectrum and the resulting signal features captured by the methodology. Additionally, this methodology was used to inspect behavioral elements of neurons to physiologically validate the model. This methodology was tested using a set of data collected from awake behaving non-human primates. Copyright © 2016 Elsevier B.V. All rights reserved.
Tsatsoulis, C; Amthauer, H
2003-01-01
A novel methodological approach for identifying clusters of similar medical incidents by analyzing large databases of incident reports is described. The discovery of similar events allows the identification of patterns and trends, and makes possible the prediction of future events and the establishment of barriers and best practices. Two techniques from the fields of information science and artificial intelligence have been integrated—namely, case based reasoning and information retrieval—and very good clustering accuracies have been achieved on a test data set of incident reports from transfusion medicine. This work suggests that clustering should integrate the features of an incident captured in traditional form based records together with the detailed information found in the narrative included in event reports. PMID:14645892
A new approach for the assessment of temporal clustering of extratropical wind storms
NASA Astrophysics Data System (ADS)
Schuster, Mareike; Eddounia, Fadoua; Kuhnel, Ivan; Ulbrich, Uwe
2017-04-01
A widely-used methodology to assess the clustering of storms in a region is based on dispersion statistics of a simple homogeneous Poisson process. This clustering measure is determined by the ratio of the variance and the mean of the local storm statistics per grid point. Resulting values larger than 1, i.e. when the variance is larger than the mean, indicate clustering; while values lower than 1 indicate a sequencing of storms that is more regular than a random process. However, a disadvantage of this methodology is that the characteristics are valid for a pre-defined climatological time period, and it is not possible to identify a temporal variability of clustering. Also, the absolute value of the dispersion statistics is not particularly intuitive. We have developed an approach to describe temporal clustering of storms which offers a more intuitive comprehension, and at the same time allows to assess temporal variations. The approach is based on the local distribution of waiting times between the occurrence of two individual storm events, the former being computed through the post-processing of individual windstorm tracks which in turn are obtained by an objective tracking algorithm. Based on this distribution a threshold can be set, either by the waiting time expected from a random process or by a quantile of the observed distribution. Thus, it can be determined if two consecutive wind storm events count as part of a (temporal) cluster. We analyze extratropical wind storms in a reanalysis dataset and compare the results of the traditional clustering measure with our new methodology. We assess what range of clustering events (in terms of duration and frequency) is covered and identify if the historically known clustered seasons are detectable by the new clustering measure in the reanalysis.
Gibert, Karina; García-Rudolph, Alejandro; Curcoll, Lluïsa; Soler, Dolors; Pla, Laura; Tormos, José María
2009-01-01
In this paper, an integral Knowledge Discovery Methodology, named Clustering based on rules by States, which incorporates artificial intelligence (AI) and statistical methods as well as interpretation-oriented tools, is used for extracting knowledge patterns about the evolution over time of the Quality of Life (QoL) of patients with Spinal Cord Injury. The methodology incorporates the interaction with experts as a crucial element with the clustering methodology to guarantee usefulness of the results. Four typical patterns are discovered by taking into account prior expert knowledge. Several hypotheses are elaborated about the reasons for psychological distress or decreases in QoL of patients over time. The knowledge discovery from data (KDD) approach turns out, once again, to be a suitable formal framework for handling multidimensional complexity of the health domains.
Stability-based validation of dietary patterns obtained by cluster analysis.
Sauvageot, Nicolas; Schritz, Anna; Leite, Sonia; Alkerwi, Ala'a; Stranges, Saverio; Zannad, Faiez; Streel, Sylvie; Hoge, Axelle; Donneau, Anne-Françoise; Albert, Adelin; Guillaume, Michèle
2017-01-14
Cluster analysis is a data-driven method used to create clusters of individuals sharing similar dietary habits. However, this method requires specific choices from the user which have an influence on the results. Therefore, there is a need of an objective methodology helping researchers in their decisions during cluster analysis. The objective of this study was to use such a methodology based on stability of clustering solutions to select the most appropriate clustering method and number of clusters for describing dietary patterns in the NESCAV study (Nutrition, Environment and Cardiovascular Health), a large population-based cross-sectional study in the Greater Region (N = 2298). Clustering solutions were obtained with K-means, K-medians and Ward's method and a number of clusters varying from 2 to 6. Their stability was assessed with three indices: adjusted Rand index, Cramer's V and misclassification rate. The most stable solution was obtained with K-means method and a number of clusters equal to 3. The "Convenient" cluster characterized by the consumption of convenient foods was the most prevalent with 46% of the population having this dietary behaviour. In addition, a "Prudent" and a "Non-Prudent" patterns associated respectively with healthy and non-healthy dietary habits were adopted by 25% and 29% of the population. The "Convenient" and "Non-Prudent" clusters were associated with higher cardiovascular risk whereas the "Prudent" pattern was associated with a decreased cardiovascular risk. Associations with others factors showed that the choice of a specific dietary pattern is part of a wider lifestyle profile. This study is of interest for both researchers and public health professionals. From a methodological standpoint, we showed that using stability of clustering solutions could help researchers in their choices. From a public health perspective, this study showed the need of targeted health promotion campaigns describing the benefits of healthy dietary patterns.
NASA Astrophysics Data System (ADS)
Wang, Lynn T.-N.; Madhavan, Sriram
2018-03-01
A pattern matching and rule-based polygon clustering methodology with DFM scoring is proposed to detect decomposition-induced manufacturability detractors and fix the layout designs prior to manufacturing. A pattern matcher scans the layout for pre-characterized patterns from a library. If a pattern were detected, rule-based clustering identifies the neighboring polygons that interact with those captured by the pattern. Then, DFM scores are computed for the possible layout fixes: the fix with the best score is applied. The proposed methodology was applied to two 20nm products with a chip area of 11 mm2 on the metal 2 layer. All the hotspots were resolved. The number of DFM spacing violations decreased by 7-15%.
Chemodynamical Clustering Applied to APOGEE Data: Rediscovering Globular Clusters
NASA Astrophysics Data System (ADS)
Chen, Boquan; D’Onghia, Elena; Pardy, Stephen A.; Pasquali, Anna; Bertelli Motta, Clio; Hanlon, Bret; Grebel, Eva K.
2018-06-01
We have developed a novel technique based on a clustering algorithm that searches for kinematically and chemically clustered stars in the APOGEE DR12 Cannon data. As compared to classical chemical tagging, the kinematic information included in our methodology allows us to identify stars that are members of known globular clusters with greater confidence. We apply our algorithm to the entire APOGEE catalog of 150,615 stars whose chemical abundances are derived by the Cannon. Our methodology found anticorrelations between the elements Al and Mg, Na and O, and C and N previously identified in the optical spectra in globular clusters, even though we omit these elements in our algorithm. Our algorithm identifies globular clusters without a priori knowledge of their locations in the sky. Thus, not only does this technique promise to discover new globular clusters, but it also allows us to identify candidate streams of kinematically and chemically clustered stars in the Milky Way.
Sideris, Costas; Alshurafa, Nabil; Pourhomayoun, Mohammad; Shahmohammadi, Farhad; Samy, Lauren; Sarrafzadeh, Majid
2015-01-01
In this paper, we propose a novel methodology for utilizing disease diagnostic information to predict severity of condition for Congestive Heart Failure (CHF) patients. Our methodology relies on a novel, clustering-based, feature extraction framework using disease diagnostic information. To reduce the dimensionality we identify disease clusters using cooccurence frequencies. We then utilize these clusters as features to predict patient severity of condition. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 patients. We compare our cluster-based feature set with another that incorporates the Charlson comorbidity score as a feature and demonstrate an accuracy improvement of up to 14% in the predictability of the severity of condition.
Assessment of cluster yield components by image analysis.
Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose
2015-04-01
Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.
Using scan statistics for congenital anomalies surveillance: the EUROCAT methodology.
Teljeur, Conor; Kelly, Alan; Loane, Maria; Densem, James; Dolk, Helen
2015-11-01
Scan statistics have been used extensively to identify temporal clusters of health events. We describe the temporal cluster detection methodology adopted by the EUROCAT (European Surveillance of Congenital Anomalies) monitoring system. Since 2001, EUROCAT has implemented variable window width scan statistic for detecting unusual temporal aggregations of congenital anomaly cases. The scan windows are based on numbers of cases rather than being defined by time. The methodology is imbedded in the EUROCAT Central Database for annual application to centrally held registry data. The methodology was incrementally adapted to improve the utility and to address statistical issues. Simulation exercises were used to determine the power of the methodology to identify periods of raised risk (of 1-18 months). In order to operationalize the scan methodology, a number of adaptations were needed, including: estimating date of conception as unit of time; deciding the maximum length (in time) and recency of clusters of interest; reporting of multiple and overlapping significant clusters; replacing the Monte Carlo simulation with a lookup table to reduce computation time; and placing a threshold on underlying population change and estimating the false positive rate by simulation. Exploration of power found that raised risk periods lasting 1 month are unlikely to be detected except when the relative risk and case counts are high. The variable window width scan statistic is a useful tool for the surveillance of congenital anomalies. Numerous adaptations have improved the utility of the original methodology in the context of temporal cluster detection in congenital anomalies.
Enrichment 2.0 Gifted and Talented Education for the 21st Century
ERIC Educational Resources Information Center
Eckstein, Michelle
2009-01-01
Enrichment clusters, a component of the Schoolwide Enrichment Model, are multigrade investigative groups based on constructivist learning methodology. Enrichment clusters are organized around major disciplines, interdisciplinary themes, or cross-disciplinary topics. Within clusters, students are grouped across grade levels by interests and focused…
Distributive Education Competency-Based Curriculum Models by Occupational Clusters. Final Report.
ERIC Educational Resources Information Center
Davis, Rodney E.; Husted, Stewart W.
To meet the needs of distributive education teachers and students, a project was initiated to develop competency-based curriculum models for marketing and distributive education clusters. The models which were developed incorporate competencies, materials and resources, teaching methodologies/learning activities, and evaluative criteria for the…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spinella, Corrado; Bongiorno, Corrado; Nicotra, Giuseppe
2005-07-25
We present an analytical methodology, based on electron energy loss spectroscopy (EELS) and energy-filtered transmission electron microscopy, which allows us to quantify the clustered silicon concentration in annealed substoichiometric silicon oxide layers, deposited by plasma-enhanced chemical vapor deposition. The clustered Si volume fraction was deduced from a fit to the experimental EELS spectrum using a theoretical description proposed to calculate the dielectric function of a system of spherical particles of equal radii, located at random in a host material. The methodology allowed us to demonstrate that the clustered Si concentration is only one half of the excess Si concentration dissolvedmore » in the layer.« less
NASA Astrophysics Data System (ADS)
Eliçabe, Guillermo E.
2013-09-01
In this work, an exact scattering model for a system of clusters of spherical particles, based on the Rayleigh-Gans approximation, has been parameterized in such a way that it can be solved in inverse form using Thikhonov Regularization to obtain the morphological parameters of the clusters. That is to say, the average number of particles per cluster, the size of the primary spherical units that form the cluster, and the Discrete Distance Distribution Function from which the z-average square radius of gyration of the system of clusters is obtained. The methodology is validated through a series of simulated and experimental examples of x-ray and light scattering that show that the proposed methodology works satisfactorily in unideal situations such as: presence of error in the measurements, presence of error in the model, and several types of unideallities present in the experimental cases.
Booth, Andrew; Harris, Janet; Croot, Elizabeth; Springett, Jane; Campbell, Fiona; Wilkins, Emma
2013-09-28
Systematic review methodologies can be harnessed to help researchers to understand and explain how complex interventions may work. Typically, when reviewing complex interventions, a review team will seek to understand the theories that underpin an intervention and the specific context for that intervention. A single published report from a research project does not typically contain this required level of detail. A review team may find it more useful to examine a "study cluster"; a group of related papers that explore and explain various features of a single project and thus supply necessary detail relating to theory and/or context.We sought to conduct a preliminary investigation, from a single case study review, of techniques required to identify a cluster of related research reports, to document the yield from such methods, and to outline a systematic methodology for cluster searching. In a systematic review of community engagement we identified a relevant project - the Gay Men's Task Force. From a single "key pearl citation" we conducted a series of related searches to find contextually or theoretically proximate documents. We followed up Citations, traced Lead authors, identified Unpublished materials, searched Google Scholar, tracked Theories, undertook ancestry searching for Early examples and followed up Related projects (embodied in the CLUSTER mnemonic). Our structured, formalised procedure for cluster searching identified useful reports that are not typically identified from topic-based searches on bibliographic databases. Items previously rejected by an initial sift were subsequently found to inform our understanding of underpinning theory (for example Diffusion of Innovations Theory), context or both. Relevant material included book chapters, a Web-based process evaluation, and peer reviewed reports of projects sharing a common ancestry. We used these reports to understand the context for the intervention and to explore explanations for its relative lack of success. Additional data helped us to challenge simplistic assumptions on the homogeneity of the target population. A single case study suggests the potential utility of cluster searching, particularly for reviews that depend on an understanding of context, e.g. realist synthesis. The methodology is transparent, explicit and reproducible. There is no reason to believe that cluster searching is not generalizable to other review topics. Further research should examine the contribution of the methodology beyond improved yield, to the final synthesis and interpretation, possibly by utilizing qualitative sensitivity analysis.
PuReD-MCL: a graph-based PubMed document clustering methodology.
Theodosiou, T; Darzentas, N; Angelis, L; Ouzounis, C A
2008-09-01
Biomedical literature is the principal repository of biomedical knowledge, with PubMed being the most complete database collecting, organizing and analyzing such textual knowledge. There are numerous efforts that attempt to exploit this information by using text mining and machine learning techniques. We developed a novel approach, called PuReD-MCL (Pubmed Related Documents-MCL), which is based on the graph clustering algorithm MCL and relevant resources from PubMed. PuReD-MCL avoids using natural language processing (NLP) techniques directly; instead, it takes advantage of existing resources, available from PubMed. PuReD-MCL then clusters documents efficiently using the MCL graph clustering algorithm, which is based on graph flow simulation. This process allows users to analyse the results by highlighting important clues, and finally to visualize the clusters and all relevant information using an interactive graph layout algorithm, for instance BioLayout Express 3D. The methodology was applied to two different datasets, previously used for the validation of the document clustering tool TextQuest. The first dataset involves the organisms Escherichia coli and yeast, whereas the second is related to Drosophila development. PuReD-MCL successfully reproduces the annotated results obtained from TextQuest, while at the same time provides additional insights into the clusters and the corresponding documents. Source code in perl and R are available from http://tartara.csd.auth.gr/~theodos/
NASA Astrophysics Data System (ADS)
Sierra-Pérez, Julián; Torres-Arredondo, M.-A.; Alvarez-Montoya, Joham
2018-01-01
Structural health monitoring consists of using sensors integrated within structures together with algorithms to perform load monitoring, damage detection, damage location, damage size and severity, and prognosis. One possibility is to use strain sensors to infer structural integrity by comparing patterns in the strain field between the pristine and damaged conditions. In previous works, the authors have demonstrated that it is possible to detect small defects based on strain field pattern recognition by using robust machine learning techniques. They have focused on methodologies based on principal component analysis (PCA) and on the development of several unfolding and standardization techniques, which allow dealing with multiple load conditions. However, before a real implementation of this approach in engineering structures, changes in the strain field due to conditions different from damage occurrence need to be isolated. Since load conditions may vary in most engineering structures and promote significant changes in the strain field, it is necessary to implement novel techniques for uncoupling such changes from those produced by damage occurrence. A damage detection methodology based on optimal baseline selection (OBS) by means of clustering techniques is presented. The methodology includes the use of hierarchical nonlinear PCA as a nonlinear modeling technique in conjunction with Q and nonlinear-T 2 damage indices. The methodology is experimentally validated using strain measurements obtained by 32 fiber Bragg grating sensors bonded to an aluminum beam under dynamic bending loads and simultaneously submitted to variations in its pitch angle. The results demonstrated the capability of the methodology for clustering data according to 13 different load conditions (pitch angles), performing the OBS and detecting six different damages induced in a cumulative way. The proposed methodology showed a true positive rate of 100% and a false positive rate of 1.28% for a 99% of confidence.
A Wavelet-Based Methodology for Grinding Wheel Condition Monitoring
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liao, T. W.; Ting, C.F.; Qu, Jun
2007-01-01
Grinding wheel surface condition changes as more material is removed. This paper presents a wavelet-based methodology for grinding wheel condition monitoring based on acoustic emission (AE) signals. Grinding experiments in creep feed mode were conducted to grind alumina specimens with a resinoid-bonded diamond wheel using two different conditions. During the experiments, AE signals were collected when the wheel was 'sharp' and when the wheel was 'dull'. Discriminant features were then extracted from each raw AE signal segment using the discrete wavelet decomposition procedure. An adaptive genetic clustering algorithm was finally applied to the extracted features in order to distinguish differentmore » states of grinding wheel condition. The test results indicate that the proposed methodology can achieve 97% clustering accuracy for the high material removal rate condition, 86.7% for the low material removal rate condition, and 76.7% for the combined grinding conditions if the base wavelet, the decomposition level, and the GA parameters are properly selected.« less
Modeling and clustering water demand patterns from real-world smart meter data
NASA Astrophysics Data System (ADS)
Cheifetz, Nicolas; Noumir, Zineb; Samé, Allou; Sandraz, Anne-Claire; Féliers, Cédric; Heim, Véronique
2017-08-01
Nowadays, drinking water utilities need an acute comprehension of the water demand on their distribution network, in order to efficiently operate the optimization of resources, manage billing and propose new customer services. With the emergence of smart grids, based on automated meter reading (AMR), a better understanding of the consumption modes is now accessible for smart cities with more granularities. In this context, this paper evaluates a novel methodology for identifying relevant usage profiles from the water consumption data produced by smart meters. The methodology is fully data-driven using the consumption time series which are seen as functions or curves observed with an hourly time step. First, a Fourier-based additive time series decomposition model is introduced to extract seasonal patterns from time series. These patterns are intended to represent the customer habits in terms of water consumption. Two functional clustering approaches are then used to classify the extracted seasonal patterns: the functional version of K-means, and the Fourier REgression Mixture (FReMix) model. The K-means approach produces a hard segmentation and K representative prototypes. On the other hand, the FReMix is a generative model and also produces K profiles as well as a soft segmentation based on the posterior probabilities. The proposed approach is applied to a smart grid deployed on the largest water distribution network (WDN) in France. The two clustering strategies are evaluated and compared. Finally, a realistic interpretation of the consumption habits is given for each cluster. The extensive experiments and the qualitative interpretation of the resulting clusters allow one to highlight the effectiveness of the proposed methodology.
Marzaro, Giovanni; Ferrarese, Alessandro; Chilin, Adriana
2014-08-01
The selection of the most appropriate protein conformation is a crucial aspect in molecular docking experiments. In order to reduce the errors arising from the use of a single protein conformation, several authors suggest the use of several tridimensional structures for the target. However, the selection of the most appropriate protein conformations still remains a challenging goal. The protein 3D-structures selection is mainly performed based on pairwise root-mean-square-deviation (RMSD) values computation, followed by hierarchical clustering. Herein we report an alternative strategy, based on the computation of only two atom affinity map for each protein conformation, followed by multivariate analysis and hierarchical clustering. This methodology was applied on seven different kinases of pharmaceutical interest. The comparison with the classical RMSD-based strategy was based on cross-docking of co-crystallized ligands. In the case of epidermal growth factor receptor kinase, also the docking performance on 220 known ligands were evaluated, followed by 3D-QSAR studies. In all the cases, the herein proposed methodology outperformed the RMSD-based one.
Clustering for unsupervised fault diagnosis in nuclear turbine shut-down transients
NASA Astrophysics Data System (ADS)
Baraldi, Piero; Di Maio, Francesco; Rigamonti, Marco; Zio, Enrico; Seraoui, Redouane
2015-06-01
Empirical methods for fault diagnosis usually entail a process of supervised training based on a set of examples of signal evolutions "labeled" with the corresponding, known classes of fault. However, in practice, the signals collected during plant operation may be, very often, "unlabeled", i.e., the information on the corresponding type of occurred fault is not available. To cope with this practical situation, in this paper we develop a methodology for the identification of transient signals showing similar characteristics, under the conjecture that operational/faulty transient conditions of the same type lead to similar behavior in the measured signals evolution. The methodology is founded on a feature extraction procedure, which feeds a spectral clustering technique, embedding the unsupervised fuzzy C-means (FCM) algorithm, which evaluates the functional similarity among the different operational/faulty transients. A procedure for validating the plausibility of the obtained clusters is also propounded based on physical considerations. The methodology is applied to a real industrial case, on the basis of 148 shut-down transients of a Nuclear Power Plant (NPP) steam turbine.
Spatial pattern recognition of seismic events in South West Colombia
NASA Astrophysics Data System (ADS)
Benítez, Hernán D.; Flórez, Juan F.; Duque, Diana P.; Benavides, Alberto; Lucía Baquero, Olga; Quintero, Jiber
2013-09-01
Recognition of seismogenic zones in geographical regions supports seismic hazard studies. This recognition is usually based on visual, qualitative and subjective analysis of data. Spatial pattern recognition provides a well founded means to obtain relevant information from large amounts of data. The purpose of this work is to identify and classify spatial patterns in instrumental data of the South West Colombian seismic database. In this research, clustering tendency analysis validates whether seismic database possesses a clustering structure. A non-supervised fuzzy clustering algorithm creates groups of seismic events. Given the sensitivity of fuzzy clustering algorithms to centroid initial positions, we proposed a methodology to initialize centroids that generates stable partitions with respect to centroid initialization. As a result of this work, a public software tool provides the user with the routines developed for clustering methodology. The analysis of the seismogenic zones obtained reveals meaningful spatial patterns in South-West Colombia. The clustering analysis provides a quantitative location and dispersion of seismogenic zones that facilitates seismological interpretations of seismic activities in South West Colombia.
Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny
2016-01-01
Depression is commonly comorbid with many other somatic diseases and symptoms. Identification of individuals in clusters with comorbid symptoms may reveal new pathophysiological mechanisms and treatment targets. The aim of this research was to combine machine-learning (ML) algorithms with traditional regression techniques by utilising self-reported medical symptoms to identify and describe clusters of individuals with increased rates of depression from a large cross-sectional community based population epidemiological study. A multi-staged methodology utilising ML and traditional statistical techniques was performed using the community based population National Health and Nutrition Examination Study (2009-2010) (N = 3,922). A Self-organised Mapping (SOM) ML algorithm, combined with hierarchical clustering, was performed to create participant clusters based on 68 medical symptoms. Binary logistic regression, controlling for sociodemographic confounders, was used to then identify the key clusters of participants with higher levels of depression (PHQ-9≥10, n = 377). Finally, a Multiple Additive Regression Tree boosted ML algorithm was run to identify the important medical symptoms for each key cluster within 17 broad categories: heart, liver, thyroid, respiratory, diabetes, arthritis, fractures and osteoporosis, skeletal pain, blood pressure, blood transfusion, cholesterol, vision, hearing, psoriasis, weight, bowels and urinary. Five clusters of participants, based on medical symptoms, were identified to have significantly increased rates of depression compared to the cluster with the lowest rate: odds ratios ranged from 2.24 (95% CI 1.56, 3.24) to 6.33 (95% CI 1.67, 24.02). The ML boosted regression algorithm identified three key medical condition categories as being significantly more common in these clusters: bowel, pain and urinary symptoms. Bowel-related symptoms was found to dominate the relative importance of symptoms within the five key clusters. This methodology shows promise for the identification of conditions in general populations and supports the current focus on the potential importance of bowel symptoms and the gut in mental health research.
Gibert, Karina; García-Rudolph, Alejandro; García-Molina, Alberto; Roig-Rovira, Teresa; Bernabeu, Montse; Tormos, José María
2008-01-01
Develop a classificatory tool to identify different populations of patients with Traumatic Brain Injury based on the characteristics of deficit and response to treatment. A KDD framework where first, descriptive statistics of every variable was done, data cleaning and selection of relevant variables. Then data was mined using a generalization of Clustering based on rules (CIBR), an hybrid AI and Statistics technique which combines inductive learning (AI) and clustering (Statistics). A prior Knowledge Base (KB) is considered to properly bias the clustering; semantic constraints implied by the KB hold in final clusters, guaranteeing interpretability of the resultis. A generalization (Exogenous Clustering based on rules, ECIBR) is presented, allowing to define the KB in terms of variables which will not be considered in the clustering process itself, to get more flexibility. Several tools as Class panel graph are introduced in the methodology to assist final interpretation. A set of 5 classes was recommended by the system and interpretation permitted profiles labeling. From the medical point of view, composition of classes is well corresponding with different patterns of increasing level of response to rehabilitation treatments. All the patients initially assessable conform a single group. Severe impaired patients are subdivided in four profiles which clearly distinct response patterns. Particularly interesting the partial response profile, where patients could not improve executive functions. Meaningful classes were obtained and, from a semantics point of view, the results were sensibly improved regarding classical clustering, according to our opinion that hybrid AI & Stats techniques are more powerful for KDD than pure ones.
Nguyen, Hien D; Ullmann, Jeremy F P; McLachlan, Geoffrey J; Voleti, Venkatakaushik; Li, Wenze; Hillman, Elizabeth M C; Reutens, David C; Janke, Andrew L
2018-02-01
Calcium is a ubiquitous messenger in neural signaling events. An increasing number of techniques are enabling visualization of neurological activity in animal models via luminescent proteins that bind to calcium ions. These techniques generate large volumes of spatially correlated time series. A model-based functional data analysis methodology via Gaussian mixtures is suggested for the clustering of data from such visualizations is proposed. The methodology is theoretically justified and a computationally efficient approach to estimation is suggested. An example analysis of a zebrafish imaging experiment is presented.
Automatic Clustering of Rolling Element Bearings Defects with Artificial Neural Network
NASA Astrophysics Data System (ADS)
Antonini, M.; Faglia, R.; Pedersoli, M.; Tiboni, M.
2006-06-01
The paper presents the optimization of a methodology for automatic clustering based on Artificial Neural Networks to detect the presence of defects in rolling bearings. The research activity was developed in co-operation with an Italian company which is expert in the production of water pumps for automotive use (Industrie Saleri Italo). The final goal of the work is to develop a system for the automatic control of the pumps, at the end of the production line. In this viewpoint, we are gradually considering the main elements of the water pump, which can cause malfunctioning. The first elements we have considered are the rolling bearing, a very critic component for the system. The experimental activity is based on the vibration measuring of rolling bearings opportunely damaged; vibration signals are in the second phase elaborated; the third and last phase is an automatic clustering. Different signal elaboration techniques are compared to optimize the methodology.
A flexible data-driven comorbidity feature extraction framework.
Sideris, Costas; Pourhomayoun, Mohammad; Kalantarian, Haik; Sarrafzadeh, Majid
2016-06-01
Disease and symptom diagnostic codes are a valuable resource for classifying and predicting patient outcomes. In this paper, we propose a novel methodology for utilizing disease diagnostic information in a predictive machine learning framework. Our methodology relies on a novel, clustering-based feature extraction framework using disease diagnostic information. To reduce the data dimensionality, we identify disease clusters using co-occurrence statistics. We optimize the number of generated clusters in the training set and then utilize these clusters as features to predict patient severity of condition and patient readmission risk. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million hospital discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 Congestive Heart Failure (CHF) patients and the UCI 130-US diabetes dataset that includes admissions from 69,980 diabetic patients. We compare our cluster-based feature set with the commonly used comorbidity frameworks including Charlson's index, Elixhauser's comorbidities and their variations. The proposed approach was shown to have significant gains between 10.7-22.1% in predictive accuracy for CHF severity of condition prediction and 4.65-5.75% in diabetes readmission prediction. Copyright © 2016 Elsevier Ltd. All rights reserved.
Clustering molecular dynamics trajectories for optimizing docking experiments.
De Paris, Renata; Quevedo, Christian V; Ruiz, Duncan D; Norberto de Souza, Osmar; Barros, Rodrigo C
2015-01-01
Molecular dynamics simulations of protein receptors have become an attractive tool for rational drug discovery. However, the high computational cost of employing molecular dynamics trajectories in virtual screening of large repositories threats the feasibility of this task. Computational intelligence techniques have been applied in this context, with the ultimate goal of reducing the overall computational cost so the task can become feasible. Particularly, clustering algorithms have been widely used as a means to reduce the dimensionality of molecular dynamics trajectories. In this paper, we develop a novel methodology for clustering entire trajectories using structural features from the substrate-binding cavity of the receptor in order to optimize docking experiments on a cloud-based environment. The resulting partition was selected based on three clustering validity criteria, and it was further validated by analyzing the interactions between 20 ligands and a fully flexible receptor (FFR) model containing a 20 ns molecular dynamics simulation trajectory. Our proposed methodology shows that taking into account features of the substrate-binding cavity as input for the k-means algorithm is a promising technique for accurately selecting ensembles of representative structures tailored to a specific ligand.
Clustering of financial time series with application to index and enhanced index tracking portfolio
NASA Astrophysics Data System (ADS)
Dose, Christian; Cincotti, Silvano
2005-09-01
A stochastic-optimization technique based on time series cluster analysis is described for index tracking and enhanced index tracking problems. Our methodology solves the problem in two steps, i.e., by first selecting a subset of stocks and then setting the weight of each stock as a result of an optimization process (asset allocation). Present formulation takes into account constraints on the number of stocks and on the fraction of capital invested in each of them, whilst not including transaction costs. Computational results based on clustering selection are compared to those of random techniques and show the importance of clustering in noise reduction and robust forecasting applications, in particular for enhanced index tracking.
Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O
2015-01-01
To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.
NASA Astrophysics Data System (ADS)
Li, Xiwang
Buildings consume about 41.1% of primary energy and 74% of the electricity in the U.S. Moreover, it is estimated by the National Energy Technology Laboratory that more than 1/4 of the 713 GW of U.S. electricity demand in 2010 could be dispatchable if only buildings could respond to that dispatch through advanced building energy control and operation strategies and smart grid infrastructure. In this study, it is envisioned that neighboring buildings will have the tendency to form a cluster, an open cyber-physical system to exploit the economic opportunities provided by a smart grid, distributed power generation, and storage devices. Through optimized demand management, these building clusters will then reduce overall primary energy consumption and peak time electricity consumption, and be more resilient to power disruptions. Therefore, this project seeks to develop a Net-zero building cluster simulation testbed and high fidelity energy forecasting models for adaptive and real-time control and decision making strategy development that can be used in a Net-zero building cluster. The following research activities are summarized in this thesis: 1) Development of a building cluster emulator for building cluster control and operation strategy assessment. 2) Development of a novel building energy forecasting methodology using active system identification and data fusion techniques. In this methodology, a systematic approach for building energy system characteristic evaluation, system excitation and model adaptation is included. The developed methodology is compared with other literature-reported building energy forecasting methods; 3) Development of the high fidelity on-line building cluster energy forecasting models, which includes energy forecasting models for buildings, PV panels, batteries and ice tank thermal storage systems 4) Small scale real building validation study to verify the performance of the developed building energy forecasting methodology. The outcomes of this thesis can be used for building cluster energy forecasting model development and model based control and operation optimization. The thesis concludes with a summary of the key outcomes of this research, as well as a list of recommendations for future work.
Dong, Skye T; Costa, Daniel S J; Butow, Phyllis N; Lovell, Melanie R; Agar, Meera; Velikova, Galina; Teckle, Paulos; Tong, Allison; Tebbutt, Niall C; Clarke, Stephen J; van der Hoek, Kim; King, Madeleine T; Fayers, Peter M
2016-01-01
Symptom clusters in advanced cancer can influence patient outcomes. There is large heterogeneity in the methods used to identify symptom clusters. To investigate the consistency of symptom cluster composition in advanced cancer patients using different statistical methodologies for all patients across five primary cancer sites, and to examine which clusters predict functional status, a global assessment of health and global quality of life. Principal component analysis and exploratory factor analysis (with different rotation and factor selection methods) and hierarchical cluster analysis (with different linkage and similarity measures) were used on a data set of 1562 advanced cancer patients who completed the European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire-Core 30. Four clusters consistently formed for many of the methods and cancer sites: tense-worry-irritable-depressed (emotional cluster), fatigue-pain, nausea-vomiting, and concentration-memory (cognitive cluster). The emotional cluster was a stronger predictor of overall quality of life than the other clusters. Fatigue-pain was a stronger predictor of overall health than the other clusters. The cognitive cluster and fatigue-pain predicted physical functioning, role functioning, and social functioning. The four identified symptom clusters were consistent across statistical methods and cancer types, although there were some noteworthy differences. Statistical derivation of symptom clusters is in need of greater methodological guidance. A psychosocial pathway in the management of symptom clusters may improve quality of life. Biological mechanisms underpinning symptom clusters need to be delineated by future research. A framework for evidence-based screening, assessment, treatment, and follow-up of symptom clusters in advanced cancer is essential. Copyright © 2016 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Clustering Molecular Dynamics Trajectories for Optimizing Docking Experiments
De Paris, Renata; Quevedo, Christian V.; Ruiz, Duncan D.; Norberto de Souza, Osmar; Barros, Rodrigo C.
2015-01-01
Molecular dynamics simulations of protein receptors have become an attractive tool for rational drug discovery. However, the high computational cost of employing molecular dynamics trajectories in virtual screening of large repositories threats the feasibility of this task. Computational intelligence techniques have been applied in this context, with the ultimate goal of reducing the overall computational cost so the task can become feasible. Particularly, clustering algorithms have been widely used as a means to reduce the dimensionality of molecular dynamics trajectories. In this paper, we develop a novel methodology for clustering entire trajectories using structural features from the substrate-binding cavity of the receptor in order to optimize docking experiments on a cloud-based environment. The resulting partition was selected based on three clustering validity criteria, and it was further validated by analyzing the interactions between 20 ligands and a fully flexible receptor (FFR) model containing a 20 ns molecular dynamics simulation trajectory. Our proposed methodology shows that taking into account features of the substrate-binding cavity as input for the k-means algorithm is a promising technique for accurately selecting ensembles of representative structures tailored to a specific ligand. PMID:25873944
Eyler, Lauren; Hubbard, Alan; Juillard, Catherine
2016-10-01
Low and middle-income countries (LMICs) and the world's poor bear a disproportionate share of the global burden of injury. Data regarding disparities in injury are vital to inform injury prevention and trauma systems strengthening interventions targeted towards vulnerable populations, but are limited in LMICs. We aim to facilitate injury disparities research by generating a standardized methodology for assessing economic status in resource-limited country trauma registries where complex metrics such as income, expenditures, and wealth index are infeasible to assess. To address this need, we developed a cluster analysis-based algorithm for generating simple population-specific metrics of economic status using nationally representative Demographic and Health Surveys (DHS) household assets data. For a limited number of variables, g, our algorithm performs weighted k-medoids clustering of the population using all combinations of g asset variables and selects the combination of variables and number of clusters that maximize average silhouette width (ASW). In simulated datasets containing both randomly distributed variables and "true" population clusters defined by correlated categorical variables, the algorithm selected the correct variable combination and appropriate cluster numbers unless variable correlation was very weak. When used with 2011 Cameroonian DHS data, our algorithm identified twenty economic clusters with ASW 0.80, indicating well-defined population clusters. This economic model for assessing health disparities will be used in the new Cameroonian six-hospital centralized trauma registry. By describing our standardized methodology and algorithm for generating economic clustering models, we aim to facilitate measurement of health disparities in other trauma registries in resource-limited countries. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Application of Classification Methods for Forecasting Mid-Term Power Load Patterns
NASA Astrophysics Data System (ADS)
Piao, Minghao; Lee, Heon Gyu; Park, Jin Hyoung; Ryu, Keun Ho
Currently an automated methodology based on data mining techniques is presented for the prediction of customer load patterns in long duration load profiles. The proposed approach in this paper consists of three stages: (i) data preprocessing: noise or outlier is removed and the continuous attribute-valued features are transformed to discrete values, (ii) cluster analysis: k-means clustering is used to create load pattern classes and the representative load profiles for each class and (iii) classification: we evaluated several supervised learning methods in order to select a suitable prediction method. According to the proposed methodology, power load measured from AMR (automatic meter reading) system, as well as customer indexes, were used as inputs for clustering. The output of clustering was the classification of representative load profiles (or classes). In order to evaluate the result of forecasting load patterns, the several classification methods were applied on a set of high voltage customers of the Korea power system and derived class labels from clustering and other features are used as input to produce classifiers. Lastly, the result of our experiments was presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Gurmeet; Nandi, Apurba; Gadre, Shridhar R., E-mail: gadre@iitk.ac.in
2016-03-14
A pragmatic method based on the molecular tailoring approach (MTA) for estimating the complete basis set (CBS) limit at Møller-Plesset second order perturbation (MP2) theory accurately for large molecular clusters with limited computational resources is developed. It is applied to water clusters, (H{sub 2}O){sub n} (n = 7, 8, 10, 16, 17, and 25) optimized employing aug-cc-pVDZ (aVDZ) basis-set. Binding energies (BEs) of these clusters are estimated at the MP2/aug-cc-pVNZ (aVNZ) [N = T, Q, and 5 (whenever possible)] levels of theory employing grafted MTA (GMTA) methodology and are found to lie within 0.2 kcal/mol of the corresponding full calculationmore » MP2 BE, wherever available. The results are extrapolated to CBS limit using a three point formula. The GMTA-MP2 calculations are feasible on off-the-shelf hardware and show around 50%–65% saving of computational time. The methodology has a potential for application to molecular clusters containing ∼100 atoms.« less
Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.
2015-01-01
Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888
Population Structure With Localized Haplotype Clusters
Browning, Sharon R.; Weir, Bruce S.
2010-01-01
We propose a multilocus version of FST and a measure of haplotype diversity using localized haplotype clusters. Specifically, we use haplotype clusters identified with BEAGLE, which is a program implementing a hidden Markov model for localized haplotype clustering and performing several functions including inference of haplotype phase. We apply this methodology to HapMap phase 3 data. With this haplotype-cluster approach, African populations have highest diversity and lowest divergence from the ancestral population, East Asian populations have lowest diversity and highest divergence, and other populations (European, Indian, and Mexican) have intermediate levels of diversity and divergence. These relationships accord with expectation based on other studies and accepted models of human history. In contrast, the population-specific FST estimates obtained directly from single-nucleotide polymorphisms (SNPs) do not reflect such expected relationships. We show that ascertainment bias of SNPs has less impact on the proposed haplotype-cluster-based FST than on the SNP-based version, which provides a potential explanation for these results. Thus, these new measures of FST and haplotype-cluster diversity provide an important new tool for population genetic analysis of high-density SNP data. PMID:20457877
Multi-viewpoint clustering analysis
NASA Technical Reports Server (NTRS)
Mehrotra, Mala; Wild, Chris
1993-01-01
In this paper, we address the feasibility of partitioning rule-based systems into a number of meaningful units to enhance the comprehensibility, maintainability and reliability of expert systems software. Preliminary results have shown that no single structuring principle or abstraction hierarchy is sufficient to understand complex knowledge bases. We therefore propose the Multi View Point - Clustering Analysis (MVP-CA) methodology to provide multiple views of the same expert system. We present the results of using this approach to partition a deployed knowledge-based system that navigates the Space Shuttle's entry. We also discuss the impact of this approach on verification and validation of knowledge-based systems.
Vautier, S; Jmel, S; Fourio, C; Moncany, D
2007-09-01
The present study investigates the heterogeneity of the population of young adult drinkers with respect to alcohol consumption and Positive Alcohol Expectancies (PAEs). Based on the positive relationship between both kinds of variables, PAE is commonly viewed as a potential motivational factor of alcoholic addiction. Empirical analyses based on the regression of alcohol consumption on PAEs suppose that the observations are statistically homogeneous with respect to the level of alcohol consumption, however. We explored the existence of moderate drinkers with a high PAE profile, and abusive drinkers with a low PAE profile. 1,017 young adult drinkers, mean age=23 +/- 2.84, with various educational levels, comprising 506 males and 511 females, were recruited as voluntary participants in a survey by undergraduate psychology students from the University of Toulouse Le Mirail. They completed a French version of the Alcohol Use Disorders Identifiction Test (AUDIT) and a French adaptation of the Alcohol Expectancy Questionnaire (AEQ). Three levels of alcohol consumption were defined using the AUDIT score, and six composite scores were obtained by averaging the relevant item-scores from the AEQ. The AEQ scores were interpreted as measurement of six kinds of PAEs, namely Global positive change, Sexual enhancement, Social and physical pleasure, Social assertiveness, Relaxation, and Arousal/Power. The TwoStep cluster methodology was used to explore the data. This methodology is convenient to deal with a mix of quantitative and qualitative variables, and it provides a classification model which is optimized through the use of an information criterion as Schwarz's Bayesian Information Criterion (BIC). The automatic clustering suggested five clusters, whose stability was ascertained until 75% of the sample size. Low drinkers (n=527) were split into one cluster of low PAEs (I1) and, interestingly, one cluster of high PAEs (I3, 46%). High drinkers (n=344) were split into one cluster of intermediate PAEs (II4) and one cluster of high PAEs (II5, 52%). Interestingly again, abusive drinkers (n=146) remained a single group (III2), exhibiting high PAEs. Clusters I3 and III3 comprised a significant proportion of males. Constraining the algorithm to find 6 clusters did not affect class III2, but split low drinkers into three clusters. Although the present results should be considered cautiously because of the novelty of TwoStep cluster methodology, they suggest a group of moderate drinkers with high PAEs. Also, abusive drinkers express high PAEs (except for 2 cases). Statistical homogeneity of moderate drinkers with respect to PAE variables appears as a dubious assumption.
NASA Astrophysics Data System (ADS)
Audebert, M.; Clément, R.; Touze-Foltz, N.; Günther, T.; Moreau, S.; Duquennoi, C.
2014-12-01
Leachate recirculation is a key process in municipal waste landfills functioning as bioreactors. To quantify the water content and to assess the leachate injection system, in-situ methods are required to obtain spatially distributed information, usually electrical resistivity tomography (ERT). This geophysical method is based on the inversion process, which presents two major problems in terms of delimiting the infiltration area. First, it is difficult for ERT users to choose an appropriate inversion parameter set. Indeed, it might not be sufficient to interpret only the optimum model (i.e. the model with the chosen regularisation strength) because it is not necessarily the model which best represents the physical process studied. Second, it is difficult to delineate the infiltration front based on resistivity models because of the smoothness of the inversion results. This paper proposes a new methodology called MICS (multiple inversions and clustering strategy), which allows ERT users to improve the delimitation of the infiltration area in leachate injection monitoring. The MICS methodology is based on (i) a multiple inversion step by varying the inversion parameter values to take a wide range of resistivity models into account and (ii) a clustering strategy to improve the delineation of the infiltration front. In this paper, MICS was assessed on two types of data. First, a numerical assessment allows us to optimise and test MICS for different infiltration area sizes, contrasts and shapes. Second, MICS was applied to a field data set gathered during leachate recirculation on a bioreactor.
Carrol, N V; Gagon, J P
1983-01-01
Because of increasing competition, it is becoming more important that health care providers pursue consumer-based market segmentation strategies. This paper presents a methodology for identifying and describing consumer segments in health service markets, and demonstrates the use of the methodology by presenting a study of consumer segments in the ambulatory care pharmacy market.
Vranish, James N.; Russell, William K.; Yu, Lusa E.; ...
2014-12-05
Iron–sulfur (Fe–S) clusters are protein cofactors that are constructed and delivered to target proteins by elaborate biosynthetic machinery. Mechanistic insights into these processes have been limited by the lack of sensitive probes for tracking Fe–S cluster synthesis and transfer reactions. Here we present fusion protein- and intein-based fluorescent labeling strategies that can probe Fe–S cluster binding. The fluorescence is sensitive to different cluster types ([2Fe–2S] and [4Fe–4S] clusters), ligand environments ([2Fe–2S] clusters on Rieske, ferredoxin (Fdx), and glutaredoxin), and cluster oxidation states. The power of this approach is highlighted with an extreme example in which the kinetics of Fe–S clustermore » transfer reactions are monitored between two Fdx molecules that have identical Fe–S spectroscopic properties. This exchange reaction between labeled and unlabeled Fdx is catalyzed by dithiothreitol (DTT), a result that was confirmed by mass spectrometry. DTT likely functions in a ligand substitution reaction that generates a [2Fe–2S]–DTT species, which can transfer the cluster to either labeled or unlabeled Fdx. The ability to monitor this challenging cluster exchange reaction indicates that real-time Fe–S cluster incorporation can be tracked for a specific labeled protein in multicomponent assays that include several unlabeled Fe–S binding proteins or other chromophores. Such advanced kinetic experiments are required to untangle the intricate networks of transfer pathways and the factors affecting flux through branch points. High sensitivity and suitability with high-throughput methodology are additional benefits of this approach. Lastly, we anticipate that this cluster detection methodology will transform the study of Fe–S cluster pathways and potentially other metal cofactor biosynthetic pathways.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carmichael, Joshua Daniel; Carr, Christina; Pettit, Erin C.
We apply a fully autonomous icequake detection methodology to a single day of high-sample rate (200 Hz) seismic network data recorded from the terminus of Taylor Glacier, ANT that temporally coincided with a brine release episode near Blood Falls (May 13, 2014). We demonstrate a statistically validated procedure to assemble waveforms triggered by icequakes into populations of clusters linked by intra-event waveform similarity. Our processing methodology implements a noise-adaptive power detector coupled with a complete-linkage clustering algorithm and noise-adaptive correlation detector. This detector-chain reveals a population of 20 multiplet sequences that includes ~150 icequakes and produces zero false alarms onmore » the concurrent, diurnally variable noise. Our results are very promising for identifying changes in background seismicity associated with the presence or absence of brine release episodes. We thereby suggest that our methodology could be applied to longer time periods to establish a brine-release monitoring program for Blood Falls that is based on icequake detections.« less
The use of hierarchical clustering for the design of optimized monitoring networks
NASA Astrophysics Data System (ADS)
Soares, Joana; Makar, Paul Andrew; Aklilu, Yayne; Akingunola, Ayodeji
2018-05-01
Associativity analysis is a powerful tool to deal with large-scale datasets by clustering the data on the basis of (dis)similarity and can be used to assess the efficacy and design of air quality monitoring networks. We describe here our use of Kolmogorov-Zurbenko filtering and hierarchical clustering of NO2 and SO2 passive and continuous monitoring data to analyse and optimize air quality networks for these species in the province of Alberta, Canada. The methodology applied in this study assesses dissimilarity between monitoring station time series based on two metrics: 1 - R, R being the Pearson correlation coefficient, and the Euclidean distance; we find that both should be used in evaluating monitoring site similarity. We have combined the analytic power of hierarchical clustering with the spatial information provided by deterministic air quality model results, using the gridded time series of model output as potential station locations, as a proxy for assessing monitoring network design and for network optimization. We demonstrate that clustering results depend on the air contaminant analysed, reflecting the difference in the respective emission sources of SO2 and NO2 in the region under study. Our work shows that much of the signal identifying the sources of NO2 and SO2 emissions resides in shorter timescales (hourly to daily) due to short-term variation of concentrations and that longer-term averages in data collection may lose the information needed to identify local sources. However, the methodology identifies stations mainly influenced by seasonality, if larger timescales (weekly to monthly) are considered. We have performed the first dissimilarity analysis based on gridded air quality model output and have shown that the methodology is capable of generating maps of subregions within which a single station will represent the entire subregion, to a given level of dissimilarity. We have also shown that our approach is capable of identifying different sampling methodologies as well as outliers (stations' time series which are markedly different from all others in a given dataset).
Phenetic Comparison of Prokaryotic Genomes Using k-mers
Déraspe, Maxime; Raymond, Frédéric; Boisvert, Sébastien; Culley, Alexander; Roy, Paul H.; Laviolette, François; Corbeil, Jacques
2017-01-01
Abstract Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of Streptococcus pneumoniae and Pseudomonas aeruginosa. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in P. aeruginosa than in S. pneumoniae, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets. PMID:28957508
MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.
Reddy, Rachamalla Maheedhar; Mohammed, Monzoorul Haque; Mande, Sharmila S
2014-01-01
A key challenge in analyzing metagenomics data pertains to assembly of sequenced DNA fragments (i.e. reads) originating from various microbes in a given environmental sample. Several existing methodologies can assemble reads originating from a single genome. However, these methodologies cannot be applied for efficient assembly of metagenomic sequence datasets. In this study, we present MetaCAA - a clustering-aided methodology which helps in improving the quality of metagenomic sequence assembly. MetaCAA initially groups sequences constituting a given metagenome into smaller clusters. Subsequently, sequences in each cluster are independently assembled using CAP3, an existing single genome assembly program. Contigs formed in each of the clusters along with the unassembled reads are then subjected to another round of assembly for generating the final set of contigs. Validation using simulated and real-world metagenomic datasets indicates that MetaCAA aids in improving the overall quality of assembly. A software implementation of MetaCAA is available at https://metagenomics.atc.tcs.com/MetaCAA. Copyright © 2014 Elsevier Inc. All rights reserved.
Clustering-Based Ensemble Learning for Activity Recognition in Smart Homes
Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli
2014-01-01
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks. PMID:25014095
Clustering-based ensemble learning for activity recognition in smart homes.
Jurek, Anna; Nugent, Chris; Bi, Yaxin; Wu, Shengli
2014-07-10
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.
Who should be undertaking population-based surveys in humanitarian emergencies?
Spiegel, Paul B
2007-01-01
Background Timely and accurate data are necessary to prioritise and effectively respond to humanitarian emergencies. 30-by-30 cluster surveys are commonly used in humanitarian emergencies because of their purported simplicity and reasonable validity and precision. Agencies have increasingly used 30-by-30 cluster surveys to undertake measurements beyond immunisation coverage and nutritional status. Methodological errors in cluster surveys have likely occurred for decades in humanitarian emergencies, often with unknown or unevaluated consequences. Discussion Most surveys in humanitarian emergencies are done by non-governmental organisations (NGOs). Some undertake good quality surveys while others have an already overburdened staff with limited epidemiological skills. Manuals explaining cluster survey methodology are available and in use. However, it is debatable as to whether using standardised, 'cookbook' survey methodologies are appropriate. Coordination of surveys is often lacking. If a coordinating body is established, as recommended, it is questionable whether it should have sole authority to release surveys due to insufficient independence. Donors should provide sufficient funding for personnel, training, and survey implementation, and not solely for direct programme implementation. Summary A dedicated corps of trained epidemiologists needs to be identified and made available to undertake surveys in humanitarian emergencies. NGOs in the field may need to form an alliance with certain specialised agencies or pool technically capable personnel. If NGOs continue to do surveys by themselves, a simple training manual with sample survey questionnaires, methodology, standardised files for data entry and analysis, and manual for interpretation should be developed and modified locally for each situation. At the beginning of an emergency, a central coordinating body should be established that has sufficient authority to set survey standards, coordinate when and where surveys should be undertaken and act as a survey repository. Technical expertise is expensive and donors must pay for it. As donors increasingly demand evidence-based programming, they have an obligation to ensure that sufficient funds are provided so organisations have adequate technical staff. PMID:17543107
A framework to spatially cluster air pollution monitoring sites in US based on the PM2.5 composition
Austin, Elena; Coull, Brent A.; Zanobetti, Antonella; Koutrakis, Petros
2013-01-01
Background Heterogeneity in the response to PM2.5 is hypothesized to be related to differences in particle composition across monitoring sites which reflect differences in source types as well as climatic and topographic conditions impacting different geographic locations. Identifying spatial patterns in particle composition is a multivariate problem that requires novel methodologies. Objectives Use cluster analysis methods to identify spatial patterns in PM2.5 composition. Verify that the resulting clusters are distinct and informative. Methods 109 monitoring sites with 75% reported speciation data during the period 2003–2008 were selected. These sites were categorized based on their average PM2.5 composition over the study period using k-means cluster analysis. The obtained clusters were validated and characterized based on their physico-chemical characteristics, geographic locations, emissions profiles, population density and proximity to major emission sources. Results Overall 31 clusters were identified. These include 21 clusters with 2 or more sites which were further grouped into 4 main types using hierarchical clustering. The resulting groupings are chemically meaningful and represent broad differences in emissions. The remaining clusters, encompassing single sites, were characterized based on their particle composition and geographic location. Conclusions The framework presented here provides a novel tool which can be used to identify and further classify sites based on their PM2.5 composition. The solution presented is fairly robust and yielded groupings that were meaningful in the context of air-pollution research. PMID:23850585
Shapira, Aviad; Shoshany, Maxim; Nir-Goldenberg, Sigal
2013-07-01
Environmental management and planning are instrumental in resolving conflicts arising between societal needs for economic development on the one hand and for open green landscapes on the other hand. Allocating green corridors between fragmented core green areas may provide a partial solution to these conflicts. Decisions regarding green corridor development require the assessment of alternative allocations based on multiple criteria evaluations. Analytical Hierarchy Process provides a methodology for both a structured and consistent extraction of such evaluations and for the search for consensus among experts regarding weights assigned to the different criteria. Implementing this methodology using 15 Israeli experts-landscape architects, regional planners, and geographers-revealed inherent differences in expert opinions in this field beyond professional divisions. The use of Agglomerative Hierarchical Clustering allowed to identify clusters representing common decisions regarding criterion weights. Aggregating the evaluations of these clusters revealed an important dichotomy between a pragmatist approach that emphasizes the weight of statutory criteria and an ecological approach that emphasizes the role of the natural conditions in allocating green landscape corridors.
Initial Analysis of and Predictive Model Development for Weather Reroute Advisory Use
NASA Technical Reports Server (NTRS)
Arneson, Heather M.
2016-01-01
In response to severe weather conditions, traffic management coordinators specify reroutes to route air traffic around affected regions of airspace. Providing analysis and recommendations of available reroute options would assist the traffic management coordinators in making more efficient rerouting decisions. These recommendations can be developed by examining historical data to determine which previous reroute options were used in similar weather and traffic conditions. Essentially, using previous information to inform future decisions. This paper describes the initial steps and methodology used towards this goal. A method to extract relevant features from the large volume of weather data to quantify the convective weather scenario during a particular time range is presented. Similar routes are clustered. A description of the algorithm to identify which cluster of reroute advisories were actually followed by pilots is described. Models built for fifteen of the top twenty most frequently used reroute clusters correctly predict the use of the cluster for over 60 of the test examples. Results are preliminary but indicate that the methodology is worth pursuing with modifications based on insight gained from this analysis.
NASA Astrophysics Data System (ADS)
Shapira, Aviad; Shoshany, Maxim; Nir-Goldenberg, Sigal
2013-07-01
Environmental management and planning are instrumental in resolving conflicts arising between societal needs for economic development on the one hand and for open green landscapes on the other hand. Allocating green corridors between fragmented core green areas may provide a partial solution to these conflicts. Decisions regarding green corridor development require the assessment of alternative allocations based on multiple criteria evaluations. Analytical Hierarchy Process provides a methodology for both a structured and consistent extraction of such evaluations and for the search for consensus among experts regarding weights assigned to the different criteria. Implementing this methodology using 15 Israeli experts—landscape architects, regional planners, and geographers—revealed inherent differences in expert opinions in this field beyond professional divisions. The use of Agglomerative Hierarchical Clustering allowed to identify clusters representing common decisions regarding criterion weights. Aggregating the evaluations of these clusters revealed an important dichotomy between a pragmatist approach that emphasizes the weight of statutory criteria and an ecological approach that emphasizes the role of the natural conditions in allocating green landscape corridors.
MVP-CA Methodology for the Expert System Advocate's Advisor (ESAA)
DOT National Transportation Integrated Search
1997-11-01
The Multi-Viewpoint Clustering Analysis (MVP-CA) tool is a semi-automated tool to provide a valuable aid for comprehension, verification, validation, maintenance, integration, and evolution of complex knowledge-based software systems. In this report,...
Models of Educational Attainment: A Theoretical and Methodological Critique
ERIC Educational Resources Information Center
Byrne, D. S.; And Others
1973-01-01
Uses cluster analysis techniques to show that egalitarian policies in secondary education coupled with high financial inputs have measurable payoffs in higher attainment rates, based on Max Weber's notion of power'' within a community. (Author/JM)
Fuzzy Set Methods for Object Recognition in Space Applications
NASA Technical Reports Server (NTRS)
Keller, James M. (Editor)
1992-01-01
Progress on the following four tasks is described: (1) fuzzy set based decision methodologies; (2) membership calculation; (3) clustering methods (including derivation of pose estimation parameters), and (4) acquisition of images and testing of algorithms.
Fusion And Inference From Multiple And Massive Disparate Distributed Dynamic Data Sets
2017-07-01
principled methodology for two-sample graph testing; designed a provably almost-surely perfect vertex clustering algorithm for block model graphs; proved...3.7 Semi-Supervised Clustering Methodology ...................................................................... 9 3.8 Robust Hypothesis Testing...dimensional Euclidean space – allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for
Creating peer groups for assessing and comparing nursing home performance.
Byrne, Margaret M; Daw, Christina; Pietz, Ken; Reis, Brian; Petersen, Laura A
2013-11-01
Publicly reported performance data for hospitals and nursing homes are becoming ubiquitous. For such comparisons to be fair, facilities must be compared with their peers. To adapt a previously published methodology for developing hospital peer groupings so that it is applicable to nursing homes and to explore the characteristics of "nearest-neighbor" peer groupings. Analysis of Department of Veterans Affairs administrative databases and nursing home facility characteristics. The nearest-neighbor methodology for developing peer groupings involves calculating the Euclidean distance between facilities based on facility characteristics. We describe our steps in selection of facility characteristics, describe the characteristics of nearest-neighbor peer groups, and compare them with peer groups derived through classical cluster analysis. The facility characteristics most pertinent to nursing home groupings were found to be different from those that were most relevant for hospitals. Unlike classical cluster groups, nearest neighbor groups are not mutually exclusive, and the nearest-neighbor methodology resulted in nursing home peer groupings that were substantially less diffuse than nursing home peer groups created using traditional cluster analysis. It is essential that healthcare policy makers and administrators have a means of fairly grouping facilities for the purposes of quality, cost, or efficiency comparisons. In this research, we show that a previously published methodology can be successfully applied to a nursing home setting. The same approach could be applied in other clinical settings such as primary care.
Samuels, Aaron M; Awino, Nobert; Odongo, Wycliffe; Abong'o, Benard; Gimnig, John; Otieno, Kephas; Shi, Ya Ping; Were, Vincent; Allen, Denise Roth; Were, Florence; Sang, Tony; Obor, David; Williamson, John; Hamel, Mary J; Patrick Kachur, S; Slutsker, Laurence; Lindblade, Kim A; Kariuki, Simon; Desai, Meghna
2017-06-07
Most human Plasmodium infections in western Kenya are asymptomatic and are believed to contribute importantly to malaria transmission. Elimination of asymptomatic infections requires active treatment approaches, such as mass testing and treatment (MTaT) or mass drug administration (MDA), as infected persons do not seek care for their infection. Evaluations of community-based approaches that are designed to reduce malaria transmission require careful attention to study design to ensure that important effects can be measured accurately. This manuscript describes the study design and methodology of a cluster-randomized controlled trial to evaluate a MTaT approach for malaria transmission reduction in an area of high malaria transmission. Ten health facilities in western Kenya were purposively selected for inclusion. The communities within 3 km of each health facility were divided into three clusters of approximately equal population size. Two clusters around each health facility were randomly assigned to the control arm, and one to the intervention arm. Three times per year for 2 years, after the long and short rains, and again before the long rains, teams of community health volunteers visited every household within the intervention arm, tested all consenting individuals with malaria rapid diagnostic tests, and treated all positive individuals with an effective anti-malarial. The effect of mass testing and treatment on malaria transmission was measured through population-based longitudinal cohorts, outpatient visits for clinical malaria, periodic population-based cross-sectional surveys, and entomological indices.
Construcción de un catálogo de cúmulos de galaxias en proceso de colisión
NASA Astrophysics Data System (ADS)
de los Ríos, M.; Domínguez, M. J.; Paz, D.
2015-08-01
In this work we present first results of the identification of colliding galaxy clusters in galaxy catalogs with redshift measurements (SDSS, 2DF), and introduce the methodology. We calibrated a method by studying the merger trees of clusters in a mock catalog based on a full-blown semi-analytic model of galaxy formation on top of the Millenium cosmological simulation. We also discuss future actions for studding our sample of colliding galaxy clusters, including x-ray observations and mass reconstruction obtained by using weak gravitational lenses.
Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario
2014-01-01
Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565
NASA Astrophysics Data System (ADS)
Pagnuco, Inti A.; Pastore, Juan I.; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L.
2016-04-01
It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, where significative groups of genes are defined based on some criteria. This task is usually performed by clustering algorithms, where the whole family of genes, or a subset of them, are clustered into meaningful groups based on their expression values in a set of experiment. In this work we used a methodology based on the Silhouette index as a measure of cluster quality for individual gene groups, and a combination of several variants of hierarchical clustering to generate the candidate groups, to obtain sets of co-expressed genes for two real data examples. We analyzed the quality of the best ranked groups, obtained by the algorithm, using an online bioinformatics tool that provides network information for the selected genes. Moreover, to verify the performance of the algorithm, considering the fact that it doesn’t find all possible subsets, we compared its results against a full search, to determine the amount of good co-regulated sets not detected.
NASA Astrophysics Data System (ADS)
Guo, Yang; Becker, Ute; Neese, Frank
2018-03-01
Local correlation theories have been developed in two main flavors: (1) "direct" local correlation methods apply local approximation to the canonical equations and (2) fragment based methods reconstruct the correlation energy from a series of smaller calculations on subsystems. The present work serves two purposes. First, we investigate the relative efficiencies of the two approaches using the domain-based local pair natural orbital (DLPNO) approach as the "direct" method and the cluster in molecule (CIM) approach as the fragment based approach. Both approaches are applied in conjunction with second-order many-body perturbation theory (MP2) as well as coupled-cluster theory with single-, double- and perturbative triple excitations [CCSD(T)]. Second, we have investigated the possible merits of combining the two approaches by performing CIM calculations with DLPNO methods serving as the method of choice for performing the subsystem calculations. Our cluster-in-molecule approach is closely related to but slightly deviates from approaches in the literature since we have avoided real space cutoffs. Moreover, the neglected distant pair correlations in the previous CIM approach are considered approximately. Six very large molecules (503-2380 atoms) were studied. At both MP2 and CCSD(T) levels of theory, the CIM and DLPNO methods show similar efficiency. However, DLPNO methods are more accurate for 3-dimensional systems. While we have found only little incentive for the combination of CIM with DLPNO-MP2, the situation is different for CIM-DLPNO-CCSD(T). This combination is attractive because (1) the better parallelization opportunities offered by CIM; (2) the methodology is less memory intensive than the genuine DLPNO-CCSD(T) method and, hence, allows for large calculations on more modest hardware; and (3) the methodology is applicable and efficient in the frequently met cases, where the largest subsystem calculation is too large for the canonical CCSD(T) method.
NASA Astrophysics Data System (ADS)
Lerner, Michael G.; Meagher, Kristin L.; Carlson, Heather A.
2008-10-01
Use of solvent mapping, based on multiple-copy minimization (MCM) techniques, is common in structure-based drug discovery. The minima of small-molecule probes define locations for complementary interactions within a binding pocket. Here, we present improved methods for MCM. In particular, a Jarvis-Patrick (JP) method is outlined for grouping the final locations of minimized probes into physical clusters. This algorithm has been tested through a study of protein-protein interfaces, showing the process to be robust, deterministic, and fast in the mapping of protein "hot spots." Improvements in the initial placement of probe molecules are also described. A final application to HIV-1 protease shows how our automated technique can be used to partition data too complicated to analyze by hand. These new automated methods may be easily and quickly extended to other protein systems, and our clustering methodology may be readily incorporated into other clustering packages.
Generalized quantum kinetic expansion: Higher-order corrections to multichromophoric Förster theory
NASA Astrophysics Data System (ADS)
Wu, Jianlan; Gong, Zhihao; Tang, Zhoufei
2015-08-01
For a general two-cluster energy transfer network, a new methodology of the generalized quantum kinetic expansion (GQKE) method is developed, which predicts an exact time-convolution equation for the cluster population evolution under the initial condition of the local cluster equilibrium state. The cluster-to-cluster rate kernel is expanded over the inter-cluster couplings. The lowest second-order GQKE rate recovers the multichromophoric Förster theory (MCFT) rate. The higher-order corrections to the MCFT rate are systematically included using the continued fraction resummation form, resulting in the resummed GQKE method. The reliability of the GQKE methodology is verified in two model systems, revealing the relevance of higher-order corrections.
Anders, Katherine L; Cutcher, Zoe; Kleinschmidt, Immo; Donnelly, Christl A; Ferguson, Neil M; Indriani, Citra; O'Neill, Scott L; Jewell, Nicholas P; Simmons, Cameron P
2018-05-07
Cluster randomized trials are the gold standard for assessing efficacy of community-level interventions, such as vector control strategies against dengue. We describe a novel cluster randomized trial methodology with a test-negative design, which offers advantages over traditional approaches. It utilizes outcome-based sampling of patients presenting with a syndrome consistent with the disease of interest, who are subsequently classified as test-positive cases or test-negative controls on the basis of diagnostic testing. We use simulations of a cluster trial to demonstrate validity of efficacy estimates under the test-negative approach. This demonstrates that, provided study arms are balanced for both test-negative and test-positive illness at baseline and that other test-negative design assumptions are met, the efficacy estimates closely match true efficacy. We also briefly discuss analytical considerations for an odds ratio-based effect estimate arising from clustered data, and outline potential approaches to analysis. We conclude that application of the test-negative design to certain cluster randomized trials could increase their efficiency and ease of implementation.
Alvarez, Stéphanie; Timler, Carl J.; Michalscheck, Mirja; Paas, Wim; Descheemaeker, Katrien; Tittonell, Pablo; Andersson, Jens A.; Groot, Jeroen C. J.
2018-01-01
Creating typologies is a way to summarize the large heterogeneity of smallholder farming systems into a few farm types. Various methods exist, commonly using statistical analysis, to create these typologies. We demonstrate that the methodological decisions on data collection, variable selection, data-reduction and clustering techniques can bear a large impact on the typology results. We illustrate the effects of analysing the diversity from different angles, using different typology objectives and different hypotheses, on typology creation by using an example from Zambia’s Eastern Province. Five separate typologies were created with principal component analysis (PCA) and hierarchical clustering analysis (HCA), based on three different expert-informed hypotheses. The greatest overlap between typologies was observed for the larger, wealthier farm types but for the remainder of the farms there were no clear overlaps between typologies. Based on these results, we argue that the typology development should be guided by a hypothesis on the local agriculture features and the drivers and mechanisms of differentiation among farming systems, such as biophysical and socio-economic conditions. That hypothesis is based both on the typology objective and on prior expert knowledge and theories of the farm diversity in the study area. We present a methodological framework that aims to integrate participatory and statistical methods for hypothesis-based typology construction. This is an iterative process whereby the results of the statistical analysis are compared with the reality of the target population as hypothesized by the local experts. Using a well-defined hypothesis and the presented methodological framework, which consolidates the hypothesis through local expert knowledge for the creation of typologies, warrants development of less subjective and more contextualized quantitative farm typologies. PMID:29763422
Alvarez, Stéphanie; Timler, Carl J; Michalscheck, Mirja; Paas, Wim; Descheemaeker, Katrien; Tittonell, Pablo; Andersson, Jens A; Groot, Jeroen C J
2018-01-01
Creating typologies is a way to summarize the large heterogeneity of smallholder farming systems into a few farm types. Various methods exist, commonly using statistical analysis, to create these typologies. We demonstrate that the methodological decisions on data collection, variable selection, data-reduction and clustering techniques can bear a large impact on the typology results. We illustrate the effects of analysing the diversity from different angles, using different typology objectives and different hypotheses, on typology creation by using an example from Zambia's Eastern Province. Five separate typologies were created with principal component analysis (PCA) and hierarchical clustering analysis (HCA), based on three different expert-informed hypotheses. The greatest overlap between typologies was observed for the larger, wealthier farm types but for the remainder of the farms there were no clear overlaps between typologies. Based on these results, we argue that the typology development should be guided by a hypothesis on the local agriculture features and the drivers and mechanisms of differentiation among farming systems, such as biophysical and socio-economic conditions. That hypothesis is based both on the typology objective and on prior expert knowledge and theories of the farm diversity in the study area. We present a methodological framework that aims to integrate participatory and statistical methods for hypothesis-based typology construction. This is an iterative process whereby the results of the statistical analysis are compared with the reality of the target population as hypothesized by the local experts. Using a well-defined hypothesis and the presented methodological framework, which consolidates the hypothesis through local expert knowledge for the creation of typologies, warrants development of less subjective and more contextualized quantitative farm typologies.
Jeong, Jeong-Won; Shin, Dae C; Do, Synho; Marmarelis, Vasilis Z
2006-08-01
This paper presents a novel segmentation methodology for automated classification and differentiation of soft tissues using multiband data obtained with the newly developed system of high-resolution ultrasonic transmission tomography (HUTT) for imaging biological organs. This methodology extends and combines two existing approaches: the L-level set active contour (AC) segmentation approach and the agglomerative hierarchical kappa-means approach for unsupervised clustering (UC). To prevent the trapping of the current iterative minimization AC algorithm in a local minimum, we introduce a multiresolution approach that applies the level set functions at successively increasing resolutions of the image data. The resulting AC clusters are subsequently rearranged by the UC algorithm that seeks the optimal set of clusters yielding the minimum within-cluster distances in the feature space. The presented results from Monte Carlo simulations and experimental animal-tissue data demonstrate that the proposed methodology outperforms other existing methods without depending on heuristic parameters and provides a reliable means for soft tissue differentiation in HUTT images.
Sánchez, Ariel G.; Grieb, Jan Niklas; Salazar-Albornoz, Salvador; ...
2016-09-30
The cosmological information contained in anisotropic galaxy clustering measurements can often be compressed into a small number of parameters whose posterior distribution is well described by a Gaussian. Here, we present a general methodology to combine these estimates into a single set of consensus constraints that encode the total information of the individual measurements, taking into account the full covariance between the different methods. We also illustrate this technique by applying it to combine the results obtained from different clustering analyses, including measurements of the signature of baryon acoustic oscillations and redshift-space distortions, based on a set of mock cataloguesmore » of the final SDSS-III Baryon Oscillation Spectroscopic Survey (BOSS). Our results show that the region of the parameter space allowed by the consensus constraints is smaller than that of the individual methods, highlighting the importance of performing multiple analyses on galaxy surveys even when the measurements are highly correlated. Our paper is part of a set that analyses the final galaxy clustering data set from BOSS. The methodology presented here is used in Alam et al. to produce the final cosmological constraints from BOSS.« less
Copula based flexible modeling of associations between clustered event times.
Geerdens, Candida; Claeskens, Gerda; Janssen, Paul
2016-07-01
Multivariate survival data are characterized by the presence of correlation between event times within the same cluster. First, we build multi-dimensional copulas with flexible and possibly symmetric dependence structures for such data. In particular, clustered right-censored survival data are modeled using mixtures of max-infinitely divisible bivariate copulas. Second, these copulas are fit by a likelihood approach where the vast amount of copula derivatives present in the likelihood is approximated by finite differences. Third, we formulate conditions for clustered right-censored survival data under which an information criterion for model selection is either weakly consistent or consistent. Several of the familiar selection criteria are included. A set of four-dimensional data on time-to-mastitis is used to demonstrate the developed methodology.
A self-consistent density based embedding scheme applied to the adsorption of CO on Pd(111)
NASA Astrophysics Data System (ADS)
Lahav, D.; Klüner, T.
2007-06-01
We derive a variant of a density based embedded cluster approach as an improvement to a recently proposed embedding theory for metallic substrates (Govind et al 1999 J. Chem. Phys. 110 7677; Klüner et al 2001 Phys. Rev. Lett. 86 5954). In this scheme, a local region in space is represented by a small cluster which is treated by accurate quantum chemical methodology. The interaction of the cluster with the infinite solid is taken into account by an effective one-electron embedding operator representing the surrounding region. We propose a self-consistent embedding scheme which resolves intrinsic problems of the former theory, in particular a violation of strict density conservation. The proposed scheme is applied to the well-known benchmark system CO/Pd(111).
DOE Office of Scientific and Technical Information (OSTI.GOV)
García-Sánchez, Tania; Gómez-Lázaro, Emilio; Muljadi, E.
An alternative approach to characterise real voltage dips is proposed and evaluated in this study. The proposed methodology is based on voltage-space vector solutions, identifying parameters for ellipses trajectories by using the least-squares algorithm applied on a sliding window along the disturbance. The most likely patterns are then estimated through a clustering process based on the k-means algorithm. The objective is to offer an efficient and easily implemented alternative to characterise faults and visualise the most likely instantaneous phase-voltage evolution during events through their corresponding voltage-space vector trajectories. This novel solution minimises the data to be stored but maintains extensivemore » information about the dips including starting and ending transients. The proposed methodology has been applied satisfactorily to real voltage dips obtained from intensive field-measurement campaigns carried out in a Spanish wind power plant up to a time period of several years. A comparison to traditional minimum root mean square-voltage and time-duration classifications is also included in this study.« less
Bitter, Neis A; Roeg, Diana P K; van Nieuwenhuizen, Chijs; van Weeghel, Jaap
2015-07-22
There is an increasing amount of evidence for the effectiveness of rehabilitation interventions for people with severe mental illness (SMI). In the Netherlands, a rehabilitation methodology that is well known and often applied is the Comprehensive Approach to Rehabilitation (CARe) methodology. The overall goal of the CARe methodology is to improve the client's quality of life by supporting the client in realizing his/her goals and wishes, handling his/her vulnerability and improving the quality of his/her social environment. The methodology is strongly influenced by the concept of 'personal recovery' and the 'strengths case management model'. No controlled effect studies have been conducted hitherto regarding the CARe methodology. This study is a two-armed cluster randomized controlled trial (RCT) that will be executed in teams from three organizations for sheltered and supported housing, which provide services to people with long-term severe mental illness. Teams in the intervention group will receive the multiple-day CARe methodology training from a specialized institute and start working according the CARe Methodology guideline. Teams in the control group will continue working in their usual way. Standardized questionnaires will be completed at baseline (T0), and 10 (T1) and 20 months (T2) post baseline. Primary outcomes are recovery, social functioning and quality of life. The model fidelity of the CARe methodology will be assessed at T1 and T2. This study is the first controlled effect study on the CARe methodology and one of the few RCTs on a broad rehabilitation method or strength-based approach. This study is relevant because mental health care organizations have become increasingly interested in recovery and rehabilitation-oriented care. The trial registration number is ISRCTN77355880 .
Liu, Zhe; Geng, Yong; Zhang, Pan; Dong, Huijuan; Liu, Zuoxi
2014-09-01
In China, local governments of many areas prefer to give priority to the development of heavy industrial clusters in pursuit of high value of gross domestic production (GDP) growth to get political achievements, which usually results in higher costs from ecological degradation and environmental pollution. Therefore, effective methods and reasonable evaluation system are urgently needed to evaluate the overall efficiency of industrial clusters. Emergy methods links economic and ecological systems together, which can evaluate the contribution of ecological products and services as well as the load placed on environmental systems. This method has been successfully applied in many case studies of ecosystem but seldom in industrial clusters. This study applied the methodology of emergy analysis to perform the efficiency of industrial clusters through a series of emergy-based indices as well as the proposed indicators. A case study of Shenyang Economic Technological Development Area (SETDA) was investigated to show the emergy method's practical potential to evaluate industrial clusters to inform environmental policy making. The results of our study showed that the industrial cluster of electric equipment and electronic manufacturing produced the most economic value and had the highest efficiency of energy utilization among the four industrial clusters. However, the sustainability index of the industrial cluster of food and beverage processing was better than the other industrial clusters.
Arnold, Matthias
2017-12-02
The economic evaluation of stratified breast cancer screening gains momentum, but produces also very diverse results. Systematic reviews so far focused on modeling techniques and epidemiologic assumptions. However, cost and utility parameters received only little attention. This systematic review assesses simulation models for stratified breast cancer screening based on their cost and utility parameters in each phase of breast cancer screening and care. A literature review was conducted to compare economic evaluations with simulation models of personalized breast cancer screening. Study quality was assessed using reporting guidelines. Cost and utility inputs were extracted, standardized and structured using a care delivery framework. Studies were then clustered according to their study aim and parameters were compared within the clusters. Eighteen studies were identified within three study clusters. Reporting quality was very diverse in all three clusters. Only two studies in cluster 1, four studies in cluster 2 and one study in cluster 3 scored high in the quality appraisal. In addition to the quality appraisal, this review assessed if the simulation models were consistent in integrating all relevant phases of care, if utility parameters were consistent and methodological sound and if cost were compatible and consistent in the actual parameters used for screening, diagnostic work up and treatment. Of 18 studies, only three studies did not show signs of potential bias. This systematic review shows that a closer look into the cost and utility parameter can help to identify potential bias. Future simulation models should focus on integrating all relevant phases of care, using methodologically sound utility parameters and avoiding inconsistent cost parameters.
NASA Astrophysics Data System (ADS)
Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.
2018-04-01
Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter halos. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the "accurate" regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard ΛCDM + halo model against the clustering of SDSS DR7 galaxies. Specifically, we use the projected correlation function, group multiplicity function and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir halos) matches the clustering of low luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the "standard" halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.
Modern proposal of methodology for retrieval of characteristic synthetic rainfall hyetographs
NASA Astrophysics Data System (ADS)
Licznar, Paweł; Burszta-Adamiak, Ewa; Łomotowski, Janusz; Stańczyk, Justyna
2017-11-01
Modern engineering workshop of designing and modelling complex drainage systems is based on hydrodynamic modelling and has a probabilistic character. Its practical application requires a change regarding rainfall models accepted at the input. Previously used artificial rainfall models of simplified form, e.g. block precipitation or Euler's type II model rainfall are no longer sufficient. It is noticeable that urgent clarification is needed as regards the methodology of standardized rainfall hyetographs that would take into consideration the specifics of local storm rainfall temporal dynamics. The aim of the paper is to present a proposal for innovative methodology for determining standardized rainfall hyetographs, based on statistical processing of the collection of actual local precipitation characteristics. Proposed methodology is based on the classification of standardized rainfall hyetographs with the use of cluster analysis. Its application is presented on the example of selected rain gauges localized in Poland. Synthetic rainfall hyetographs achieved as a final result may be used for hydrodynamic modelling of sewerage systems, including probabilistic detection of necessary capacity of retention reservoirs.
ERIC Educational Resources Information Center
Wang, Dongxu; Stewart, Donald; Chang, Chun
2016-01-01
Purpose: The purpose of this paper is to examine the effectiveness of a holistic school-based nutrition programme using the health-promoting school (HPS) approach, on teachers' knowledge, attitudes and behaviour in relation to nutrition in rural China. Design/methodology/approach: A cluster-randomised intervention trial design was employed. Two…
Boyack, Kevin W.; Newman, David; Duhon, Russell J.; Klavans, Richard; Patek, Michael; Biberstine, Joseph R.; Schijvenaars, Bob; Skupin, André; Ma, Nianli; Börner, Katy
2011-01-01
Background We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents. Methodology We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models – BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE. Conclusions PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts. PMID:21437291
Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F
2017-04-01
Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.
A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.
Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain
2015-10-01
Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.
Ebenryter-Olbińska, Katarzyna; Kaniowski, Damian; Sobczak, Milena; Wojtczak, Błażej A; Janczak, Sławomir; Wielgus, Ewelina; Nawrot, Barbara; Leśnikowski, Zbigniew J
2017-11-21
A general and convenient approach for the incorporation of different types of boron clusters into specific locations of the DNA-oligonucleotide chain based on the automated phosphoramidite method of oligonucleotide synthesis and post-synthetic "click chemistry" modification has been developed. Pronounced effects of boron-cluster modification on the physico- and biochemical properties of the antisense oligonucleotides were observed. The silencing activity of antisense oligonucleotides bearing a single boron cluster modification in the middle of the oligonucleotide chain was substantially higher than that of unmodified oligonucleotides. This finding may be of importance for the design of therapeutic nucleic acids with improved properties. The proposed synthetic methodology broadens the availability of nucleic acid-boron cluster conjugates and opens up new avenues for their potential practical use. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.
Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong
2015-01-01
Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.
A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus
Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong
2015-01-01
Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic. PMID:25706180
Fast gene ontology based clustering for microarray experiments.
Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa
2008-11-21
Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.
Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.
Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J
2008-06-18
Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson correlation coefficient and the SD-weighted correlation coefficient, and is particularly useful for clustering replicated microarray data. This computational approach should be generally useful for proteomic data or other high-throughput analysis methodology.
Chen, Yun; Yang, Hui
2016-01-01
In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering. PMID:27966581
Chen, Yun; Yang, Hui
2016-12-14
In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering.
NASA Astrophysics Data System (ADS)
Zhang, Weipeng
2017-06-01
The relationship between the medical characteristics of lung cancers and computer tomography (CT) images are explored so as to improve the early diagnosis rate of lung cancers. This research collected CT images of patients with solitary pulmonary nodule lung cancer, and used gradual clustering methodology to classify them. Preliminary classifications were made, followed by continuous modification and iteration to determine the optimal condensation point, until iteration stability was achieved. Reasonable classification results were obtained. the clustering results fell into 3 categories. The first type of patients was mostly female, with ages between 50 and 65 years. CT images of solitary pulmonary nodule lung cancer for this group contain complete lobulation and burr, with pleural indentation; The second type of patients was mostly male with ages between 50 and 80 years. CT images of solitary pulmonary nodule lung cancer for this group contain complete lobulation and burr, but with no pleural indentation; The third type of patients was also mostly male with ages between 50 and 80 years. CT images for this group showed no abnormalities. the application of gradual clustering methodology can scientifically classify CT image features of patients with lung cancer in the initial lesion stage. These findings provide the basis for early detection and treatment of malignant lesions in patients with lung cancer.
Jolani, Shahab
2018-03-01
In health and medical sciences, multiple imputation (MI) is now becoming popular to obtain valid inferences in the presence of missing data. However, MI of clustered data such as multicenter studies and individual participant data meta-analysis requires advanced imputation routines that preserve the hierarchical structure of data. In clustered data, a specific challenge is the presence of systematically missing data, when a variable is completely missing in some clusters, and sporadically missing data, when it is partly missing in some clusters. Unfortunately, little is known about how to perform MI when both types of missing data occur simultaneously. We develop a new class of hierarchical imputation approach based on chained equations methodology that simultaneously imputes systematically and sporadically missing data while allowing for arbitrary patterns of missingness among them. Here, we use a random effect imputation model and adopt a simplification over fully Bayesian techniques such as Gibbs sampler to directly obtain draws of parameters within each step of the chained equations. We justify through theoretical arguments and extensive simulation studies that the proposed imputation methodology has good statistical properties in terms of bias and coverage rates of parameter estimates. An illustration is given in a case study with eight individual participant datasets. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Eye-gaze determination of user intent at the computer interface
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goldberg, J.H.; Schryver, J.C.
1993-12-31
Determination of user intent at the computer interface through eye-gaze monitoring can significantly aid applications for the disabled, as well as telerobotics and process control interfaces. Whereas current eye-gaze control applications are limited to object selection and x/y gazepoint tracking, a methodology was developed here to discriminate a more abstract interface operation: zooming-in or out. This methodology first collects samples of eve-gaze location looking at controlled stimuli, at 30 Hz, just prior to a user`s decision to zoom. The sample is broken into data frames, or temporal snapshots. Within a data frame, all spatial samples are connected into a minimummore » spanning tree, then clustered, according to user defined parameters. Each cluster is mapped to one in the prior data frame, and statistics are computed from each cluster. These characteristics include cluster size, position, and pupil size. A multiple discriminant analysis uses these statistics both within and between data frames to formulate optimal rules for assigning the observations into zooming, zoom-out, or no zoom conditions. The statistical procedure effectively generates heuristics for future assignments, based upon these variables. Future work will enhance the accuracy and precision of the modeling technique, and will empirically test users in controlled experiments.« less
Maljovec, D.; Liu, S.; Wang, B.; ...
2015-07-14
Here, dynamic probabilistic risk assessment (DPRA) methodologies couple system simulator codes (e.g., RELAP and MELCOR) with simulation controller codes (e.g., RAVEN and ADAPT). Whereas system simulator codes model system dynamics deterministically, simulation controller codes introduce both deterministic (e.g., system control logic and operating procedures) and stochastic (e.g., component failures and parameter uncertainties) elements into the simulation. Typically, a DPRA is performed by sampling values of a set of parameters and simulating the system behavior for that specific set of parameter values. For complex systems, a major challenge in using DPRA methodologies is to analyze the large number of scenarios generated,more » where clustering techniques are typically employed to better organize and interpret the data. In this paper, we focus on the analysis of two nuclear simulation datasets that are part of the risk-informed safety margin characterization (RISMC) boiling water reactor (BWR) station blackout (SBO) case study. We provide the domain experts a software tool that encodes traditional and topological clustering techniques within an interactive analysis and visualization environment, for understanding the structures of such high-dimensional nuclear simulation datasets. We demonstrate through our case study that both types of clustering techniques complement each other for enhanced structural understanding of the data.« less
Conjunctive Conceptual Clustering: A Methodology and Experimentation.
1987-09-01
observing a typical restaurant table on vhich there are such objects as food on a plate, a salad, utensils, salt and pepper, napkins , a ase with flowers, a...colored graph has nodes and inks that match only if they have corre-ponding link-olor and node-color labelg 4w 80 [SEtexture sa lif ba p S M i e d If...LINK LINK LINK LINK LINK 9 0 1 OPENdRECT RECTLOD 1 2 CL 10 0 0 LINK LINK INK LINK LINK ,~ . 0.5. Input file for attribute-based clustering The
Support Vector Data Descriptions and k-Means Clustering: One Class?
Gornitz, Nico; Lima, Luiz Alberto; Muller, Klaus-Robert; Kloft, Marius; Nakajima, Shinichi
2017-09-27
We present ClusterSVDD, a methodology that unifies support vector data descriptions (SVDDs) and k-means clustering into a single formulation. This allows both methods to benefit from one another, i.e., by adding flexibility using multiple spheres for SVDDs and increasing anomaly resistance and flexibility through kernels to k-means. In particular, our approach leads to a new interpretation of k-means as a regularized mode seeking algorithm. The unifying formulation further allows for deriving new algorithms by transferring knowledge from one-class learning settings to clustering settings and vice versa. As a showcase, we derive a clustering method for structured data based on a one-class learning scenario. Additionally, our formulation can be solved via a particularly simple optimization scheme. We evaluate our approach empirically to highlight some of the proposed benefits on artificially generated data, as well as on real-world problems, and provide a Python software package comprising various implementations of primal and dual SVDD as well as our proposed ClusterSVDD.
The XMM Cluster Survey: X-ray analysis methodology
NASA Astrophysics Data System (ADS)
Lloyd-Davies, E. J.; Romer, A. Kathy; Mehrtens, Nicola; Hosmer, Mark; Davidson, Michael; Sabirli, Kivanc; Mann, Robert G.; Hilton, Matt; Liddle, Andrew R.; Viana, Pedro T. P.; Campbell, Heather C.; Collins, Chris A.; Dubois, E. Naomi; Freeman, Peter; Harrison, Craig D.; Hoyle, Ben; Kay, Scott T.; Kuwertz, Emma; Miller, Christopher J.; Nichol, Robert C.; Sahlén, Martin; Stanford, S. A.; Stott, John P.
2011-11-01
The XMM Cluster Survey (XCS) is a serendipitous search for galaxy clusters using all publicly available data in the XMM-Newton Science Archive. Its main aims are to measure cosmological parameters and trace the evolution of X-ray scaling relations. In this paper we describe the data processing methodology applied to the 5776 XMM observations used to construct the current XCS source catalogue. A total of 3675 > 4σ cluster candidates with >50 background-subtracted X-ray counts are extracted from a total non-overlapping area suitable for cluster searching of 410 deg2. Of these, 993 candidates are detected with >300 background-subtracted X-ray photon counts, and we demonstrate that robust temperature measurements can be obtained down to this count limit. We describe in detail the automated pipelines used to perform the spectral and surface brightness fitting for these candidates, as well as to estimate redshifts from the X-ray data alone. A total of 587 (122) X-ray temperatures to a typical accuracy of <40 (<10) per cent have been measured to date. We also present the methodology adopted for determining the selection function of the survey, and show that the extended source detection algorithm is robust to a range of cluster morphologies by inserting mock clusters derived from hydrodynamical simulations into real XMMimages. These tests show that the simple isothermal β-profiles is sufficient to capture the essential details of the cluster population detected in the archival XMM observations. The redshift follow-up of the XCS cluster sample is presented in a companion paper, together with a first data release of 503 optically confirmed clusters.
ERIC Educational Resources Information Center
Steinley, Douglas; Brusco, Michael J.; Henson, Robert
2012-01-01
A measure of "clusterability" serves as the basis of a new methodology designed to preserve cluster structure in a reduced dimensional space. Similar to principal component analysis, which finds the direction of maximal variance in multivariate space, principal cluster axes find the direction of maximum clusterability in multivariate space.…
ERIC Educational Resources Information Center
Wang, Dongxu; Stewart, Donald; Chang, Chun
2016-01-01
Purpose: The purpose of this paper is to assess whether the school-based nutrition programme using the health-promoting school (HPS) framework was effective to improve parents' knowledge, attitudes and behaviour (KAB) in relation to nutrition in rural Mi Yun County, Beijing. Design/methodology/approach: A cluster-randomised intervention trial…
NASA Astrophysics Data System (ADS)
Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.
2018-07-01
Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter haloes. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the `accurate' regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard Λ cold dark matter (ΛCDM) + halo model against the clustering of Sloan Digital Sky Survey (SDSS) seventh data release (DR7) galaxies. Specifically, we use the projected correlation function, group multiplicity function, and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir haloes) matches the clustering of low-luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the `standard' halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kong, Bo; Fox, Rodney O.; Feng, Heng
An Euler–Euler anisotropic Gaussian approach (EE-AG) for simulating gas–particle flows, in which particle velocities are assumed to follow a multivariate anisotropic Gaussian distribution, is used to perform mesoscale simulations of homogeneous cluster-induced turbulence (CIT). A three-dimensional Gauss–Hermite quadrature formulation is used to calculate the kinetic flux for 10 velocity moments in a finite-volume framework. The particle-phase volume-fraction and momentum equations are coupled with the Eulerian solver for the gas phase. This approach is implemented in an open-source CFD package, OpenFOAM, and detailed simulation results are compared with previous Euler–Lagrange simulations in a domain size study of CIT. Here, these resultsmore » demonstrate that the proposed EE-AG methodology is able to produce comparable results to EL simulations, and this moment-based methodology can be used to perform accurate mesoscale simulations of dilute gas–particle flows.« less
Kong, Bo; Fox, Rodney O.; Feng, Heng; ...
2017-02-16
An Euler–Euler anisotropic Gaussian approach (EE-AG) for simulating gas–particle flows, in which particle velocities are assumed to follow a multivariate anisotropic Gaussian distribution, is used to perform mesoscale simulations of homogeneous cluster-induced turbulence (CIT). A three-dimensional Gauss–Hermite quadrature formulation is used to calculate the kinetic flux for 10 velocity moments in a finite-volume framework. The particle-phase volume-fraction and momentum equations are coupled with the Eulerian solver for the gas phase. This approach is implemented in an open-source CFD package, OpenFOAM, and detailed simulation results are compared with previous Euler–Lagrange simulations in a domain size study of CIT. Here, these resultsmore » demonstrate that the proposed EE-AG methodology is able to produce comparable results to EL simulations, and this moment-based methodology can be used to perform accurate mesoscale simulations of dilute gas–particle flows.« less
Tisettanta case study: the interoperation of furniture production companies
NASA Astrophysics Data System (ADS)
Amarilli, Fabrizio; Spreafico, Alberto
This chapter presents the Tisettanta case study, focusing on the definition of the possible innovations that ICT technologies can bring to the Italian wood-furniture industry. This sector is characterized by industrial clusters composed mainly of a few large companies with international brand reputations and a large base of SMEs that manufacture finished products or are specialized in the production of single components/processes (such as the Brianza cluster, where Tisettanta operates). In this particular business ecosystem, ICT technologies can bring relevant support and improvements to the supply chain process, where collaborations between enterprises are put into action through the exchange of business documents such as orders, order confirmation, bills of lading, invoices, etc. The analysis methodology adopted in the Tisettanta case study refers to the TEKNE Methodology of Change (see Chapter 2), which defines a framework for supporting firms in the adoption of the Internetworked Enterprise organizational paradigm.
Local Higher-Order Graph Clustering
Yin, Hao; Benson, Austin R.; Leskovec, Jure; Gleich, David F.
2018-01-01
Local graph clustering methods aim to find a cluster of nodes by exploring a small region of the graph. These methods are attractive because they enable targeted clustering around a given seed node and are faster than traditional global graph clustering methods because their runtime does not depend on the size of the input graph. However, current local graph partitioning methods are not designed to account for the higher-order structures crucial to the network, nor can they effectively handle directed networks. Here we introduce a new class of local graph clustering methods that address these issues by incorporating higher-order network information captured by small subgraphs, also called network motifs. We develop the Motif-based Approximate Personalized PageRank (MAPPR) algorithm that finds clusters containing a seed node with minimal motif conductance, a generalization of the conductance metric for network motifs. We generalize existing theory to prove the fast running time (independent of the size of the graph) and obtain theoretical guarantees on the cluster quality (in terms of motif conductance). We also develop a theory of node neighborhoods for finding sets that have small motif conductance, and apply these results to the case of finding good seed nodes to use as input to the MAPPR algorithm. Experimental validation on community detection tasks in both synthetic and real-world networks, shows that our new framework MAPPR outperforms the current edge-based personalized PageRank methodology. PMID:29770258
NASA Astrophysics Data System (ADS)
Schaefer, A. M.; Daniell, J. E.; Wenzel, F.
2014-12-01
Earthquake clustering tends to be an increasingly important part of general earthquake research especially in terms of seismic hazard assessment and earthquake forecasting and prediction approaches. The distinct identification and definition of foreshocks, aftershocks, mainshocks and secondary mainshocks is taken into account using a point based spatio-temporal clustering algorithm originating from the field of classic machine learning. This can be further applied for declustering purposes to separate background seismicity from triggered seismicity. The results are interpreted and processed to assemble 3D-(x,y,t) earthquake clustering maps which are based on smoothed seismicity records in space and time. In addition, multi-dimensional Gaussian functions are used to capture clustering parameters for spatial distribution and dominant orientations. Clusters are further processed using methodologies originating from geostatistics, which have been mostly applied and developed in mining projects during the last decades. A 2.5D variogram analysis is applied to identify spatio-temporal homogeneity in terms of earthquake density and energy output. The results are mitigated using Kriging to provide an accurate mapping solution for clustering features. As a case study, seismic data of New Zealand and the United States is used, covering events since the 1950s, from which an earthquake cluster catalogue is assembled for most of the major events, including a detailed analysis of the Landers and Christchurch sequences.
Intuitive color-based visualization of multimedia content as large graphs
NASA Astrophysics Data System (ADS)
Delest, Maylis; Don, Anthony; Benois-Pineau, Jenny
2004-06-01
Data visualization techniques are penetrating in various technological areas. In the field of multimedia such as information search and retrieval in multimedia archives, or digital media production and post-production, data visualization methodologies based on large graphs give an exciting alternative to conventional storyboard visualization. In this paper we develop a new approach to visualization of multimedia (video) documents based both on large graph clustering and preliminary video segmenting and indexing.
Proof test methodology for composites
NASA Technical Reports Server (NTRS)
Wu, Edward M.; Bell, David K.
1992-01-01
The special requirements for proof test of composites are identified based on the underlying failure process of composites. Two proof test methods are developed to eliminate the inevitable weak fiber sites without also causing flaw clustering which weakens the post-proof-test composite. Significant reliability enhancement by these proof test methods has been experimentally demonstrated for composite strength and composite life in tension. This basic proof test methodology is relevant to the certification and acceptance of critical composite structures. It can also be applied to the manufacturing process development to achieve zero-reject for very large composite structures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ward, Lee H.; Laros, James H., III
This paper describes a methodology for implementing disk-less cluster systems using the Network File System (NFS) that scales to thousands of nodes. This method has been successfully deployed and is currently in use on several production systems at Sandia National Labs. This paper will outline our methodology and implementation, discuss hardware and software considerations in detail and present cluster configurations with performance numbers for various management operations like booting.
Roushangar, Kiyoumars; Alizadeh, Farhad; Adamowski, Jan
2018-08-01
Understanding precipitation on a regional basis is an important component of water resources planning and management. The present study outlines a methodology based on continuous wavelet transform (CWT) and multiscale entropy (CWME), combined with self-organizing map (SOM) and k-means clustering techniques, to measure and analyze the complexity of precipitation. Historical monthly precipitation data from 1960 to 2010 at 31 rain gauges across Iran were preprocessed by CWT. The multi-resolution CWT approach segregated the major features of the original precipitation series by unfolding the structure of the time series which was often ambiguous. The entropy concept was then applied to components obtained from CWT to measure dispersion, uncertainty, disorder, and diversification of subcomponents. Based on different validity indices, k-means clustering captured homogenous areas more accurately, and additional analysis was performed based on the outcome of this approach. The 31 rain gauges in this study were clustered into 6 groups, each one having a unique CWME pattern across different time scales. The results of clustering showed that hydrologic similarity (multiscale variation of precipitation) was not based on geographic contiguity. According to the pattern of entropy across the scales, each cluster was assigned an entropy signature that provided an estimation of the entropy pattern of precipitation data in each cluster. Based on the pattern of mean CWME for each cluster, a characteristic signature was assigned, which provided an estimation of the CWME of a cluster across scales of 1-2, 3-8, and 9-13 months relative to other stations. The validity of the homogeneous clusters demonstrated the usefulness of the proposed approach to regionalize precipitation. Further analysis based on wavelet coherence (WTC) was performed by selecting central rain gauges in each cluster and analyzing against temperature, wind, Multivariate ENSO index (MEI), and East Atlantic (EA) and North Atlantic Oscillation (NAO), indeces. The results revealed that all climatic features except NAO influenced precipitation in Iran during the 1960-2010 period. Copyright © 2018 Elsevier Inc. All rights reserved.
Design of double fuzzy clustering-driven context neural networks.
Kim, Eun-Hu; Oh, Sung-Kwun; Pedrycz, Witold
2018-08-01
In this study, we introduce a novel category of double fuzzy clustering-driven context neural networks (DFCCNNs). The study is focused on the development of advanced design methodologies for redesigning the structure of conventional fuzzy clustering-based neural networks. The conventional fuzzy clustering-based neural networks typically focus on dividing the input space into several local spaces (implied by clusters). In contrast, the proposed DFCCNNs take into account two distinct local spaces called context and cluster spaces, respectively. Cluster space refers to the local space positioned in the input space whereas context space concerns a local space formed in the output space. Through partitioning the output space into several local spaces, each context space is used as the desired (target) local output to construct local models. To complete this, the proposed network includes a new context layer for reasoning about context space in the output space. In this sense, Fuzzy C-Means (FCM) clustering is useful to form local spaces in both input and output spaces. The first one is used in order to form clusters and train weights positioned between the input and hidden layer, whereas the other one is applied to the output space to form context spaces. The key features of the proposed DFCCNNs can be enumerated as follows: (i) the parameters between the input layer and hidden layer are built through FCM clustering. The connections (weights) are specified as constant terms being in fact the centers of the clusters. The membership functions (represented through the partition matrix) produced by the FCM are used as activation functions located at the hidden layer of the "conventional" neural networks. (ii) Following the hidden layer, a context layer is formed to approximate the context space of the output variable and each node in context layer means individual local model. The outputs of the context layer are specified as a combination of both weights formed as linear function and the outputs of the hidden layer. The weights are updated using the least square estimation (LSE)-based method. (iii) At the output layer, the outputs of context layer are decoded to produce the corresponding numeric output. At this time, the weighted average is used and the weights are also adjusted with the use of the LSE scheme. From the viewpoint of performance improvement, the proposed design methodologies are discussed and experimented with the aid of benchmark machine learning datasets. Through the experiments, it is shown that the generalization abilities of the proposed DFCCNNs are better than those of the conventional FCNNs reported in the literature. Copyright © 2018 Elsevier Ltd. All rights reserved.
Holmes, Sean T; Alkan, Fahri; Iuliucci, Robbie J; Mueller, Karl T; Dybowski, Cecil
2016-07-05
(29) Si and (31) P magnetic-shielding tensors in covalent network solids have been evaluated using periodic and cluster-based calculations. The cluster-based computational methodology employs pseudoatoms to reduce the net charge (resulting from missing co-ordination on the terminal atoms) through valence modification of terminal atoms using bond-valence theory (VMTA/BV). The magnetic-shielding tensors computed with the VMTA/BV method are compared to magnetic-shielding tensors determined with the periodic GIPAW approach. The cluster-based all-electron calculations agree with experiment better than the GIPAW calculations, particularly for predicting absolute magnetic shielding and for predicting chemical shifts. The performance of the DFT functionals CA-PZ, PW91, PBE, rPBE, PBEsol, WC, and PBE0 are assessed for the prediction of (29) Si and (31) P magnetic-shielding constants. Calculations using the hybrid functional PBE0, in combination with the VMTA/BV approach, result in excellent agreement with experiment. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Use of regionalisation approach to develop fire frequency curves for Victoria, Australia
NASA Astrophysics Data System (ADS)
Khastagir, Anirban; Jayasuriya, Niranjali; Bhuyian, Muhammed A.
2017-11-01
It is important to perform fire frequency analysis to obtain fire frequency curves (FFC) based on fire intensity at different parts of Victoria. In this paper fire frequency curves (FFCs) were derived based on forest fire danger index (FFDI). FFDI is a measure related to fire initiation, spreading speed and containment difficulty. The mean temperature (T), relative humidity (RH) and areal extent of open water (LC2) during summer months (Dec-Feb) were identified as the most important parameters for assessing the risk of occurrence of bushfire. Based on these parameters, Andrews' curve equation was applied to 40 selected meteorological stations to identify homogenous stations to form unique clusters. A methodology using peak FFDI from cluster averaged FFDIs was developed by applying Log Pearson Type III (LPIII) distribution to generate FFCs. A total of nine homogeneous clusters across Victoria were identified, and subsequently their FFC's were developed in order to estimate the regionalised fire occurrence characteristics.
Generalization of Clustering Coefficients to Signed Correlation Networks
Costantini, Giulio; Perugini, Marco
2014-01-01
The recent interest in network analysis applications in personality psychology and psychopathology has put forward new methodological challenges. Personality and psychopathology networks are typically based on correlation matrices and therefore include both positive and negative edge signs. However, some applications of network analysis disregard negative edges, such as computing clustering coefficients. In this contribution, we illustrate the importance of the distinction between positive and negative edges in networks based on correlation matrices. The clustering coefficient is generalized to signed correlation networks: three new indices are introduced that take edge signs into account, each derived from an existing and widely used formula. The performances of the new indices are illustrated and compared with the performances of the unsigned indices, both on a signed simulated network and on a signed network based on actual personality psychology data. The results show that the new indices are more resistant to sample variations in correlation networks and therefore have higher convergence compared with the unsigned indices both in simulated networks and with real data. PMID:24586367
In general, the accuracy of a predicted toxicity value increases with increase in similarity between the query chemical and the chemicals used to develop a QSAR model. A toxicity estimation methodology employing this finding has been developed. A hierarchical based clustering t...
Jiang, Jheng Jie; Lee, Chon Lin; Fang, Meng Der; Boyd, Kenneth G.; Gibb, Stuart W.
2015-01-01
This paper presents a methodology based on multivariate data analysis for characterizing potential source contributions of emerging contaminants (ECs) detected in 26 river water samples across multi-scape regions during dry and wet seasons. Based on this methodology, we unveil an approach toward potential source contributions of ECs, a concept we refer to as the “Pharmaco-signature.” Exploratory analysis of data points has been carried out by unsupervised pattern recognition (hierarchical cluster analysis, HCA) and receptor model (principal component analysis-multiple linear regression, PCA-MLR) in an attempt to demonstrate significant source contributions of ECs in different land-use zone. Robust cluster solutions grouped the database according to different EC profiles. PCA-MLR identified that 58.9% of the mean summed ECs were contributed by domestic impact, 9.7% by antibiotics application, and 31.4% by drug abuse. Diclofenac, ibuprofen, codeine, ampicillin, tetracycline, and erythromycin-H2O have significant pollution risk quotients (RQ>1), indicating potentially high risk to aquatic organisms in Taiwan. PMID:25874375
Zhang, Shu; Taft, Cyrus W; Bentsman, Joseph; Hussey, Aaron; Petrus, Bryan
2012-09-01
Tuning a complex multi-loop PID based control system requires considerable experience. In today's power industry the number of available qualified tuners is dwindling and there is a great need for better tuning tools to maintain and improve the performance of complex multivariable processes. Multi-loop PID tuning is the procedure for the online tuning of a cluster of PID controllers operating in a closed loop with a multivariable process. This paper presents the first application of the simultaneous tuning technique to the multi-input-multi-output (MIMO) PID based nonlinear controller in the power plant control context, with the closed-loop system consisting of a MIMO nonlinear boiler/turbine model and a nonlinear cluster of six PID-type controllers. Although simplified, the dynamics and cross-coupling of the process and the PID cluster are similar to those used in a real power plant. The particular technique selected, iterative feedback tuning (IFT), utilizes the linearized version of the PID cluster for signal conditioning, but the data collection and tuning is carried out on the full nonlinear closed-loop system. Based on the figure of merit for the control system performance, the IFT is shown to deliver performance favorably comparable to that attained through the empirical tuning carried out by an experienced control engineer. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Pathak, Maharshi
City administrators and real-estate developers have been setting up rather aggressive energy efficiency targets. This, in turn, has led the building science research groups across the globe to focus on urban scale building performance studies and level of abstraction associated with the simulations of the same. The increasing maturity of the stakeholders towards energy efficiency and creating comfortable working environment has led researchers to develop methodologies and tools for addressing the policy driven interventions whether it's urban level energy systems, buildings' operational optimization or retrofit guidelines. Typically, these large-scale simulations are carried out by grouping buildings based on their design similarities i.e. standardization of the buildings. Such an approach does not necessarily lead to potential working inputs which can make decision-making effective. To address this, a novel approach is proposed in the present study. The principle objective of this study is to propose, to define and evaluate the methodology to utilize machine learning algorithms in defining representative building archetypes for the Stock-level Building Energy Modeling (SBEM) which are based on operational parameter database. The study uses "Phoenix- climate" based CBECS-2012 survey microdata for analysis and validation. Using the database, parameter correlations are studied to understand the relation between input parameters and the energy performance. Contrary to precedence, the study establishes that the energy performance is better explained by the non-linear models. The non-linear behavior is explained by advanced learning algorithms. Based on these algorithms, the buildings at study are grouped into meaningful clusters. The cluster "mediod" (statistically the centroid, meaning building that can be represented as the centroid of the cluster) are established statistically to identify the level of abstraction that is acceptable for the whole building energy simulations and post that the retrofit decision-making. Further, the methodology is validated by conducting Monte-Carlo simulations on 13 key input simulation parameters. The sensitivity analysis of these 13 parameters is utilized to identify the optimum retrofits. From the sample analysis, the envelope parameters are found to be more sensitive towards the EUI of the building and thus retrofit packages should also be directed to maximize the energy usage reduction.
Real Time Intelligent Target Detection and Analysis with Machine Vision
NASA Technical Reports Server (NTRS)
Howard, Ayanna; Padgett, Curtis; Brown, Kenneth
2000-01-01
We present an algorithm for detecting a specified set of targets for an Automatic Target Recognition (ATR) application. ATR involves processing images for detecting, classifying, and tracking targets embedded in a background scene. We address the problem of discriminating between targets and nontarget objects in a scene by evaluating 40x40 image blocks belonging to an image. Each image block is first projected onto a set of templates specifically designed to separate images of targets embedded in a typical background scene from those background images without targets. These filters are found using directed principal component analysis which maximally separates the two groups. The projected images are then clustered into one of n classes based on a minimum distance to a set of n cluster prototypes. These cluster prototypes have previously been identified using a modified clustering algorithm based on prior sensed data. Each projected image pattern is then fed into the associated cluster's trained neural network for classification. A detailed description of our algorithm will be given in this paper. We outline our methodology for designing the templates, describe our modified clustering algorithm, and provide details on the neural network classifiers. Evaluation of the overall algorithm demonstrates that our detection rates approach 96% with a false positive rate of less than 0.03%.
NASA Astrophysics Data System (ADS)
Chené, A.-N.; Borissova, J.; Clarke, J. R. A.; Bonatto, C.; Majaess, D. J.; Moni Bidin, C.; Sale, S. E.; Mauro, F.; Kurtev, R.; Baume, G.; Feinstein, C.; Ivanov, V. D.; Geisler, D.; Catelan, M.; Minniti, D.; Lucas, P.; de Grijs, R.; Kumar, M. S. N.
2012-09-01
Context. The ESO Public Survey "VISTA Variables in the Vía Láctea" (VVV) provides deep multi-epoch infrared observations for unprecedented 562 sq. degrees of the Galactic bulge, and adjacent regions of the disk. Aims: The VVV observations will foster the construction of a sample of Galactic star clusters with reliable and homogeneously derived physical parameters (e.g., age, distance, and mass, etc.). In this first paper in a series, the methodology employed to establish cluster parameters for the envisioned database are elaborated upon by analysing four known young open clusters: Danks 1, Danks 2, RCW 79, and DBS 132. The analysis offers a first glimpse of the information that can be gleaned from the VVV observations for clusters in the final database. Methods: Wide-field, deep JHKs VVV observations, combined with new infrared spectroscopy, are employed to constrain fundamental parameters for a subset of clusters. Results: Results are inferred from VVV near-infrared photometry and numerous low resolution spectra (typically more than 10 per cluster). The high quality of the spectra and the deep wide-field VVV photometry enables us to precisely and independently determine the characteristics of the clusters studied, which we compare to previous determinations. An anomalous reddening law in the direction of the Danks clusters is found, specifically E(J - H)/E(H - Ks) = 2.20 ± 0.06, which exceeds published values for the inner Galaxy. The G305 star forming complex, which includes the Danks clusters, lies beyond the Sagittarius-Carina spiral arm and occupies the Centaurus arm. Finally, the first deep infrared colour-magnitude diagram of RCW 79 is presented, which reveals a sizeable pre-main sequence population. A list of candidate variable stars in G305 region is reported. Conclusions: This study demonstrates the strength of the dataset and methodology employed, and constitutes the first step of a broader study which shall include reliable parameters for a sizeable number of poorly characterised and/or newly discovered clusters. Based on observations made with NTT telescope at the La Silla Observatory, ESO, under programme ID 087.D-0490A, and with the Clay telescope at the Las Campanas Observatory under programme CN2011A-086. Also based on data from the VVV survey observed under program ID 172.B-2002.Tables 1, 5 and 6 are available in electronic form at http://www.aanda.org
A Hierarchical Bayesian Procedure for Two-Mode Cluster Analysis
ERIC Educational Resources Information Center
DeSarbo, Wayne S.; Fong, Duncan K. H.; Liechty, John; Saxton, M. Kim
2004-01-01
This manuscript introduces a new Bayesian finite mixture methodology for the joint clustering of row and column stimuli/objects associated with two-mode asymmetric proximity, dominance, or profile data. That is, common clusters are derived which partition both the row and column stimuli/objects simultaneously into the same derived set of clusters.…
Cluster Stability Estimation Based on a Minimal Spanning Trees Approach
NASA Astrophysics Data System (ADS)
Volkovich, Zeev (Vladimir); Barzily, Zeev; Weber, Gerhard-Wilhelm; Toledano-Kitai, Dvora
2009-08-01
Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters.
Temporal Methods to Detect Content-Based Anomalies in Social Media
DOE Office of Scientific and Technical Information (OSTI.GOV)
Skryzalin, Jacek; Field, Jr., Richard; Fisher, Andrew N.
Here, we develop a method for time-dependent topic tracking and meme trending in social media. Our objective is to identify time periods whose content differs signifcantly from normal, and we utilize two techniques to do so. The first is an information-theoretic analysis of the distributions of terms emitted during different periods of time. In the second, we cluster documents from each time period and analyze the tightness of each clustering. We also discuss a method of combining the scores created by each technique, and we provide ample empirical analysis of our methodology on various Twitter datasets.
Cluster Randomised Trials in Cochrane Reviews: Evaluation of Methodological and Reporting Practice.
Richardson, Marty; Garner, Paul; Donegan, Sarah
2016-01-01
Systematic reviews can include cluster-randomised controlled trials (C-RCTs), which require different analysis compared with standard individual-randomised controlled trials. However, it is not known whether review authors follow the methodological and reporting guidance when including these trials. The aim of this study was to assess the methodological and reporting practice of Cochrane reviews that included C-RCTs against criteria developed from existing guidance. Criteria were developed, based on methodological literature and personal experience supervising review production and quality. Criteria were grouped into four themes: identifying, reporting, assessing risk of bias, and analysing C-RCTs. The Cochrane Database of Systematic Reviews was searched (2nd December 2013), and the 50 most recent reviews that included C-RCTs were retrieved. Each review was then assessed using the criteria. The 50 reviews we identified were published by 26 Cochrane Review Groups between June 2013 and November 2013. For identifying C-RCTs, only 56% identified that C-RCTs were eligible for inclusion in the review in the eligibility criteria. For reporting C-RCTs, only eight (24%) of the 33 reviews reported the method of cluster adjustment for their included C-RCTs. For assessing risk of bias, only one review assessed all five C-RCT-specific risk-of-bias criteria. For analysing C-RCTs, of the 27 reviews that presented unadjusted data, only nine (33%) provided a warning that confidence intervals may be artificially narrow. Of the 34 reviews that reported data from unadjusted C-RCTs, only 13 (38%) excluded the unadjusted results from the meta-analyses. The methodological and reporting practices in Cochrane reviews incorporating C-RCTs could be greatly improved, particularly with regard to analyses. Criteria developed as part of the current study could be used by review authors or editors to identify errors and improve the quality of published systematic reviews incorporating C-RCTs.
NASA Technical Reports Server (NTRS)
Yedavalli, R. K.
1992-01-01
The aspect of controller design for improving the ride quality of aircraft in terms of damping ratio and natural frequency specifications on the short period dynamics is addressed. The controller is designed to be robust with respect to uncertainties in the real parameters of the control design model such as uncertainties in the dimensional stability derivatives, imperfections in actuator/sensor locations and possibly variations in flight conditions, etc. The design is based on a new robust root clustering theory developed by the author by extending the nominal root clustering theory of Gutman and Jury to perturbed matrices. The proposed methodology allows to get an explicit relationship between the parameters of the root clustering region and the uncertainty radius of the parameter space. The current literature available for robust stability becomes a special case of this unified theory. The bounds derived on the parameter perturbation for robust root clustering are then used in selecting the robust controller.
Spatial clustering of average risks and risk trends in Bayesian disease mapping.
Anderson, Craig; Lee, Duncan; Dean, Nema
2017-01-01
Spatiotemporal disease mapping focuses on estimating the spatial pattern in disease risk across a set of nonoverlapping areal units over a fixed period of time. The key aim of such research is to identify areas that have a high average level of disease risk or where disease risk is increasing over time, thus allowing public health interventions to be focused on these areas. Such aims are well suited to the statistical approach of clustering, and while much research has been done in this area in a purely spatial setting, only a handful of approaches have focused on spatiotemporal clustering of disease risk. Therefore, this paper outlines a new modeling approach for clustering spatiotemporal disease risk data, by clustering areas based on both their mean risk levels and the behavior of their temporal trends. The efficacy of the methodology is established by a simulation study, and is illustrated by a study of respiratory disease risk in Glasgow, Scotland. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Clustered Numerical Data Analysis Using Markov Lie Monoid Based Networks
NASA Astrophysics Data System (ADS)
Johnson, Joseph
2016-03-01
We have designed and build an optimal numerical standardization algorithm that links numerical values with their associated units, error level, and defining metadata thus supporting automated data exchange and new levels of artificial intelligence (AI). The software manages all dimensional and error analysis and computational tracing. Tables of entities verses properties of these generalized numbers (called ``metanumbers'') support a transformation of each table into a network among the entities and another network among their properties where the network connection matrix is based upon a proximity metric between the two items. We previously proved that every network is isomorphic to the Lie algebra that generates continuous Markov transformations. We have also shown that the eigenvectors of these Markov matrices provide an agnostic clustering of the underlying patterns. We will present this methodology and show how our new work on conversion of scientific numerical data through this process can reveal underlying information clusters ordered by the eigenvalues. We will also show how the linking of clusters from different tables can be used to form a ``supernet'' of all numerical information supporting new initiatives in AI.
Theofilatos, Konstantinos; Pavlopoulou, Niki; Papasavvas, Christoforos; Likothanassis, Spiros; Dimitrakopoulos, Christos; Georgopoulos, Efstratios; Moschopoulos, Charalampos; Mavroudi, Seferina
2015-03-01
Proteins are considered to be the most important individual components of biological systems and they combine to form physical protein complexes which are responsible for certain molecular functions. Despite the large availability of protein-protein interaction (PPI) information, not much information is available about protein complexes. Experimental methods are limited in terms of time, efficiency, cost and performance constraints. Existing computational methods have provided encouraging preliminary results, but they phase certain disadvantages as they require parameter tuning, some of them cannot handle weighted PPI data and others do not allow a protein to participate in more than one protein complex. In the present paper, we propose a new fully unsupervised methodology for predicting protein complexes from weighted PPI graphs. The proposed methodology is called evolutionary enhanced Markov clustering (EE-MC) and it is a hybrid combination of an adaptive evolutionary algorithm and a state-of-the-art clustering algorithm named enhanced Markov clustering. EE-MC was compared with state-of-the-art methodologies when applied to datasets from the human and the yeast Saccharomyces cerevisiae organisms. Using public available datasets, EE-MC outperformed existing methodologies (in some datasets the separation metric was increased by 10-20%). Moreover, when applied to new human datasets its performance was encouraging in the prediction of protein complexes which consist of proteins with high functional similarity. In specific, 5737 protein complexes were predicted and 72.58% of them are enriched for at least one gene ontology (GO) function term. EE-MC is by design able to overcome intrinsic limitations of existing methodologies such as their inability to handle weighted PPI networks, their constraint to assign every protein in exactly one cluster and the difficulties they face concerning the parameter tuning. This fact was experimentally validated and moreover, new potentially true human protein complexes were suggested as candidates for further validation using experimental techniques. Copyright © 2015 Elsevier B.V. All rights reserved.
Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033
NASA Astrophysics Data System (ADS)
Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.
2018-04-01
Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.
Promoting psychology to students: embracing the multiplicity of research foci and method
Rees, Clare S.
2013-01-01
In order for the discipline of psychology to continue to thrive it is imperative that future students are effectively recruited into the field. Research is an important part of the discipline and it is argued that the nature of psychological research is naturally one of multiplicity in topic and methodology and that promoting and highlighting this should be considered as a potentially effective recruitment strategy. In this study, a snap-shot of current research topics and methodologies was collected based on published papers from one typical academic psychology department in Australia. Fifty articles published in the period 2010–2013 were randomly selected and then grouped using content analysis to form topic clusters. Five main clusters were identified and included: Grief and Loss; Psychopathology; Sociocultural Studies; Attachment and Parenting; and Developmental Disorders. The studies spanned the full spectrum of research methodologies from quantitative to qualitative and had implications for assessment practices, diagnosis, prevention and treatment, education, and policy. The findings are discussed in terms of the unique characteristics of psychology as a discipline and how this diversity ought to be utilized as the main selling point of the discipline to future students. PMID:24155737
Poole, William; Leinonen, Kalle; Shmulevich, Ilya
2017-01-01
Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C. PMID:28170390
Poole, William; Leinonen, Kalle; Shmulevich, Ilya; Knijnenburg, Theo A; Bernard, Brady
2017-02-01
Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.
Prosperi, Mattia C F; De Luca, Andrea; Di Giambenedetto, Simona; Bracciale, Laura; Fabbiani, Massimiliano; Cauda, Roberto; Salemi, Marco
2010-10-25
Phylogenetic methods produce hierarchies of molecular species, inferring knowledge about taxonomy and evolution. However, there is not yet a consensus methodology that provides a crisp partition of taxa, desirable when considering the problem of intra/inter-patient quasispecies classification or infection transmission event identification. We introduce the threshold bootstrap clustering (TBC), a new methodology for partitioning molecular sequences, that does not require a phylogenetic tree estimation. The TBC is an incremental partition algorithm, inspired by the stochastic Chinese restaurant process, and takes advantage of resampling techniques and models of sequence evolution. TBC uses as input a multiple alignment of molecular sequences and its output is a crisp partition of the taxa into an automatically determined number of clusters. By varying initial conditions, the algorithm can produce different partitions. We describe a procedure that selects a prime partition among a set of candidate ones and calculates a measure of cluster reliability. TBC was successfully tested for the identification of type-1 human immunodeficiency and hepatitis C virus subtypes, and compared with previously established methodologies. It was also evaluated in the problem of HIV-1 intra-patient quasispecies clustering, and for transmission cluster identification, using a set of sequences from patients with known transmission event histories. TBC has been shown to be effective for the subtyping of HIV and HCV, and for identifying intra-patient quasispecies. To some extent, the algorithm was able also to infer clusters corresponding to events of infection transmission. The computational complexity of TBC is quadratic in the number of taxa, lower than other established methods; in addition, TBC has been enhanced with a measure of cluster reliability. The TBC can be useful to characterise molecular quasipecies in a broad context.
R, GeethaRamani; Balasubramanian, Lakshmi
2018-07-01
Macula segmentation and fovea localization is one of the primary tasks in retinal analysis as they are responsible for detailed vision. Existing approaches required segmentation of retinal structures viz. optic disc and blood vessels for this purpose. This work avoids knowledge of other retinal structures and attempts data mining techniques to segment macula. Unsupervised clustering algorithm is exploited for this purpose. Selection of initial cluster centres has a great impact on performance of clustering algorithms. A heuristic based clustering in which initial centres are selected based on measures defining statistical distribution of data is incorporated in the proposed methodology. The initial phase of proposed framework includes image cropping, green channel extraction, contrast enhancement and application of mathematical closing. Then, the pre-processed image is subjected to heuristic based clustering yielding a binary map. The binary image is post-processed to eliminate unwanted components. Finally, the component which possessed the minimum intensity is finalized as macula and its centre constitutes the fovea. The proposed approach outperforms existing works by reporting that 100%,of HRF, 100% of DRIVE, 96.92% of DIARETDB0, 97.75% of DIARETDB1, 98.81% of HEI-MED, 90% of STARE and 99.33% of MESSIDOR images satisfy the 1R criterion, a standard adopted for evaluating performance of macula and fovea identification. The proposed system thus helps the ophthalmologists in identifying the macula thereby facilitating to identify if any abnormality is present within the macula region. Copyright © 2018 Elsevier B.V. All rights reserved.
Non-DSB clustered DNA lesions. Does theory colocalize with the experiment?
NASA Astrophysics Data System (ADS)
Nikitaki, Zacharenia; Nikolov, Vladimir; Mavragani, Ifigeneia V.; Plante, Ianik; Emfietzoglou, Dimitris; Iliakis, George; Georgakilas, Alexandros G.
2016-11-01
Ionizing radiation results in various kinds of DNA lesions such as double strand breaks (DSBs) and other non-DSB base lesions. These lesions may be formed in close proximity (i.e., within a few nanometers) resulting in clustered types of DNA lesions. These damage clusters are considered the fingerprint of ionizing radiation, notably charged particles of high linear energy transfer (LET). Accumulating theoretical and experimental evidence suggests that the induction of these clustered lesions appears under various irradiation conditions but also as a result of high levels of oxidative stress. The biological significance of these clustered DNA lesions pertains to the inability of cells to process them efficiently compared to isolated DNA lesions. The results in the case of unsuccessful or erroneous repair can vary from mutations up to chromosomal instability. In this mini review, we discuss of several Monte Carlo simulations codes and experimental evidence regarding the induction and repair of radiation-induced non-DSB complex DNA lesions. We also critically present the most widely used methodologies (i.e., gel electrophoresis and fluorescence microscopy [in situ colocalization assays]). Based on the comparison of different approaches, we provide examples and suggestions for the improved detection of these lesions in situ. Based on the current status of knowledge, we conclude that there is a great need for improvement of the detection techniques at the cellular or tissue level, which will provide valuable information for understanding the mechanisms used by the cell to process clustered DNA lesions.
Ocké, Marga C
2013-05-01
This paper aims to describe different approaches for studying the overall diet with advantages and limitations. Studies of the overall diet have emerged because the relationship between dietary intake and health is very complex with all kinds of interactions. These cannot be captured well by studying single dietary components. Three main approaches to study the overall diet can be distinguished. The first method is researcher-defined scores or indices of diet quality. These are usually based on guidelines for a healthy diet or on diets known to be healthy. The second approach, using principal component or cluster analysis, is driven by the underlying dietary data. In principal component analysis, scales are derived based on the underlying relationships between food groups, whereas in cluster analysis, subgroups of the population are created with people that cluster together based on their dietary intake. A third approach includes methods that are driven by a combination of biological pathways and the underlying dietary data. Reduced rank regression defines linear combinations of food intakes that maximally explain nutrient intakes or intermediate markers of disease. Decision tree analysis identifies subgroups of a population whose members share dietary characteristics that influence (intermediate markers of) disease. It is concluded that all approaches have advantages and limitations and essentially answer different questions. The third approach is still more in an exploration phase, but seems to have great potential with complementary value. More insight into the utility of conducting studies on the overall diet can be gained if more attention is given to methodological issues.
Improving the Statistical Modeling of the TRMM Extreme Precipitation Monitoring System
NASA Astrophysics Data System (ADS)
Demirdjian, L.; Zhou, Y.; Huffman, G. J.
2016-12-01
This project improves upon an existing extreme precipitation monitoring system based on the Tropical Rainfall Measuring Mission (TRMM) daily product (3B42) using new statistical models. The proposed system utilizes a regional modeling approach, where data from similar grid locations are pooled to increase the quality and stability of the resulting model parameter estimates to compensate for the short data record. The regional frequency analysis is divided into two stages. In the first stage, the region defined by the TRMM measurements is partitioned into approximately 27,000 non-overlapping clusters using a recursive k-means clustering scheme. In the second stage, a statistical model is used to characterize the extreme precipitation events occurring in each cluster. Instead of utilizing the block-maxima approach used in the existing system, where annual maxima are fit to the Generalized Extreme Value (GEV) probability distribution at each cluster separately, the present work adopts the peak-over-threshold (POT) method of classifying points as extreme if they exceed a pre-specified threshold. Theoretical considerations motivate the use of the Generalized-Pareto (GP) distribution for fitting threshold exceedances. The fitted parameters can be used to construct simple and intuitive average recurrence interval (ARI) maps which reveal how rare a particular precipitation event is given its spatial location. The new methodology eliminates much of the random noise that was produced by the existing models due to a short data record, producing more reasonable ARI maps when compared with NOAA's long-term Climate Prediction Center (CPC) ground based observations. The resulting ARI maps can be useful for disaster preparation, warning, and management, as well as increased public awareness of the severity of precipitation events. Furthermore, the proposed methodology can be applied to various other extreme climate records.
NASA Astrophysics Data System (ADS)
Lima, Carlos H. R.; AghaKouchak, Amir; Lall, Upmanu
2017-12-01
Floods are the main natural disaster in Brazil, causing substantial economic damage and loss of life. Studies suggest that some extreme floods result from a causal climate chain. Exceptional rain and floods are determined by large-scale anomalies and persistent patterns in the atmospheric and oceanic circulations, which influence the magnitude, extent, and duration of these extremes. Moreover, floods can result from different generating mechanisms. These factors contradict the assumptions of homogeneity, and often stationarity, in flood frequency analysis. Here we outline a methodological framework based on clustering using self-organizing maps (SOMs) that allows the linkage of large-scale processes to local-scale observations. The methodology is applied to flood data from several sites in the flood-prone Upper Paraná River basin (UPRB) in southern Brazil. The SOM clustering approach is employed to classify the 6-day rainfall field over the UPRB into four categories, which are then used to classify floods into four types based on the spatiotemporal dynamics of the rainfall field prior to the observed flood events. An analysis of the vertically integrated moisture fluxes, vorticity, and high-level atmospheric circulation revealed that these four clusters are related to known tropical and extratropical processes, including the South American low-level jet (SALLJ); extratropical cyclones; and the South Atlantic Convergence Zone (SACZ). Persistent anomalies in the sea surface temperature fields in the Pacific and Atlantic oceans are also found to be associated with these processes. Floods associated with each cluster present different patterns in terms of frequency, magnitude, spatial variability, scaling, and synchronization of events across the sites and subbasins. These insights suggest new directions for flood risk assessment, forecasting, and management.
Servien, Rémi; Mamy, Laure; Li, Ziang; Rossard, Virginie; Latrille, Eric; Bessac, Fabienne; Patureau, Dominique; Benoit, Pierre
2014-09-01
Following legislation, the assessment of the environmental risks of 30000-100000 chemical substances is required for their registration dossiers. However, their behavior in the environment and their transfer to environmental components such as water or atmosphere are studied for only a very small proportion of the chemical in laboratory tests or monitoring studies because it is time-consuming and/or cost prohibitive. Therefore, the objective of this work was to develop a new methodology, TyPol, to classify organic compounds, and their degradation products, according to both their behavior in the environment and their molecular properties. The strategy relies on partial least squares analysis and hierarchical clustering. The calculation of molecular descriptors is based on an in silico approach, and the environmental endpoints (i.e. environmental parameters) are extracted from several available databases and literature. The classification of 215 organic compounds inputted in TyPol for this proof-of-concept study showed that the combination of some specific molecular descriptors could be related to a particular behavior in the environment. TyPol also provided an analysis of similarities (or dissimilarities) between organic compounds and their degradation products. Among the 24 degradation products that were inputted, 58% were found in the same cluster as their parents. The robustness of the method was tested and shown to be good. TyPol could help to predict the environmental behavior of a "new" compound (parent compound or degradation product) from its affiliation to one cluster, but also to select representative substances from a large data set in order to answer some specific questions regarding their behavior in the environment. Copyright © 2014 Elsevier Ltd. All rights reserved.
Mining the modular structure of protein interaction networks.
Berenstein, Ariel José; Piñero, Janet; Furlong, Laura Inés; Chernomoretz, Ariel
2015-01-01
Cluster-based descriptions of biological networks have received much attention in recent years fostered by accumulated evidence of the existence of meaningful correlations between topological network clusters and biological functional modules. Several well-performing clustering algorithms exist to infer topological network partitions. However, due to respective technical idiosyncrasies they might produce dissimilar modular decompositions of a given network. In this contribution, we aimed to analyze how alternative modular descriptions could condition the outcome of follow-up network biology analysis. We considered a human protein interaction network and two paradigmatic cluster recognition algorithms, namely: the Clauset-Newman-Moore and the infomap procedures. We analyzed to what extent both methodologies yielded different results in terms of granularity and biological congruency. In addition, taking into account Guimera's cartographic role characterization of network nodes, we explored how the adoption of a given clustering methodology impinged on the ability to highlight relevant network meso-scale connectivity patterns. As a case study we considered a set of aging related proteins and showed that only the high-resolution modular description provided by infomap, could unveil statistically significant associations between them and inter/intra modular cartographic features. Besides reporting novel biological insights that could be gained from the discovered associations, our contribution warns against possible technical concerns that might affect the tools used to mine for interaction patterns in network biology studies. In particular our results suggested that sub-optimal partitions from the strict point of view of their modularity levels might still be worth being analyzed when meso-scale features were to be explored in connection with external source of biological knowledge.
Smith, Jennifer L; Sturrock, Hugh J W; Olives, Casey; Solomon, Anthony W; Brooker, Simon J
2013-01-01
Implementation of trachoma control strategies requires reliable district-level estimates of trachomatous inflammation-follicular (TF), generally collected using the recommended gold-standard cluster randomized surveys (CRS). Integrated Threshold Mapping (ITM) has been proposed as an integrated and cost-effective means of rapidly surveying trachoma in order to classify districts according to treatment thresholds. ITM differs from CRS in a number of important ways, including the use of a school-based sampling platform for children aged 1-9 and a different age distribution of participants. This study uses computerised sampling simulations to compare the performance of these survey designs and evaluate the impact of varying key parameters. Realistic pseudo gold standard data for 100 districts were generated that maintained the relative risk of disease between important sub-groups and incorporated empirical estimates of disease clustering at the household, village and district level. To simulate the different sampling approaches, 20 clusters were selected from each district, with individuals sampled according to the protocol for ITM and CRS. Results showed that ITM generally under-estimated the true prevalence of TF over a range of epidemiological settings and introduced more district misclassification according to treatment thresholds than did CRS. However, the extent of underestimation and resulting misclassification was found to be dependent on three main factors: (i) the district prevalence of TF; (ii) the relative risk of TF between enrolled and non-enrolled children within clusters; and (iii) the enrollment rate in schools. Although in some contexts the two methodologies may be equivalent, ITM can introduce a bias-dependent shift as prevalence of TF increases, resulting in a greater risk of misclassification around treatment thresholds. In addition to strengthening the evidence base around choice of trachoma survey methodologies, this study illustrates the use of a simulated approach in addressing operational research questions for trachoma but also other NTDs.
Cluster-based control of a separating flow over a smoothly contoured ramp
NASA Astrophysics Data System (ADS)
Kaiser, Eurika; Noack, Bernd R.; Spohn, Andreas; Cattafesta, Louis N.; Morzyński, Marek
2017-12-01
The ability to manipulate and control fluid flows is of great importance in many scientific and engineering applications. The proposed closed-loop control framework addresses a key issue of model-based control: The actuation effect often results from slow dynamics of strongly nonlinear interactions which the flow reveals at timescales much longer than the prediction horizon of any model. Hence, we employ a probabilistic approach based on a cluster-based discretization of the Liouville equation for the evolution of the probability distribution. The proposed methodology frames high-dimensional, nonlinear dynamics into low-dimensional, probabilistic, linear dynamics which considerably simplifies the optimal control problem while preserving nonlinear actuation mechanisms. The data-driven approach builds upon a state space discretization using a clustering algorithm which groups kinematically similar flow states into a low number of clusters. The temporal evolution of the probability distribution on this set of clusters is then described by a control-dependent Markov model. This Markov model can be used as predictor for the ergodic probability distribution for a particular control law. This probability distribution approximates the long-term behavior of the original system on which basis the optimal control law is determined. We examine how the approach can be used to improve the open-loop actuation in a separating flow dominated by Kelvin-Helmholtz shedding. For this purpose, the feature space, in which the model is learned, and the admissible control inputs are tailored to strongly oscillatory flows.
Mallants, Dirk; Batelaan, Okke; Gedeon, Matej; Huysmans, Marijke; Dassargues, Alain
2017-01-01
Cone penetration testing (CPT) is one of the most efficient and versatile methods currently available for geotechnical, lithostratigraphic and hydrogeological site characterization. Currently available methods for soil behaviour type classification (SBT) of CPT data however have severe limitations, often restricting their application to a local scale. For parameterization of regional groundwater flow or geotechnical models, and delineation of regional hydro- or lithostratigraphy, regional SBT classification would be very useful. This paper investigates the use of model-based clustering for SBT classification, and the influence of different clustering approaches on the properties and spatial distribution of the obtained soil classes. We additionally propose a methodology for automated lithostratigraphic mapping of regionally occurring sedimentary units using SBT classification. The methodology is applied to a large CPT dataset, covering a groundwater basin of ~60 km2 with predominantly unconsolidated sandy sediments in northern Belgium. Results show that the model-based approach is superior in detecting the true lithological classes when compared to more frequently applied unsupervised classification approaches or literature classification diagrams. We demonstrate that automated mapping of lithostratigraphic units using advanced SBT classification techniques can provide a large gain in efficiency, compared to more time-consuming manual approaches and yields at least equally accurate results. PMID:28467468
Rogiers, Bart; Mallants, Dirk; Batelaan, Okke; Gedeon, Matej; Huysmans, Marijke; Dassargues, Alain
2017-01-01
Cone penetration testing (CPT) is one of the most efficient and versatile methods currently available for geotechnical, lithostratigraphic and hydrogeological site characterization. Currently available methods for soil behaviour type classification (SBT) of CPT data however have severe limitations, often restricting their application to a local scale. For parameterization of regional groundwater flow or geotechnical models, and delineation of regional hydro- or lithostratigraphy, regional SBT classification would be very useful. This paper investigates the use of model-based clustering for SBT classification, and the influence of different clustering approaches on the properties and spatial distribution of the obtained soil classes. We additionally propose a methodology for automated lithostratigraphic mapping of regionally occurring sedimentary units using SBT classification. The methodology is applied to a large CPT dataset, covering a groundwater basin of ~60 km2 with predominantly unconsolidated sandy sediments in northern Belgium. Results show that the model-based approach is superior in detecting the true lithological classes when compared to more frequently applied unsupervised classification approaches or literature classification diagrams. We demonstrate that automated mapping of lithostratigraphic units using advanced SBT classification techniques can provide a large gain in efficiency, compared to more time-consuming manual approaches and yields at least equally accurate results.
Vieira, Verónica; Webster, Thomas; Weinberg, Janice; Aschengrau, Ann; Ozonoff, David
2005-01-01
Background The availability of geographic information from cancer and birth defect registries has increased public demands for investigation of perceived disease clusters. Many neighborhood-level cluster investigations are methodologically problematic, while maps made from registry data often ignore latency and many known risk factors. Population-based case-control and cohort studies provide a stronger foundation for spatial epidemiology because potential confounders and disease latency can be addressed. Methods We investigated the association between residence and colorectal, lung, and breast cancer on upper Cape Cod, Massachusetts (USA) using extensive data on covariates and residential history from two case-control studies for 1983–1993. We generated maps using generalized additive models, smoothing on longitude and latitude while adjusting for covariates. The resulting continuous surface estimates disease rates relative to the whole study area. We used permutation tests to examine the overall importance of location in the model and identify areas of increased and decreased risk. Results Maps of colorectal cancer were relatively flat. Assuming 15 years of latency, lung cancer was significantly elevated just northeast of the Massachusetts Military Reservation, although the result did not hold when we restricted to residences of longest duration. Earlier non-spatial epidemiology had found a weak association between lung cancer and proximity to gun and mortar positions on the reservation. Breast cancer hot spots tended to increase in magnitude as we increased latency and adjusted for covariates, indicating that confounders were partly hiding these areas. Significant breast cancer hot spots were located near known groundwater plumes and the Massachusetts Military Reservation. Discussion Spatial epidemiology of population-based case-control studies addresses many methodological criticisms of cluster studies and generates new exposure hypotheses. Our results provide evidence for spatial clustering of breast cancer on upper Cape Cod. The analysis suggests further investigation of the potential association between breast cancer and pollution plumes based on detailed exposure modeling. PMID:15955253
Vieira, Verónica; Webster, Thomas; Weinberg, Janice; Aschengrau, Ann; Ozonoff, David
2005-06-14
The availability of geographic information from cancer and birth defect registries has increased public demands for investigation of perceived disease clusters. Many neighborhood-level cluster investigations are methodologically problematic, while maps made from registry data often ignore latency and many known risk factors. Population-based case-control and cohort studies provide a stronger foundation for spatial epidemiology because potential confounders and disease latency can be addressed. We investigated the association between residence and colorectal, lung, and breast cancer on upper Cape Cod, Massachusetts (USA) using extensive data on covariates and residential history from two case-control studies for 1983-1993. We generated maps using generalized additive models, smoothing on longitude and latitude while adjusting for covariates. The resulting continuous surface estimates disease rates relative to the whole study area. We used permutation tests to examine the overall importance of location in the model and identify areas of increased and decreased risk. Maps of colorectal cancer were relatively flat. Assuming 15 years of latency, lung cancer was significantly elevated just northeast of the Massachusetts Military Reservation, although the result did not hold when we restricted to residences of longest duration. Earlier non-spatial epidemiology had found a weak association between lung cancer and proximity to gun and mortar positions on the reservation. Breast cancer hot spots tended to increase in magnitude as we increased latency and adjusted for covariates, indicating that confounders were partly hiding these areas. Significant breast cancer hot spots were located near known groundwater plumes and the Massachusetts Military Reservation. Spatial epidemiology of population-based case-control studies addresses many methodological criticisms of cluster studies and generates new exposure hypotheses. Our results provide evidence for spatial clustering of breast cancer on upper Cape Cod. The analysis suggests further investigation of the potential association between breast cancer and pollution plumes based on detailed exposure modeling.
Engels, Michael F M; Gibbs, Alan C; Jaeger, Edward P; Verbinnen, Danny; Lobanov, Victor S; Agrafiotis, Dimitris K
2006-01-01
We report on the structural comparison of the corporate collections of Johnson & Johnson Pharmaceutical Research & Development (JNJPRD) and 3-Dimensional Pharmaceuticals (3DP), performed in the context of the recent acquisition of 3DP by JNJPRD. The main objective of the study was to assess the druglikeness of the 3DP library and the extent to which it enriched the chemical diversity of the JNJPRD corporate collection. The two databases, at the time of acquisition, collectively contained more than 1.1 million compounds with a clearly defined structural description. The analysis was based on a clustering approach and aimed at providing an intuitive quantitative estimate and visual representation of this enrichment. A novel hierarchical clustering algorithm called divisive k-means was employed in combination with Kelley's cluster-level selection method to partition the combined data set into clusters, and the diversity contribution of each library was evaluated as a function of the relative occupancy of these clusters. Typical 3DP chemotypes enriching the diversity of the JNJPRD collection were catalogued and visualized using a modified maximum common substructure algorithm. The joint collection of JNJPRD and 3DP compounds was also compared to other databases of known medicinally active or druglike compounds. The potential of the methodology for the analysis of very large chemical databases is discussed.
Naegle, Kristen M; Welsch, Roy E; Yaffe, Michael B; White, Forest M; Lauffenburger, Douglas A
2011-07-01
Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundreds to thousands of proteins across multiple conditions. However, little is known about the function of many of these protein modifications, or the enzymes responsible for modifying them. To address this challenge, we have developed an approach that enhances the power of clustering techniques to infer functional and regulatory meaning of protein states in cell signaling networks. We have created a new computational framework for applying clustering to biological data in order to overcome the typical dependence on specific a priori assumptions and expert knowledge concerning the technical aspects of clustering. Multiple clustering analysis methodology ('MCAM') employs an array of diverse data transformations, distance metrics, set sizes, and clustering algorithms, in a combinatorial fashion, to create a suite of clustering sets. These sets are then evaluated based on their ability to produce biological insights through statistical enrichment of metadata relating to knowledge concerning protein functions, kinase substrates, and sequence motifs. We applied MCAM to a set of dynamic phosphorylation measurements of the ERRB network to explore the relationships between algorithmic parameters and the biological meaning that could be inferred and report on interesting biological predictions. Further, we applied MCAM to multiple phosphoproteomic datasets for the ERBB network, which allowed us to compare independent and incomplete overlapping measurements of phosphorylation sites in the network. We report specific and global differences of the ERBB network stimulated with different ligands and with changes in HER2 expression. Overall, we offer MCAM as a broadly-applicable approach for analysis of proteomic data which may help increase the current understanding of molecular networks in a variety of biological problems. © 2011 Naegle et al.
Compulsive buying disorder clustering based on sex, age, onset and personality traits.
Granero, Roser; Fernández-Aranda, Fernando; Baño, Marta; Steward, Trevor; Mestre-Bach, Gemma; Del Pino-Gutiérrez, Amparo; Moragas, Laura; Mallorquí-Bagué, Núria; Aymamí, Neus; Goméz-Peña, Mónica; Tárrega, Salomé; Menchón, José M; Jiménez-Murcia, Susana
2016-07-01
In spite of the revived interest in compulsive buying disorder (CBD), its classification into the contemporary nosologic systems continues to be debated, and scarce studies have addressed heterogeneity in the clinical phenotype through methodologies based on a person-centered approach. To identify empirical clusters of CBD employing personality traits, as well as patients' sex, age and the age of CBD onset as indicators. An agglomerative hierarchical clustering method defining a combination of the Schwarz Bayesian Information Criterion and log-likelihood was used. Three clusters were identified in a sample of n=110 patients attending a specialized CBD unit a) "male compulsive buyers" reported the highest prevalence of comorbid gambling disorder and the lowest levels of reward dependence; b) "female low-dysfunctional" mainly included employed women, with the highest level of education, the oldest age of onset, the lowest scores in harm avoidance and the highest levels of persistence, self-directedness and cooperativeness; and c) "female highly-dysfunctional" with the youngest age of onset, the highest levels of comorbid psychopathology and harm avoidance, and the lowest score in self-directedness. Sociodemographic characteristics and personality traits can be used to determine CBD clusters which represent different clinical subtypes. These subtypes should be considered when developing assessment instruments, preventive programs and treatment interventions. Copyright © 2016 Elsevier Inc. All rights reserved.
Handley, Margaret A; Schillinger, Dean; Shiboski, Stephen
2011-01-01
Although randomized controlled trials are often a gold standard for determining intervention effects, in the area of practice-based research (PBR), there are many situations in which individual randomization is not possible. Alternative approaches to evaluating interventions have received increased attention, particularly those that can retain elements of randomization such that they can be considered "controlled" trials. Methodological design elements and practical implementation considerations for two quasi-experimental design approaches that have considerable promise in PBR settings--the stepped-wedge design, and a variant of this design, a wait-list cross-over design, are presented along with a case study from a recent PBR intervention for patients with diabetes. PBR-relevant design features include: creation of a cohort over time that collects control data but allows all participants (clusters or patients) to receive the intervention; staggered introduction of clusters; multiple data collection points; and one-way cross-over into the intervention arm. Practical considerations include: randomization versus stratification, training run in phases; and extended time period for overall study completion. Several design features of practice based research studies can be adapted to local circumstances yet retain elements to improve methodological rigor. Studies that utilize these methods, such as the stepped-wedge design and the wait-list cross-over design, can increase the evidence base for controlled studies conducted within the complex environment of PBR.
Using Public Data for Comparative Proteome Analysis in Precision Medicine Programs.
Hughes, Christopher S; Morin, Gregg B
2018-03-01
Maximizing the clinical utility of information obtained in longitudinal precision medicine programs would benefit from robust comparative analyses to known information to assess biological features of patient material toward identifying the underlying features driving their disease phenotype. Herein, the potential for utilizing publically deposited mass-spectrometry-based proteomics data to perform inter-study comparisons of cell-line or tumor-tissue materials is investigated. To investigate the robustness of comparison between MS-based proteomics studies carried out with different methodologies, deposited data representative of label-free (MS1) and isobaric tagging (MS2 and MS3 quantification) are utilized. In-depth quantitative proteomics data acquired from analysis of ovarian cancer cell lines revealed the robust recapitulation of observable gene expression dynamics between individual studies carried out using significantly different methodologies. The observed signatures enable robust inter-study clustering of cell line samples. In addition, the ability to classify and cluster tumor samples based on observed gene expression trends when using a single patient sample is established. With this analysis, relevant gene expression dynamics are obtained from a single patient tumor, in the context of a precision medicine analysis, by leveraging a large cohort of repository data as a comparator. Together, these data establish the potential for state-of-the-art MS-based proteomics data to serve as resources for robust comparative analyses in precision medicine applications. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Modulated Modularity Clustering as an Exploratory Tool for Functional Genomic Inference
Stone, Eric A.; Ayroles, Julien F.
2009-01-01
In recent years, the advent of high-throughput assays, coupled with their diminishing cost, has facilitated a systems approach to biology. As a consequence, massive amounts of data are currently being generated, requiring efficient methodology aimed at the reduction of scale. Whole-genome transcriptional profiling is a standard component of systems-level analyses, and to reduce scale and improve inference clustering genes is common. Since clustering is often the first step toward generating hypotheses, cluster quality is critical. Conversely, because the validation of cluster-driven hypotheses is indirect, it is critical that quality clusters not be obtained by subjective means. In this paper, we present a new objective-based clustering method and demonstrate that it yields high-quality results. Our method, modulated modularity clustering (MMC), seeks community structure in graphical data. MMC modulates the connection strengths of edges in a weighted graph to maximize an objective function (called modularity) that quantifies community structure. The result of this maximization is a clustering through which tightly-connected groups of vertices emerge. Our application is to systems genetics, and we quantitatively compare MMC both to the hierarchical clustering method most commonly employed and to three popular spectral clustering approaches. We further validate MMC through analyses of human and Drosophila melanogaster expression data, demonstrating that the clusters we obtain are biologically meaningful. We show MMC to be effective and suitable to applications of large scale. In light of these features, we advocate MMC as a standard tool for exploration and hypothesis generation. PMID:19424432
Kristunas, Caroline A; Smith, Karen L; Gray, Laura J
2017-03-07
The current methodology for sample size calculations for stepped-wedge cluster randomised trials (SW-CRTs) is based on the assumption of equal cluster sizes. However, as is often the case in cluster randomised trials (CRTs), the clusters in SW-CRTs are likely to vary in size, which in other designs of CRT leads to a reduction in power. The effect of an imbalance in cluster size on the power of SW-CRTs has not previously been reported, nor what an appropriate adjustment to the sample size calculation should be to allow for any imbalance. We aimed to assess the impact of an imbalance in cluster size on the power of a cross-sectional SW-CRT and recommend a method for calculating the sample size of a SW-CRT when there is an imbalance in cluster size. The effect of varying degrees of imbalance in cluster size on the power of SW-CRTs was investigated using simulations. The sample size was calculated using both the standard method and two proposed adjusted design effects (DEs), based on those suggested for CRTs with unequal cluster sizes. The data were analysed using generalised estimating equations with an exchangeable correlation matrix and robust standard errors. An imbalance in cluster size was not found to have a notable effect on the power of SW-CRTs. The two proposed adjusted DEs resulted in trials that were generally considerably over-powered. We recommend that the standard method of sample size calculation for SW-CRTs be used, provided that the assumptions of the method hold. However, it would be beneficial to investigate, through simulation, what effect the maximum likely amount of inequality in cluster sizes would be on the power of the trial and whether any inflation of the sample size would be required.
Massive open star clusters using the VVV survey. II. Discovery of six clusters with Wolf-Rayet stars
NASA Astrophysics Data System (ADS)
Chené, A.-N.; Borissova, J.; Bonatto, C.; Majaess, D. J.; Baume, G.; Clarke, J. R. A.; Kurtev, R.; Schnurr, O.; Bouret, J.-C.; Catelan, M.; Emerson, J. P.; Feinstein, C.; Geisler, D.; de Grijs, R.; Hervé, A.; Ivanov, V. D.; Kumar, M. S. N.; Lucas, P.; Mahy, L.; Martins, F.; Mauro, F.; Minniti, D.; Moni Bidin, C.
2013-01-01
Context. The ESO Public Survey "VISTA Variables in the Vía Láctea" (VVV) provides deep multi-epoch infrared observations for an unprecedented 562 sq. degrees of the Galactic bulge, and adjacent regions of the disk. Nearly 150 new open clusters and cluster candidates have been discovered in this survey. Aims: This is the second in a series of papers about young, massive open clusters observed using the VVV survey. We present the first study of six recently discovered clusters. These clusters contain at least one newly discovered Wolf-Rayet (WR) star. Methods: Following the methodology presented in the first paper of the series, wide-field, deep JHKs VVV observations, combined with new infrared spectroscopy, are employed to constrain fundamental parameters for a subset of clusters. Results: We find that the six studied stellar groups are real young (2-7 Myr) and massive (between 0.8 and 2.2 × 103 M⊙) clusters. They are highly obscured (AV ~ 5-24 mag) and compact (1-2 pc). In addition to WR stars, two of the six clusters also contain at least one red supergiant star, and one of these two clusters also contains a blue supergiant. We claim the discovery of 8 new WR stars, and 3 stars showing WR-like emission lines which could be classified WR or OIf. Preliminary analysis provides initial masses of ~30-50 M⊙ for the WR stars. Finally, we discuss the spiral structure of the Galaxy using the six new clusters as tracers, together with the previously studied VVV clusters. Based on observations with ISAAC, VLT, ESO (programme 087.D-0341A), New Technology Telescope at ESO's La Silla Observatory (programme 087.D-0490A) and with the Clay telescope at the Las Campanas Observatory (programme CN2011A-086). Also based on data from the VVV survey (programme 172.B-2002).
Algorithmic localisation of noise sources in the tip region of a low-speed axial flow fan
NASA Astrophysics Data System (ADS)
Tóth, Bence; Vad, János
2017-04-01
An objective and algorithmised methodology is proposed to analyse beamform data obtained for axial fans. Its application is demonstrated in a case study regarding the tip region of a low-speed cooling fan. First, beamforming is carried out in a co-rotating frame of reference. Then, a distribution of source strength is extracted along the circumference of the rotor at the blade tip radius in each analysed third-octave band. The circumferential distributions are expanded into Fourier series, which allows for filtering out the effects of perturbations, on the basis of an objective criterion. The remaining Fourier components are then considered as base sources to determine the blade-passage-periodic flow mechanisms responsible for the broadband noise. Based on their frequency and angular location, the base sources are grouped together. This is done using the fuzzy c-means clustering method to allow the overlap of the source mechanisms. The number of clusters is determined in a validity analysis. Finally, the obtained clusters are assigned to source mechanisms based on the literature. Thus, turbulent boundary layer - trailing edge interaction noise, tip leakage flow noise, and double leakage flow noise are identified.
On the Distribution of Orbital Poles of Milky Way Satellites
NASA Astrophysics Data System (ADS)
Palma, Christopher; Majewski, Steven R.; Johnston, Kathryn V.
2002-01-01
In numerous studies of the outer Galactic halo some evidence for accretion has been found. If the outer halo did form in part or wholly through merger events, we might expect to find coherent streams of stars and globular clusters following orbits similar to those of their parent objects, which are assumed to be present or former Milky Way dwarf satellite galaxies. We present a study of this phenomenon by assessing the likelihood of potential descendant ``dynamical families'' in the outer halo. We conduct two analyses: one that involves a statistical analysis of the spatial distribution of all known Galactic dwarf satellite galaxies (DSGs) and globular clusters, and a second, more specific analysis of those globular clusters and DSGs for which full phase space dynamical data exist. In both cases our methodology is appropriate only to members of descendant dynamical families that retain nearly aligned orbital poles today. Since the Sagittarius dwarf (Sgr) is considered a paradigm for the type of merger/tidal interaction event for which we are searching, we also undertake a case study of the Sgr system and identify several globular clusters that may be members of its extended dynamical family. In our first analysis, the distribution of possible orbital poles for the entire sample of outer (Rgc>8 kpc) halo globular clusters is tested for statistically significant associations among globular clusters and DSGs. Our methodology for identifying possible associations is similar to that used by Lynden-Bell & Lynden-Bell, but we put the associations on a more statistical foundation. Moreover, we study the degree of possible dynamical clustering among various interesting ensembles of globular clusters and satellite galaxies. Among the ensembles studied, we find the globular cluster subpopulation with the highest statistical likelihood of association with one or more of the Galactic DSGs to be the distant, outer halo (Rgc>25 kpc), second-parameter globular clusters. The results of our orbital pole analysis are supported by the great circle cell count methodology of Johnston, Hernquist, & Bolte. The space motions of the clusters Pal 4, NGC 6229, NGC 7006, and Pyxis are predicted to be among those most likely to show the clusters to be following stream orbits, since these clusters are responsible for the majority of the statistical significance of the association between outer halo, second-parameter globular clusters and the Milky Way DSGs. In our second analysis, we study the orbits of the 41 globular clusters and six Milky Way-bound DSGs having measured proper motions to look for objects with both coplanar orbits and similar angular momenta. Unfortunately, the majority of globular clusters with measured proper motions are inner halo clusters that are less likely to retain memory of their original orbit. Although four potential globular cluster/DSG associations are found, we believe three of these associations involving inner halo clusters to be coincidental. While the present sample of objects with complete dynamical data is small and does not include many of the globular clusters that are more likely to have been captured by the Milky Way, the methodology we adopt will become increasingly powerful as more proper motions are measured for distant Galactic satellites and globular clusters, and especially as results from the Space Interferometry Mission (SIM) become available.
Accident patterns for construction-related workers: a cluster analysis
NASA Astrophysics Data System (ADS)
Liao, Chia-Wen; Tyan, Yaw-Yauan
2012-01-01
The construction industry has been identified as one of the most hazardous industries. The risk of constructionrelated workers is far greater than that in a manufacturing based industry. However, some steps can be taken to reduce worker risk through effective injury prevention strategies. In this article, k-means clustering methodology is employed in specifying the factors related to different worker types and in identifying the patterns of industrial occupational accidents. Accident reports during the period 1998 to 2008 are extracted from case reports of the Northern Region Inspection Office of the Council of Labor Affairs of Taiwan. The results show that the cluster analysis can indicate some patterns of occupational injuries in the construction industry. Inspection plans should be proposed according to the type of construction-related workers. The findings provide a direction for more effective inspection strategies and injury prevention programs.
Accident patterns for construction-related workers: a cluster analysis
NASA Astrophysics Data System (ADS)
Liao, Chia-Wen; Tyan, Yaw-Yauan
2011-12-01
The construction industry has been identified as one of the most hazardous industries. The risk of constructionrelated workers is far greater than that in a manufacturing based industry. However, some steps can be taken to reduce worker risk through effective injury prevention strategies. In this article, k-means clustering methodology is employed in specifying the factors related to different worker types and in identifying the patterns of industrial occupational accidents. Accident reports during the period 1998 to 2008 are extracted from case reports of the Northern Region Inspection Office of the Council of Labor Affairs of Taiwan. The results show that the cluster analysis can indicate some patterns of occupational injuries in the construction industry. Inspection plans should be proposed according to the type of construction-related workers. The findings provide a direction for more effective inspection strategies and injury prevention programs.
Processing ARM VAP data on an AWS cluster
NASA Astrophysics Data System (ADS)
Martin, T.; Macduff, M.; Shippert, T.
2017-12-01
The Atmospheric Radiation Measurement (ARM) Data Management Facility (DMF) manages over 18,000 processes and 1.3 TB of data each day. This includes many Value Added Products (VAPs) that make use of multiple instruments to produce the derived products that are scientifically relevant. A thermodynamic and cloud profile VAP is being developed to provide input to the ARM Large-eddy simulation (LES) ARM Symbiotic Simulation and Observation (LASSO) project (https://www.arm.gov/capabilities/vaps/lasso-122) . This algorithm is CPU intensive and the processing requirements exceeded the available DMF computing capacity. Amazon Web Service (AWS) along with CfnCluster was investigated to see how it would perform. This cluster environment is cost effective and scales dynamically based on demand. We were able to take advantage of autoscaling which allowed the cluster to grow and shrink based on the size of the processing queue. We also were able to take advantage of the Amazon Web Services spot market to further reduce the cost. Our test was very successful and found that cloud resources can be used to efficiently and effectively process time series data. This poster will present the resources and methodology used to successfully run the algorithm.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kowalski, Karol; Valiev, Marat
2009-12-21
The recently introduced energy expansion based on the use of generating functional (GF) [K. Kowalski, P.D. Fan, J. Chem. Phys. 130, 084112 (2009)] provides a way of constructing size-consistent non-iterative coupled-cluster (CC) corrections in terms of moments of the CC equations. To take advantage of this expansion in a strongly interacting regime, the regularization of the cluster amplitudes is required in order to counteract the effect of excessive growth of the norm of the CC wavefunction. Although proven to be effcient, the previously discussed form of the regularization does not lead to rigorously size-consistent corrections. In this paper we addressmore » the issue of size-consistent regularization of the GF expansion by redefning the equations for the cluster amplitudes. The performance and basic features of proposed methodology is illustrated on several gas-phase benchmark systems. Moreover, the regularized GF approaches are combined with QM/MM module and applied to describe the SN2 reaction of CHCl3 and OH- in aqueous solution.« less
Using Cluster Analysis for Data Mining in Educational Technology Research
ERIC Educational Resources Information Center
Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.
2012-01-01
Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…
DYNER: A DYNamic ClustER for Education and Research
ERIC Educational Resources Information Center
Kehagias, Dimitris; Grivas, Michael; Mamalis, Basilis; Pantziou, Grammati
2006-01-01
Purpose: The purpose of this paper is to evaluate the use of a non-expensive dynamic computing resource, consisting of a Beowulf class cluster and a NoW, as an educational and research infrastructure. Design/methodology/approach: Clusters, built using commodity-off-the-shelf (COTS) hardware components and free, or commonly used, software, provide…
The cosmological analysis of X-ray cluster surveys - I. A new method for interpreting number counts
NASA Astrophysics Data System (ADS)
Clerc, N.; Pierre, M.; Pacaud, F.; Sadibekova, T.
2012-07-01
We present a new method aimed at simplifying the cosmological analysis of X-ray cluster surveys. It is based on purely instrumental observable quantities considered in a two-dimensional X-ray colour-magnitude diagram (hardness ratio versus count rate). The basic principle is that even in rather shallow surveys, substantial information on cluster redshift and temperature is present in the raw X-ray data and can be statistically extracted; in parallel, such diagrams can be readily predicted from an ab initio cosmological modelling. We illustrate the methodology for the case of a 100-deg2XMM survey having a sensitivity of ˜10-14 erg s-1 cm-2 and fit at the same time, the survey selection function, the cluster evolutionary scaling relations and the cosmology; our sole assumption - driven by the limited size of the sample considered in the case study - is that the local cluster scaling relations are known. We devote special attention to the realistic modelling of the count-rate measurement uncertainties and evaluate the potential of the method via a Fisher analysis. In the absence of individual cluster redshifts, the count rate and hardness ratio (CR-HR) method appears to be much more efficient than the traditional approach based on cluster counts (i.e. dn/dz, requiring redshifts). In the case where redshifts are available, our method performs similar to the traditional mass function (dn/dM/dz) for the purely cosmological parameters, but constrains better parameters defining the cluster scaling relations and their evolution. A further practical advantage of the CR-HR method is its simplicity: this fully top-down approach totally bypasses the tedious steps consisting in deriving cluster masses from X-ray temperature measurements.
Hyde, J M; Cerezo, A; Williams, T J
2009-04-01
Statistical analysis of atom probe data has improved dramatically in the last decade and it is now possible to determine the size, the number density and the composition of individual clusters or precipitates such as those formed in reactor pressure vessel (RPV) steels during irradiation. However, the characterisation of the onset of clustering or co-segregation is more difficult and has traditionally focused on the use of composition frequency distributions (for detecting clustering) and contingency tables (for detecting co-segregation). In this work, the authors investigate the possibility of directly examining the neighbourhood of each individual solute atom as a means of identifying the onset of solute clustering and/or co-segregation. The methodology involves comparing the mean observed composition around a particular type of solute with that expected from the overall composition of the material. The methodology has been applied to atom probe data obtained from several irradiated RPV steels. The results show that the new approach is more sensitive to fine scale clustering and co-segregation than that achievable using composition frequency distribution and contingency table analyses.
Keita, Akilah Dulin; Whittaker, Shannon; Wynter, Jamila; Kidanu, Tamnnet Woldemichael; Chhay, Channavy; Cardel, Michelle; Gans, Kim M
2016-01-01
This study identifies Southeast Asian refugee parents' and grandparents' perceptions of the risk and protective factors for childhood obesity. We used a mixed methods approach (concept mapping) for data collection and analyses. Fifty-nine participants engaged in modified nominal group meetings where they generated statements about children's weight status and structuring meetings where they sorted statements into piles based on similarity and rated statements on relative importance. Concept Systems® software generated clusters of ideas, cluster ratings, and pattern matches. Eleven clusters emerged. Participants rated "Healthy Food Changes Made within the School" and "Parent-related Physical Activity Factors" as most important, whereas "Neighborhood Built Features" was rated as the least important. Cambodian and Hmong participants agreed the most on cluster ratings of relative importance (r = 0.62). The study findings may be used to inform the development of culturally appropriate obesity prevention interventions for Southeast Asian refugee communities.
Cluster Analysis of Weighted Bipartite Networks: A New Copula-Based Approach
Chessa, Alessandro; Crimaldi, Irene; Riccaboni, Massimo; Trapin, Luca
2014-01-01
In this work we are interested in identifying clusters of “positional equivalent” actors, i.e. actors who play a similar role in a system. In particular, we analyze weighted bipartite networks that describes the relationships between actors on one side and features or traits on the other, together with the intensity level to which actors show their features. We develop a methodological approach that takes into account the underlying multivariate dependence among groups of actors. The idea is that positions in a network could be defined on the basis of the similar intensity levels that the actors exhibit in expressing some features, instead of just considering relationships that actors hold with each others. Moreover, we propose a new clustering procedure that exploits the potentiality of copula functions, a mathematical instrument for the modelization of the stochastic dependence structure. Our clustering algorithm can be applied both to binary and real-valued matrices. We validate it with simulations and applications to real-world data. PMID:25303095
Hydrodynamic fractionation of finite size gold nanoparticle clusters.
Tsai, De-Hao; Cho, Tae Joon; DelRio, Frank W; Taurozzi, Julian; Zachariah, Michael R; Hackley, Vincent A
2011-06-15
We demonstrate a high-resolution in situ experimental method for performing simultaneous size classification and characterization of functional gold nanoparticle clusters (GNCs) based on asymmetric-flow field flow fractionation (AFFF). Field emission scanning electron microscopy, atomic force microscopy, multi-angle light scattering (MALS), and in situ ultraviolet-visible optical spectroscopy provide complementary data and imagery confirming the cluster state (e.g., dimer, trimer, tetramer), packing structure, and purity of fractionated populations. An orthogonal analysis of GNC size distributions is obtained using electrospray-differential mobility analysis (ES-DMA). We find a linear correlation between the normalized MALS intensity (measured during AFFF elution) and the corresponding number concentration (measured by ES-DMA), establishing the capacity for AFFF to quantify the absolute number concentration of GNCs. The results and corresponding methodology summarized here provide the proof of concept for general applications involving the formation, isolation, and in situ analysis of both functional and adventitious nanoparticle clusters of finite size. © 2011 American Chemical Society
Acioli, Paulo H.; Jellinek, Julius
2017-07-14
A theoretical/computational description and analysis of the spectra of electron binding energies of Al 12 -, Al 13 - and Al 12Ni- clusters, which differ in size and/or composition by a single atom yet possess strikingly different measured photoelectron spectra, is presented. It is shown that the measured spectra can not only be reproduced computationally with quantitative fidelity – this is achieved through a combination of state-of-the-art density functional theory with a highly accurate scheme for conversion of the Kohn-Sham eigenenergies into electron binding energies – but also explained in terms of the effects of size, structure/symmetry and composition. Furthermore,more » a new methodology is developed and applied that provides for disentanglement and differential assignment of the separate roles played by size, structure/symmetry and composition in defining the observed differences in the measured spectra. The methodology is general and applicable to any finite system, homogeneous or heterogeneous. Finally, we project that in combination with advances in synthesis techniques this methodology will become an indispensable computation-based aid in the design of controlled synthesis protocols for manufacture of nanosystems and nanodevices with precisely desired electronic and other characteristics.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Acioli, Paulo H.; Jellinek, Julius
A theoretical/computational description and analysis of the spectra of electron binding energies of Al 12 -, Al 13 - and Al 12Ni- clusters, which differ in size and/or composition by a single atom yet possess strikingly different measured photoelectron spectra, is presented. It is shown that the measured spectra can not only be reproduced computationally with quantitative fidelity – this is achieved through a combination of state-of-the-art density functional theory with a highly accurate scheme for conversion of the Kohn-Sham eigenenergies into electron binding energies – but also explained in terms of the effects of size, structure/symmetry and composition. Furthermore,more » a new methodology is developed and applied that provides for disentanglement and differential assignment of the separate roles played by size, structure/symmetry and composition in defining the observed differences in the measured spectra. The methodology is general and applicable to any finite system, homogeneous or heterogeneous. Finally, we project that in combination with advances in synthesis techniques this methodology will become an indispensable computation-based aid in the design of controlled synthesis protocols for manufacture of nanosystems and nanodevices with precisely desired electronic and other characteristics.« less
High- and low-level hierarchical classification algorithm based on source separation process
NASA Astrophysics Data System (ADS)
Loghmari, Mohamed Anis; Karray, Emna; Naceur, Mohamed Saber
2016-10-01
High-dimensional data applications have earned great attention in recent years. We focus on remote sensing data analysis on high-dimensional space like hyperspectral data. From a methodological viewpoint, remote sensing data analysis is not a trivial task. Its complexity is caused by many factors, such as large spectral or spatial variability as well as the curse of dimensionality. The latter describes the problem of data sparseness. In this particular ill-posed problem, a reliable classification approach requires appropriate modeling of the classification process. The proposed approach is based on a hierarchical clustering algorithm in order to deal with remote sensing data in high-dimensional space. Indeed, one obvious method to perform dimensionality reduction is to use the independent component analysis process as a preprocessing step. The first particularity of our method is the special structure of its cluster tree. Most of the hierarchical algorithms associate leaves to individual clusters, and start from a large number of individual classes equal to the number of pixels; however, in our approach, leaves are associated with the most relevant sources which are represented according to mutually independent axes to specifically represent some land covers associated with a limited number of clusters. These sources contribute to the refinement of the clustering by providing complementary rather than redundant information. The second particularity of our approach is that at each level of the cluster tree, we combine both a high-level divisive clustering and a low-level agglomerative clustering. This approach reduces the computational cost since the high-level divisive clustering is controlled by a simple Boolean operator, and optimizes the clustering results since the low-level agglomerative clustering is guided by the most relevant independent sources. Then at each new step we obtain a new finer partition that will participate in the clustering process to enhance semantic capabilities and give good identification rates.
Alternative power supply systems for remote industrial customers
NASA Astrophysics Data System (ADS)
Kharlamova, N. V.; Khalyasmaa, A. I.; Eroshenko, S. A.
2017-06-01
The paper addresses the problem of alternative power supply of remote industrial clusters with renewable electric energy generation. As a result of different technologies comparison, consideration is given to wind energy application. The authors present a methodology of mean expected wind generation output calculation, based on Weibull distribution, which provides an effective express-tool for preliminary assessment of required installed generation capacity. The case study is based on real data including database of meteorological information, relief characteristics, power system topology etc. Wind generation feasibility estimation for a specific territory is followed by power flow calculations using Monte Carlo methodology. Finally, the paper provides a set of recommendations to ensure safe and reliable power supply for the final customers and, subsequently, to provide sustainable development of the regions, located far from megalopolises and industrial centres.
Bai, Mei; Dixon, Jane; Williams, Anna-Leila; Jeon, Sangchoon; Lazenby, Mark; McCorkle, Ruth
2016-11-01
Research shows that spiritual well-being correlates positively with quality of life (QOL) for people with cancer, whereas contradictory findings are frequently reported with respect to the differentiated associations between dimensions of spiritual well-being, namely peace, meaning and faith, and QOL. This study aimed to examine individual patterns of spiritual well-being among patients newly diagnosed with advanced cancer. Cluster analysis was based on the twelve items of the 12-item Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale at Time 1. A combination of hierarchical and k-means (non-hierarchical) clustering methods was employed to jointly determine the number of clusters. Self-rated health, depressive symptoms, peace, meaning and faith, and overall QOL were compared at Time 1 and Time 2. Hierarchical and k-means clustering methods both suggested four clusters. Comparison of the four clusters supported statistically significant and clinically meaningful differences in QOL outcomes among clusters while revealing contrasting relations of faith with QOL. Cluster 1, Cluster 3, and Cluster 4 represented high, medium, and low levels of overall QOL, respectively, with correspondingly high, medium, and low levels of peace, meaning, and faith. Cluster 2 was distinguished from other clusters by its medium levels of overall QOL, peace, and meaning and low level of faith. This study provides empirical support for individual difference in response to a newly diagnosed cancer and brings into focus conceptual and methodological challenges associated with the measure of spiritual well-being, which may partly contribute to the attenuated relation between faith and QOL.
A hybrid approach to select features and classify diseases based on medical data
NASA Astrophysics Data System (ADS)
AbdelLatif, Hisham; Luo, Jiawei
2018-03-01
Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms
Chill, Be Cool Man: African American Men, Identity, Coping, and Aggressive Ideation
Thomas, Alvin; Hammond, Wizdom Powell; Kohn-Wood, Laura P.
2016-01-01
Aggression is an important correlate of violence, depression, coping, and suicide among emerging young African American males. Yet most researchers treat aggression deterministically, fail to address cultural factors, or consider the potential for individual characteristics to exert an intersectional influence on this psychosocial outcome. Addressing this gap, we consider the moderating effect of coping on the relationship between masculine and racial identity and aggressive ideation among African American males (N = 128) drawn from 2 large Midwestern universities. Using the phenomenological variant of ecological systems theory and person-centered methodology as a guide, hierarchical cluster analysis grouped participants into profile groups based on their responses to both a measure of racial identity and a measure of masculine identity. Results from the cluster analysis revealed 3 distinct identity clusters: Identity Ambivalent, Identity Appraising, and Identity Consolidated. Although these cluster groups did not differ with regard to coping, significant differences were observed between cluster groups in relation to aggressive ideation. Further, a full model with identity profile clusters, coping, and aggressive ideation indicates that cluster membership significantly moderates the relationship between coping and aggressive ideation. The implications of these data for intersecting identities of African American men, and the association of identity and outcomes related to risk for mental health and violence, are discussed. PMID:25090145
Chill, be cool man: African American men, identity, coping, and aggressive ideation.
Thomas, Alvin; Hammond, Wizdom Powell; Kohn-Wood, Laura P
2015-07-01
Aggression is an important correlate of violence, depression, coping, and suicide among emerging young African American males. Yet most researchers treat aggression deterministically, fail to address cultural factors, or consider the potential for individual characteristics to exert an intersectional influence on this psychosocial outcome. Addressing this gap, we consider the moderating effect of coping on the relationship between masculine and racial identity and aggressive ideation among African American males (N = 128) drawn from 2 large Midwestern universities. Using the phenomenological variant of ecological systems theory and person-centered methodology as a guide, hierarchical cluster analysis grouped participants into profile groups based on their responses to both a measure of racial identity and a measure of masculine identity. Results from the cluster analysis revealed 3 distinct identity clusters: Identity Ambivalent, Identity Appraising, and Identity Consolidated. Although these cluster groups did not differ with regard to coping, significant differences were observed between cluster groups in relation to aggressive ideation. Further, a full model with identity profile clusters, coping, and aggressive ideation indicates that cluster membership significantly moderates the relationship between coping and aggressive ideation. The implications of these data for intersecting identities of African American men, and the association of identity and outcomes related to risk for mental health and violence, are discussed. (c) 2015 APA, all rights reserved).
Potential of SNP markers for the characterization of Brazilian cassava germplasm.
de Oliveira, Eder Jorge; Ferreira, Cláudia Fortes; da Silva Santos, Vanderlei; de Jesus, Onildo Nunes; Oliveira, Gilmara Alvarenga Fachardo; da Silva, Maiane Suzarte
2014-06-01
High-throughput markers, such as SNPs, along with different methodologies were used to evaluate the applicability of the Bayesian approach and the multivariate analysis in structuring the genetic diversity in cassavas. The objective of the present work was to evaluate the diversity and genetic structure of the largest cassava germplasm bank in Brazil. Complementary methodological approaches such as discriminant analysis of principal components (DAPC), Bayesian analysis and molecular analysis of variance (AMOVA) were used to understand the structure and diversity of 1,280 accessions genotyped using 402 single nucleotide polymorphism markers. The genetic diversity (0.327) and the average observed heterozygosity (0.322) were high considering the bi-allelic markers. In terms of population, the presence of a complex genetic structure was observed indicating the formation of 30 clusters by DAPC and 34 clusters by Bayesian analysis. Both methodologies presented difficulties and controversies in terms of the allocation of some accessions to specific clusters. However, the clusters suggested by the DAPC analysis seemed to be more consistent for presenting higher probability of allocation of the accessions within the clusters. Prior information related to breeding patterns and geographic origins of the accessions were not sufficient for providing clear differentiation between the clusters according to the AMOVA analysis. In contrast, the F ST was maximized when considering the clusters suggested by the Bayesian and DAPC analyses. The high frequency of germplasm exchange between producers and the subsequent alteration of the name of the same material may be one of the causes of the low association between genetic diversity and geographic origin. The results of this study may benefit cassava germplasm conservation programs, and contribute to the maximization of genetic gains in breeding programs.
NASA Astrophysics Data System (ADS)
Sehgal, V.; Lakhanpal, A.; Maheswaran, R.; Khosa, R.; Sridhar, Venkataramana
2018-01-01
This study proposes a wavelet-based multi-resolution modeling approach for statistical downscaling of GCM variables to mean monthly precipitation for five locations at Krishna Basin, India. Climatic dataset from NCEP is used for training the proposed models (Jan.'69 to Dec.'94) and are applied to corresponding CanCM4 GCM variables to simulate precipitation for the validation (Jan.'95-Dec.'05) and forecast (Jan.'06-Dec.'35) periods. The observed precipitation data is obtained from the India Meteorological Department (IMD) gridded precipitation product at 0.25 degree spatial resolution. This paper proposes a novel Multi-Scale Wavelet Entropy (MWE) based approach for clustering climatic variables into suitable clusters using k-means methodology. Principal Component Analysis (PCA) is used to obtain the representative Principal Components (PC) explaining 90-95% variance for each cluster. A multi-resolution non-linear approach combining Discrete Wavelet Transform (DWT) and Second Order Volterra (SoV) is used to model the representative PCs to obtain the downscaled precipitation for each downscaling location (W-P-SoV model). The results establish that wavelet-based multi-resolution SoV models perform significantly better compared to the traditional Multiple Linear Regression (MLR) and Artificial Neural Networks (ANN) based frameworks. It is observed that the proposed MWE-based clustering and subsequent PCA, helps reduce the dimensionality of the input climatic variables, while capturing more variability compared to stand-alone k-means (no MWE). The proposed models perform better in estimating the number of precipitation events during the non-monsoon periods whereas the models with clustering without MWE over-estimate the rainfall during the dry season.
Molecular Taxonomy of Phytopathogenic Fungi: A Case Study in Peronospora
Göker, Markus; García-Blázquez, Gema; Voglmayr, Hermann; Tellería, M. Teresa; Martín, María P.
2009-01-01
Background Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms. Methodology Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews. Conclusions A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence. PMID:19641601
Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata.
Hu, Wei; Zaveri, Amrapali; Qiu, Honglei; Dumontier, Michel
2017-09-18
The ability to efficiently search and filter datasets depends on access to high quality metadata. While most biomedical repositories require data submitters to provide a minimal set of metadata, some such as the Gene Expression Omnibus (GEO) allows users to specify additional metadata in the form of textual key-value pairs (e.g. sex: female). However, since there is no structured vocabulary to guide the submitter regarding the metadata terms to use, consequently, the 44,000,000+ key-value pairs in GEO suffer from numerous quality issues including redundancy, heterogeneity, inconsistency, and incompleteness. Such issues hinder the ability of scientists to hone in on datasets that meet their requirements and point to a need for accurate, structured and complete description of the data. In this study, we propose a clustering-based approach to address data quality issues in biomedical, specifically gene expression, metadata. First, we present three different kinds of similarity measures to compare metadata keys. Second, we design a scalable agglomerative clustering algorithm to cluster similar keys together. Our agglomerative cluster algorithm identified metadata keys that were similar, based on (i) name, (ii) core concept and (iii) value similarities, to each other and grouped them together. We evaluated our method using a manually created gold standard in which 359 keys were grouped into 27 clusters based on six types of characteristics: (i) age, (ii) cell line, (iii) disease, (iv) strain, (v) tissue and (vi) treatment. As a result, the algorithm generated 18 clusters containing 355 keys (four clusters with only one key were excluded). In the 18 clusters, there were keys that were identified correctly to be related to that cluster, but there were 13 keys which were not related to that cluster. We compared our approach with four other published methods. Our approach significantly outperformed them for most metadata keys and achieved the best average F-Score (0.63). Our algorithm identified keys that were similar to each other and grouped them together. Our intuition that underpins cleaning by clustering is that, dividing keys into different clusters resolves the scalability issues for data observation and cleaning, and keys in the same cluster with duplicates and errors can easily be found. Our algorithm can also be applied to other biomedical data types.
NASA Astrophysics Data System (ADS)
Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann
2017-07-01
Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.
CNN based approach for activity recognition using a wrist-worn accelerometer.
Panwar, Madhuri; Dyuthi, S Ram; Chandra Prakash, K; Biswas, Dwaipayan; Acharyya, Amit; Maharatna, Koushik; Gautam, Arvind; Naik, Ganesh R
2017-07-01
In recent years, significant advancements have taken place in human activity recognition using various machine learning approaches. However, feature engineering have dominated conventional methods involving the difficult process of optimal feature selection. This problem has been mitigated by using a novel methodology based on deep learning framework which automatically extracts the useful features and reduces the computational cost. As a proof of concept, we have attempted to design a generalized model for recognition of three fundamental movements of the human forearm performed in daily life where data is collected from four different subjects using a single wrist worn accelerometer sensor. The validation of the proposed model is done with different pre-processing and noisy data condition which is evaluated using three possible methods. The results show that our proposed methodology achieves an average recognition rate of 99.8% as opposed to conventional methods based on K-means clustering, linear discriminant analysis and support vector machine.
Zhekova, Hristina R; Seth, Michael; Ziegler, Tom
2011-11-14
We have recently developed a methodology for the calculation of exchange coupling constants J in weakly interacting polynuclear metal clusters. The method is based on unrestricted and restricted second order spin-flip constricted variational density functional theory (SF-CV(2)-DFT) and is here applied to eight binuclear copper systems. Comparison of the SF-CV(2)-DFT results with experiment and with results obtained from other DFT and wave function based methods has been made. Restricted SF-CV(2)-DFT with the BH&HLYP functional yields consistently J values in excellent agreement with experiment. The results acquired from this scheme are comparable in quality to those obtained by accurate multi-reference wave function methodologies such as difference dedicated configuration interaction and the complete active space with second-order perturbation theory. © 2011 American Institute of Physics
Unsupervised pattern recognition methods in ciders profiling based on GCE voltammetric signals.
Jakubowska, Małgorzata; Sordoń, Wanda; Ciepiela, Filip
2016-07-15
This work presents a complete methodology of distinguishing between different brands of cider and ageing degrees, based on voltammetric signals, utilizing dedicated data preprocessing procedures and unsupervised multivariate analysis. It was demonstrated that voltammograms recorded on glassy carbon electrode in Britton-Robinson buffer at pH 2 are reproducible for each brand. By application of clustering algorithms and principal component analysis visible homogenous clusters were obtained. Advanced signal processing strategy which included automatic baseline correction, interval scaling and continuous wavelet transform with dedicated mother wavelet, was a key step in the correct recognition of the objects. The results show that voltammetry combined with optimized univariate and multivariate data processing is a sufficient tool to distinguish between ciders from various brands and to evaluate their freshness. Copyright © 2016 Elsevier Ltd. All rights reserved.
Prediction of Solvent Physical Properties using the Hierarchical Clustering Method
Recently a QSAR (Quantitative Structure Activity Relationship) method, the hierarchical clustering method, was developed to estimate acute toxicity values for large, diverse datasets. This methodology has now been applied to the estimate solvent physical properties including sur...
ERIC Educational Resources Information Center
Melson, Elniee; Bridle, Christopher; Markham, Wolfgang
2017-01-01
Purpose: The purpose of this paper is to report the process evaluation of a pilot randomised control trial of an anti-smoking intervention for Malaysian 13-14-year olds, conducted in 2011/2012. It was hypothesised that trained peer supporters would promote non-smoking among classmates through informal conversations. Design/methodology/approach:…
NASA Astrophysics Data System (ADS)
Bandaru, Sunith; Deb, Kalyanmoy
2011-09-01
In this article, a methodology is proposed for automatically extracting innovative design principles which make a system or process (subject to conflicting objectives) optimal using its Pareto-optimal dataset. Such 'higher knowledge' would not only help designers to execute the system better, but also enable them to predict how changes in one variable would affect other variables if the system has to retain its optimal behaviour. This in turn would help solve other similar systems with different parameter settings easily without the need to perform a fresh optimization task. The proposed methodology uses a clustering-based optimization technique and is capable of discovering hidden functional relationships between the variables, objective and constraint functions and any other function that the designer wishes to include as a 'basis function'. A number of engineering design problems are considered for which the mathematical structure of these explicit relationships exists and has been revealed by a previous study. A comparison with the multivariate adaptive regression splines (MARS) approach reveals the practicality of the proposed approach due to its ability to find meaningful design principles. The success of this procedure for automated innovization is highly encouraging and indicates its suitability for further development in tackling more complex design scenarios.
Schultz, Elise V; Schultz, Christopher J; Carey, Lawrence D; Cecil, Daniel J; Bateman, Monte
2016-01-01
This study develops a fully automated lightning jump system encompassing objective storm tracking, Geostationary Lightning Mapper proxy data, and the lightning jump algorithm (LJA), which are important elements in the transition of the LJA concept from a research to an operational based algorithm. Storm cluster tracking is based on a product created from the combination of a radar parameter (vertically integrated liquid, VIL), and lightning information (flash rate density). Evaluations showed that the spatial scale of tracked features or storm clusters had a large impact on the lightning jump system performance, where increasing spatial scale size resulted in decreased dynamic range of the system's performance. This framework will also serve as a means to refine the LJA itself to enhance its operational applicability. Parameters within the system are isolated and the system's performance is evaluated with adjustments to parameter sensitivity. The system's performance is evaluated using the probability of detection (POD) and false alarm ratio (FAR) statistics. Of the algorithm parameters tested, sigma-level (metric of lightning jump strength) and flash rate threshold influenced the system's performance the most. Finally, verification methodologies are investigated. It is discovered that minor changes in verification methodology can dramatically impact the evaluation of the lightning jump system.
NASA Technical Reports Server (NTRS)
Schultz, Elise; Schultz, Christopher Joseph; Carey, Lawrence D.; Cecil, Daniel J.; Bateman, Monte
2016-01-01
This study develops a fully automated lightning jump system encompassing objective storm tracking, Geostationary Lightning Mapper proxy data, and the lightning jump algorithm (LJA), which are important elements in the transition of the LJA concept from a research to an operational based algorithm. Storm cluster tracking is based on a product created from the combination of a radar parameter (vertically integrated liquid, VIL), and lightning information (flash rate density). Evaluations showed that the spatial scale of tracked features or storm clusters had a large impact on the lightning jump system performance, where increasing spatial scale size resulted in decreased dynamic range of the system's performance. This framework will also serve as a means to refine the LJA itself to enhance its operational applicability. Parameters within the system are isolated and the system's performance is evaluated with adjustments to parameter sensitivity. The system's performance is evaluated using the probability of detection (POD) and false alarm ratio (FAR) statistics. Of the algorithm parameters tested, sigma-level (metric of lightning jump strength) and flash rate threshold influenced the system's performance the most. Finally, verification methodologies are investigated. It is discovered that minor changes in verification methodology can dramatically impact the evaluation of the lightning jump system.
SCHULTZ, ELISE V.; SCHULTZ, CHRISTOPHER J.; CAREY, LAWRENCE D.; CECIL, DANIEL J.; BATEMAN, MONTE
2017-01-01
This study develops a fully automated lightning jump system encompassing objective storm tracking, Geostationary Lightning Mapper proxy data, and the lightning jump algorithm (LJA), which are important elements in the transition of the LJA concept from a research to an operational based algorithm. Storm cluster tracking is based on a product created from the combination of a radar parameter (vertically integrated liquid, VIL), and lightning information (flash rate density). Evaluations showed that the spatial scale of tracked features or storm clusters had a large impact on the lightning jump system performance, where increasing spatial scale size resulted in decreased dynamic range of the system’s performance. This framework will also serve as a means to refine the LJA itself to enhance its operational applicability. Parameters within the system are isolated and the system’s performance is evaluated with adjustments to parameter sensitivity. The system’s performance is evaluated using the probability of detection (POD) and false alarm ratio (FAR) statistics. Of the algorithm parameters tested, sigma-level (metric of lightning jump strength) and flash rate threshold influenced the system’s performance the most. Finally, verification methodologies are investigated. It is discovered that minor changes in verification methodology can dramatically impact the evaluation of the lightning jump system. PMID:29303164
Text Mining of Journal Articles for Sleep Disorder Terminologies.
Lam, Calvin; Lai, Fu-Chih; Wang, Chia-Hui; Lai, Mei-Hsin; Hsu, Nanly; Chung, Min-Huey
2016-01-01
Research on publication trends in journal articles on sleep disorders (SDs) and the associated methodologies by using text mining has been limited. The present study involved text mining for terms to determine the publication trends in sleep-related journal articles published during 2000-2013 and to identify associations between SD and methodology terms as well as conducting statistical analyses of the text mining findings. SD and methodology terms were extracted from 3,720 sleep-related journal articles in the PubMed database by using MetaMap. The extracted data set was analyzed using hierarchical cluster analyses and adjusted logistic regression models to investigate publication trends and associations between SD and methodology terms. MetaMap had a text mining precision, recall, and false positive rate of 0.70, 0.77, and 11.51%, respectively. The most common SD term was breathing-related sleep disorder, whereas narcolepsy was the least common. Cluster analyses showed similar methodology clusters for each SD term, except narcolepsy. The logistic regression models showed an increasing prevalence of insomnia, parasomnia, and other sleep disorders but a decreasing prevalence of breathing-related sleep disorder during 2000-2013. Different SD terms were positively associated with different methodology terms regarding research design terms, measure terms, and analysis terms. Insomnia-, parasomnia-, and other sleep disorder-related articles showed an increasing publication trend, whereas those related to breathing-related sleep disorder showed a decreasing trend. Furthermore, experimental studies more commonly focused on hypersomnia and other SDs and less commonly on insomnia, breathing-related sleep disorder, narcolepsy, and parasomnia. Thus, text mining may facilitate the exploration of the publication trends in SDs and the associated methodologies.
Influence of birth cohort on age of onset cluster analysis in bipolar I disorder.
Bauer, M; Glenn, T; Alda, M; Andreassen, O A; Angelopoulos, E; Ardau, R; Baethge, C; Bauer, R; Bellivier, F; Belmaker, R H; Berk, M; Bjella, T D; Bossini, L; Bersudsky, Y; Cheung, E Y W; Conell, J; Del Zompo, M; Dodd, S; Etain, B; Fagiolini, A; Frye, M A; Fountoulakis, K N; Garneau-Fournier, J; Gonzalez-Pinto, A; Harima, H; Hassel, S; Henry, C; Iacovides, A; Isometsä, E T; Kapczinski, F; Kliwicki, S; König, B; Krogh, R; Kunz, M; Lafer, B; Larsen, E R; Lewitzka, U; Lopez-Jaramillo, C; MacQueen, G; Manchia, M; Marsh, W; Martinez-Cengotitabengoa, M; Melle, I; Monteith, S; Morken, G; Munoz, R; Nery, F G; O'Donovan, C; Osher, Y; Pfennig, A; Quiroz, D; Ramesar, R; Rasgon, N; Reif, A; Ritter, P; Rybakowski, J K; Sagduyu, K; Scippa, A M; Severus, E; Simhandl, C; Stein, D J; Strejilevich, S; Hatim Sulaiman, A; Suominen, K; Tagata, H; Tatebayashi, Y; Torrent, C; Vieta, E; Viswanath, B; Wanchoo, M J; Zetin, M; Whybrow, P C
2015-01-01
Two common approaches to identify subgroups of patients with bipolar disorder are clustering methodology (mixture analysis) based on the age of onset, and a birth cohort analysis. This study investigates if a birth cohort effect will influence the results of clustering on the age of onset, using a large, international database. The database includes 4037 patients with a diagnosis of bipolar I disorder, previously collected at 36 collection sites in 23 countries. Generalized estimating equations (GEE) were used to adjust the data for country median age, and in some models, birth cohort. Model-based clustering (mixture analysis) was then performed on the age of onset data using the residuals. Clinical variables in subgroups were compared. There was a strong birth cohort effect. Without adjusting for the birth cohort, three subgroups were found by clustering. After adjusting for the birth cohort or when considering only those born after 1959, two subgroups were found. With results of either two or three subgroups, the youngest subgroup was more likely to have a family history of mood disorders and a first episode with depressed polarity. However, without adjusting for birth cohort (three subgroups), family history and polarity of the first episode could not be distinguished between the middle and oldest subgroups. These results using international data confirm prior findings using single country data, that there are subgroups of bipolar I disorder based on the age of onset, and that there is a birth cohort effect. Including the birth cohort adjustment altered the number and characteristics of subgroups detected when clustering by age of onset. Further investigation is needed to determine if combining both approaches will identify subgroups that are more useful for research. Copyright © 2014 Elsevier Masson SAS. All rights reserved.
Bayesian network meta-analysis for cluster randomized trials with binary outcomes.
Uhlmann, Lorenz; Jensen, Katrin; Kieser, Meinhard
2017-06-01
Network meta-analysis is becoming a common approach to combine direct and indirect comparisons of several treatment arms. In recent research, there have been various developments and extensions of the standard methodology. Simultaneously, cluster randomized trials are experiencing an increased popularity, especially in the field of health services research, where, for example, medical practices are the units of randomization but the outcome is measured at the patient level. Combination of the results of cluster randomized trials is challenging. In this tutorial, we examine and compare different approaches for the incorporation of cluster randomized trials in a (network) meta-analysis. Furthermore, we provide practical insight on the implementation of the models. In simulation studies, it is shown that some of the examined approaches lead to unsatisfying results. However, there are alternatives which are suitable to combine cluster randomized trials in a network meta-analysis as they are unbiased and reach accurate coverage rates. In conclusion, the methodology can be extended in such a way that an adequate inclusion of the results obtained in cluster randomized trials becomes feasible. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
2013-01-01
Background A general trend towards positive patient-reported evaluations of hospitals could be taken as a sign that most patients form a homogeneous, reasonably pleased group, and consequently that there is little need for quality improvement. The objective of this study was to explore this assumption by identifying and statistically validating clusters of patients based on their evaluation of outcomes related to overall satisfaction, malpractice and benefit of treatment. Methods Data were collected using a national patient-experience survey of 61 hospitals in the 4 health regions in Norway during spring 2011. Postal questionnaires were mailed to 23,420 patients after their discharge from hospital. Cluster analysis was performed to identify response clusters of patients, based on their responses to single items about overall patient satisfaction, benefit of treatment and perception of malpractice. Results Cluster analysis identified six response groups, including one cluster with systematically poorer evaluation across outcomes (18.5% of patients) and one small outlier group (5.3%) with very poor scores across all outcomes. One-Way ANOVA with post-hoc tests showed that most differences between the six response groups on the three outcome items were significant. The response groups were significantly associated with nine patient-experience indicators (p < 0.001), and all groups were significantly different from each of the other groups on a majority of the patient-experience indicators. Clusters were significantly associated with age, education, self-perceived health, gender, and the degree to write open comments in the questionnaire. Conclusions The study identified five response clusters with distinct patient-reported outcome scores, in addition to a heterogeneous outlier group with very poor scores across all outcomes. The outlier group and the cluster with systematically poorer evaluation across outcomes comprised almost one-quarter of all patients, clearly demonstrating the need to tailor quality initiatives and improve patient-perceived quality in hospitals. More research on patient clustering in patient evaluation is needed, as well as standardization of methodology to increase comparability across studies. PMID:23433450
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Wei-Chen; Maitra, Ranjan
2011-01-01
We propose a model-based approach for clustering time series regression data in an unsupervised machine learning framework to identify groups under the assumption that each mixture component follows a Gaussian autoregressive regression model of order p. Given the number of groups, the traditional maximum likelihood approach of estimating the parameters using the expectation-maximization (EM) algorithm can be employed, although it is computationally demanding. The somewhat fast tune to the EM folk song provided by the Alternating Expectation Conditional Maximization (AECM) algorithm can alleviate the problem to some extent. In this article, we develop an alternative partial expectation conditional maximization algorithmmore » (APECM) that uses an additional data augmentation storage step to efficiently implement AECM for finite mixture models. Results on our simulation experiments show improved performance in both fewer numbers of iterations and computation time. The methodology is applied to the problem of clustering mutual funds data on the basis of their average annual per cent returns and in the presence of economic indicators.« less
A Clustering Methodology of Web Log Data for Learning Management Systems
ERIC Educational Resources Information Center
Valsamidis, Stavros; Kontogiannis, Sotirios; Kazanidis, Ioannis; Theodosiou, Theodosios; Karakos, Alexandros
2012-01-01
Learning Management Systems (LMS) collect large amounts of data. Data mining techniques can be applied to analyse their web data log files. The instructors may use this data for assessing and measuring their courses. In this respect, we have proposed a methodology for analysing LMS courses and students' activity. This methodology uses a Markov…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Das, Sonjoy; Goswami, Kundan; Datta, Biswa N.
2014-12-10
Failure of structural systems under dynamic loading can be prevented via active vibration control which shifts the damped natural frequencies of the systems away from the dominant range of loading spectrum. The damped natural frequencies and the dynamic load typically show significant variations in practice. A computationally efficient methodology based on quadratic partial eigenvalue assignment technique and optimization under uncertainty has been formulated in the present work that will rigorously account for these variations and result in an economic and resilient design of structures. A novel scheme based on hierarchical clustering and importance sampling is also developed in this workmore » for accurate and efficient estimation of probability of failure to guarantee the desired resilience level of the designed system. Numerical examples are presented to illustrate the proposed methodology.« less
Galway, Lp; Bell, Nathaniel; Sae, Al Shatari; Hagopian, Amy; Burnham, Gilbert; Flaxman, Abraham; Weiss, Wiliam M; Rajaratnam, Julie; Takaro, Tim K
2012-04-27
Mortality estimates can measure and monitor the impacts of conflict on a population, guide humanitarian efforts, and help to better understand the public health impacts of conflict. Vital statistics registration and surveillance systems are rarely functional in conflict settings, posing a challenge of estimating mortality using retrospective population-based surveys. We present a two-stage cluster sampling method for application in population-based mortality surveys. The sampling method utilizes gridded population data and a geographic information system (GIS) to select clusters in the first sampling stage and Google Earth TM imagery and sampling grids to select households in the second sampling stage. The sampling method is implemented in a household mortality study in Iraq in 2011. Factors affecting feasibility and methodological quality are described. Sampling is a challenge in retrospective population-based mortality studies and alternatives that improve on the conventional approaches are needed. The sampling strategy presented here was designed to generate a representative sample of the Iraqi population while reducing the potential for bias and considering the context specific challenges of the study setting. This sampling strategy, or variations on it, are adaptable and should be considered and tested in other conflict settings.
2012-01-01
Background Mortality estimates can measure and monitor the impacts of conflict on a population, guide humanitarian efforts, and help to better understand the public health impacts of conflict. Vital statistics registration and surveillance systems are rarely functional in conflict settings, posing a challenge of estimating mortality using retrospective population-based surveys. Results We present a two-stage cluster sampling method for application in population-based mortality surveys. The sampling method utilizes gridded population data and a geographic information system (GIS) to select clusters in the first sampling stage and Google Earth TM imagery and sampling grids to select households in the second sampling stage. The sampling method is implemented in a household mortality study in Iraq in 2011. Factors affecting feasibility and methodological quality are described. Conclusion Sampling is a challenge in retrospective population-based mortality studies and alternatives that improve on the conventional approaches are needed. The sampling strategy presented here was designed to generate a representative sample of the Iraqi population while reducing the potential for bias and considering the context specific challenges of the study setting. This sampling strategy, or variations on it, are adaptable and should be considered and tested in other conflict settings. PMID:22540266
Udrescu, Lucreţia; Sbârcea, Laura; Topîrceanu, Alexandru; Iovanovici, Alexandru; Kurunczi, Ludovic; Bogdan, Paul; Udrescu, Mihai
2016-09-07
Analyzing drug-drug interactions may unravel previously unknown drug action patterns, leading to the development of new drug discovery tools. We present a new approach to analyzing drug-drug interaction networks, based on clustering and topological community detection techniques that are specific to complex network science. Our methodology uncovers functional drug categories along with the intricate relationships between them. Using modularity-based and energy-model layout community detection algorithms, we link the network clusters to 9 relevant pharmacological properties. Out of the 1141 drugs from the DrugBank 4.1 database, our extensive literature survey and cross-checking with other databases such as Drugs.com, RxList, and DrugBank 4.3 confirm the predicted properties for 85% of the drugs. As such, we argue that network analysis offers a high-level grasp on a wide area of pharmacological aspects, indicating possible unaccounted interactions and missing pharmacological properties that can lead to drug repositioning for the 15% drugs which seem to be inconsistent with the predicted property. Also, by using network centralities, we can rank drugs according to their interaction potential for both simple and complex multi-pathology therapies. Moreover, our clustering approach can be extended for applications such as analyzing drug-target interactions or phenotyping patients in personalized medicine applications.
Udrescu, Lucreţia; Sbârcea, Laura; Topîrceanu, Alexandru; Iovanovici, Alexandru; Kurunczi, Ludovic; Bogdan, Paul; Udrescu, Mihai
2016-01-01
Analyzing drug-drug interactions may unravel previously unknown drug action patterns, leading to the development of new drug discovery tools. We present a new approach to analyzing drug-drug interaction networks, based on clustering and topological community detection techniques that are specific to complex network science. Our methodology uncovers functional drug categories along with the intricate relationships between them. Using modularity-based and energy-model layout community detection algorithms, we link the network clusters to 9 relevant pharmacological properties. Out of the 1141 drugs from the DrugBank 4.1 database, our extensive literature survey and cross-checking with other databases such as Drugs.com, RxList, and DrugBank 4.3 confirm the predicted properties for 85% of the drugs. As such, we argue that network analysis offers a high-level grasp on a wide area of pharmacological aspects, indicating possible unaccounted interactions and missing pharmacological properties that can lead to drug repositioning for the 15% drugs which seem to be inconsistent with the predicted property. Also, by using network centralities, we can rank drugs according to their interaction potential for both simple and complex multi-pathology therapies. Moreover, our clustering approach can be extended for applications such as analyzing drug-target interactions or phenotyping patients in personalized medicine applications. PMID:27599720
Koren, Omry; Knights, Dan; Gonzalez, Antonio; Waldron, Levi; Segata, Nicola; Knight, Rob; Huttenhower, Curtis; Ley, Ruth E
2013-01-01
Recent analyses of human-associated bacterial diversity have categorized individuals into 'enterotypes' or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes.
Waldron, Levi; Segata, Nicola; Knight, Rob; Huttenhower, Curtis; Ley, Ruth E.
2013-01-01
Recent analyses of human-associated bacterial diversity have categorized individuals into ‘enterotypes’ or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes. PMID:23326225
Baars, Erik W; van der Hart, Onno; Nijenhuis, Ellert R S; Chu, James A; Glas, Gerrit; Draijer, Nel
2011-01-01
The purpose of this study was to develop an expertise-based prognostic model for the treatment of complex posttraumatic stress disorder (PTSD) and dissociative identity disorder (DID). We developed a survey in 2 rounds: In the first round we surveyed 42 experienced therapists (22 DID and 20 complex PTSD therapists), and in the second round we surveyed a subset of 22 of the 42 therapists (13 DID and 9 complex PTSD therapists). First, we drew on therapists' knowledge of prognostic factors for stabilization-oriented treatment of complex PTSD and DID. Second, therapists prioritized a list of prognostic factors by estimating the size of each variable's prognostic effect; we clustered these factors according to content and named the clusters. Next, concept mapping methodology and statistical analyses (including principal components analyses) were used to transform individual judgments into weighted group judgments for clusters of items. A prognostic model, based on consensually determined estimates of effect sizes, of 8 clusters containing 51 factors for both complex PTSD and DID was formed. It includes the clusters lack of motivation, lack of healthy relationships, lack of healthy therapeutic relationships, lack of other internal and external resources, serious Axis I comorbidity, serious Axis II comorbidity, poor attachment, and self-destruction. In addition, a set of 5 DID-specific items was constructed. The model is supportive of the current phase-oriented treatment model, emphasizing the strengthening of the therapeutic relationship and the patient's resources in the initial stabilization phase. Further research is needed to test the model's statistical and clinical validity.
Estimation of Carcinogenicity using Hierarchical Clustering and Nearest Neighbor Methodologies
Previously a hierarchical clustering (HC) approach and a nearest neighbor (NN) approach were developed to model acute aquatic toxicity end points. These approaches were developed to correlate the toxicity for large, noncongeneric data sets. In this study these approaches applie...
A framework for graph-based synthesis, analysis, and visualization of HPC cluster job data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mayo, Jackson R.; Kegelmeyer, W. Philip, Jr.; Wong, Matthew H.
The monitoring and system analysis of high performance computing (HPC) clusters is of increasing importance to the HPC community. Analysis of HPC job data can be used to characterize system usage and diagnose and examine failure modes and their effects. This analysis is not straightforward, however, due to the complex relationships that exist between jobs. These relationships are based on a number of factors, including shared compute nodes between jobs, proximity of jobs in time, etc. Graph-based techniques represent an approach that is particularly well suited to this problem, and provide an effective technique for discovering important relationships in jobmore » queuing and execution data. The efficacy of these techniques is rooted in the use of a semantic graph as a knowledge representation tool. In a semantic graph job data, represented in a combination of numerical and textual forms, can be flexibly processed into edges, with corresponding weights, expressing relationships between jobs, nodes, users, and other relevant entities. This graph-based representation permits formal manipulation by a number of analysis algorithms. This report presents a methodology and software implementation that leverages semantic graph-based techniques for the system-level monitoring and analysis of HPC clusters based on job queuing and execution data. Ontology development and graph synthesis is discussed with respect to the domain of HPC job data. The framework developed automates the synthesis of graphs from a database of job information. It also provides a front end, enabling visualization of the synthesized graphs. Additionally, an analysis engine is incorporated that provides performance analysis, graph-based clustering, and failure prediction capabilities for HPC systems.« less
2014-01-01
Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where possible, discontinuation of clusters following heterogeneous merges, allowance for potential loss of clusters and additional variability in cluster size in the original sample size calculation, and use of appropriate ICC estimates that reflect cluster size. PMID:24884591
Tsipouras, Markos G; Giannakeas, Nikolaos; Tzallas, Alexandros T; Tsianou, Zoe E; Manousou, Pinelopi; Hall, Andrew; Tsoulos, Ioannis; Tsianos, Epameinondas
2017-03-01
Collagen proportional area (CPA) extraction in liver biopsy images provides the degree of fibrosis expansion in liver tissue, which is the most characteristic histological alteration in hepatitis C virus (HCV). Assessment of the fibrotic tissue is currently based on semiquantitative staging scores such as Ishak and Metavir. Since its introduction as a fibrotic tissue assessment technique, CPA calculation based on image analysis techniques has proven to be more accurate than semiquantitative scores. However, CPA has yet to reach everyday clinical practice, since the lack of standardized and robust methods for computerized image analysis for CPA assessment have proven to be a major limitation. The current work introduces a three-stage fully automated methodology for CPA extraction based on machine learning techniques. Specifically, clustering algorithms have been employed for background-tissue separation, as well as for fibrosis detection in liver tissue regions, in the first and the third stage of the methodology, respectively. Due to the existence of several types of tissue regions in the image (such as blood clots, muscle tissue, structural collagen, etc.), classification algorithms have been employed to identify liver tissue regions and exclude all other non-liver tissue regions from CPA computation. For the evaluation of the methodology, 79 liver biopsy images have been employed, obtaining 1.31% mean absolute CPA error, with 0.923 concordance correlation coefficient. The proposed methodology is designed to (i) avoid manual threshold-based and region selection processes, widely used in similar approaches presented in the literature, and (ii) minimize CPA calculation time. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Analysis of Tropical Cyclone Tracks in the North Indian Ocean
NASA Astrophysics Data System (ADS)
Patwardhan, A.; Paliwal, M.; Mohapatra, M.
2011-12-01
Cyclones are regarded as one of the most dangerous meteorological phenomena of the tropical region. The probability of landfall of a tropical cyclone depends on its movement (trajectory). Analysis of trajectories of tropical cyclones could be useful for identifying potentially predictable characteristics. There is long history of analysis of tropical cyclones tracks. A common approach is using different clustering techniques to group the cyclone tracks on the basis of certain characteristics. Various clustering method have been used to study the tropical cyclones in different ocean basins like western North Pacific ocean (Elsner and Liu, 2003; Camargo et al., 2007), North Atlantic Ocean (Elsner, 2003; Gaffney et al. 2007; Nakamura et al., 2009). In this study, tropical cyclone tracks in the North Indian Ocean basin, for the period 1961-2010 have been analyzed and grouped into clusters based on their spatial characteristics. A tropical cyclone trajectory is approximated as an open curve and described by its first two moments. The resulting clusters have different centroid locations and also differently shaped variance ellipses. These track characteristics are then used in the standard clustering algorithms which allow the whole track shape, length, and location to be incorporated into the clustering methodology. The resulting clusters have different genesis locations and trajectory shapes. We have also examined characteristics such as life span, maximum sustained wind speed, landfall, seasonality, many of which are significantly different across the identified clusters. The clustering approach groups cyclones with higher maximum wind speed and longest life span in to one cluster. Another cluster includes short duration cyclonic events that are mostly deep depressions and significant for rainfall over Eastern and Central India. The clustering approach is likely to prove useful for analysis of events of significance with regard to impacts.
Stevens, David Cole; Conway, Kyle R.; Pearce, Nelson; Villegas-Peñaranda, Luis Roberto; Garza, Anthony G.; Boddy, Christopher N.
2013-01-01
Background Heterologous expression of bacterial biosynthetic gene clusters is currently an indispensable tool for characterizing biosynthetic pathways. Development of an effective, general heterologous expression system that can be applied to bioprospecting from metagenomic DNA will enable the discovery of a wealth of new natural products. Methodology We have developed a new Escherichia coli-based heterologous expression system for polyketide biosynthetic gene clusters. We have demonstrated the over-expression of the alternative sigma factor σ54 directly and positively regulates heterologous expression of the oxytetracycline biosynthetic gene cluster in E. coli. Bioinformatics analysis indicates that σ54 promoters are present in nearly 70% of polyketide and non-ribosomal peptide biosynthetic pathways. Conclusions We have demonstrated a new mechanism for heterologous expression of the oxytetracycline polyketide biosynthetic pathway, where high-level pleiotropic sigma factors from the heterologous host directly and positively regulate transcription of the non-native biosynthetic gene cluster. Our bioinformatics analysis is consistent with the hypothesis that heterologous expression mediated by the alternative sigma factor σ54 may be a viable method for the production of additional polyketide products. PMID:23724102
Bouklata, Nada; Supply, Philip; Jaouhari, Sanae; Charof, Reda; Seghrouchni, Fouad; Sadki, Khalid; El Achhab, Youness; Nejjari, Chakib; Filali-Maltouf, Abdelkarim
2015-01-01
Background Standard 24-locus Mycobacterial Interspersed Repetitive Unit Variable Number Tandem Repeat (MIRU-VNTR) typing allows to get an improved resolution power for tracing TB transmission and predicting different strain (sub) lineages in a community. Methodology During 2010–2012, a total of 168 Mycobacterium tuberculosis Complex (MTBC) isolates were collected by cluster sampling from 10 different Moroccan cities, and centralized by the National Reference Laboratory of Tuberculosis over the study period. All isolates were genotyped using spoligotyping, and a subset of 75 was genotyped using 24-locus based MIRU-VNTR typing, followed by first line drug susceptibility testing. Corresponding strain lineages were predicted using MIRU-VNTRplus database. Principal Findings Spoligotyping resulted in 137 isolates in 18 clusters (2–50 isolates per cluster: clustering rate of 81.54%) corresponding to a SIT number in the SITVIT database, while 31(18.45%) patterns were unique of which 10 were labelled as “unknown” according to the same database. The most prevalent spoligotype family was LAM; (n = 81 or 48.24% of isolates, dominated by SIT42, n = 49), followed by Haarlem (23.80%), T superfamily (15.47%), >Beijing (2.97%), > U clade (2.38%) and S clade (1.19%). Subsequent 24-Locus MIRU-VNTR typing identified 64 unique types and 11 isolates in 5 clusters (2 to 3isolates per cluster), substantially reducing clusters defined by spoligotyping only. The single cluster of three isolates corresponded to two previously treated MDR-TB cases and one new MDR-TB case known to be contact a same index case and belonging to a same family, albeit residing in 3 different administrative regions. MIRU-VNTR loci 4052, 802, 2996, 2163b, 3690, 1955, 424, 2531, 2401 and 960 were highly discriminative in our setting (HGDI >0.6). Conclusions 24-locus MIRU-VNTR typing can substantially improve the resolution of large clusters initially defined by spoligotyping alone and predominating in Morocco, and could therefore be used to better study tuberculosis transmission in a population-based, multi-year sample context. PMID:26285026
Martin, James; Taljaard, Monica; Girling, Alan; Hemming, Karla
2016-01-01
Background Stepped-wedge cluster randomised trials (SW-CRT) are increasingly being used in health policy and services research, but unless they are conducted and reported to the highest methodological standards, they are unlikely to be useful to decision-makers. Sample size calculations for these designs require allowance for clustering, time effects and repeated measures. Methods We carried out a methodological review of SW-CRTs up to October 2014. We assessed adherence to reporting each of the 9 sample size calculation items recommended in the 2012 extension of the CONSORT statement to cluster trials. Results We identified 32 completed trials and 28 independent protocols published between 1987 and 2014. Of these, 45 (75%) reported a sample size calculation, with a median of 5.0 (IQR 2.5–6.0) of the 9 CONSORT items reported. Of those that reported a sample size calculation, the majority, 33 (73%), allowed for clustering, but just 15 (33%) allowed for time effects. There was a small increase in the proportions reporting a sample size calculation (from 64% before to 84% after publication of the CONSORT extension, p=0.07). The type of design (cohort or cross-sectional) was not reported clearly in the majority of studies, but cohort designs seemed to be most prevalent. Sample size calculations in cohort designs were particularly poor with only 3 out of 24 (13%) of these studies allowing for repeated measures. Discussion The quality of reporting of sample size items in stepped-wedge trials is suboptimal. There is an urgent need for dissemination of the appropriate guidelines for reporting and methodological development to match the proliferation of the use of this design in practice. Time effects and repeated measures should be considered in all SW-CRT power calculations, and there should be clarity in reporting trials as cohort or cross-sectional designs. PMID:26846897
A novel tree-based algorithm to discover seismic patterns in earthquake catalogs
NASA Astrophysics Data System (ADS)
Florido, E.; Asencio-Cortés, G.; Aznarte, J. L.; Rubio-Escudero, C.; Martínez-Álvarez, F.
2018-06-01
A novel methodology is introduced in this research study to detect seismic precursors. Based on an existing approach, the new methodology searches for patterns in the historical data. Such patterns may contain statistical or soil dynamics information. It improves the original version in several aspects. First, new seismicity indicators have been used to characterize earthquakes. Second, a machine learning clustering algorithm has been applied in a very flexible way, thus allowing the discovery of new data groupings. Third, a novel search strategy is proposed in order to obtain non-overlapped patterns. And, fourth, arbitrary lengths of patterns are searched for, thus discovering long and short-term behaviors that may influence in the occurrence of medium-large earthquakes. The methodology has been applied to seven different datasets, from three different regions, namely the Iberian Peninsula, Chile and Japan. Reported results show a remarkable improvement with respect to the former version, in terms of all evaluated quality measures. In particular, the number of false positives has decreased and the positive predictive values increased, both of them in a very remarkable manner.
Computer aided detection of clusters of microcalcifications on full field digital mammograms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ge Jun; Sahiner, Berkman; Hadjiiski, Lubomir M.
2006-08-15
We are developing a computer-aided detection (CAD) system to identify microcalcification clusters (MCCs) automatically on full field digital mammograms (FFDMs). The CAD system includes six stages: preprocessing; image enhancement; segmentation of microcalcification candidates; false positive (FP) reduction for individual microcalcifications; regional clustering; and FP reduction for clustered microcalcifications. At the stage of FP reduction for individual microcalcifications, a truncated sum-of-squares error function was used to improve the efficiency and robustness of the training of an artificial neural network in our CAD system for FFDMs. At the stage of FP reduction for clustered microcalcifications, morphological features and features derived from themore » artificial neural network outputs were extracted from each cluster. Stepwise linear discriminant analysis (LDA) was used to select the features. An LDA classifier was then used to differentiate clustered microcalcifications from FPs. A data set of 96 cases with 192 images was collected at the University of Michigan. This data set contained 96 MCCs, of which 28 clusters were proven by biopsy to be malignant and 68 were proven to be benign. The data set was separated into two independent data sets for training and testing of the CAD system in a cross-validation scheme. When one data set was used to train and validate the convolution neural network (CNN) in our CAD system, the other data set was used to evaluate the detection performance. With the use of a truncated error metric, the training of CNN could be accelerated and the classification performance was improved. The CNN in combination with an LDA classifier could substantially reduce FPs with a small tradeoff in sensitivity. By using the free-response receiver operating characteristic methodology, it was found that our CAD system can achieve a cluster-based sensitivity of 70, 80, and 90 % at 0.21, 0.61, and 1.49 FPs/image, respectively. For case-based performance evaluation, a sensitivity of 70, 80, and 90 % can be achieved at 0.07, 0.17, and 0.65 FPs/image, respectively. We also used a data set of 216 mammograms negative for clustered microcalcifications to further estimate the FP rate of our CAD system. The corresponding FP rates were 0.15, 0.31, and 0.86 FPs/image for cluster-based detection when negative mammograms were used for estimation of FP rates.« less
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M
2015-05-01
To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor
2015-01-01
Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
Superresolution Imaging of Aquaporin-4 Cluster Size in Antibody-Stained Paraffin Brain Sections
Smith, Alex J.; Verkman, Alan S.
2015-01-01
The water channel aquaporin-4 (AQP4) forms supramolecular clusters whose size is determined by the ratio of M1- and M23-AQP4 isoforms. In cultured astrocytes, differences in the subcellular localization and macromolecular interactions of small and large AQP4 clusters results in distinct physiological roles for M1- and M23-AQP4. Here, we developed quantitative superresolution optical imaging methodology to measure AQP4 cluster size in antibody-stained paraffin sections of mouse cerebral cortex and spinal cord, human postmortem brain, and glioma biopsy specimens. This methodology was used to demonstrate that large AQP4 clusters are formed in AQP4−/− astrocytes transfected with only M23-AQP4, but not in those expressing only M1-AQP4, both in vitro and in vivo. Native AQP4 in mouse cortex, where both isoforms are expressed, was enriched in astrocyte foot-processes adjacent to microcapillaries; clusters in perivascular regions of the cortex were larger than in parenchymal regions, demonstrating size-dependent subcellular segregation of AQP4 clusters. Two-color superresolution imaging demonstrated colocalization of Kir4.1 with AQP4 clusters in perivascular areas but not in parenchyma. Surprisingly, the subcellular distribution of AQP4 clusters was different between gray and white matter astrocytes in spinal cord, demonstrating regional specificity in cluster polarization. Changes in AQP4 subcellular distribution are associated with several neurological diseases and we demonstrate that AQP4 clustering was preserved in a postmortem human cortical brain tissue specimen, but that AQP4 was not substantially clustered in a human glioblastoma specimen despite high-level expression. Our results demonstrate the utility of superresolution optical imaging for measuring the size of AQP4 supramolecular clusters in paraffin sections of brain tissue and support AQP4 cluster size as a primary determinant of its subcellular distribution. PMID:26682810
Leech, Rebecca M; McNaughton, Sarah A; Timperio, Anna
2014-01-22
Diet, physical activity (PA) and sedentary behavior are important, yet modifiable, determinants of obesity. Recent research into the clustering of these behaviors suggests that children and adolescents have multiple obesogenic risk factors. This paper reviews studies using empirical, data-driven methodologies, such as cluster analysis (CA) and latent class analysis (LCA), to identify clustering patterns of diet, PA and sedentary behavior among children or adolescents and their associations with socio-demographic indicators, and overweight and obesity. A literature search of electronic databases was undertaken to identify studies which have used data-driven methodologies to investigate the clustering of diet, PA and sedentary behavior among children and adolescents aged 5-18 years old. Eighteen studies (62% of potential studies) were identified that met the inclusion criteria, of which eight examined the clustering of PA and sedentary behavior and eight examined diet, PA and sedentary behavior. Studies were mostly cross-sectional and conducted in older children and adolescents (≥ 9 years). Findings from the review suggest that obesogenic cluster patterns are complex with a mixed PA/sedentary behavior cluster observed most frequently, but healthy and unhealthy patterning of all three behaviors was also reported. Cluster membership was found to differ according to age, gender and socio-economic status (SES). The tendency for older children/adolescents, particularly females, to comprise clusters defined by low PA was the most robust finding. Findings to support an association between obesogenic cluster patterns and overweight and obesity were inconclusive, with longitudinal research in this area limited. Diet, PA and sedentary behavior cluster together in complex ways that are not well understood. Further research, particularly in younger children, is needed to understand how cluster membership differs according to socio-demographic profile. Longitudinal research is also essential to establish how different cluster patterns track over time and their influence on the development of overweight and obesity.
2014-01-01
Diet, physical activity (PA) and sedentary behavior are important, yet modifiable, determinants of obesity. Recent research into the clustering of these behaviors suggests that children and adolescents have multiple obesogenic risk factors. This paper reviews studies using empirical, data-driven methodologies, such as cluster analysis (CA) and latent class analysis (LCA), to identify clustering patterns of diet, PA and sedentary behavior among children or adolescents and their associations with socio-demographic indicators, and overweight and obesity. A literature search of electronic databases was undertaken to identify studies which have used data-driven methodologies to investigate the clustering of diet, PA and sedentary behavior among children and adolescents aged 5–18 years old. Eighteen studies (62% of potential studies) were identified that met the inclusion criteria, of which eight examined the clustering of PA and sedentary behavior and eight examined diet, PA and sedentary behavior. Studies were mostly cross-sectional and conducted in older children and adolescents (≥9 years). Findings from the review suggest that obesogenic cluster patterns are complex with a mixed PA/sedentary behavior cluster observed most frequently, but healthy and unhealthy patterning of all three behaviors was also reported. Cluster membership was found to differ according to age, gender and socio-economic status (SES). The tendency for older children/adolescents, particularly females, to comprise clusters defined by low PA was the most robust finding. Findings to support an association between obesogenic cluster patterns and overweight and obesity were inconclusive, with longitudinal research in this area limited. Diet, PA and sedentary behavior cluster together in complex ways that are not well understood. Further research, particularly in younger children, is needed to understand how cluster membership differs according to socio-demographic profile. Longitudinal research is also essential to establish how different cluster patterns track over time and their influence on the development of overweight and obesity. PMID:24450617
Self-organization and clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.
Dynamic clustering detection through multi-valued descriptors of dermoscopic images.
Cozza, Valentina; Guarracino, Maria Rosario; Maddalena, Lucia; Baroni, Adone
2011-09-10
This paper introduces a dynamic clustering methodology based on multi-valued descriptors of dermoscopic images. The main idea is to support medical diagnosis to decide if pigmented skin lesions belonging to an uncertain set are nearer to malignant melanoma or to benign nevi. Melanoma is the most deadly skin cancer, and early diagnosis is a current challenge for clinicians. Most data analysis algorithms for skin lesions discrimination focus on segmentation and extraction of features of categorical or numerical type. As an alternative approach, this paper introduces two new concepts: first, it considers multi-valued data that scalar variables not only describe but also intervals or histogram variables; second, it introduces a dynamic clustering method based on Wasserstein distance to compare multi-valued data. The overall strategy of analysis can be summarized into the following steps: first, a segmentation of dermoscopic images allows to identify a set of multi-valued descriptors; second, we performed a discriminant analysis on a set of images where there is an a priori classification so that it is possible to detect which features discriminate the benign and malignant lesions; and third, we performed the proposed dynamic clustering method on the uncertain cases, which need to be associated to one of the two previously mentioned groups. Results based on clinical data show that the grading of specific descriptors associated to dermoscopic characteristics provides a novel way to characterize uncertain lesions that can help the dermatologist's diagnosis. Copyright © 2011 John Wiley & Sons, Ltd.
Mogasale, Vittal; Mogasale, Vijayalaxmi V; Ramani, Enusa; Lee, Jung Seok; Park, Ju Yeon; Lee, Kang Sung; Wierzba, Thomas F
2016-01-29
The control of typhoid fever being an important public health concern in low and middle income countries, improving typhoid surveillance will help in planning and implementing typhoid control activities such as deployment of new generation Vi conjugate typhoid vaccines. We conducted a systematic literature review of longitudinal population-based blood culture-confirmed typhoid fever studies from low and middle income countries published from 1(st) January 1990 to 31(st) December 2013. We quantitatively summarized typhoid fever incidence rates and qualitatively reviewed study methodology that could have influenced rate estimates. We used meta-analysis approach based on random effects model in summarizing the hospitalization rates. Twenty-two papers presented longitudinal population-based and blood culture-confirmed typhoid fever incidence estimates from 20 distinct sites in low and middle income countries. The reported incidence and hospitalizations rates were heterogeneous as well as the study methodology across the sites. We elucidated how the incidence rates were underestimated in published studies. We summarized six categories of under-estimation biases observed in these studies and presented potential solutions. Published longitudinal typhoid fever studies in low and middle income countries are geographically clustered and the methodology employed has a potential for underestimation. Future studies should account for these limitations.
Megacity analysis: a clustering approach to classification
2017-06-01
kinetic or non -kinetic urban operations. We develop and implement a methodology to classify megacities into groups. Using 33 variables, we construct a...is interested in these megacity networks and their implications for potential urban operations. We develop a methodology to group like megacities...is interested in these megacity networks and their implications for potential urban operations. We develop a methodology to group like megacities
Sumner, Isaiah; Iyengar, Srinivasan S
2007-10-18
We have introduced a computational methodology to study vibrational spectroscopy in clusters inclusive of critical nuclear quantum effects. This approach is based on the recently developed quantum wavepacket ab initio molecular dynamics method that combines quantum wavepacket dynamics with ab initio molecular dynamics. The computational efficiency of the dynamical procedure is drastically improved (by several orders of magnitude) through the utilization of wavelet-based techniques combined with the previously introduced time-dependent deterministic sampling procedure measure to achieve stable, picosecond length, quantum-classical dynamics of electrons and nuclei in clusters. The dynamical information is employed to construct a novel cumulative flux/velocity correlation function, where the wavepacket flux from the quantized particle is combined with classical nuclear velocities to obtain the vibrational density of states. The approach is demonstrated by computing the vibrational density of states of [Cl-H-Cl]-, inclusive of critical quantum nuclear effects, and our results are in good agreement with experiment. A general hierarchical procedure is also provided, based on electronic structure harmonic frequencies, classical ab initio molecular dynamics, computation of nuclear quantum-mechanical eigenstates, and employing quantum wavepacket ab initio dynamics to understand vibrational spectroscopy in hydrogen-bonded clusters that display large degrees of anharmonicities.
Measuring Systemic and Climate Diversity in Ontario's University Sector
ERIC Educational Resources Information Center
Piché, Pierre Gilles
2015-01-01
This article proposes a methodology for measuring institutional diversity and applies it to Ontario's university sector. This study first used hierarchical cluster analysis, which suggested there has been very little change in diversity between 1994 and 2010 as universities were clustered in three groups for both years. However, by adapting…
Spectral Target Detection using Schroedinger Eigenmaps
NASA Astrophysics Data System (ADS)
Dorado-Munoz, Leidy P.
Applications of optical remote sensing processes include environmental monitoring, military monitoring, meteorology, mapping, surveillance, etc. Many of these tasks include the detection of specific objects or materials, usually few or small, which are surrounded by other materials that clutter the scene and hide the relevant information. This target detection process has been boosted lately by the use of hyperspectral imagery (HSI) since its high spectral dimension provides more detailed spectral information that is desirable in data exploitation. Typical spectral target detectors rely on statistical or geometric models to characterize the spectral variability of the data. However, in many cases these parametric models do not fit well HSI data that impacts the detection performance. On the other hand, non-linear transformation methods, mainly based on manifold learning algorithms, have shown a potential use in HSI transformation, dimensionality reduction and classification. In target detection, non-linear transformation algorithms are used as preprocessing techniques that transform the data to a more suitable lower dimensional space, where the statistical or geometric detectors are applied. One of these non-linear manifold methods is the Schroedinger Eigenmaps (SE) algorithm that has been introduced as a technique for semi-supervised classification. The core tool of the SE algorithm is the Schroedinger operator that includes a potential term that encodes prior information about the materials present in a scene, and enables the embedding to be steered in some convenient directions in order to cluster similar pixels together. A completely novel target detection methodology based on SE algorithm is proposed for the first time in this thesis. The proposed methodology does not just include the transformation of the data to a lower dimensional space but also includes the definition of a detector that capitalizes on the theory behind SE. The fact that target pixels and those similar pixels are clustered in a predictable region of the low-dimensional representation is used to define a decision rule that allows one to identify target pixels over the rest of pixels in a given image. In addition, a knowledge propagation scheme is used to combine spectral and spatial information as a means to propagate the "potential constraints" to nearby points. The propagation scheme is introduced to reinforce weak connections and improve the separability between most of the target pixels and the background. Experiments using different HSI data sets are carried out in order to test the proposed methodology. The assessment is performed from a quantitative and qualitative point of view, and by comparing the SE-based methodology against two other detection methodologies that use linear/non-linear algorithms as transformations and the well-known Adaptive Coherence/Cosine Estimator (ACE) detector. Overall results show that the SE-based detector outperforms the other two detection methodologies, which indicates the usefulness of the SE transformation in spectral target detection problems.
Interactive Inverse Groundwater Modeling - Addressing User Fatigue
NASA Astrophysics Data System (ADS)
Singh, A.; Minsker, B. S.
2006-12-01
This paper builds on ongoing research on developing an interactive and multi-objective framework to solve the groundwater inverse problem. In this work we solve the classic groundwater inverse problem of estimating a spatially continuous conductivity field, given field measurements of hydraulic heads. The proposed framework is based on an interactive multi-objective genetic algorithm (IMOGA) that not only considers quantitative measures such as calibration error and degree of regularization, but also takes into account expert knowledge about the structure of the underlying conductivity field expressed as subjective rankings of potential conductivity fields by the expert. The IMOGA converges to the optimal Pareto front representing the best trade- off among the qualitative as well as quantitative objectives. However, since the IMOGA is a population-based iterative search it requires the user to evaluate hundreds of solutions. This leads to the problem of 'user fatigue'. We propose a two step methodology to combat user fatigue in such interactive systems. The first step is choosing only a few highly representative solutions to be shown to the expert for ranking. Spatial clustering is used to group the search space based on the similarity of the conductivity fields. Sampling is then carried out from different clusters to improve the diversity of solutions shown to the user. Once the expert has ranked representative solutions from each cluster a machine learning model is used to 'learn user preference' and extrapolate these for the solutions not ranked by the expert. We investigate different machine learning models such as Decision Trees, Bayesian learning model, and instance based weighting to model user preference. In addition, we also investigate ways to improve the performance of these models by providing information about the spatial structure of the conductivity fields (which is what the expert bases his or her rank on). Results are shown for each of these machine learning models and the advantages and disadvantages for each approach are discussed. These results indicate that using the proposed two-step methodology leads to significant reduction in user-fatigue without deteriorating the solution quality of the IMOGA.
Simulating Asymmetric Top Impurities in Superfluid Clusters: A para-Water Dopant in para-Hydrogen.
Zeng, Tao; Li, Hui; Roy, Pierre-Nicholas
2013-01-03
We present the first simulation study of bosonic clusters doped with an asymmetric top molecule. The path-integral Monte Carlo method with the latest methodological advance in treating rigid-body rotation [Noya, E. G.; Vega, C.; McBride, C. J. Chem. Phys.2011, 134, 054117] is employed to study a para-water impurity in para-hydrogen clusters with up to 20 para-hydrogen molecules. The growth pattern of the doped clusters is similar in nature to that of pure clusters. The para-water molecule appears to rotate freely in the cluster. The presence of para-water substantially quenches the superfluid response of para-hydrogen with respect to the space-fixed frame.
Unsupervised user similarity mining in GSM sensor networks.
Shad, Shafqat Ali; Chen, Enhong
2013-01-01
Mobility data has attracted the researchers for the past few years because of its rich context and spatiotemporal nature, where this information can be used for potential applications like early warning system, route prediction, traffic management, advertisement, social networking, and community finding. All the mentioned applications are based on mobility profile building and user trend analysis, where mobility profile building is done through significant places extraction, user's actual movement prediction, and context awareness. However, significant places extraction and user's actual movement prediction for mobility profile building are a trivial task. In this paper, we present the user similarity mining-based methodology through user mobility profile building by using the semantic tagging information provided by user and basic GSM network architecture properties based on unsupervised clustering approach. As the mobility information is in low-level raw form, our proposed methodology successfully converts it to a high-level meaningful information by using the cell-Id location information rather than previously used location capturing methods like GPS, Infrared, and Wifi for profile mining and user similarity mining.
Bayesian multivariate hierarchical transformation models for ROC analysis.
O'Malley, A James; Zou, Kelly H
2006-02-15
A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box-Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial.
Bayesian multivariate hierarchical transformation models for ROC analysis
O'Malley, A. James; Zou, Kelly H.
2006-01-01
SUMMARY A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box–Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial. PMID:16217836
Coupled-cluster based R-matrix codes (CCRM): Recent developments
NASA Astrophysics Data System (ADS)
Sur, Chiranjib; Pradhan, Anil K.
2008-05-01
We report the ongoing development of the new coupled-cluster R-matrix codes (CCRM) for treating electron-ion scattering and radiative processes within the framework of the relativistic coupled-cluster method (RCC), interfaced with the standard R-matrix methodology. The RCC method is size consistent and in principle equivalent to an all-order many-body perturbation theory. The RCC method is one of the most accurate many-body theories, and has been applied for several systems. This project should enable the study of electron-interactions with heavy atoms/ions, utilizing not only high speed computing platforms but also improved theoretical description of the relativistic and correlation effects for the target atoms/ions as treated extensively within the RCC method. Here we present a comprehensive outline of the newly developed theoretical method and a schematic representation of the new suite of CCRM codes. We begin with the flowchart and description of various stages involved in this development. We retain the notations and nomenclature of different stages as analogous to the standard R-matrix codes.
WAIS-III index score profiles in the Canadian standardization sample.
Lange, Rael T
2007-01-01
Representative index score profiles were examined in the Canadian standardization sample of the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III). The identification of profile patterns was based on the methodology proposed by Lange, Iverson, Senior, and Chelune (2002) that aims to maximize the influence of profile shape and minimize the influence of profile magnitude on the cluster solution. A two-step cluster analysis procedure was used (i.e., hierarchical and k-means analyses). Cluster analysis of the four index scores (i.e., Verbal Comprehension [VCI], Perceptual Organization [POI], Working Memory [WMI], Processing Speed [PSI]) identified six profiles in this sample. Profiles were differentiated by pattern of performance and were primarily characterized as (a) high VCI/POI, low WMI/PSI, (b) low VCI/POI, high WMI/PSI, (c) high PSI, (d) low PSI, (e) high VCI/WMI, low POI/PSI, and (f) low VCI, high POI. These profiles are potentially useful for determining whether a patient's WAIS-III performance is unusual in a normal population.
NASA Astrophysics Data System (ADS)
Sirakov, Nikolay M.; Suh, Sang; Attardo, Salvatore
2011-06-01
This paper presents a further step of a research toward the development of a quick and accurate weapons identification methodology and system. A basic stage of this methodology is the automatic acquisition and updating of weapons ontology as a source of deriving high level weapons information. The present paper outlines the main ideas used to approach the goal. In the next stage, a clustering approach is suggested on the base of hierarchy of concepts. An inherent slot of every node of the proposed ontology is a low level features vector (LLFV), which facilitates the search through the ontology. Part of the LLFV is the information about the object's parts. To partition an object a new approach is presented capable of defining the objects concavities used to mark the end points of weapon parts, considered as convexities. Further an existing matching approach is optimized to determine whether an ontological object matches the objects from an input image. Objects from derived ontological clusters will be considered for the matching process. Image resizing is studied and applied to decrease the runtime of the matching approach and investigate its rotational and scaling invariance. Set of experiments are preformed to validate the theoretical concepts.
Usongo, Valentine; Berry, Chrystal; Yousfi, Khadidja; Doualla-Bell, Florence; Labbé, Genevieve; Johnson, Roger; Fournier, Eric; Nadon, Celine; Goodridge, Lawrence; Bekal, Sadjia
2018-01-01
Salmonella enterica serovar Heidelberg (S. Heidelberg) is one of the top serovars causing human salmonellosis. The core genome single nucleotide variant pipeline (cgSNV) is one of several whole genome based sequence typing methods used for the laboratory investigation of foodborne pathogens. SNV detection using this method requires a reference genome. The purpose of this study was to investigate the impact of the choice of the reference genome on the cgSNV-informed phylogenetic clustering and inferred isolate relationships. We found that using a draft or closed genome of S. Heidelberg as reference did not impact the ability of the cgSNV methodology to differentiate among 145 S. Heidelberg isolates involved in foodborne outbreaks. We also found that using a distantly related genome such as S. Dublin as choice of reference led to a loss in resolution since some sporadic isolates were found to cluster together with outbreak isolates. In addition, the genetic distances between outbreak isolates as well as between outbreak and sporadic isolates were overall reduced when S. Dublin was used as the reference genome as opposed to S. Heidelberg.
Clustering of samples and variables with mixed-type data
Edelmann, Dominic; Kopp-Schneider, Annette
2017-01-01
Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix. PMID:29182671
DISENTANGLING THE ICL WITH THE CHEFs: ABELL 2744 AS A CASE STUDY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiménez-Teja, Y.; Dupke, R., E-mail: yojite@iaa.es
Measurements of the intracluster light (ICL) are still prone to methodological ambiguities, and there are multiple techniques in the literature to address them, mostly based on the binding energy, the local density distribution, or the surface brightness. A common issue with these methods is the a priori assumption of a number of hypotheses on either the ICL morphology, its surface brightness level, or some properties of the brightest cluster galaxy (BCG). The discrepancy in the results is high, and numerical simulations just place a boundary on the ICL fraction in present-day galaxy clusters in the range 10%–50%. We developed amore » new algorithm based on the Chebyshev–Fourier functions to estimate the ICL fraction without relying on any a priori assumption about the physical or geometrical characteristics of the ICL. We are able to not only disentangle the ICL from the galactic luminosity but mark out the limits of the BCG from the ICL in a natural way. We test our technique with the recently released data of the cluster Abell 2744, observed by the Frontier Fields program. The complexity of this multiple merging cluster system and the formidable depth of these images make it a challenging test case to prove the efficiency of our algorithm. We found a final ICL fraction of 19.17 ± 2.87%, which is very consistent with numerical simulations.« less
How to influence patient oral hygiene behavior effectively.
Clarkson, J E; Young, L; Ramsay, C R; Bonner, B C; Bonetti, D
2009-10-01
Considerable resources are expended in dealing with dental disease easily prevented with better oral hygiene. The study hypothesis was that an evidence-based intervention, framed with psychological theory, would improve patients' oral hygiene behavior. The impact of trial methodology on trial outcomes was also explored by the conducting of two independent trials, one randomized by patient and one by dentist. The study included 87 dental practices and 778 patients (Patient RCT = 37 dentists/300 patients; Cluster RCT = 50 dentists/478 patients). Controlled for baseline differences, pooled results showed that patients who experienced the intervention had better behavioral (timing, duration, method), cognitive (confidence, planning), and clinical (plaque, gingival bleeding) outcomes. However, clinical outcomes were significantly better only in the Cluster RCT, suggesting that the impact of trial design on results needs to be further explored.
A New Classification of Diabetic Gait Pattern Based on Cluster Analysis of Biomechanical Data
Sawacha, Zimi; Guarneri, Gabriella; Avogaro, Angelo; Cobelli, Claudio
2010-01-01
Background The diabetic foot, one of the most serious complications of diabetes mellitus and a major risk factor for plantar ulceration, is determined mainly by peripheral neuropathy. Neuropathic patients exhibit decreased stability while standing as well as during dynamic conditions. A new methodology for diabetic gait pattern classification based on cluster analysis has been proposed that aims to identify groups of subjects with similar patterns of gait and verify if three-dimensional gait data are able to distinguish diabetic gait patterns from one of the control subjects. Method The gait of 20 nondiabetic individuals and 46 diabetes patients with and without peripheral neuropathy was analyzed [mean age 59.0 (2.9) and 61.1(4.4) years, mean body mass index (BMI) 24.0 (2.8), and 26.3 (2.0)]. K-means cluster analysis was applied to classify the subjects' gait patterns through the analysis of their ground reaction forces, joints and segments (trunk, hip, knee, ankle) angles, and moments. Results Cluster analysis classification led to definition of four well-separated clusters: one aggregating just neuropathic subjects, one aggregating both neuropathics and non-neuropathics, one including only diabetes patients, and one including either controls or diabetic and neuropathic subjects. Conclusions Cluster analysis was useful in grouping subjects with similar gait patterns and provided evidence that there were subgroups that might otherwise not be observed if a group ensemble was presented for any specific variable. In particular, we observed the presence of neuropathic subjects with a gait similar to the controls and diabetes patients with a long disease duration with a gait as altered as the neuropathic one. PMID:20920432
Woodhall, Dana M; Mkwanda, Square; Dembele, Massitan; Lwanga, Harriet; Drexler, Naomi; Dubray, Christine; Harris, Jennifer; Worrell, Caitlin; Mathieu, Els
2014-04-01
Currently, a 30-cluster survey to monitor drug coverage after mass drug administration for neglected tropical diseases is the most common methodology used by control programs. We investigated alternative survey methodologies that could potentially provide an estimation of drug coverage. Three alternative survey methods (market, village chief, and religious leader) were conducted and compared to the 30-cluster method in Malawi, Mali, and Uganda. In Malawi, drug coverage for the 30-cluster, market, village chief, and religious leader methods were 66.8% (95% CI 60.3-73.4), 74.3%, 76.3%, and 77.8%, respectively. In Mali, results for round 1 were 62.6% (95% CI 54.4-70.7), 56.1%, 74.8%, and 83.2%, and 57.2% (95% CI 49.0-65.4), 54.5%, 72.2%, and 73.3%, respectively, for round 2. Uganda survey results were 65.7% (59.4-72.0), 43.7%, 67.2%, and 77.6% respectively. Further research is needed to test different coverage survey methodologies to determine which survey methods are the most scientifically rigorous and resource efficient. Published by Elsevier B.V.
Designing Trend-Monitoring Sounds for Helicopters: Methodological Issues and an Application
ERIC Educational Resources Information Center
Edworthy, Judy; Hellier, Elizabeth; Aldrich, Kirsteen; Loxley, Sarah
2004-01-01
This article explores methodological issues in sonification and sound design arising from the design of helicopter monitoring sounds. Six monitoring sounds (each with 5 levels) were tested for similarity and meaning with 3 different techniques: hierarchical cluster analysis, linkage analysis, and multidimensional scaling. In Experiment 1,…
We present a robust methodology for examining the relationship between synoptic-scale atmospheric transport patterns and pollutant concentration levels observed at a site. Our approach entails calculating a large number of back-trajectories from the observational site over a long...
NASA Astrophysics Data System (ADS)
Hussain, M.; Chen, D.
2014-11-01
Buildings, the basic unit of an urban landscape, host most of its socio-economic activities and play an important role in the creation of urban land-use patterns. The spatial arrangement of different building types creates varied urban land-use clusters which can provide an insight to understand the relationships between social, economic, and living spaces. The classification of such urban clusters can help in policy-making and resource management. In many countries including the UK no national-level cadastral database containing information on individual building types exists in public domain. In this paper, we present a framework for inferring functional types of buildings based on the analysis of their form (e.g. geometrical properties, such as area and perimeter, layout) and spatial relationship from large topographic and address-based GIS database. Machine learning algorithms along with exploratory spatial analysis techniques are used to create the classification rules. The classification is extended to two further levels based on the functions (use) of buildings derived from address-based data. The developed methodology was applied to the Manchester metropolitan area using the Ordnance Survey's MasterMap®, a large-scale topographic and address-based data available for the UK.
Strugnell, Claudia; Millar, Lynne; Churchill, Andrew; Jacka, Felice; Bell, Colin; Malakellis, Mary; Swinburn, Boyd; Allender, Steve
2016-01-01
Healthy Together Victoria (HTV) - a complex 'whole of system' intervention, including an embedded cluster randomized control trial, to reduce chronic disease by addressing risk factors (physical inactivity, poor diet quality, smoking and harmful alcohol use) among children and adults in selected communities in Victoria, Australia (Healthy Together Communities). To describe the methodology for: 1) assessing changes in the prevalence of measured childhood obesity and associated risks between primary and secondary school students in HTV communities, compared with comparison communities; and 2) assessing community-level system changes that influence childhood obesity in HTC and comparison communities. Twenty-four geographically bounded areas were randomized to either prevention or comparison (2012). A repeat cross-sectional study utilising opt-out consent will collect objectively measured height, weight, waist and self-reported behavioral data among primary [Grade 4 (aged 9-10y) and Grade 6 (aged 11-12y)] and secondary [Grade 8 (aged 13-14y) and Grade 10 (aged 15-16y)] school students (2014 to 2018). Relationships between measured childhood obesity and system causes, as defined in the Foresight obesity systems map, will be assessed using a range of routine and customised data. This research methodology describes the beginnings of a state-wide childhood obesity monitoring system that can evolve to regularly inform progress on reducing obesity, and situate these changes in the context of broader community-level system change.
Clusters, Groups, and Filaments in the Chandra Deep Field-South up to Redshift 1
NASA Astrophysics Data System (ADS)
Dehghan, S.; Johnston-Hollitt, M.
2014-03-01
We present a comprehensive structure detection analysis of the 0.3 deg2 area of the MUSYC-ACES field, which covers the Chandra Deep Field-South (CDFS). Using a density-based clustering algorithm on the MUSYC and ACES photometric and spectroscopic catalogs, we find 62 overdense regions up to redshifts of 1, including clusters, groups, and filaments. We also present the detection of a relatively small void of ~10 Mpc2 at z ~ 0.53. All structures are confirmed using the DBSCAN method, including the detection of nine structures previously reported in the literature. We present a catalog of all structures present, including their central position, mean redshift, velocity dispersions, and classification based on their morphological and spectroscopic distributions. In particular, we find 13 galaxy clusters and 6 large groups/small clusters. Comparison of these massive structures with published XMM-Newton imaging (where available) shows that 80% of these structures are associated with diffuse, soft-band (0.4-1 keV) X-ray emission, including 90% of all objects classified as clusters. The presence of soft-band X-ray emission in these massive structures (M 200 >= 4.9 × 1013 M ⊙) provides a strong independent confirmation of our methodology and classification scheme. In the closest two clusters identified (z < 0.13) high-quality optical imaging from the Deep2c field of the Garching-Bonn Deep Survey reveals the cD galaxies and demonstrates that they sit at the center of the detected X-ray emission. Nearly 60% of the clusters, groups, and filaments are detected in the known enhanced density regions of the CDFS at z ~= 0.13, 0.52, 0.68, and 0.73. Additionally, all of the clusters, bar the most distant, are found in these overdense redshift regions. Many of the clusters and groups exhibit signs of ongoing formation seen in their velocity distributions, position within the detected cosmic web, and in one case through the presence of tidally disrupted central galaxies exhibiting trails of stars. These results all provide strong support for hierarchical structure formation up to redshifts of 1.
Goszczyński, Tomasz M; Kowalski, Konrad; Leśnikowski, Zbigniew J; Boratyński, Janusz
2015-02-01
Boron clusters represent a vast family of boron-rich compounds with extraordinary properties that provide the opportunity of exploitation in different areas of chemistry and biology. In addition, boron clusters are clinically used in boron neutron capture therapy (BNCT) of tumors. In this paper, a novel, in solid state (solvent free), thermal method for protein modification with boron clusters has been proposed. The method is based on a cyclic ether ring opening in oxonium adduct of cyclic ether and a boron cluster with nucleophilic centers of the protein. Lysozyme was used as the model protein, and the physicochemical and biological properties of the obtained conjugates were characterized. The main residues of modification were identified as arginine-128 and threonine-51. No significant changes in the secondary or tertiary structures of the protein after tethering of the boron cluster were found using mass spectrometry and circular dichroism measurements. However, some changes in the intermolecular interactions and hydrodynamic and catalytic properties were observed. To the best of our knowledge, we have described the first example of an application of cyclic ether ring opening in the oxonium adducts of a boron cluster for protein modification. In addition, a distinctive feature of the proposed approach is performing the reaction in solid state and at elevated temperature. The proposed methodology provides a new route to protein modification with boron clusters and extends the range of innovative molecules available for biological and medical testing. Copyright © 2014 Elsevier B.V. All rights reserved.
General Framework for Effect Sizes in Cluster Randomized Experiments
ERIC Educational Resources Information Center
VanHoudnos, Nathan
2016-01-01
Cluster randomized experiments are ubiquitous in modern education research. Although a variety of modeling approaches are used to analyze these data, perhaps the most common methodology is a normal mixed effects model where some effects, such as the treatment effect, are regarded as fixed, and others, such as the effect of group random assignment…
Communication via Chalkboard and on Paper. TLA-100.00 (F.U.F.).
ERIC Educational Resources Information Center
Hardison, Margaret J.
The purpose of this cluster of learning modules is to increase the teacher intern's understanding and skills with regard to his role as a communicator. The cluster contains three modules: (a) objectives for teaching handwriting, (b) methodology of manuscript writing, and (c) practice teaching of manuscript handwriting. Each module contains a…
Kurczynska, Monika; Kotulska, Malgorzata
2018-01-01
Mirror protein structures are often considered as artifacts in modeling protein structures. However, they may soon become a new branch of biochemistry. Moreover, methods of protein structure reconstruction, based on their residue-residue contact maps, need methodology to differentiate between models of native and mirror orientation, especially regarding the reconstructed backbones. We analyzed 130 500 structural protein models obtained from contact maps of 1 305 SCOP domains belonging to all 7 structural classes. On average, the same numbers of native and mirror models were obtained among 100 models generated for each domain. Since their structural features are often not sufficient for differentiating between the two types of model orientations, we proposed to apply various energy terms (ETs) from PyRosetta to separate native and mirror models. To automate the procedure for differentiating these models, the k-means clustering algorithm was applied. Using total energy did not allow to obtain appropriate clusters-the accuracy of the clustering for class A (all helices) was no more than 0.52. Therefore, we tested a series of different k-means clusterings based on various combinations of ETs. Finally, applying two most differentiating ETs for each class allowed to obtain satisfying results. To unify the method for differentiating between native and mirror models, independent of their structural class, the two best ETs for each class were considered. Finally, the k-means clustering algorithm used three common ETs: probability of amino acid assuming certain values of dihedral angles Φ and Ψ, Ramachandran preferences and Coulomb interactions. The accuracies of clustering with these ETs were in the range between 0.68 and 0.76, with sensitivity and selectivity in the range between 0.68 and 0.87, depending on the structural class. The method can be applied to all fully-automated tools for protein structure reconstruction based on contact maps, especially those analyzing big sets of models.
Tests for informative cluster size using a novel balanced bootstrap scheme.
Nevalainen, Jaakko; Oja, Hannu; Datta, Somnath
2017-07-20
Clustered data are often encountered in biomedical studies, and to date, a number of approaches have been proposed to analyze such data. However, the phenomenon of informative cluster size (ICS) is a challenging problem, and its presence has an impact on the choice of a correct analysis methodology. For example, Dutta and Datta (2015, Biometrics) presented a number of marginal distributions that could be tested. Depending on the nature and degree of informativeness of the cluster size, these marginal distributions may differ, as do the choices of the appropriate test. In particular, they applied their new test to a periodontal data set where the plausibility of the informativeness was mentioned, but no formal test for the same was conducted. We propose bootstrap tests for testing the presence of ICS. A balanced bootstrap method is developed to successfully estimate the null distribution by merging the re-sampled observations with closely matching counterparts. Relying on the assumption of exchangeability within clusters, the proposed procedure performs well in simulations even with a small number of clusters, at different distributions and against different alternative hypotheses, thus making it an omnibus test. We also explain how to extend the ICS test to a regression setting and thereby enhancing its practical utility. The methodologies are illustrated using the periodontal data set mentioned earlier. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Methods for sample size determination in cluster randomized trials
Rutterford, Clare; Copas, Andrew; Eldridge, Sandra
2015-01-01
Background: The use of cluster randomized trials (CRTs) is increasing, along with the variety in their design and analysis. The simplest approach for their sample size calculation is to calculate the sample size assuming individual randomization and inflate this by a design effect to account for randomization by cluster. The assumptions of a simple design effect may not always be met; alternative or more complicated approaches are required. Methods: We summarise a wide range of sample size methods available for cluster randomized trials. For those familiar with sample size calculations for individually randomized trials but with less experience in the clustered case, this manuscript provides formulae for a wide range of scenarios with associated explanation and recommendations. For those with more experience, comprehensive summaries are provided that allow quick identification of methods for a given design, outcome and analysis method. Results: We present first those methods applicable to the simplest two-arm, parallel group, completely randomized design followed by methods that incorporate deviations from this design such as: variability in cluster sizes; attrition; non-compliance; or the inclusion of baseline covariates or repeated measures. The paper concludes with methods for alternative designs. Conclusions: There is a large amount of methodology available for sample size calculations in CRTs. This paper gives the most comprehensive description of published methodology for sample size calculation and provides an important resource for those designing these trials. PMID:26174515
GRAMM-X public web server for protein–protein docking
Tovchigrechko, Andrey; Vakser, Ilya A.
2006-01-01
Protein docking software GRAMM-X and its web interface () extend the original GRAMM Fast Fourier Transformation methodology by employing smoothed potentials, refinement stage, and knowledge-based scoring. The web server frees users from complex installation of database-dependent parallel software and maintaining large hardware resources needed for protein docking simulations. Docking problems submitted to GRAMM-X server are processed by a 320 processor Linux cluster. The server was extensively tested by benchmarking, several months of public use, and participation in the CAPRI server track. PMID:16845016
Schröder, Winfried; Nickel, Stefan; Jenssen, Martin; Riediger, Jan
2015-07-15
A methodology for mapping ecosystems and their potential development under climate change and atmospheric nitrogen deposition was developed using examples from Germany. The methodology integrated data on vegetation, soil, climate change and atmospheric nitrogen deposition. These data were used to classify ecosystem types regarding six ecological functions and interrelated structures. Respective data covering 1961-1990 were used for reference. The assessment of functional and structural integrity relies on comparing a current or future state with an ecosystem type-specific reference. While current functions and structures of ecosystems were quantified by measurements, potential future developments were projected by geochemical soil modelling and data from a regional climate change model. The ecosystem types referenced the potential natural vegetation and were mapped using data on current tree species coverage and land use. In this manner, current ecosystem types were derived, which were related to data on elevation, soil texture, and climate for the years 1961-1990. These relations were quantified by Classification and Regression Trees, which were used to map the spatial patterns of ecosystem type clusters for 1961-1990. The climate data for these years were subsequently replaced by the results of a regional climate model for 1991-2010, 2011-2040, and 2041-2070. For each of these periods, one map of ecosystem type clusters was produced and evaluated with regard to the development of areal coverage of ecosystem type clusters over time. This evaluation of the structural aspects of ecological integrity at the national level was added by projecting potential future values of indicators for ecological functions at the site level by using the Very Simple Dynamic soil modelling technique based on climate data and two scenarios of nitrogen deposition as input. The results were compared to the reference and enabled an evaluation of site-specific ecosystem changes over time which proved to be both, positive and negative. Copyright © 2015 Elsevier B.V. All rights reserved.
Moody, Daniela I.; Brumby, Steven P.; Rowland, Joel C.; ...
2014-12-09
We present results from an ongoing effort to extend neuromimetic machine vision algorithms to multispectral data using adaptive signal processing combined with compressive sensing and machine learning techniques. Our goal is to develop a robust classification methodology that will allow for automated discretization of the landscape into distinct units based on attributes such as vegetation, surface hydrological properties, and topographic/geomorphic characteristics. We use a Hebbian learning rule to build spectral-textural dictionaries that are tailored for classification. We learn our dictionaries from millions of overlapping multispectral image patches and then use a pursuit search to generate classification features. Land cover labelsmore » are automatically generated using unsupervised clustering of sparse approximations (CoSA). We demonstrate our method on multispectral WorldView-2 data from a coastal plain ecosystem in Barrow, Alaska. We explore learning from both raw multispectral imagery and normalized band difference indices. We explore a quantitative metric to evaluate the spectral properties of the clusters in order to potentially aid in assigning land cover categories to the cluster labels. In this study, our results suggest CoSA is a promising approach to unsupervised land cover classification in high-resolution satellite imagery.« less
Online network organization of Barcelona en Comú, an emergent movement-party.
Aragón, Pablo; Gallego, Helena; Laniado, David; Volkovich, Yana; Kaltenbrunner, Andreas
2017-01-01
The emerging grassroots party Barcelona en Comú won the 2015 Barcelona City Council election. This candidacy was devised by activists involved in the Spanish 15M movement to transform citizen outrage into political change. On the one hand, the 15M movement was based on a decentralized structure. On the other hand, political science literature postulates that parties develop oligarchical leadership structures. This tension motivates to examine whether Barcelona en Comú preserved a decentralized structure or adopted a conventional centralized organization. In this study we develop a computational methodology to characterize the online network organization of every party in the election campaign on Twitter. Results on the network of retweets reveal that, while traditional parties are organized in a single cluster, for Barcelona en Comú two well-defined groups co-exist: a centralized cluster led by the candidate and party accounts, and a decentralized cluster with the movement activists. Furthermore, results on the network of replies also shows a dual structure: a cluster around the candidate receiving the largest attention from other parties, and another with the movement activists exhibiting a higher predisposition to dialogue with other parties.
Cluster cosmology with next-generation surveys.
NASA Astrophysics Data System (ADS)
Ascaso, B.
2017-03-01
The advent of next-generation surveys will provide a large number of cluster detections that will serve the basis for constraining cos mological parameters using cluster counts. The main two observational ingredients needed are the cluster selection function and the calibration of the mass-observable relation. In this talk, we present the methodology designed to obtain robust predictions of both ingredients based on realistic cosmological simulations mimicking the following next-generation surveys: J-PAS, LSST and Euclid. We display recent results on the selection functions for these mentioned surveys together with others coming from other next-generation surveys such as eROSITA, ACTpol and SPTpol. We notice that the optical and IR surveys will reach the lowest masses between 0.3
Speeding up the Consensus Clustering methodology for microarray data analysis
2011-01-01
Background The inference of the number of clusters in a dataset, a fundamental problem in Statistics, Data Analysis and Classification, is usually addressed via internal validation measures. The stated problem is quite difficult, in particular for microarrays, since the inferred prediction must be sensible enough to capture the inherent biological structure in a dataset, e.g., functionally related genes. Despite the rich literature present in that area, the identification of an internal validation measure that is both fast and precise has proved to be elusive. In order to partially fill this gap, we propose a speed-up of Consensus (Consensus Clustering), a methodology whose purpose is the provision of a prediction of the number of clusters in a dataset, together with a dissimilarity matrix (the consensus matrix) that can be used by clustering algorithms. As detailed in the remainder of the paper, Consensus is a natural candidate for a speed-up. Results Since the time-precision performance of Consensus depends on two parameters, our first task is to show that a simple adjustment of the parameters is not enough to obtain a good precision-time trade-off. Our second task is to provide a fast approximation algorithm for Consensus. That is, the closely related algorithm FC (Fast Consensus) that would have the same precision as Consensus with a substantially better time performance. The performance of FC has been assessed via extensive experiments on twelve benchmark datasets that summarize key features of microarray applications, such as cancer studies, gene expression with up and down patterns, and a full spectrum of dimensionality up to over a thousand. Based on their outcome, compared with previous benchmarking results available in the literature, FC turns out to be among the fastest internal validation methods, while retaining the same outstanding precision of Consensus. Moreover, it also provides a consensus matrix that can be used as a dissimilarity matrix, guaranteeing the same performance as the corresponding matrix produced by Consensus. We have also experimented with the use of Consensus and FC in conjunction with NMF (Nonnegative Matrix Factorization), in order to identify the correct number of clusters in a dataset. Although NMF is an increasingly popular technique for biological data mining, our results are somewhat disappointing and complement quite well the state of the art about NMF, shedding further light on its merits and limitations. Conclusions In summary, FC with a parameter setting that makes it robust with respect to small and medium-sized datasets, i.e, number of items to cluster in the hundreds and number of conditions up to a thousand, seems to be the internal validation measure of choice. Moreover, the technique we have developed here can be used in other contexts, in particular for the speed-up of stability-based validation measures. PMID:21235792
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lafata, K; Ren, L; Wu, Q
Purpose: To develop a data-mining methodology based on quantum clustering and machine learning to predict expected dosimetric endpoints for lung SBRT applications based on patient-specific anatomic features. Methods: Ninety-three patients who received lung SBRT at our clinic from 2011–2013 were retrospectively identified. Planning information was acquired for each patient, from which various features were extracted using in-house semi-automatic software. Anatomic features included tumor-to-OAR distances, tumor location, total-lung-volume, GTV and ITV. Dosimetric endpoints were adopted from RTOG-0195 recommendations, and consisted of various OAR-specific partial-volume doses and maximum point-doses. First, PCA analysis and unsupervised quantum-clustering was used to explore the feature-space tomore » identify potentially strong classifiers. Secondly, a multi-class logistic regression algorithm was developed and trained to predict dose-volume endpoints based on patient-specific anatomic features. Classes were defined by discretizing the dose-volume data, and the feature-space was zero-mean normalized. Fitting parameters were determined by minimizing a regularized cost function, and optimization was performed via gradient descent. As a pilot study, the model was tested on two esophageal dosimetric planning endpoints (maximum point-dose, dose-to-5cc), and its generalizability was evaluated with leave-one-out cross-validation. Results: Quantum-Clustering demonstrated a strong separation of feature-space at 15Gy across the first-and-second Principle Components of the data when the dosimetric endpoints were retrospectively identified. Maximum point dose prediction to the esophagus demonstrated a cross-validation accuracy of 87%, and the maximum dose to 5cc demonstrated a respective value of 79%. The largest optimized weighting factor was placed on GTV-to-esophagus distance (a factor of 10 greater than the second largest weighting factor), indicating an intuitively strong correlation between this feature and both endpoints. Conclusion: This pilot study shows that it is feasible to predict dose-volume endpoints based on patient-specific anatomic features. The developed methodology can potentially help to identify patients at risk for higher OAR doses, thus improving the efficiency of treatment planning. R01-184173.« less
Stacked Autoencoders for Outlier Detection in Over-the-Horizon Radar Signals
Protopapadakis, Eftychios; Doulamis, Anastasios; Doulamis, Nikolaos; Dres, Dimitrios; Bimpas, Matthaios
2017-01-01
Detection of outliers in radar signals is a considerable challenge in maritime surveillance applications. High-Frequency Surface-Wave (HFSW) radars have attracted significant interest as potential tools for long-range target identification and outlier detection at over-the-horizon (OTH) distances. However, a number of disadvantages, such as their low spatial resolution and presence of clutter, have a negative impact on their accuracy. In this paper, we explore the applicability of deep learning techniques for detecting deviations from the norm in behavioral patterns of vessels (outliers) as they are tracked from an OTH radar. The proposed methodology exploits the nonlinear mapping capabilities of deep stacked autoencoders in combination with density-based clustering. A comparative experimental evaluation of the approach shows promising results in terms of the proposed methodology's performance. PMID:29312449
Cross Section High Resolution Imaging of Polymer-Based Materials
NASA Astrophysics Data System (ADS)
Delaportas, D.; Aden, P.; Muckle, C.; Yeates, S.; Treutlein, R.; Haq, S.; Alexandrou, I.
This paper describes a methodology for preparing cross sections of organic layers suitable for transmission electron microscopy (TEM) at high resolution. Our principal aim is to prepare samples that are tough enough to allow the slicing into sub-150 nm sections. We also need strong contrast at the organic layer area to make it identifiable during TEM. Our approach is to deposit organic layers on flexible substrates and prepare thin cross sections using ultra-microtomy. We sandwich the organic layer between two metal thin films in order to isolate it and improve contrast. Our methodology is used to study the microstructure of polymer/nanotube composites, allowing us to accurately measure the organic layer thickness, determine nanotube dispersion and assess the effect of nanotube clustering on film structural stability.
Legal Research in the Context of Educational Leadership and Policy Studies
ERIC Educational Resources Information Center
Sughrue, Jennifer; Driscoll, Lisa G.
2012-01-01
Legal research methodology is not included in the cluster of research and design courses offered to undergraduate and graduate students in education by traditional departments of research and foundations, so it becomes the responsibility of education law faculty to instruct students in legal methodology. This narrow corridor of opportunity for…
Discovery of gigantic molecular nanostructures using a flow reaction array as a search engine.
Zang, Hong-Ying; de la Oliva, Andreu Ruiz; Miras, Haralampos N; Long, De-Liang; McBurney, Roy T; Cronin, Leroy
2014-04-28
The discovery of gigantic molecular nanostructures like coordination and polyoxometalate clusters is extremely time-consuming since a vast combinatorial space needs to be searched, and even a systematic and exhaustive exploration of the available synthetic parameters relies on a great deal of serendipity. Here we present a synthetic methodology that combines a flow reaction array and algorithmic control to give a chemical 'real-space' search engine leading to the discovery and isolation of a range of new molecular nanoclusters based on [Mo(2)O(2)S(2)](2+)-based building blocks with either fourfold (C4) or fivefold (C5) symmetry templates and linkers. This engine leads us to isolate six new nanoscale cluster compounds: 1, {Mo(10)(C5)}; 2, {Mo(14)(C4)4(C5)2}; 3, {Mo(60)(C4)10}; 4, {Mo(48)(C4)6}; 5, {Mo(34)(C4)4}; 6, {Mo(18)(C4)9}; in only 200 automated experiments from a parameter space spanning ~5 million possible combinations.
Shan, Liran C; De Brún, Aoife; Henchion, Maeve; Li, Chenguang; Murrin, Celine; Wall, Patrick G; Monahan, Frank J
2017-09-01
Recent innovations in processed meats focus on healthier reformulations through reducing negative constituents and/or adding health beneficial ingredients. This study explored the influence of base meat product (ham, sausages, beef burger), salt and/or fat content (reduced or not), healthy ingredients (omega 3, vitamin E, none), and price (average or higher than average) on consumers' purchase intention and quality judgement of processed meats. A survey (n=481) using conjoint methodology and cluster analysis was conducted. Price and base meat product were most important for consumers' purchase intention, followed by healthy ingredient and salt and/or fat content. In reformulation, consumers had a preference for ham and sausages over beef burgers, and for reduced salt and/or fat over non reduction. In relation to healthy ingredients, omega 3 was preferred over none, and vitamin E was least preferred. Healthier reformulations improved the perceived healthiness of processed meats. Cluster analyses identified three consumer segments with different product preferences. Copyright © 2017 Elsevier Ltd. All rights reserved.
Assessing Similarity Among Individual Tumor Size Lesion Dynamics: The CICIL Methodology
Girard, Pascal; Ioannou, Konstantinos; Klinkhardt, Ute; Munafo, Alain
2018-01-01
Mathematical models of tumor dynamics generally omit information on individual target lesions (iTLs), and consider the most important variable to be the sum of tumor sizes (TS). However, differences in lesion dynamics might be predictive of tumor progression. To exploit this information, we have developed a novel and flexible approach for the non‐parametric analysis of iTLs, which integrates knowledge from signal processing and machine learning. We called this new methodology ClassIfication Clustering of Individual Lesions (CICIL). We used CICIL to assess similarities among the TS dynamics of 3,223 iTLs measured in 1,056 patients with metastatic colorectal cancer treated with cetuximab combined with irinotecan, in two phase II studies. We mainly observed similar dynamics among lesions within the same tumor site classification. In contrast, lesions in anatomic locations with different features showed different dynamics in about 35% of patients. The CICIL methodology has also been implemented in a user‐friendly and efficient Java‐based framework. PMID:29388396
ERIC Educational Resources Information Center
Kung, Hsiang-Te; And Others
1993-01-01
In spite of rapid progress achieved in the methodological research underlying environmental impact assessment (EIA), the problem of weighting various parameters has not yet been solved. This paper presents a new approach, fuzzy clustering analysis, which is illustrated with an EIA case study on Baoshan-Wusong District in Shanghai, China. (Author)
ERIC Educational Resources Information Center
Ho, Hsuan-Fu; Hung, Chia-Chi
2008-01-01
Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…
Acidity in DMSO from the embedded cluster integral equation quantum solvation model.
Heil, Jochen; Tomazic, Daniel; Egbers, Simon; Kast, Stefan M
2014-04-01
The embedded cluster reference interaction site model (EC-RISM) is applied to the prediction of acidity constants of organic molecules in dimethyl sulfoxide (DMSO) solution. EC-RISM is based on a self-consistent treatment of the solute's electronic structure and the solvent's structure by coupling quantum-chemical calculations with three-dimensional (3D) RISM integral equation theory. We compare available DMSO force fields with reference calculations obtained using the polarizable continuum model (PCM). The results are evaluated statistically using two different approaches to eliminating the proton contribution: a linear regression model and an analysis of pK(a) shifts for compound pairs. Suitable levels of theory for the integral equation methodology are benchmarked. The results are further analyzed and illustrated by visualizing solvent site distribution functions and comparing them with an aqueous environment.
Acosta, Joie D; Chinman, Matthew; Ebener, Patricia; Phillips, Andrea; Xenakis, Lea; Malone, Patrick S
2016-01-01
Restorative Practices in schools lack rigorous evaluation studies. As an example of rigorous school-based research, this paper describes the first randomized control trial of restorative practices to date, the Study of Restorative Practices. It is a 5-year, cluster-randomized controlled trial (RCT) of the Restorative Practices Intervention (RPI) in 14 middle schools in Maine to assess whether RPI impacts both positive developmental outcomes and problem behaviors and whether the effects persist during the transition from middle to high school. The two-year RPI intervention began in the 2014-2015 school year. The study's rationale and theoretical concerns are discussed along with methodological concerns including teacher professional development. The theoretical rationale and description of the methods from this study may be useful to others conducting rigorous research and evaluation in this area.
Spatiotemporal multistage consensus clustering in molecular dynamics studies of large proteins.
Kenn, Michael; Ribarics, Reiner; Ilieva, Nevena; Cibena, Michael; Karch, Rudolf; Schreiner, Wolfgang
2016-04-26
The aim of this work is to find semi-rigid domains within large proteins as reference structures for fitting molecular dynamics trajectories. We propose an algorithm, multistage consensus clustering, MCC, based on minimum variation of distances between pairs of Cα-atoms as target function. The whole dataset (trajectory) is split into sub-segments. For a given sub-segment, spatial clustering is repeatedly started from different random seeds, and we adopt the specific spatial clustering with minimum target function: the process described so far is stage 1 of MCC. Then, in stage 2, the results of spatial clustering are consolidated, to arrive at domains stable over the whole dataset. We found that MCC is robust regarding the choice of parameters and yields relevant information on functional domains of the major histocompatibility complex (MHC) studied in this paper: the α-helices and β-floor of the protein (MHC) proved to be most flexible and did not contribute to clusters of significant size. Three alleles of the MHC, each in complex with ABCD3 peptide and LC13 T-cell receptor (TCR), yielded different patterns of motion. Those alleles causing immunological allo-reactions showed distinct correlations of motion between parts of the peptide, the binding cleft and the complementary determining regions (CDR)-loops of the TCR. Multistage consensus clustering reflected functional differences between MHC alleles and yields a methodological basis to increase sensitivity of functional analyses of bio-molecules. Due to the generality of approach, MCC is prone to lend itself as a potent tool also for the analysis of other kinds of big data.
NASA Astrophysics Data System (ADS)
Landman, Uzi; Yannouleas, Constantine
2001-09-01
This special volume of The European Physical Journal D contains papers presented at the 10th International Symposium on Small Particles and Inorganic Clusters, ISSPIC 10, which was held from October 11th to 15th, 2000, in Atlanta, Georgia, USA. The meeting was attended by over 300 scientists from all over the world, and in it 45 invited and hot-topic lectures were given, and 330 posters were presented. In keeping with the tradition of the ISSPIC meetings, the 10th anniversary symposium was devoted to a broad and balanced overview of new results, emerging trends and perspectives pertaining to the physics and chemistry of clusters. The meeting covered experimental and theoretical investigations of gas-phase and supported clusters, nanoscale cluster-based materials, and nanometer. The sessions at ISSPIC 10 were organized as mini-symposia, and topics included: electronic and structural properties of clusters, charged clusters and photoelectron spectroscopy, fast laser spectroscopy and dynamics, cluster-surface interactions and cluster deposition, helium clusters and spectroscopy, structural evolution and thermodynamics of clusters, cluster reactivity and nano-catalysis, two-dimensional quantum dots, nano-crystals and self-assembly, mechanical, electronic and transport properties of carbon nanotubes, structure and conductance properties of nanowires, formation and stability of droplets and jets, and medical applications of colloidal clusters. Most importantly, the symposium provided an interdisciplinary forum for presentation and discussion of fundamental and methodological aspects as well as technologically oriented developments. The opening session of ISSPIC 10 was dedicated to the memory of Professor Walter D. Knight, a pioneer in the field of clusters. The work of Walter and his research group at Berkeley on electronic shells in metal clusters opened new avenues in cluster science and contributed significantly to the rapid growth of this field. We would like to take this opportunity to gratefully acknowledge the financial assistance extended to the symposium by the Georgia Institute of Technology and by the United States Air Force Office of Scientific Research. We also thank the participants for contributing to the success of ISSPIC 10 through communicating their latest scientific results during the meeting and via the papers appearing in these proceedings.
Off-road truck-related accidents in U.S. mines
Dindarloo, Saeid R.; Pollard, Jonisha P.; Siami-Irdemoosa, Elnaz
2016-01-01
Introduction Off-road trucks are one of the major sources of equipment-related accidents in the U.S. mining industries. A systematic analysis of all off-road truck-related accidents, injuries, and illnesses, which are reported and published by the Mine Safety and Health Administration (MSHA), is expected to provide practical insights for identifying the accident patterns and trends in the available raw database. Therefore, appropriate safety management measures can be administered and implemented based on these accident patterns/trends. Methods A hybrid clustering-classification methodology using K-means clustering and gene expression programming (GEP) is proposed for the analysis of severe and non-severe off-road truck-related injuries at U.S. mines. Using the GEP sub-model, a small subset of the 36 recorded attributes was found to be correlated to the severity level. Results Given the set of specified attributes, the clustering sub-model was able to cluster the accident records into 5 distinct groups. For instance, the first cluster contained accidents related to minerals processing mills and coal preparation plants (91%). More than two-thirds of the victims in this cluster had less than 5 years of job experience. This cluster was associated with the highest percentage of severe injuries (22 severe accidents, 3.4%). Almost 50% of all accidents in this cluster occurred at stone operations. Similarly, the other four clusters were characterized to highlight important patterns that can be used to determine areas of focus for safety initiatives. Conclusions The identified clusters of accidents may play a vital role in the prevention of severe injuries in mining. Further research into the cluster attributes and identified patterns will be necessary to determine how these factors can be mitigated to reduce the risk of severe injuries. Practical application Analyzing injury data using data mining techniques provides some insight into attributes that are associated with high accuracies for predicting injury severity. PMID:27620937
Off-road truck-related accidents in U.S. mines.
Dindarloo, Saeid R; Pollard, Jonisha P; Siami-Irdemoosa, Elnaz
2016-09-01
Off-road trucks are one of the major sources of equipment-related accidents in the U.S. mining industries. A systematic analysis of all off-road truck-related accidents, injuries, and illnesses, which are reported and published by the Mine Safety and Health Administration (MSHA), is expected to provide practical insights for identifying the accident patterns and trends in the available raw database. Therefore, appropriate safety management measures can be administered and implemented based on these accident patterns/trends. A hybrid clustering-classification methodology using K-means clustering and gene expression programming (GEP) is proposed for the analysis of severe and non-severe off-road truck-related injuries at U.S. mines. Using the GEP sub-model, a small subset of the 36 recorded attributes was found to be correlated to the severity level. Given the set of specified attributes, the clustering sub-model was able to cluster the accident records into 5 distinct groups. For instance, the first cluster contained accidents related to minerals processing mills and coal preparation plants (91%). More than two-thirds of the victims in this cluster had less than 5years of job experience. This cluster was associated with the highest percentage of severe injuries (22 severe accidents, 3.4%). Almost 50% of all accidents in this cluster occurred at stone operations. Similarly, the other four clusters were characterized to highlight important patterns that can be used to determine areas of focus for safety initiatives. The identified clusters of accidents may play a vital role in the prevention of severe injuries in mining. Further research into the cluster attributes and identified patterns will be necessary to determine how these factors can be mitigated to reduce the risk of severe injuries. Analyzing injury data using data mining techniques provides some insight into attributes that are associated with high accuracies for predicting injury severity. Copyright © 2016 Elsevier Ltd and National Safety Council. All rights reserved.
Water droplet excess free energy determined by cluster mitosis using guided molecular dynamics
NASA Astrophysics Data System (ADS)
Lau, Gabriel V.; Hunt, Patricia A.; Müller, Erich A.; Jackson, George; Ford, Ian J.
2015-12-01
Atmospheric aerosols play a vital role in affecting climate by influencing the properties and lifetimes of clouds and precipitation. Understanding the underlying microscopic mechanisms involved in the nucleation of aerosol droplets from the vapour phase is therefore of great interest. One key thermodynamic quantity in nucleation is the excess free energy of cluster formation relative to that of the saturated vapour. In our current study, the excess free energy is extracted for clusters of pure water modelled with the TIP4P/2005 intermolecular potential using a method based on nonequilibrium molecular dynamics and the Jarzynski relation. The change in free energy associated with the "mitosis" or division of a cluster of N water molecules into two N/2 sub-clusters is evaluated. This methodology is an extension of the disassembly procedure used recently to calculate the excess free energy of argon clusters [H. Y. Tang and I. J. Ford, Phys. Rev. E 91, 023308 (2015)]. Our findings are compared to the corresponding excess free energies obtained from classical nucleation theory (CNT) as well as internally consistent classical theory (ICCT). The values of the excess free energy that we obtain with the mitosis method are consistent with CNT for large cluster sizes but for the smallest clusters, the results tend towards ICCT; for intermediate sized clusters, we obtain values between the ICCT and CNT predictions. Furthermore, the curvature-dependent surface tension which can be obtained by regarding the clusters as spherical droplets of bulk density is found to be a monotonically increasing function of cluster size for the studied range. The data are compared to other values reported in the literature, agreeing qualitatively with some but disagreeing with the values determined by Joswiak et al. [J. Phys. Chem. Lett. 4, 4267 (2013)] using a biased mitosis approach; an assessment of the differences is the main motivation for our current study.
Water droplet excess free energy determined by cluster mitosis using guided molecular dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lau, Gabriel V.; Müller, Erich A.; Jackson, George
Atmospheric aerosols play a vital role in affecting climate by influencing the properties and lifetimes of clouds and precipitation. Understanding the underlying microscopic mechanisms involved in the nucleation of aerosol droplets from the vapour phase is therefore of great interest. One key thermodynamic quantity in nucleation is the excess free energy of cluster formation relative to that of the saturated vapour. In our current study, the excess free energy is extracted for clusters of pure water modelled with the TIP4P/2005 intermolecular potential using a method based on nonequilibrium molecular dynamics and the Jarzynski relation. The change in free energy associatedmore » with the “mitosis” or division of a cluster of N water molecules into two N/2 sub-clusters is evaluated. This methodology is an extension of the disassembly procedure used recently to calculate the excess free energy of argon clusters [H. Y. Tang and I. J. Ford, Phys. Rev. E 91, 023308 (2015)]. Our findings are compared to the corresponding excess free energies obtained from classical nucleation theory (CNT) as well as internally consistent classical theory (ICCT). The values of the excess free energy that we obtain with the mitosis method are consistent with CNT for large cluster sizes but for the smallest clusters, the results tend towards ICCT; for intermediate sized clusters, we obtain values between the ICCT and CNT predictions. Furthermore, the curvature-dependent surface tension which can be obtained by regarding the clusters as spherical droplets of bulk density is found to be a monotonically increasing function of cluster size for the studied range. The data are compared to other values reported in the literature, agreeing qualitatively with some but disagreeing with the values determined by Joswiak et al. [J. Phys. Chem. Lett. 4, 4267 (2013)] using a biased mitosis approach; an assessment of the differences is the main motivation for our current study.« less
Latent Class Analysis of Incomplete Data via an Entropy-Based Criterion
Larose, Chantal; Harel, Ofer; Kordas, Katarzyna; Dey, Dipak K.
2016-01-01
Latent class analysis is used to group categorical data into classes via a probability model. Model selection criteria then judge how well the model fits the data. When addressing incomplete data, the current methodology restricts the imputation to a single, pre-specified number of classes. We seek to develop an entropy-based model selection criterion that does not restrict the imputation to one number of clusters. Simulations show the new criterion performing well against the current standards of AIC and BIC, while a family studies application demonstrates how the criterion provides more detailed and useful results than AIC and BIC. PMID:27695391
The Meaning of Work among Chinese University Students: Findings from Prototype Research Methodology
ERIC Educational Resources Information Center
Zhou, Sili; Leung, S. Alvin; Li, Xu
2012-01-01
This study examined Chinese university students' conceptualization of the meaning of work. One hundred and ninety students (93 male, 97 female) from Beijing, China, participated in the study. Prototype research methodology (J. Li, 2001) was used to explore the meaning of work and the associations among the identified meanings. Cluster analysis was…
The Gaia-ESO Survey: open clusters in Gaia-DR1 . A way forward to stellar age calibration
NASA Astrophysics Data System (ADS)
Randich, S.; Tognelli, E.; Jackson, R.; Jeffries, R. D.; Degl'Innocenti, S.; Pancino, E.; Re Fiorentin, P.; Spagna, A.; Sacco, G.; Bragaglia, A.; Magrini, L.; Prada Moroni, P. G.; Alfaro, E.; Franciosini, E.; Morbidelli, L.; Roccatagliata, V.; Bouy, H.; Bravi, L.; Jiménez-Esteban, F. M.; Jordi, C.; Zari, E.; Tautvaišiene, G.; Drazdauskas, A.; Mikolaitis, S.; Gilmore, G.; Feltzing, S.; Vallenari, A.; Bensby, T.; Koposov, S.; Korn, A.; Lanzafame, A.; Smiljanic, R.; Bayo, A.; Carraro, G.; Costado, M. T.; Heiter, U.; Hourihane, A.; Jofré, P.; Lewis, J.; Monaco, L.; Prisinzano, L.; Sbordone, L.; Sousa, S. G.; Worley, C. C.; Zaggia, S.
2018-05-01
Context. Determination and calibration of the ages of stars, which heavily rely on stellar evolutionary models, are very challenging, while representing a crucial aspect in many astrophysical areas. Aims: We describe the methodologies that, taking advantage of Gaia-DR1 and the Gaia-ESO Survey data, enable the comparison of observed open star cluster sequences with stellar evolutionary models. The final, long-term goal is the exploitation of open clusters as age calibrators. Methods: We perform a homogeneous analysis of eight open clusters using the Gaia-DR1 TGAS catalogue for bright members and information from the Gaia-ESO Survey for fainter stars. Cluster membership probabilities for the Gaia-ESO Survey targets are derived based on several spectroscopic tracers. The Gaia-ESO Survey also provides the cluster chemical composition. We obtain cluster parallaxes using two methods. The first one relies on the astrometric selection of a sample of bona fide members, while the other one fits the parallax distribution of a larger sample of TGAS sources. Ages and reddening values are recovered through a Bayesian analysis using the 2MASS magnitudes and three sets of standard models. Lithium depletion boundary (LDB) ages are also determined using literature observations and the same models employed for the Bayesian analysis. Results: For all but one cluster, parallaxes derived by us agree with those presented in Gaia Collaboration (2017, A&A, 601, A19), while a discrepancy is found for NGC 2516; we provide evidence supporting our own determination. Inferred cluster ages are robust against models and are generally consistent with literature values. Conclusions: The systematic parallax errors inherent in the Gaia DR1 data presently limit the precision of our results. Nevertheless, we have been able to place these eight clusters onto the same age scale for the first time, with good agreement between isochronal and LDB ages where there is overlap. Our approach appears promising and demonstrates the potential of combining Gaia and ground-based spectroscopic datasets. Based on observations collected with the FLAMES instrument at VLT/UT2 telescope (Paranal Observatory, ESO, Chile), for the Gaia-ESO Large Public Spectroscopic Survey (188.B-3002, 193.B-0936).Additional tables are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/612/A99
The Africa Yoga Project and Well-Being: A Concept Map of Students' Perceptions.
Giambrone, Carla A; Cook-Cottone, Catherine P; Klein, Jessalyn E
2018-03-01
Concept mapping methodology was used to explore the perceived impact of practicing yoga with the Africa Yoga Project (AYP)-an organisation created to increase health and well-being by providing community-based yoga classes throughout Kenya. AYP's mission fit with theoretical models of well-being is discussed. Anecdotal evidence and initial qualitative research suggested the AYP meaningfully impacted adult students. Of the hundreds of AYP's adult students, 56 and 82 students participated in Phases I and II, respectively. Phase I brainstorming resulted in 94 student-generated statements about their perceived change. Phase II participants sorted and rated statements in terms of importance. Multidimensional scaling and hierarchical cluster analysis of sort data was utilised to map and group statements into clusters. Based on statistical and interpretive criteria, a five-cluster solution with the following concepts was identified as the best model of students' change: Personal Growth; Interpersonal Effectiveness (lowest importance); Physical and Social Benefits; Emotional Resiliency; and Improved Self-Concept (highest importance). Overall, students reported positive perceptions of the AYP. Additional research is needed to quantify students' change, and to compare the AYP outcomes to those of other programs aimed at poverty-related stress reduction and well-being. © 2018 The International Association of Applied Psychology.
Ferles, Christos; Beaufort, William-Scott; Ferle, Vanessa
2017-01-01
The present study devises mapping methodologies and projection techniques that visualize and demonstrate biological sequence data clustering results. The Sequence Data Density Display (SDDD) and Sequence Likelihood Projection (SLP) visualizations represent the input symbolical sequences in a lower-dimensional space in such a way that the clusters and relations of data elements are depicted graphically. Both operate in combination/synergy with the Self-Organizing Hidden Markov Model Map (SOHMMM). The resulting unified framework is in position to analyze automatically and directly raw sequence data. This analysis is carried out with little, or even complete absence of, prior information/domain knowledge.
Complex basis functions for molecular resonances: Methodology and applications
NASA Astrophysics Data System (ADS)
White, Alec; McCurdy, C. William; Head-Gordon, Martin
The computation of positions and widths of metastable electronic states is a challenge for molecular electronic structure theory because, in addition to the difficulty of the many-body problem, such states obey scattering boundary conditions. These resonances cannot be addressed with naïve application of traditional bound state electronic structure theory. Non-Hermitian electronic structure methods employing complex basis functions is one way that we may rigorously treat resonances within the framework of traditional electronic structure theory. In this talk, I will discuss our recent work in this area including the methodological extension from single determinant SCF-based approaches to highly correlated levels of wavefunction-based theory such as equation of motion coupled cluster and many-body perturbation theory. These approaches provide a hierarchy of theoretical methods for the computation of positions and widths of molecular resonances. Within this framework, we may also examine properties of resonances including the dependence of these parameters on molecular geometry. Some applications of these methods to temporary anions and dianions will also be discussed.
Applying a Resources Framework to Analysis of the Force and Motion Conceptual Evaluation
ERIC Educational Resources Information Center
Smith, Trevor I.; Wittman, Michael C.
2008-01-01
We suggest one redefinition of common clusters of questions used to analyze student responses on the Force and Motion Conceptual Evaluation. Our goal is to propose a methodology that moves beyond an analysis of student learning defined by correct responses, either on the overall test or on clusters of questions defined solely by content. We use…
Spatial cluster detection for repeatedly measured outcomes while accounting for residential history.
Cook, Andrea J; Gold, Diane R; Li, Yi
2009-10-01
Spatial cluster detection has become an important methodology in quantifying the effect of hazardous exposures. Previous methods have focused on cross-sectional outcomes that are binary or continuous. There are virtually no spatial cluster detection methods proposed for longitudinal outcomes. This paper proposes a new spatial cluster detection method for repeated outcomes using cumulative geographic residuals. A major advantage of this method is its ability to readily incorporate information on study participants relocation, which most cluster detection statistics cannot. Application of these methods will be illustrated by the Home Allergens and Asthma prospective cohort study analyzing the relationship between environmental exposures and repeated measured outcome, occurrence of wheeze in the last 6 months, while taking into account mobile locations.
Unsupervised User Similarity Mining in GSM Sensor Networks
Shad, Shafqat Ali; Chen, Enhong
2013-01-01
Mobility data has attracted the researchers for the past few years because of its rich context and spatiotemporal nature, where this information can be used for potential applications like early warning system, route prediction, traffic management, advertisement, social networking, and community finding. All the mentioned applications are based on mobility profile building and user trend analysis, where mobility profile building is done through significant places extraction, user's actual movement prediction, and context awareness. However, significant places extraction and user's actual movement prediction for mobility profile building are a trivial task. In this paper, we present the user similarity mining-based methodology through user mobility profile building by using the semantic tagging information provided by user and basic GSM network architecture properties based on unsupervised clustering approach. As the mobility information is in low-level raw form, our proposed methodology successfully converts it to a high-level meaningful information by using the cell-Id location information rather than previously used location capturing methods like GPS, Infrared, and Wifi for profile mining and user similarity mining. PMID:23576905
Choo-Wosoba, Hyoyoung; Levy, Steven M; Datta, Somnath
2016-06-01
Community water fluoridation is an important public health measure to prevent dental caries, but it continues to be somewhat controversial. The Iowa Fluoride Study (IFS) is a longitudinal study on a cohort of Iowa children that began in 1991. The main purposes of this study (http://www.dentistry.uiowa.edu/preventive-fluoride-study) were to quantify fluoride exposures from both dietary and nondietary sources and to associate longitudinal fluoride exposures with dental fluorosis (spots on teeth) and dental caries (cavities). We analyze a subset of the IFS data by a marginal regression model with a zero-inflated version of the Conway-Maxwell-Poisson distribution for count data exhibiting excessive zeros and a wide range of dispersion patterns. In general, we introduce two estimation methods for fitting a ZICMP marginal regression model. Finite sample behaviors of the estimators and the resulting confidence intervals are studied using extensive simulation studies. We apply our methodologies to the dental caries data. Our novel modeling incorporating zero inflation, clustering, and overdispersion sheds some new light on the effect of community water fluoridation and other factors. We also include a second application of our methodology to a genomic (next-generation sequencing) dataset that exhibits underdispersion. © 2015, The International Biometric Society.
Clustering approaches to feature change detection
NASA Astrophysics Data System (ADS)
G-Michael, Tesfaye; Gunzburger, Max; Peterson, Janet
2018-05-01
The automated detection of changes occurring between multi-temporal images is of significant importance in a wide range of medical, environmental, safety, as well as many other settings. The usage of k-means clustering is explored as a means for detecting objects added to a scene. The silhouette score for the clustering is used to define the optimal number of clusters that should be used. For simple images having a limited number of colors, new objects can be detected by examining the change between the optimal number of clusters for the original and modified images. For more complex images, new objects may need to be identified by examining the relative areas covered by corresponding clusters in the original and modified images. Which method is preferable depends on the composition and range of colors present in the images. In addition to describing the clustering and change detection methodology of our proposed approach, we provide some simple illustrations of its application.
The methodology of multi-viewpoint clustering analysis
NASA Technical Reports Server (NTRS)
Mehrotra, Mala; Wild, Chris
1993-01-01
One of the greatest challenges facing the software engineering community is the ability to produce large and complex computer systems, such as ground support systems for unmanned scientific missions, that are reliable and cost effective. In order to build and maintain these systems, it is important that the knowledge in the system be suitably abstracted, structured, and otherwise clustered in a manner which facilitates its understanding, manipulation, testing, and utilization. Development of complex mission-critical systems will require the ability to abstract overall concepts in the system at various levels of detail and to consider the system from different points of view. Multi-ViewPoint - Clustering Analysis MVP-CA methodology has been developed to provide multiple views of large, complicated systems. MVP-CA provides an ability to discover significant structures by providing an automated mechanism to structure both hierarchically (from detail to abstract) and orthogonally (from different perspectives). We propose to integrate MVP/CA into an overall software engineering life cycle to support the development and evolution of complex mission critical systems.
Needham, Robert; Stebbins, Julie; Chockalingam, Nachiappan
2016-01-01
To review the current scientific literature on the assessment of three-dimensional movement of the lumbar spine with a focus on the utilisation of a 3D cluster. Electronic databases PubMed, OVID, CINAHL, The Cochrance Library, ScienceDirect, ProQuest and Web of Knowledge were searched between 1966 and March 2015. The reference lists of the articles that met the inclusion criteria were also searched. From the 1530 articles identified through an initial search, 16 articles met the inclusion criteria. All information relating to methodology and kinematic modelling of the lumbar segment along with the outcome measures were extracted from the studies identified for synthesis. Guidelines detailing 3D cluster construction were limited in the identified articles and the lack of information presented makes it difficult to assess the external validity of this technique. Scarce information was presented detailing time-series angle data of the lumbar spine during gait. Further developments of the 3D cluster technique are required and it is essential that the authors provide clear instruction, definitions and standards in their manuscript to improve clarity and reproducibility.
On the effect of model parameters on forecast objects
NASA Astrophysics Data System (ADS)
Marzban, Caren; Jones, Corinne; Li, Ning; Sandgathe, Scott
2018-04-01
Many physics-based numerical models produce a gridded, spatial field of forecasts, e.g., a temperature map
. The field for some quantities generally consists of spatially coherent and disconnected objects
. Such objects arise in many problems, including precipitation forecasts in atmospheric models, eddy currents in ocean models, and models of forest fires. Certain features of these objects (e.g., location, size, intensity, and shape) are generally of interest. Here, a methodology is developed for assessing the impact of model parameters on the features of forecast objects. The main ingredients of the methodology include the use of (1) Latin hypercube sampling for varying the values of the model parameters, (2) statistical clustering algorithms for identifying objects, (3) multivariate multiple regression for assessing the impact of multiple model parameters on the distribution (across the forecast domain) of object features, and (4) methods for reducing the number of hypothesis tests and controlling the resulting errors. The final output
of the methodology is a series of box plots and confidence intervals that visually display the sensitivities. The methodology is demonstrated on precipitation forecasts from a mesoscale numerical weather prediction model.
Parente, Joana; Pereira, Mário G; Tonini, Marj
2016-07-15
The present study focuses on the dependence of the space-time permutation scan statistics (STPSS) (1) on the input database's characteristics and (2) on the use of this methodology to assess changes on the fire regime due to different type of climate and fire management activities. Based on the very strong relationship between weather and the fire incidence in Portugal, the detected clusters will be interpreted in terms of the atmospheric conditions. Apart from being the country most affected by the fires in the European context, Portugal meets all the conditions required to carry out this study, namely: (i) two long and comprehensive official datasets, i.e. the Portuguese Rural Fire Database (PRFD) and the National Mapping Burnt Areas (NMBA), respectively based on ground and satellite measurements; (ii) the two types of climate (Csb in the north and Csa in the south) that characterizes the Mediterranean basin regions most affected by the fires also divide the mainland Portuguese area; and, (iii) the national plan for the defence of forest against fires was approved a decade ago and it is now reasonable to assess its impacts. Results confirmed (1) the influence of the dataset's characteristics on the detected clusters, (2) the existence of two different fire regimes in the country promoted by the different types of climate, (3) the positive impacts of the fire prevention policy decisions and (4) the ability of the STPSS to correctly identify clusters, regarding their number, location, and space-time size in spite of eventual space and/or time splits of the datasets. Finally, the role of the weather on days when clustered fires were active was confirmed for the classes of small, medium and large fires. Copyright © 2016 Elsevier B.V. All rights reserved.
Inhibition of Protein Aggregation: Supramolecular Assemblies of Arginine Hold the Key
Das, Utpal; Hariprasad, Gururao; Ethayathulla, Abdul S.; Manral, Pallavi; Das, Taposh K.; Pasha, Santosh; Mann, Anita; Ganguli, Munia; Verma, Amit K.; Bhat, Rajiv; Chandrayan, Sanjeev Kumar; Ahmed, Shubbir; Sharma, Sujata; Kaur, Punit; Singh, Tej P.; Srinivasan, Alagiri
2007-01-01
Background Aggregation of unfolded proteins occurs mainly through the exposed hydrophobic surfaces. Any mechanism of inhibition of this aggregation should explain the prevention of these hydrophobic interactions. Though arginine is prevalently used as an aggregation suppressor, its mechanism of action is not clearly understood. We propose a mechanism based on the hydrophobic interactions of arginine. Methodology We have analyzed arginine solution for its hydrotropic effect by pyrene solubility and the presence of hydrophobic environment by 1-anilino-8-naphthalene sulfonic acid fluorescence. Mass spectroscopic analyses show that arginine forms molecular clusters in the gas phase and the cluster composition is dependent on the solution conditions. Light scattering studies indicate that arginine exists as clusters in solution. In the presence of arginine, the reverse phase chromatographic elution profile of Alzheimer's amyloid beta 1-42 (Aβ1-42) peptide is modified. Changes in the hydrodynamic volume of Aβ1-42 in the presence of arginine measured by size exclusion chromatography show that arginine binds to Aβ1-42. Arginine increases the solubility of Aβ1-42 peptide in aqueous medium. It decreases the aggregation of Aβ1-42 as observed by atomic force microscopy. Conclusions Based on our experimental results we propose that molecular clusters of arginine in aqueous solutions display a hydrophobic surface by the alignment of its three methylene groups. The hydrophobic surfaces present on the proteins interact with the hydrophobic surface presented by the arginine clusters. The masking of hydrophobic surface inhibits protein-protein aggregation. This mechanism is also responsible for the hydrotropic effect of arginine on various compounds. It is also explained why other amino acids fail to inhibit the protein aggregation. PMID:18000547
Ajili, Yosra; Hammami, Kamel; Jaidane, Nejm Eddine; Lanza, Mathieu; Kalugina, Yulia N; Lique, François; Hochlaf, Majdi
2013-07-07
We closely compare the accuracy of multidimensional potential energy surfaces (PESs) generated by the recently developed explicitly correlated coupled cluster (CCSD(T)-F12) methods in connection with the cc-pVXZ-F12 (X = D, T) and aug-cc-pVTZ basis sets and those deduced using the well-established orbital-based coupled cluster techniques employing correlation consistent atomic basis sets (aug-cc-pVXZ, X = T, Q, 5) and extrapolated to the complete basis set (CBS) limit. This work is performed on the benchmark rare gas-hydrogen halide interaction (HCl-He) system. These PESs are then incorporated into quantum close-coupling scattering dynamical calculations in order to check the impact of the accuracy of the PES on the scattering calculations. For this system, we deduced inelastic collisional data including (de-)excitation collisional and pressure broadening cross sections. Our work shows that the CCSD(T)-F12/aug-cc-pVTZ PES describes correctly the repulsive wall, the van der Waals minimum and long range internuclear distances whereas cc-pVXZ-F12 (X = D,T) basis sets are not diffuse enough for that purposes. Interestingly, the collision cross sections deduced from the CCSD(T)-F12/aug-cc-pVTZ PES are in excellent agreement with those obtained with CCSD(T)/CBS methodology. The position of the resonances and the general shape of these cross sections almost coincide. Since the cost of the electronic structure computations is reduced by several orders of magnitude when using CCSD(T)-F12/aug-cc-pVTZ compared to CCSD(T)/CBS methodology, this approach can be recommended as an alternative for generation of PESs of molecular clusters and for the interpretation of accurate scattering experiments as well as for a wide production of collisional data to be included in astrophysical and atmospherical models.
a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis
NASA Astrophysics Data System (ADS)
Huang, W.; Li, S.; Xu, S.
2016-06-01
How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the approximate 55% spatiotemporal clusters distributed in different locations can be eventually grouped as the same type of clusters with consideration of semantic aspect.
Differentiation of aflatoxigenic and non-aflatoxigenic strains of Aspergilli by FT-IR spectroscopy.
Atkinson, Curtis; Pechanova, Olga; Sparks, Darrell L; Brown, Ashli; Rodriguez, Jose M
2014-01-01
Fourier transform infrared spectroscopy (FT-IR) is a well-established and widely accepted methodology to identify and differentiate diverse microbial species. In this study, FT-IR was used to differentiate 20 strains of ubiquitous and agronomically important phytopathogens of Aspergillus flavus and Aspergillus parasiticus. By analyzing their spectral profiles via principal component and cluster analysis, differentiation was achieved between the aflatoxin-producing and nonproducing strains of both fungal species. This study thus indicates that FT-IR coupled to multivariate statistics can rapidly differentiate strains of Aspergilli based on their toxigenicity.
Wilhelm, Jan; Walz, Michael; Stendel, Melanie; Bagrets, Alexei; Evers, Ferdinand
2013-05-14
We present a modification of the standard electron transport methodology based on the (non-equilibrium) Green's function formalism to efficiently simulate STM-images. The novel feature of this method is that it employs an effective embedding technique that allows us to extrapolate properties of metal substrates with adsorbed molecules from quantum-chemical cluster calculations. To illustrate the potential of this approach, we present an application to STM-images of C58-dimers immobilized on Au(111)-surfaces that is motivated by recent experiments.
2015-03-26
sequences Type Sequence (* = Phosphorothioate bases) Tm S/S/S (°C) U T*T*T* T*T*T TTT TTA CTC ACC TAT ATC A 16.5 U GTG AGT A U...27 T GTC GTG A T CAA AGT GT T CAA AGT GTG TCG TGA 27% 25% 48% F re qu en cy ( % ) 0 20 40 60 80 100 0-30° 31-60° 61-90° (b) (i) (ii
Techniques to derive geometries for image-based Eulerian computations
Dillard, Seth; Buchholz, James; Vigmostad, Sarah; Kim, Hyunggun; Udaykumar, H.S.
2014-01-01
Purpose The performance of three frequently used level set-based segmentation methods is examined for the purpose of defining features and boundary conditions for image-based Eulerian fluid and solid mechanics models. The focus of the evaluation is to identify an approach that produces the best geometric representation from a computational fluid/solid modeling point of view. In particular, extraction of geometries from a wide variety of imaging modalities and noise intensities, to supply to an immersed boundary approach, is targeted. Design/methodology/approach Two- and three-dimensional images, acquired from optical, X-ray CT, and ultrasound imaging modalities, are segmented with active contours, k-means, and adaptive clustering methods. Segmentation contours are converted to level sets and smoothed as necessary for use in fluid/solid simulations. Results produced by the three approaches are compared visually and with contrast ratio, signal-to-noise ratio, and contrast-to-noise ratio measures. Findings While the active contours method possesses built-in smoothing and regularization and produces continuous contours, the clustering methods (k-means and adaptive clustering) produce discrete (pixelated) contours that require smoothing using speckle-reducing anisotropic diffusion (SRAD). Thus, for images with high contrast and low to moderate noise, active contours are generally preferable. However, adaptive clustering is found to be far superior to the other two methods for images possessing high levels of noise and global intensity variations, due to its more sophisticated use of local pixel/voxel intensity statistics. Originality/value It is often difficult to know a priori which segmentation will perform best for a given image type, particularly when geometric modeling is the ultimate goal. This work offers insight to the algorithm selection process, as well as outlining a practical framework for generating useful geometric surfaces in an Eulerian setting. PMID:25750470
Transformation to equivalent dimensions—a new methodology to study earthquake clustering
NASA Astrophysics Data System (ADS)
Lasocki, Stanislaw
2014-05-01
A seismic event is represented by a point in a parameter space, quantified by the vector of parameter values. Studies of earthquake clustering involve considering distances between such points in multidimensional spaces. However, the metrics of earthquake parameters are different, hence the metric in a multidimensional parameter space cannot be readily defined. The present paper proposes a solution of this metric problem based on a concept of probabilistic equivalence of earthquake parameters. Under this concept the lengths of parameter intervals are equivalent if the probability for earthquakes to take values from either interval is the same. Earthquake clustering is studied in an equivalent rather than the original dimensions space, where the equivalent dimension (ED) of a parameter is its cumulative distribution function. All transformed parameters are of linear scale in [0, 1] interval and the distance between earthquakes represented by vectors in any ED space is Euclidean. The unknown, in general, cumulative distributions of earthquake parameters are estimated from earthquake catalogues by means of the model-free non-parametric kernel estimation method. Potential of the transformation to EDs is illustrated by two examples of use: to find hierarchically closest neighbours in time-space and to assess temporal variations of earthquake clustering in a specific 4-D phase space.
Kurczynska, Monika
2018-01-01
Mirror protein structures are often considered as artifacts in modeling protein structures. However, they may soon become a new branch of biochemistry. Moreover, methods of protein structure reconstruction, based on their residue-residue contact maps, need methodology to differentiate between models of native and mirror orientation, especially regarding the reconstructed backbones. We analyzed 130 500 structural protein models obtained from contact maps of 1 305 SCOP domains belonging to all 7 structural classes. On average, the same numbers of native and mirror models were obtained among 100 models generated for each domain. Since their structural features are often not sufficient for differentiating between the two types of model orientations, we proposed to apply various energy terms (ETs) from PyRosetta to separate native and mirror models. To automate the procedure for differentiating these models, the k-means clustering algorithm was applied. Using total energy did not allow to obtain appropriate clusters–the accuracy of the clustering for class A (all helices) was no more than 0.52. Therefore, we tested a series of different k-means clusterings based on various combinations of ETs. Finally, applying two most differentiating ETs for each class allowed to obtain satisfying results. To unify the method for differentiating between native and mirror models, independent of their structural class, the two best ETs for each class were considered. Finally, the k-means clustering algorithm used three common ETs: probability of amino acid assuming certain values of dihedral angles Φ and Ψ, Ramachandran preferences and Coulomb interactions. The accuracies of clustering with these ETs were in the range between 0.68 and 0.76, with sensitivity and selectivity in the range between 0.68 and 0.87, depending on the structural class. The method can be applied to all fully-automated tools for protein structure reconstruction based on contact maps, especially those analyzing big sets of models. PMID:29787567
Cluster-randomized Studies in Educational Research: Principles and Methodological Aspects
Dreyhaupt, Jens; Mayer, Benjamin; Keis, Oliver; Öchsner, Wolfgang; Muche, Rainer
2017-01-01
An increasing number of studies are being performed in educational research to evaluate new teaching methods and approaches. These studies could be performed more efficiently and deliver more convincing results if they more strictly applied and complied with recognized standards of scientific studies. Such an approach could substantially increase the quality in particular of prospective, two-arm (intervention) studies that aim to compare two different teaching methods. A key standard in such studies is randomization, which can minimize systematic bias in study findings; such bias may result if the two study arms are not structurally equivalent. If possible, educational research studies should also achieve this standard, although this is not yet generally the case. Some difficulties and concerns exist, particularly regarding organizational and methodological aspects. An important point to consider in educational research studies is that usually individuals cannot be randomized, because of the teaching situation, and instead whole groups have to be randomized (so-called “cluster randomization”). Compared with studies with individual randomization, studies with cluster randomization normally require (significantly) larger sample sizes and more complex methods for calculating sample size. Furthermore, cluster-randomized studies require more complex methods for statistical analysis. The consequence of the above is that a competent expert with respective special knowledge needs to be involved in all phases of cluster-randomized studies. Studies to evaluate new teaching methods need to make greater use of randomization in order to achieve scientifically convincing results. Therefore, in this article we describe the general principles of cluster randomization and how to implement these principles, and we also outline practical aspects of using cluster randomization in prospective, two-arm comparative educational research studies. PMID:28584874
Cluster-randomized Studies in Educational Research: Principles and Methodological Aspects.
Dreyhaupt, Jens; Mayer, Benjamin; Keis, Oliver; Öchsner, Wolfgang; Muche, Rainer
2017-01-01
An increasing number of studies are being performed in educational research to evaluate new teaching methods and approaches. These studies could be performed more efficiently and deliver more convincing results if they more strictly applied and complied with recognized standards of scientific studies. Such an approach could substantially increase the quality in particular of prospective, two-arm (intervention) studies that aim to compare two different teaching methods. A key standard in such studies is randomization, which can minimize systematic bias in study findings; such bias may result if the two study arms are not structurally equivalent. If possible, educational research studies should also achieve this standard, although this is not yet generally the case. Some difficulties and concerns exist, particularly regarding organizational and methodological aspects. An important point to consider in educational research studies is that usually individuals cannot be randomized, because of the teaching situation, and instead whole groups have to be randomized (so-called "cluster randomization"). Compared with studies with individual randomization, studies with cluster randomization normally require (significantly) larger sample sizes and more complex methods for calculating sample size. Furthermore, cluster-randomized studies require more complex methods for statistical analysis. The consequence of the above is that a competent expert with respective special knowledge needs to be involved in all phases of cluster-randomized studies. Studies to evaluate new teaching methods need to make greater use of randomization in order to achieve scientifically convincing results. Therefore, in this article we describe the general principles of cluster randomization and how to implement these principles, and we also outline practical aspects of using cluster randomization in prospective, two-arm comparative educational research studies.
Aldrete-Tapia, J A; Miranda-Castilleja, D E; Arvizu-Medrano, S M; Hernández-Iturriaga, M
2018-02-01
The high concentration of fructose in agave juice has been associated with reduced ethanol tolerance of commercial yeasts used for tequila production and low fermentation yields. The selection of autochthonous strains, which are better adapted to agave juice, could improve the process. In this study, a 2-step selection process of yeasts isolated from spontaneous fermentations for tequila production was carried out based on analysis of the growth dynamics in combined conditions of high fructose and ethanol. First, yeast isolates (605) were screened to identify strains tolerant to high fructose (20%) and to ethanol (10%), yielding 89 isolates able to grow in both conditions. From the 89 isolates, the growth curves under 8 treatments of combined fructose (from 20% to 5%) and ethanol (from 0% to 10%) were obtained, and the kinetic parameters were analyzed with principal component analysis and k-means clustering. The resulting yeast strain groups corresponded to the fast, medium and slow growers. A second clustering of only the fast growers led to the selection of 3 Saccharomyces strains (199, 230, 231) that were able to grow rapidly in 4 out of the 8 conditions evaluated. This methodology differentiated strains phenotypically and could be further used for strain selection in other processes. A method to select yeast strains for fermentation taking into account the natural differences of yeast isolates. This methodology is based on the cell exposition to combinations of sugar and ethanol, which are the most important stress factors in fermentation. This strategy will help to identify the most tolerant strain that could improve ethanol yield and reduce fermentation time. © 2018 Institute of Food Technologists®.
ERIC Educational Resources Information Center
Revelle, Andy; Messner, Kevin; Shrimplin, Aaron; Hurst, Susan
2012-01-01
Q-methodology was used to identify clusters of opinions about e-books at Miami University. The research identified four distinct opinion types among those investigated: Book Lovers, Technophiles, Pragmatists, and Printers. The initial Q-methodology study results were then used as a basis for a large-n survey of undergraduates, graduate students,…
ERIC Educational Resources Information Center
White, Marilyn Domas; Abels, Eileen G.; Gordon-Murnane, Laura
1998-01-01
Reports on methodological developments in a project to assess the adoption of the Web by publishers of business information for electronic commerce. Describes the approach used on a sample of 20 business publishers to identify five clusters of publishers ranging from traditionalist to innovator. Distinguishes between adopters and nonadopters of…
Cluster randomised trials in the medical literature: two bibliometric surveys
Bland, J Martin
2004-01-01
Background Several reviews of published cluster randomised trials have reported that about half did not take clustering into account in the analysis, which was thus incorrect and potentially misleading. In this paper I ask whether cluster randomised trials are increasing in both number and quality of reporting. Methods Computer search for papers on cluster randomised trials since 1980, hand search of trial reports published in selected volumes of the British Medical Journal over 20 years. Results There has been a large increase in the numbers of methodological papers and of trial reports using the term 'cluster random' in recent years, with about equal numbers of each type of paper. The British Medical Journal contained more such reports than any other journal. In this journal there was a corresponding increase over time in the number of trials where subjects were randomised in clusters. In 2003 all reports showed awareness of the need to allow for clustering in the analysis. In 1993 and before clustering was ignored in most such trials. Conclusion Cluster trials are becoming more frequent and reporting is of higher quality. Perhaps statistician pressure works. PMID:15310402
[Applying the clustering technique for characterising maintenance outsourcing].
Cruz, Antonio M; Usaquén-Perilla, Sandra P; Vanegas-Pabón, Nidia N; Lopera, Carolina
2010-06-01
Using clustering techniques for characterising companies providing health institutions with maintenance services. The study analysed seven pilot areas' equipment inventory (264 medical devices). Clustering techniques were applied using 26 variables. Response time (RT), operation duration (OD), availability and turnaround time (TAT) were amongst the most significant ones. Average biomedical equipment obsolescence value was 0.78. Four service provider clusters were identified: clusters 1 and 3 had better performance, lower TAT, RT and DR values (56 % of the providers coded O, L, C, B, I, S, H, F and G, had 1 to 4 day TAT values:
Acosta, Joie D.; Chinman, Matthew; Ebener, Patricia; Phillips, Andrea; Xenakis, Lea; Malone, Patrick S.
2017-01-01
Restorative Practices in schools lack rigorous evaluation studies. As an example of rigorous school-based research, this paper describes the first randomized control trial of restorative practices to date, the Study of Restorative Practices. It is a 5-year, cluster-randomized controlled trial (RCT) of the Restorative Practices Intervention (RPI) in 14 middle schools in Maine to assess whether RPI impacts both positive developmental outcomes and problem behaviors and whether the effects persist during the transition from middle to high school. The two-year RPI intervention began in the 2014–2015 school year. The study’s rationale and theoretical concerns are discussed along with methodological concerns including teacher professional development. The theoretical rationale and description of the methods from this study may be useful to others conducting rigorous research and evaluation in this area. PMID:28936104
Prediction, Detection, and Validation of Isotope Clusters in Mass Spectrometry Data
Treutler, Hendrik; Neumann, Steffen
2016-01-01
Mass spectrometry is a key analytical platform for metabolomics. The precise quantification and identification of small molecules is a prerequisite for elucidating the metabolism and the detection, validation, and evaluation of isotope clusters in LC-MS data is important for this task. Here, we present an approach for the improved detection of isotope clusters using chemical prior knowledge and the validation of detected isotope clusters depending on the substance mass using database statistics. We find remarkable improvements regarding the number of detected isotope clusters and are able to predict the correct molecular formula in the top three ranks in 92% of the cases. We make our methodology freely available as part of the Bioconductor packages xcms version 1.50.0 and CAMERA version 1.30.0. PMID:27775610
The ellipticity of galaxy cluster haloes from satellite galaxies and weak lensing
Shin, Tae-hyeon; Clampitt, Joseph; Jain, Bhuvnesh; ...
2018-01-04
Here, we study the ellipticity of galaxy cluster haloes as characterized by the distribution of cluster galaxies and as measured with weak lensing. We use Monte Carlo simulations of elliptical cluster density profiles to estimate and correct for Poisson noise bias, edge bias and projection effects. We apply our methodology to 10 428 Sloan Digital Sky Survey clusters identified by the redMaPPer algorithm with richness above 20. We find a mean ellipticity =0.271 ± 0.002 (stat) ±0.031 (sys) corresponding to an axis ratio = 0.573 ± 0.002 (stat) ±0.039 (sys). We compare this ellipticity of the satellites to the halomore » shape, through a stacked lensing measurement using optimal estimators of the lensing quadrupole based on Clampitt and Jain (2016). We find a best-fitting axis ratio of 0.56 ± 0.09 (stat) ±0.03 (sys), consistent with the ellipticity of the satellite distribution. Thus, cluster galaxies trace the shape of the dark matter halo to within our estimated uncertainties. Finally, we restack the satellite and lensing ellipticity measurements along the major axis of the cluster central galaxy's light distribution. From the lensing measurements, we infer a misalignment angle with an root-mean-square of 30° ± 10° when stacking on the central galaxy. We discuss applications of halo shape measurements to test the effects of the baryonic gas and active galactic nucleus feedback, as well as dark matter and gravity. The major improvements in signal-to-noise ratio expected with the ongoing Dark Energy Survey and future surveys from Large Synoptic Survey Telescope, Euclid, and Wide Field Infrared Survey Telescope will make halo shapes a useful probe of these effects.« less
The ellipticity of galaxy cluster haloes from satellite galaxies and weak lensing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shin, Tae-hyeon; Clampitt, Joseph; Jain, Bhuvnesh
Here, we study the ellipticity of galaxy cluster haloes as characterized by the distribution of cluster galaxies and as measured with weak lensing. We use Monte Carlo simulations of elliptical cluster density profiles to estimate and correct for Poisson noise bias, edge bias and projection effects. We apply our methodology to 10 428 Sloan Digital Sky Survey clusters identified by the redMaPPer algorithm with richness above 20. We find a mean ellipticity =0.271 ± 0.002 (stat) ±0.031 (sys) corresponding to an axis ratio = 0.573 ± 0.002 (stat) ±0.039 (sys). We compare this ellipticity of the satellites to the halomore » shape, through a stacked lensing measurement using optimal estimators of the lensing quadrupole based on Clampitt and Jain (2016). We find a best-fitting axis ratio of 0.56 ± 0.09 (stat) ±0.03 (sys), consistent with the ellipticity of the satellite distribution. Thus, cluster galaxies trace the shape of the dark matter halo to within our estimated uncertainties. Finally, we restack the satellite and lensing ellipticity measurements along the major axis of the cluster central galaxy's light distribution. From the lensing measurements, we infer a misalignment angle with an root-mean-square of 30° ± 10° when stacking on the central galaxy. We discuss applications of halo shape measurements to test the effects of the baryonic gas and active galactic nucleus feedback, as well as dark matter and gravity. The major improvements in signal-to-noise ratio expected with the ongoing Dark Energy Survey and future surveys from Large Synoptic Survey Telescope, Euclid, and Wide Field Infrared Survey Telescope will make halo shapes a useful probe of these effects.« less
The ellipticity of galaxy cluster haloes from satellite galaxies and weak lensing
NASA Astrophysics Data System (ADS)
Shin, Tae-hyeon; Clampitt, Joseph; Jain, Bhuvnesh; Bernstein, Gary; Neil, Andrew; Rozo, Eduardo; Rykoff, Eli
2018-04-01
We study the ellipticity of galaxy cluster haloes as characterized by the distribution of cluster galaxies and as measured with weak lensing. We use Monte Carlo simulations of elliptical cluster density profiles to estimate and correct for Poisson noise bias, edge bias and projection effects. We apply our methodology to 10 428 Sloan Digital Sky Survey clusters identified by the redMaPPer algorithm with richness above 20. We find a mean ellipticity =0.271 ± 0.002 (stat) ±0.031 (sys) corresponding to an axis ratio = 0.573 ± 0.002 (stat) ±0.039 (sys). We compare this ellipticity of the satellites to the halo shape, through a stacked lensing measurement using optimal estimators of the lensing quadrupole based on Clampitt and Jain (2016). We find a best-fitting axis ratio of 0.56 ± 0.09 (stat) ±0.03 (sys), consistent with the ellipticity of the satellite distribution. Thus, cluster galaxies trace the shape of the dark matter halo to within our estimated uncertainties. Finally, we restack the satellite and lensing ellipticity measurements along the major axis of the cluster central galaxy's light distribution. From the lensing measurements, we infer a misalignment angle with an root-mean-square of 30° ± 10° when stacking on the central galaxy. We discuss applications of halo shape measurements to test the effects of the baryonic gas and active galactic nucleus feedback, as well as dark matter and gravity. The major improvements in signal-to-noise ratio expected with the ongoing Dark Energy Survey and future surveys from Large Synoptic Survey Telescope, Euclid, and Wide Field Infrared Survey Telescope will make halo shapes a useful probe of these effects.
Algorithm for cellular reprogramming.
Ronquist, Scott; Patterson, Geoff; Muir, Lindsey A; Lindsly, Stephen; Chen, Haiming; Brown, Markus; Wicha, Max S; Bloch, Anthony; Brockett, Roger; Rajapakse, Indika
2017-11-07
The day we understand the time evolution of subcellular events at a level of detail comparable to physical systems governed by Newton's laws of motion seems far away. Even so, quantitative approaches to cellular dynamics add to our understanding of cell biology. With data-guided frameworks we can develop better predictions about, and methods for, control over specific biological processes and system-wide cell behavior. Here we describe an approach for optimizing the use of transcription factors (TFs) in cellular reprogramming, based on a device commonly used in optimal control. We construct an approximate model for the natural evolution of a cell-cycle-synchronized population of human fibroblasts, based on data obtained by sampling the expression of 22,083 genes at several time points during the cell cycle. To arrive at a model of moderate complexity, we cluster gene expression based on division of the genome into topologically associating domains (TADs) and then model the dynamics of TAD expression levels. Based on this dynamical model and additional data, such as known TF binding sites and activity, we develop a methodology for identifying the top TF candidates for a specific cellular reprogramming task. Our data-guided methodology identifies a number of TFs previously validated for reprogramming and/or natural differentiation and predicts some potentially useful combinations of TFs. Our findings highlight the immense potential of dynamical models, mathematics, and data-guided methodologies for improving strategies for control over biological processes. Copyright © 2017 the Author(s). Published by PNAS.
Cluster categorization of urban roads to optimize their noise monitoring.
Zambon, G; Benocci, R; Brambilla, G
2016-01-01
Road traffic in urban areas is recognized to be associated with urban mobility and public health, and it is often the main source of noise pollution. Lately, noise maps have been considered a powerful tool to estimate the population exposure to environmental noise, but they need to be validated by measured noise data. The project Dynamic Acoustic Mapping (DYNAMAP), co-funded in the framework of the LIFE 2013 program, is aimed to develop a statistically based method to optimize the choice and the number of monitoring sites and to automate the noise mapping update using the data retrieved from a low-cost monitoring network. Indeed, the first objective should improve the spatial sampling based on the legislative road classification, as this classification is mainly based on the geometrical characteristics of the road, rather than its noise emission. The present paper describes the statistical approach of the methodology under development and the results of its preliminary application to a limited sample of roads in the city of Milan. The resulting categorization of roads, based on clustering the 24-h hourly L Aeqh, looks promising to optimize the spatial sampling of noise monitoring toward a description of the noise pollution due to complex urban road networks more efficient than that based on the legislative road classification.
Applications of modern statistical methods to analysis of data in physical science
NASA Astrophysics Data System (ADS)
Wicker, James Eric
Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.
Marques, J M C; Llanio-Trujillo, J L; Albertí, M; Aguilar, A; Pirani, F
2013-08-22
We employ a recently developed methodology to study structural and energetic properties of the first solvation shells of the potassium ion in nonpolar environments due to aromatic rings, which is important to understand the selectivity of several biochemical phenomena. Our evolutionary algorithm is used in the global optimization study of clusters formed of K(+) solvated with hexafluorobenzene (HFBz) molecules. The global intermolecular interaction for these clusters has been decomposed in HFBz-HFBz and in K(+)-HFBz contributions, using a potential model based on different decompositions of the molecular polarizability of hexafluorobenzene. Putative global minimum structures of microsolvation clusters up to 21 hexafluorobenzene molecules were obtained and compared with the analogous K(+)-benzene clusters reported in our previous work (J. Phys. Chem. A 2012, 116, 4947-4956). We have found that both K(+)-(Bz)n and K(+)-(HFBz)n clusters show a strong magic number around the closure of the first solvation shell. Nonetheless, all K(+)-benzene clusters have essentially the same first solvation shell geometry with four solvent molecules around the ion, whereas the corresponding one for K(+)-(HFBz)n is completed with nine HFBz species, and its structural motif varies as n increases. This is attributed to the ion-solvent interaction that has a larger magnitude for K(+)-Bz than in the case of K(+)-HFBz. In addition, the ability of having more HFBz than Bz molecules around K(+) in the first solvation shell is intimately related to the inversion in the sign of the quadrupole moment of the two solvent species, which leads to a distinct ion-solvent geometry of approach.
Tumour-associated and non-tumour-associated microbiota in colorectal cancer
Flemer, Burkhardt; Lynch, Denise B; Brown, Jillian M R; Jeffery, Ian B; Ryan, Feargal J; Claesson, Marcus J; O'Riordain, Micheal; Shanahan, Fergus; O'Toole, Paul W
2017-01-01
Objective A signature that unifies the colorectal cancer (CRC) microbiota across multiple studies has not been identified. In addition to methodological variance, heterogeneity may be caused by both microbial and host response differences, which was addressed in this study. Design We prospectively studied the colonic microbiota and the expression of specific host response genes using faecal and mucosal samples (‘ON’ and ‘OFF’ the tumour, proximal and distal) from 59 patients undergoing surgery for CRC, 21 individuals with polyps and 56 healthy controls. Microbiota composition was determined by 16S rRNA amplicon sequencing; expression of host genes involved in CRC progression and immune response was quantified by real-time quantitative PCR. Results The microbiota of patients with CRC differed from that of controls, but alterations were not restricted to the cancerous tissue. Differences between distal and proximal cancers were detected and faecal microbiota only partially reflected mucosal microbiota in CRC. Patients with CRC can be stratified based on higher level structures of mucosal-associated bacterial co-abundance groups (CAGs) that resemble the previously formulated concept of enterotypes. Of these, Bacteroidetes Cluster 1 and Firmicutes Cluster 1 were in decreased abundance in CRC mucosa, whereas Bacteroidetes Cluster 2, Firmicutes Cluster 2, Pathogen Cluster and Prevotella Cluster showed increased abundance in CRC mucosa. CRC-associated CAGs were differentially correlated with the expression of host immunoinflammatory response genes. Conclusions CRC-associated microbiota profiles differ from those in healthy subjects and are linked with distinct mucosal gene-expression profiles. Compositional alterations in the microbiota are not restricted to cancerous tissue and differ between distal and proximal cancers. PMID:26992426
Hybrid analysis for indicating patients with breast cancer using temperature time series.
Silva, Lincoln F; Santos, Alair Augusto S M D; Bravo, Renato S; Silva, Aristófanes C; Muchaluat-Saade, Débora C; Conci, Aura
2016-07-01
Breast cancer is the most common cancer among women worldwide. Diagnosis and treatment in early stages increase cure chances. The temperature of cancerous tissue is generally higher than that of healthy surrounding tissues, making thermography an option to be considered in screening strategies of this cancer type. This paper proposes a hybrid methodology for analyzing dynamic infrared thermography in order to indicate patients with risk of breast cancer, using unsupervised and supervised machine learning techniques, which characterizes the methodology as hybrid. The dynamic infrared thermography monitors or quantitatively measures temperature changes on the examined surface, after a thermal stress. In the dynamic infrared thermography execution, a sequence of breast thermograms is generated. In the proposed methodology, this sequence is processed and analyzed by several techniques. First, the region of the breasts is segmented and the thermograms of the sequence are registered. Then, temperature time series are built and the k-means algorithm is applied on these series using various values of k. Clustering formed by k-means algorithm, for each k value, is evaluated using clustering validation indices, generating values treated as features in the classification model construction step. A data mining tool was used to solve the combined algorithm selection and hyperparameter optimization (CASH) problem in classification tasks. Besides the classification algorithm recommended by the data mining tool, classifiers based on Bayesian networks, neural networks, decision rules and decision tree were executed on the data set used for evaluation. Test results support that the proposed analysis methodology is able to indicate patients with breast cancer. Among 39 tested classification algorithms, K-Star and Bayes Net presented 100% classification accuracy. Furthermore, among the Bayes Net, multi-layer perceptron, decision table and random forest classification algorithms, an average accuracy of 95.38% was obtained. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
An approach to accidents modeling based on compounds road environments.
Fernandes, Ana; Neves, Jose
2013-04-01
The most common approach to study the influence of certain road features on accidents has been the consideration of uniform road segments characterized by a unique feature. However, when an accident is related to the road infrastructure, its cause is usually not a single characteristic but rather a complex combination of several characteristics. The main objective of this paper is to describe a methodology developed in order to consider the road as a complete environment by using compound road environments, overcoming the limitations inherented in considering only uniform road segments. The methodology consists of: dividing a sample of roads into segments; grouping them into quite homogeneous road environments using cluster analysis; and identifying the influence of skid resistance and texture depth on road accidents in each environment by using generalized linear models. The application of this methodology is demonstrated for eight roads. Based on real data from accidents and road characteristics, three compound road environments were established where the pavement surface properties significantly influence the occurrence of accidents. Results have showed clearly that road environments where braking maneuvers are more common or those with small radii of curvature and high speeds require higher skid resistance and texture depth as an important contribution to the accident prevention. Copyright © 2013 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Adamo, A.; Ryon, J. E.; Messa, M.; Kim, H.; Grasha, K.; Cook, D. O.; Calzetti, D.; Lee, J. C.; Whitmore, B. C.; Elmegreen, B. G.; Ubeda, L.; Smith, L. J.; Bright, S. N.; Runnholm, A.; Andrews, J. E.; Fumagalli, M.; Gouliermis, D. A.; Kahre, L.; Nair, P.; Thilker, D.; Walterbos, R.; Wofford, A.; Aloisi, A.; Ashworth, G.; Brown, T. M.; Chandar, R.; Christian, C.; Cignoni, M.; Clayton, G. C.; Dale, D. A.; de Mink, S. E.; Dobbs, C.; Elmegreen, D. M.; Evans, A. S.; Gallagher, J. S., III; Grebel, E. K.; Herrero, A.; Hunter, D. A.; Johnson, K. E.; Kennicutt, R. C.; Krumholz, M. R.; Lennon, D.; Levay, K.; Martin, C.; Nota, A.; Östlin, G.; Pellerin, A.; Prieto, J.; Regan, M. W.; Sabbi, E.; Sacchi, E.; Schaerer, D.; Schiminovich, D.; Shabani, F.; Tosi, M.; Van Dyk, S. D.; Zackrisson, E.
2017-06-01
We report the large effort that is producing comprehensive high-level young star cluster (YSC) catalogs for a significant fraction of galaxies observed with the Legacy ExtraGalactic UV Survey (LEGUS) Hubble treasury program. We present the methodology developed to extract cluster positions, verify their genuine nature, produce multiband photometry (from NUV to NIR), and derive their physical properties via spectral energy distribution fitting analyses. We use the nearby spiral galaxy NGC 628 as a test case for demonstrating the impact that LEGUS will have on our understanding of the formation and evolution of YSCs and compact stellar associations within their host galaxy. Our analysis of the cluster luminosity function from the UV to the NIR finds a steepening at the bright end and at all wavelengths suggesting a dearth of luminous clusters. The cluster mass function of NGC 628 is consistent with a power-law distribution of slopes ˜ -2 and a truncation of a few times 105 {M}⊙ . After their formation, YSCs and compact associations follow different evolutionary paths. YSCs survive for a longer time frame, confirming their being potentially bound systems. Associations disappear on timescales comparable to hierarchically organized star-forming regions, suggesting that they are expanding systems. We find mass-independent cluster disruption in the inner region of NGC 628, while in the outer part of the galaxy there is little or no disruption. We observe faster disruption rates for low mass (≤104 {M}⊙ ) clusters, suggesting that a mass-dependent component is necessary to fully describe the YSC disruption process in NGC 628. Based on observations obtained with the NASA/ESA Hubble Space Telescope, at the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5-26555.
Crack, Jason C.; Thomson, Andrew J.
2017-01-01
The iron-sulfur cluster containing protein Fumarate and Nitrate Reduction (FNR) is the master regulator for the switch between anaerobic and aerobic respiration in Escherichia coli and many other bacteria. The [4Fe-4S] cluster functions as the sensory module, undergoing reaction with O2 that leads to conversion to a [2Fe-2S] form with loss of high-affinity DNA binding. Here, we report studies of the FNR cluster conversion reaction using time-resolved electrospray ionization mass spectrometry. The data provide insight into the reaction, permitting the detection of cluster conversion intermediates and products, including a [3Fe-3S] cluster and persulfide-coordinated [2Fe-2S] clusters [[2Fe-2S](S)n, where n = 1 or 2]. Analysis of kinetic data revealed a branched mechanism in which cluster sulfide oxidation occurs in parallel with cluster conversion and not as a subsequent, secondary reaction to generate [2Fe-2S](S)n species. This methodology shows great potential for broad application to studies of protein cofactor–small molecule interactions. PMID:28373574
NASA Astrophysics Data System (ADS)
Harper, Bryan; Thomas, Dennis; Chikkagoudar, Satish; Baker, Nathan; Tang, Kaizhi; Heredia-Langner, Alejandro; Lins, Roberto; Harper, Stacey
2015-06-01
The integration of rapid assays, large datasets, informatics, and modeling can overcome current barriers in understanding nanomaterial structure-toxicity relationships by providing a weight-of-the-evidence mechanism to generate hazard rankings for nanomaterials. Here, we present the use of a rapid, low-cost assay to perform screening-level toxicity evaluations of nanomaterials in vivo. Calculated EZ Metric scores, a combined measure of morbidity and mortality in developing embryonic zebrafish, were established at realistic exposure levels and used to develop a hazard ranking of diverse nanomaterial toxicity. Hazard ranking and clustering analysis of 68 diverse nanomaterials revealed distinct patterns of toxicity related to both the core composition and outermost surface chemistry of nanomaterials. The resulting clusters guided the development of a surface chemistry-based model of gold nanoparticle toxicity. Our findings suggest that risk assessments based on the size and core composition of nanomaterials alone may be wholly inappropriate, especially when considering complex engineered nanomaterials. Research should continue to focus on methodologies for determining nanomaterial hazard based on multiple sub-lethal responses following realistic, low-dose exposures, thus increasing the availability of quantitative measures of nanomaterial hazard to support the development of nanoparticle structure-activity relationships.
Ximenes, Ricardo Arraes de Alencar; Pereira, Leila Maria Beltrão; Martelli, Celina Maria Turchi; Merchán-Hamann, Edgar; Stein, Airton Tetelbom; Figueiredo, Gerusa Maria; Braga, Maria Cynthia; Montarroyos, Ulisses Ramos; Brasil, Leila Melo; Turchi, Marília Dalva; Fonseca, José Carlos Ferraz da; Lima, Maria Luiza Carvalho de; Alencar, Luis Cláudio Arraes de; Costa, Marcelo; Coral, Gabriela; Moreira, Regina Celia; Cardoso, Maria Regina Alves
2010-09-01
A population-based survey to provide information on the prevalence of hepatitis viral infection and the pattern of risk factors was carried out in the urban population of all Brazilian state capitals and the Federal District, between 2005 and 2009. This paper describes the design and methodology of the study which involved a population aged 5 to 19 for hepatitis A and 10 to 69 for hepatitis B and C. Interviews and blood samples were obtained through household visits. The sample was selected using stratified multi-stage cluster sampling and was drawn with equal probability from each domain of study (region and age-group). Nationwide, 19,280 households and ~31,000 residents were selected. The study is large enough to detect prevalence of viral infection around 0.1% and risk factor assessments within each region. The methodology seems to be a viable way of differentiating between distinct epidemiological patterns of hepatitis A, B and C. These data will be of value for the evaluation of vaccination policies and for the design of control program strategies.
NASA Technical Reports Server (NTRS)
Souza, V. M.; Vieira, L. E. A.; Medeiros, C.; Da Silva, L. A.; Alves, L. R.; Koga, D.; Sibeck, D. G.; Walsh, B. M.; Kanekal, S. G.; Jauer, P. R.;
2016-01-01
Analysis of particle pitch angle distributions (PADs) has been used as a means to comprehend a multitude of different physical mechanisms that lead to flux variations in the Van Allen belts and also to particle precipitation into the upper atmosphere. In this work we developed a neural network-based data clustering methodology that automatically identifies distinct PAD types in an unsupervised way using particle flux data. One can promptly identify and locate three well-known PAD types in both time and radial distance, namely, 90deg peaked, butterfly, and flattop distributions. In order to illustrate the applicability of our methodology, we used relativistic electron flux data from the whole month of November 2014, acquired from the Relativistic Electron-Proton Telescope instrument on board the Van Allen Probes, but it is emphasized that our approach can also be used with multiplatform spacecraft data. Our PAD classification results are in reasonably good agreement with those obtained by standard statistical fitting algorithms. The proposed methodology has a potential use for Van Allen belt's monitoring.
NASA Astrophysics Data System (ADS)
Fezzani, Ridha; Berger, Laurent
2018-06-01
An automated signal-based method was developed in order to analyse the seafloor backscatter data logged by calibrated multibeam echosounder. The processing consists first in the clustering of each survey sub-area into a small number of homogeneous sediment types, based on the backscatter average level at one or several incidence angles. Second, it uses their local average angular response to extract discriminant descriptors, obtained by fitting the field data to the Generic Seafloor Acoustic Backscatter parametric model. Third, the descriptors are used for seafloor type classification. The method was tested on the multi-year data recorded by a calibrated 90-kHz Simrad ME70 multibeam sonar operated in the Bay of Biscay, France and Celtic Sea, Ireland. It was applied for seafloor-type classification into 12 classes, to a dataset of 158 spots surveyed for demersal and benthic fauna study and monitoring. Qualitative analyses and classified clusters using extracted parameters show a good discriminatory potential, indicating the robustness of this approach.
A Network Analysis of Countries’ Export Flows: Firm Grounds for the Building Blocks of the Economy
Caldarelli, Guido; Cristelli, Matthieu; Gabrielli, Andrea; Pietronero, Luciano; Scala, Antonio; Tacchella, Andrea
2012-01-01
In this paper we analyze the bipartite network of countries and products from UN data on country production. We define the country-country and product-product projected networks and introduce a novel method of filtering information based on elements’ similarity. As a result we find that country clustering reveals unexpected socio-geographic links among the most competing countries. On the same footings the products clustering can be efficiently used for a bottom-up classification of produced goods. Furthermore we mathematically reformulate the “reflections method” introduced by Hidalgo and Hausmann as a fixpoint problem; such formulation highlights some conceptual weaknesses of the approach. To overcome such an issue, we introduce an alternative methodology (based on biased Markov chains) that allows to rank countries in a conceptually consistent way. Our analysis uncovers a strong non-linear interaction between the diversification of a country and the ubiquity of its products, thus suggesting the possible need of moving towards more efficient and direct non-linear fixpoint algorithms to rank countries and products in the global market. PMID:23094044
Discovery of gigantic molecular nanostructures using a flow reaction array as a search engine
Zang, Hong-Ying; de la Oliva, Andreu Ruiz; Miras, Haralampos N.; Long, De-Liang; McBurney, Roy T.; Cronin, Leroy
2014-01-01
The discovery of gigantic molecular nanostructures like coordination and polyoxometalate clusters is extremely time-consuming since a vast combinatorial space needs to be searched, and even a systematic and exhaustive exploration of the available synthetic parameters relies on a great deal of serendipity. Here we present a synthetic methodology that combines a flow reaction array and algorithmic control to give a chemical ‘real-space’ search engine leading to the discovery and isolation of a range of new molecular nanoclusters based on [Mo2O2S2]2+-based building blocks with either fourfold (C4) or fivefold (C5) symmetry templates and linkers. This engine leads us to isolate six new nanoscale cluster compounds: 1, {Mo10(C5)}; 2, {Mo14(C4)4(C5)2}; 3, {Mo60(C4)10}; 4, {Mo48(C4)6}; 5, {Mo34(C4)4}; 6, {Mo18(C4)9}; in only 200 automated experiments from a parameter space spanning ~5 million possible combinations. PMID:24770632
Area variations in multiple morbidity using a life table methodology.
Congdon, Peter
Analysis of healthy life expectancy is typically based on a binary distinction between health and ill-health. By contrast, this paper considers spatial modelling of disease free life expectancy taking account of the number of chronic conditions. Thus the analysis is based on population sub-groups with no disease, those with one disease only, and those with two or more diseases (multiple morbidity). Data on health status is accordingly modelled using a multinomial likelihood. The analysis uses data for 258 small areas in north London, and shows wide differences in the disease burden related to multiple morbidity. Strong associations between area socioeconomic deprivation and multiple morbidity are demonstrated, as well as strong spatial clustering.
Kubas, Adam; Noak, Johannes; Trunschke, Annette; Schlögl, Robert; Neese, Frank; Maganas, Dimitrios
2017-09-01
Absorption and multiwavelength resonance Raman spectroscopy are widely used to investigate the electronic structure of transition metal centers in coordination compounds and extended solid systems. In combination with computational methodologies that have predictive accuracy, they define powerful protocols to study the spectroscopic response of catalytic materials. In this work, we study the absorption and resonance Raman spectra of the M1 MoVO x catalyst. The spectra were calculated by time-dependent density functional theory (TD-DFT) in conjunction with the independent mode displaced harmonic oscillator model (IMDHO), which allows for detailed bandshape predictions. For this purpose cluster models with up to 9 Mo and V metallic centers are considered to represent the bulk structure of MoVO x . Capping hydrogens were used to achieve valence saturation at the edges of the cluster models. The construction of model structures was based on a thorough bonding analysis which involved conventional DFT and local coupled cluster (DLPNO-CCSD(T)) methods. Furthermore the relationship of cluster topology to the computed spectral features is discussed in detail. It is shown that due to the local nature of the involved electronic transitions, band assignment protocols developed for molecular systems can be applied to describe the calculated spectral features of the cluster models as well. The present study serves as a reference for future applications of combined experimental and computational protocols in the field of solid-state heterogeneous catalysis.
NASA Astrophysics Data System (ADS)
Vulcani, Benedetta; Vulcani
We present the first study of the spatial distribution of star formation in z ~ 0.5 cluster galaxies. The analysis is based on data taken with the Wide Field Camera 3 as part of the Grism Lens-Amplified Survey from Space (GLASS). We illustrate the methodology by focusing on two clusters (MACS0717.5+3745 and MACS1423.8+2404) with different morphologies (one relaxed and one merging) and use foreground and background galaxies as field control sample. The cluster+field sample consists of 42 galaxies with stellar masses in the range 108-1011 M ⊙, and star formation rates in the range 1-20 M⊙ yr -1. In both environments, Hα is more extended than the rest-frame UV continuum in 60% of the cases, consistent with diffuse star formation and inside out growth. The Hα emission appears more extended in cluster galaxies than in the field, pointing perhaps to ionized gas being stripped and/or star formation being enhanced at large radii. The peak of the Hα emission and that of the continuum are offset by less than 1 kpc. We investigate trends with the hot gas density as traced by the X-ray emission, and with the surface mass density as inferred from gravitational lens models and find no conclusive results. The diversity of morphologies and sizes observed in Hα illustrates the complexity of the environmental process that regulate star formation.
Review of Recent Methodological Developments in Group-Randomized Trials: Part 2-Analysis.
Turner, Elizabeth L; Prague, Melanie; Gallis, John A; Li, Fan; Murray, David M
2017-07-01
In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have updated that review with developments in analysis of the past 13 years, with a companion article to focus on developments in design. We discuss developments in the topics of the earlier review (e.g., methods for parallel-arm GRTs, individually randomized group-treatment trials, and missing data) and in new topics, including methods to account for multiple-level clustering and alternative estimation methods (e.g., augmented generalized estimating equations, targeted maximum likelihood, and quadratic inference functions). In addition, we describe developments in analysis of alternative group designs (including stepped-wedge GRTs, network-randomized trials, and pseudocluster randomized trials), which require clustering to be accounted for in their design and analysis.
Review of Recent Methodological Developments in Group-Randomized Trials: Part 1—Design
Li, Fan; Gallis, John A.; Prague, Melanie; Murray, David M.
2017-01-01
In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have highlighted the developments of the past 13 years in design with a companion article to focus on developments in analysis. As a pair, these articles update the 2004 review. We have discussed developments in the topics of the earlier review (e.g., clustering, matching, and individually randomized group-treatment trials) and in new topics, including constrained randomization and a range of randomized designs that are alternatives to the standard parallel-arm GRT. These include the stepped-wedge GRT, the pseudocluster randomized trial, and the network-randomized GRT, which, like the parallel-arm GRT, require clustering to be accounted for in both their design and analysis. PMID:28426295
Review of Recent Methodological Developments in Group-Randomized Trials: Part 1-Design.
Turner, Elizabeth L; Li, Fan; Gallis, John A; Prague, Melanie; Murray, David M
2017-06-01
In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have highlighted the developments of the past 13 years in design with a companion article to focus on developments in analysis. As a pair, these articles update the 2004 review. We have discussed developments in the topics of the earlier review (e.g., clustering, matching, and individually randomized group-treatment trials) and in new topics, including constrained randomization and a range of randomized designs that are alternatives to the standard parallel-arm GRT. These include the stepped-wedge GRT, the pseudocluster randomized trial, and the network-randomized GRT, which, like the parallel-arm GRT, require clustering to be accounted for in both their design and analysis.
Disease clusters, exact distributions of maxima, and P-values.
Grimson, R C
1993-10-01
This paper presents combinatorial (exact) methods that are useful in the analysis of disease cluster data obtained from small environments, such as buildings and neighbourhoods. Maxwell-Boltzmann and Fermi-Dirac occupancy models are compared in terms of appropriateness of representation of disease incidence patterns (space and/or time) in these environments. The methods are illustrated by a statistical analysis of the incidence pattern of bone fractures in a setting wherein fracture clustering was alleged to be occurring. One of the methodological results derived in this paper is the exact distribution of the maximum cell frequency in occupancy models.
Corrosion and corrosion fatigue of airframe aluminum alloys
NASA Technical Reports Server (NTRS)
Chen, G. S.; Gao, M.; Harlow, D. G.; Wei, R. P.
1994-01-01
Localized corrosion and corrosion fatigue crack nucleation and growth are recognized as degradation mechanisms that effect the durability and integrity of commercial transport aircraft. Mechanically based understanding is needed to aid the development of effective methodologies for assessing durability and integrity of airframe components. As a part of the methodology development, experiments on pitting corrosion, and on corrosion fatigue crack nucleation and early growth from these pits were conducted. Pitting was found to be associated with constituent particles in the alloys and pit growth often involved coalescence of individual particle-nucleated pits, both laterally and in depth. Fatigue cracks typically nucleated from one of the larger pits that formed by a cluster of particles. The size of pit at which fatigue crack nucleates is a function of stress level and fatigue loading frequency. The experimental results are summarized, and their implications on service performance and life prediction are discussed.
Guastaferro, Kate; Miller, Katy; Shanley Chatham, Jenelle R.; Whitaker, Daniel J.; McGilly, Kate; Lutzker, John R.
2017-01-01
An effective approach in early intervention for children and families, including child maltreatment prevention, is home-based services. Though several evidence-based programs exist, they are often grouped together, despite having different foci. This paper describes an ongoing cluster randomized trial systematically braiding two evidence-based home-based models, SafeCare® and Parents as Teachers (PAT)®, to better meet the needs of families at-risk. We describe the methodology for braiding model implementation and curriculum, specifically focusing on how structured qualitative feedback from pilot families and providers was used to create the braided curriculum and implementation. Systematic braiding of two models at the implementation and curriculum levels is a mechanism that has the potential to meet the more comprehensive needs of families at-risk for maltreatment. PMID:27870760
Quantifying innovation in surgery.
Hughes-Hallett, Archie; Mayer, Erik K; Marcus, Hani J; Cundy, Thomas P; Pratt, Philip J; Parston, Greg; Vale, Justin A; Darzi, Ara W
2014-08-01
The objectives of this study were to assess the applicability of patents and publications as metrics of surgical technology and innovation; evaluate the historical relationship between patents and publications; develop a methodology that can be used to determine the rate of innovation growth in any given health care technology. The study of health care innovation represents an emerging academic field, yet it is limited by a lack of valid scientific methods for quantitative analysis. This article explores and cross-validates 2 innovation metrics using surgical technology as an exemplar. Electronic patenting databases and the MEDLINE database were searched between 1980 and 2010 for "surgeon" OR "surgical" OR "surgery." Resulting patent codes were grouped into technology clusters. Growth curves were plotted for these technology clusters to establish the rate and characteristics of growth. The initial search retrieved 52,046 patents and 1,801,075 publications. The top performing technology cluster of the last 30 years was minimally invasive surgery. Robotic surgery, surgical staplers, and image guidance were the most emergent technology clusters. When examining the growth curves for these clusters they were found to follow an S-shaped pattern of growth, with the emergent technologies lying on the exponential phases of their respective growth curves. In addition, publication and patent counts were closely correlated in areas of technology expansion. This article demonstrates the utility of publically available patent and publication data to quantify innovations within surgical technology and proposes a novel methodology for assessing and forecasting areas of technological innovation.
Yoon, In-Kyu; Getis, Arthur; Aldstadt, Jared; Rothman, Alan L.; Tannitisupawong, Darunee; Koenraadt, Constantianus J. M.; Fansiri, Thanyalak; Jones, James W.; Morrison, Amy C.; Jarman, Richard G.; Nisalak, Ananda; Mammen, Mammen P.; Thammapalo, Suwich; Srikiatkhachorn, Anon; Green, Sharone; Libraty, Daniel H.; Gibbons, Robert V.; Endy, Timothy; Pimgate, Chusak; Scott, Thomas W.
2012-01-01
Background Based on spatiotemporal clustering of human dengue virus (DENV) infections, transmission is thought to occur at fine spatiotemporal scales by horizontal transfer of virus between humans and mosquito vectors. To define the dimensions of local transmission and quantify the factors that support it, we examined relationships between infected humans and Aedes aegypti in Thai villages. Methodology/Principal Findings Geographic cluster investigations of 100-meter radius were conducted around DENV-positive and DENV-negative febrile “index” cases (positive and negative clusters, respectively) from a longitudinal cohort study in rural Thailand. Child contacts and Ae. aegypti from cluster houses were assessed for DENV infection. Spatiotemporal, demographic, and entomological parameters were evaluated. In positive clusters, the DENV infection rate among child contacts was 35.3% in index houses, 29.9% in houses within 20 meters, and decreased with distance from the index house to 6.2% in houses 80–100 meters away (p<0.001). Significantly more Ae. aegypti were DENV-infectious (i.e., DENV-positive in head/thorax) in positive clusters (23/1755; 1.3%) than negative clusters (1/1548; 0.1%). In positive clusters, 8.2% of mosquitoes were DENV-infectious in index houses, 4.2% in other houses with DENV-infected children, and 0.4% in houses without infected children (p<0.001). The DENV infection rate in contacts was 47.4% in houses with infectious mosquitoes, 28.7% in other houses in the same cluster, and 10.8% in positive clusters without infectious mosquitoes (p<0.001). Ae. aegypti pupae and adult females were more numerous only in houses containing infectious mosquitoes. Conclusions/Significance Human and mosquito infections are positively associated at the level of individual houses and neighboring residences. Certain houses with high transmission risk contribute disproportionately to DENV spread to neighboring houses. Small groups of houses with elevated transmission risk are consistent with over-dispersion of transmission (i.e., at a given point in time, people/mosquitoes from a small portion of houses are responsible for the majority of transmission). PMID:22816001
Fornasaro, Stefano; Vicario, Annalisa; De Leo, Luigina; Bonifacio, Alois; Not, Tarcisio; Sergo, Valter
2018-05-14
Raman hyperspectral imaging is an emerging practice in biological and biomedical research for label free analysis of tissues and cells. Using this method, both spatial distribution and spectral information of analyzed samples can be obtained. The current study reports the first Raman microspectroscopic characterisation of colon tissues from patients with Coeliac Disease (CD). The aim was to assess if Raman imaging coupled with hyperspectral multivariate image analysis is capable of detecting the alterations in the biochemical composition of intestinal tissues associated with CD. The analytical approach was based on a multi-step methodology: duodenal biopsies from healthy and coeliac patients were measured and processed with Multivariate Curve Resolution Alternating Least Squares (MCR-ALS). Based on the distribution maps and the pure spectra of the image constituents obtained from MCR-ALS, interesting biochemical differences between healthy and coeliac patients has been derived. Noticeably, a reduced distribution of complex lipids in the pericryptic space, and a different distribution and abundance of proteins rich in beta-sheet structures was found in CD patients. The output of the MCR-ALS analysis was then used as a starting point for two clustering algorithms (k-means clustering and hierarchical clustering methods). Both methods converged with similar results providing precise segmentation over multiple Raman images of studied tissues.
Clustering by well-being in workplace social networks: Homophily and social contagion.
Chancellor, Joseph; Layous, Kristin; Margolis, Seth; Lyubomirsky, Sonja
2017-12-01
Social interaction among employees is crucial at both an organizational and individual level. Demonstrating the value of recent methodological advances, 2 studies conducted in 2 workplaces and 2 countries sought to answer the following questions: (a) Do coworkers interact more with coworkers who have similar well-being? and, if yes, (b) what are the processes by which such affiliation occurs? Affiliation was assessed via 2 methodologies: a commonly used self-report measure (i.e., mutual nominations by coworkers) complemented by a behavioral measure (i.e., sociometric badges that track physical proximity and social interaction). We found that individuals who share similar levels of well-being (e.g., positive affect, life satisfaction, need satisfaction, and job satisfaction) were more likely to socialize with one another. Furthermore, time-lagged analyses suggested that clustering in need satisfaction arises from mutual attraction (homophily), whereas clustering in job satisfaction and organizational prosocial behavior results from emotional contagion. These results suggest ways in which organizations can physically and socially improve their workplace. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Practical proof of CP element based design for 14nm node and beyond
NASA Astrophysics Data System (ADS)
Maruyama, Takashi; Takita, Hiroshi; Ikeno, Rimon; Osawa, Morimi; Kojima, Yoshinori; Sugatani, Shinji; Hoshino, Hiromi; Hino, Toshio; Ito, Masaru; Iizuka, Tetsuya; Komatsu, Satoshi; Ikeda, Makoto; Asada, Kunihiro
2013-03-01
To realize HVM (High Volume Manufacturing) with CP (Character Projection) based EBDW, the shot count reduction is the essential key. All device circuits should be composed with predefined character parts and we call this methodology "CP element based design". In our previous work, we presented following three concepts [2]. 1) Memory: We reported the prospects of affordability for the CP-stencil resource. 2) Logic cell: We adopted a multi-cell clustering approach in the physical synthesis. 3) Random interconnect: We proposed an ultra-regular layout scheme using fixed size wiring tiles containing repeated tracks and cutting points at the tile edges. In this paper, we will report the experimental proofs in these methodologies. In full chip layout, CP stencil resource management is critical key. From the MCC-POC (Proof of Concept) result [1], we assumed total available CP stencil resource as 9000um2. We should manage to layout all circuit macros within this restriction. Especially the issues in assignment of CP-stencil resource for the memory macros are the most important as they consume considerable degree of resource because of the various line-ups such as 1RW-, 2RW-SRAMs, Resister Files and ROM which require several varieties of large size peripheral circuits. Furthermore the memory macros typically take large area of more than 40% of die area in the forefront logic LSI products so that the shot count increase impact is serious. To realize CP-stencil resource saving we had constructed automatic CP analyzing system. We developed two types of extraction mode of simple division by block and layout repeatability recognition. By properly controlling these models based upon each peripheral circuit characteristics, we could minimize the consumption of CP stencil resources. The estimation for 14nm technology node had been performed based on the analysis of practical memory compiler. The required resource for memory macro is proved to be affordable value which is 60% of full CP stencil resource and wafer level converted shot count is proved to be the level which meets 100WPH throughput. In logic cell design, circuit performance verification result after the cell clustering has been estimated. The cell clustering by the acknowledgment of physical distance proved to owe large penalty mainly in the wiring length. To reduce this design penalty, we proposed CP cell clustering by the acknowledgment of logical distance. For shot-count reduction of random interconnect area design, we proposed a more structural routing architecture which consists of the track exchange and the via position arrangement. Putting these design approaches together, we can design CP stencils to hit the target throughput within the area constraint. From the analysis for other macros such as analog, I/O, and DUMMY, it has proved that we don't need special CP design approach than legacy pattern matching CP extraction. From all these experimental results we get good prospects to the reality of full CP element based layout.
Smiga, Szymon; Fabiano, Eduardo
2017-11-15
We have developed a simplified coupled cluster (SCC) methodology, using the basic idea of scaled MP2 methods. The scheme has been applied to the coupled cluster double equations and implemented in three different non-iterative variants. This new method (especially the SCCD[3] variant, which utilizes a spin-resolved formalism) has been found to be very efficient and to yield an accurate approximation of the reference CCD results for both total and interaction energies of different atoms and molecules. Furthermore, we demonstrate that the equations determining the scaling coefficients for the SCCD[3] approach can generate non-empirical SCS-MP2 scaling coefficients which are in good agreement with previous theoretical investigations.
NASA Astrophysics Data System (ADS)
Vallet, B.; Soheilian, B.; Brédif, M.
2014-08-01
The 3D reconstruction of similar 3D objects detected in 2D faces a major issue when it comes to grouping the 2D detections into clusters to be used to reconstruct the individual 3D objects. Simple clustering heuristics fail as soon as similar objects are close. This paper formulates a framework to use the geometric quality of the reconstruction as a hint to do a proper clustering. We present a methodology to solve the resulting combinatorial optimization problem with some simplifications and approximations in order to make it tractable. The proposed method is applied to the reconstruction of 3D traffic signs from their 2D detections to demonstrate its capacity to solve ambiguities.
Identifying typical patterns of vulnerability: A 5-step approach based on cluster analysis
NASA Astrophysics Data System (ADS)
Sietz, Diana; Lüdeke, Matthias; Kok, Marcel; Lucas, Paul; Carsten, Walther; Janssen, Peter
2013-04-01
Specific processes that shape the vulnerability of socio-ecological systems to climate, market and other stresses derive from diverse background conditions. Within the multitude of vulnerability-creating mechanisms, distinct processes recur in various regions inspiring research on typical patterns of vulnerability. The vulnerability patterns display typical combinations of the natural and socio-economic properties that shape a systems' vulnerability to particular stresses. Based on the identification of a limited number of vulnerability patterns, pattern analysis provides an efficient approach to improving our understanding of vulnerability and decision-making for vulnerability reduction. However, current pattern analyses often miss explicit descriptions of their methods and pay insufficient attention to the validity of their groupings. Therefore, the question arises as to how do we identify typical vulnerability patterns in order to enhance our understanding of a systems' vulnerability to stresses? A cluster-based pattern recognition applied at global and local levels is scrutinised with a focus on an applicable methodology and practicable insights. Taking the example of drylands, this presentation demonstrates the conditions necessary to identify typical vulnerability patterns. They are summarised in five methodological steps comprising the elicitation of relevant cause-effect hypotheses and the quantitative indication of mechanisms as well as an evaluation of robustness, a validation and a ranking of the identified patterns. Reflecting scale-dependent opportunities, a global study is able to support decision-making with insights into the up-scaling of interventions when available funds are limited. In contrast, local investigations encourage an outcome-based validation. This constitutes a crucial step in establishing the credibility of the patterns and hence their suitability for informing extension services and individual decisions. In this respect, working at the local level provides a clear advantage since, to a large extent, limitations in globally available observational data constrain such a validation on the global scale. Overall, the five steps are outlined in detail in order to facilitate and motivate the application of pattern recognition in other research studies concerned with vulnerability analysis, including future applications to different vulnerability frameworks. Such applications could promote the refinement of mechanisms in specific contexts and advance methodological adjustments. This would further increase the value of identifying typical patterns in the properties of socio-ecological systems for an improved understanding and management of the relation between these systems and particular stresses.
Adnane, Choaib; Adouly, Taoufik; Khallouk, Amine; Rouadi, Sami; Abada, Redallah; Roubal, Mohamed; Mahtar, Mohamed
2017-02-01
The purpose of this study is to use unsupervised cluster methodology to identify phenotype and mucosal eosinophilia endotype subgroups of patients with medical refractory chronic rhinosinusitis (CRS), and evaluate the difference in quality of life (QOL) outcomes after endoscopic sinus surgery (ESS) between these clusters for better surgical case selection. A prospective cohort study included 131 patients with medical refractory CRS who elected ESS. The Sino-Nasal Outcome Test (SNOT-22) was used to evaluate QOL before and 12 months after surgery. Unsupervised two-step clustering method was performed. One hundred and thirteen subjects were retained in this study: 46 patients with CRS without nasal polyps and 67 patients with nasal polyps. Nasal polyps, gender, mucosal eosinophilia profile, and prior sinus surgery were the most discriminating factors in the generated clusters. Three clusters were identified. A significant clinical improvement was observed in all clusters 12 months after surgery with a reduction of SNOT-22 scores. There was a significant difference in QOL outcomes between clusters; cluster 1 had the worst QOL improvement after FESS in comparison with the other clusters 2 and 3. All patients in cluster 1 presented CRSwNP with the highest mucosal eosinophilia endotype. Clustering method is able to classify CRS phenotypes and endotypes with different associated surgical outcomes.
A Novel Performance Evaluation Methodology for Single-Target Trackers.
Kristan, Matej; Matas, Jiri; Leonardis, Ales; Vojir, Tomas; Pflugfelder, Roman; Fernandez, Gustavo; Nebehay, Georg; Porikli, Fatih; Cehovin, Luka
2016-11-01
This paper addresses the problem of single-target tracker performance evaluation. We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison. The ranking-based methodology addresses tracker equivalence in terms of statistical significance and practical differences. A fully-annotated dataset with per-frame annotations with several visual attributes is introduced. The diversity of its visual properties is maximized in a novel way by clustering a large number of videos according to their visual attributes. This makes it the most sophistically constructed and annotated dataset to date. A multi-platform evaluation system allowing easy integration of third-party trackers is presented as well. The proposed evaluation methodology was tested on the VOT2014 challenge on the new dataset and 38 trackers, making it the largest benchmark to date. Most of the tested trackers are indeed state-of-the-art since they outperform the standard baselines, resulting in a highly-challenging benchmark. An exhaustive analysis of the dataset from the perspective of tracking difficulty is carried out. To facilitate tracker comparison a new performance visualization technique is proposed.
NASA Astrophysics Data System (ADS)
Torres-Arredondo, M.-A.; Sierra-Pérez, Julián; Cabanes, Guénaël
2016-05-01
The process of measuring and analysing the data from a distributed sensor network all over a structural system in order to quantify its condition is known as structural health monitoring (SHM). For the design of a trustworthy health monitoring system, a vast amount of information regarding the inherent physical characteristics of the sources and their propagation and interaction across the structure is crucial. Moreover, any SHM system which is expected to transition to field operation must take into account the influence of environmental and operational changes which cause modifications in the stiffness and damping of the structure and consequently modify its dynamic behaviour. On that account, special attention is paid in this paper to the development of an efficient SHM methodology where robust signal processing and pattern recognition techniques are integrated for the correct interpretation of complex ultrasonic waves within the context of damage detection and identification. The methodology is based on an acousto-ultrasonics technique where the discrete wavelet transform is evaluated for feature extraction and selection, linear principal component analysis for data-driven modelling and self-organising maps for a two-level clustering under the principle of local density. At the end, the methodology is experimentally demonstrated and results show that all the damages were detectable and identifiable.
Analysis methods for Thematic Mapper data of urban regions
NASA Technical Reports Server (NTRS)
Wang, S. C.
1984-01-01
Studies have indicated the difficulty in deriving a detailed land-use/land-cover classification for heterogeneous metropolitan areas with Landsat MSS and TM data. The major methodological issues of digital analysis which possibly have effected the results of classification are examined. In response to these methodological issues, a multichannel hierarchical clustering algorithm has been developed and tested for a more complete analysis of the data for urban areas.
Kim, Jaehee; Ogden, Robert Todd; Kim, Haseong
2013-10-18
Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization.The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be potentially used to identify genes which have the same patterns or biological processes, and help facing the present and forthcoming challenges of data analysis in functional genomics.
Subtyping adolescents with bulimia nervosa.
Chen, Eunice Y; Le Grange, Daniel
2007-12-01
Cluster analyses of eating disorder patients have yielded a "dietary-depressive" subtype, typified by greater negative affect, and a "dietary" subtype, typified by dietary restraint. This study aimed to replicate these findings in an adolescent sample with bulimia nervosa (BN) from a randomized controlled trial and to examine the validity and reliability of this methodology. In the sample of BN adolescents (N=80), cluster analysis revealed a "dietary-depressive" subtype (37.5%) and a "dietary" subtype (62.5%) using the Beck Depression Inventory, Rosenberg Self-Esteem Scale and Eating Disorder Examination Restraint subscale. The "dietary-depressive" subtype compared to the "dietary" subtype was significantly more likely to: (1) report co-occurring disorders, (2) greater eating and weight concerns, and (3) less vomiting abstinence at post-treatment (all p's<.05). The cluster analysis based on "dietary" and "dietary-depressive" subtypes appeared to have concurrent validity, yielding more distinct groups than subtyping by vomiting frequency. In order to assess the reliability of the subtyping scheme, a larger sample of adolescents with mixed eating and weight disorders in an outpatient eating disorder clinic (N=149) was subtyped, yielding similar subtypes. These results support the validity and reliability of the subtyping strategy in two adolescent samples.
Sequential analysis of hydrochemical data for watershed characterization.
Thyne, Geoffrey; Güler, Cüneyt; Poeter, Eileen
2004-01-01
A methodology for characterizing the hydrogeology of watersheds using hydrochemical data that combine statistical, geochemical, and spatial techniques is presented. Surface water and ground water base flow and spring runoff samples (180 total) from a single watershed are first classified using hierarchical cluster analysis. The statistical clusters are analyzed for spatial coherence confirming that the clusters have a geological basis corresponding to topographic flowpaths and showing that the fractured rock aquifer behaves as an equivalent porous medium on the watershed scale. Then principal component analysis (PCA) is used to determine the sources of variation between parameters. PCA analysis shows that the variations within the dataset are related to variations in calcium, magnesium, SO4, and HCO3, which are derived from natural weathering reactions, and pH, NO3, and chlorine, which indicate anthropogenic impact. PHREEQC modeling is used to quantitatively describe the natural hydrochemical evolution for the watershed and aid in discrimination of samples that have an anthropogenic component. Finally, the seasonal changes in the water chemistry of individual sites were analyzed to better characterize the spatial variability of vertical hydraulic conductivity. The integrated result provides a method to characterize the hydrogeology of the watershed that fully utilizes traditional data.
Towards the use of computationally inserted lesions for mammographic CAD assessment
NASA Astrophysics Data System (ADS)
Ghanian, Zahra; Pezeshk, Aria; Petrick, Nicholas; Sahiner, Berkman
2018-03-01
Computer-aided detection (CADe) devices used for breast cancer detection on mammograms are typically first developed and assessed for a specific "original" acquisition system, e.g., a specific image detector. When CADe developers are ready to apply their CADe device to a new mammographic acquisition system, they typically assess the CADe device with images acquired using the new system. Collecting large repositories of clinical images containing verified cancer locations and acquired by the new image acquisition system is costly and time consuming. Our goal is to develop a methodology to reduce the clinical data burden in the assessment of a CADe device for use with a different image acquisition system. We are developing an image blending technique that allows users to seamlessly insert lesions imaged using an original acquisition system into normal images or regions acquired with a new system. In this study, we investigated the insertion of microcalcification clusters imaged using an original acquisition system into normal images acquired with that same system utilizing our previously-developed image blending technique. We first performed a reader study to assess whether experienced observers could distinguish between computationally inserted and native clusters. For this purpose, we applied our insertion technique to clinical cases taken from the University of South Florida Digital Database for Screening Mammography (DDSM) and the Breast Cancer Digital Repository (BCDR). Regions of interest containing microcalcification clusters from one breast of a patient were inserted into the contralateral breast of the same patient. The reader study included 55 native clusters and their 55 inserted counterparts. Analysis of the reader ratings using receiver operating characteristic (ROC) methodology indicated that inserted clusters cannot be reliably distinguished from native clusters (area under the ROC curve, AUC=0.58±0.04). Furthermore, CADe sensitivity was evaluated on mammograms with native and inserted microcalcification clusters using a commercial CADe system. For this purpose, we used full field digital mammograms (FFDMs) from 68 clinical cases, acquired at the University of Michigan Health System. The average sensitivities for native and inserted clusters were equal, 85.3% (58/68). These results demonstrate the feasibility of using the inserted microcalcification clusters for assessing mammographic CAD devices.
Scaled Rocket Testing in Hypersonic Flow
NASA Technical Reports Server (NTRS)
Dufrene, Aaron; MacLean, Matthew; Carr, Zakary; Parker, Ron; Holden, Michael; Mehta, Manish
2015-01-01
NASA's Space Launch System (SLS) uses four clustered liquid rocket engines along with two solid rocket boosters. The interaction between all six rocket exhaust plumes will produce a complex and severe thermal environment in the base of the vehicle. This work focuses on a recent 2% scale, hot-fire SLS base heating test. These base heating tests are short-duration tests executed with chamber pressures near the full-scale values with gaseous hydrogen/oxygen engines and RSRMV analogous solid propellant motors. The LENS II shock tunnel/Ludwieg tube tunnel was used at or near flight duplicated conditions up to Mach 5. Model development was strongly based on the Space Shuttle base heating tests with several improvements including doubling of the maximum chamber pressures and duplication of freestream conditions. Detailed base heating results are outside of the scope of the current work, rather test methodology and techniques are presented along with broader applicability toward scaled rocket testing in supersonic and hypersonic flow.
Syed, Zeeshan; Saeed, Mohammed; Rubinfeld, Ilan
2010-01-01
For many clinical conditions, only a small number of patients experience adverse outcomes. Developing risk stratification algorithms for these conditions typically requires collecting large volumes of data to capture enough positive and negative for training. This process is slow, expensive, and may not be appropriate for new phenomena. In this paper, we explore different anomaly detection approaches to identify high-risk patients as cases that lie in sparse regions of the feature space. We study three broad categories of anomaly detection methods: classification-based, nearest neighbor-based, and clustering-based techniques. When evaluated on data from the National Surgical Quality Improvement Program (NSQIP), these methods were able to successfully identify patients at an elevated risk of mortality and rare morbidities following inpatient surgical procedures. PMID:21347083
Construction of ground-state preserving sparse lattice models for predictive materials simulations
NASA Astrophysics Data System (ADS)
Huang, Wenxuan; Urban, Alexander; Rong, Ziqin; Ding, Zhiwei; Luo, Chuan; Ceder, Gerbrand
2017-08-01
First-principles based cluster expansion models are the dominant approach in ab initio thermodynamics of crystalline mixtures enabling the prediction of phase diagrams and novel ground states. However, despite recent advances, the construction of accurate models still requires a careful and time-consuming manual parameter tuning process for ground-state preservation, since this property is not guaranteed by default. In this paper, we present a systematic and mathematically sound method to obtain cluster expansion models that are guaranteed to preserve the ground states of their reference data. The method builds on the recently introduced compressive sensing paradigm for cluster expansion and employs quadratic programming to impose constraints on the model parameters. The robustness of our methodology is illustrated for two lithium transition metal oxides with relevance for Li-ion battery cathodes, i.e., Li2xFe2(1-x)O2 and Li2xTi2(1-x)O2, for which the construction of cluster expansion models with compressive sensing alone has proven to be challenging. We demonstrate that our method not only guarantees ground-state preservation on the set of reference structures used for the model construction, but also show that out-of-sample ground-state preservation up to relatively large supercell size is achievable through a rapidly converging iterative refinement. This method provides a general tool for building robust, compressed and constrained physical models with predictive power.
Almansa, Josué; Vermunt, Jeroen K; Forero, Carlos G; Vilagut, Gemma; De Graaf, Ron; De Girolamo, Giovanni; Alonso, Jordi
2011-06-01
Epidemiological studies on mental health and mental comorbidity are usually based on prevalences and correlations between disorders, or some other form of bivariate clustering of disorders. In this paper, we propose a Factor Mixture Model (FMM) methodology based on conceptual models aiming to measure and summarize distinctive disorder information in the internalizing and externalizing dimensions. This methodology includes explicit modelling of subpopulations with and without 12 month disorders ("ill" and "healthy") by means of latent classes, as well as assessment of model invariance and estimation of dimensional scores. We applied this methodology with an internalizing/externalizing two-factor model, to a representative sample gathered in the European Study of the Epidemiology of Mental Disorders (ESEMeD) study -- which includes 8796 individuals from six countries, and used the CIDI 3.0 instrument for disorder assessment. Results revealed that southern European countries have significantly higher mental health levels concerning internalizing/externalizing disorders than central countries; males suffered more externalizing disorders than women did, and conversely, internalizing disorders were more frequent in women. Differences in mental-health level between socio-demographic groups were due to different proportions of healthy and ill individuals and, noticeably, to the ameliorating influence of marital status on severity. An advantage of latent model-based scores is that the inclusion of additional mental-health dimensional information -- other than diagnostic data -- allows for greater precision within a target range of scores. Copyright © 2011 John Wiley & Sons, Ltd.
CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.
Fidaner, Işık Barış; Cankorur-Cetinkaya, Ayca; Dikicioglu, Duygu; Kirdar, Betul; Cemgil, Ali Taylan; Oliver, Stephen G
2016-02-01
Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets. We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications. The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG. sgo24@cam.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Health-risk behaviour in Croatia.
Bécue-Bertaut, Mónica; Kern, Josipa; Hernández-Maldonado, Maria-Luisa; Juresa, Vesna; Vuletic, Silvije
2008-02-01
To identify the health-risk behaviour of various homogeneous clusters of individuals. The study was conducted in 13 of the 20 Croatian counties and in Zagreb, the Croatian capital. In the first stage, general practices were selected in each county. The second-stage sample was created by drawing a random subsample of 10% of the patients registered at each selected general practice. The sample was divided into seven homogenous clusters using statistical methodology, combining multiple factor analysis with a hybrid clustering method. Seven homogeneous clusters were identified, three composed of males and four composed of females, based on statistically significant differences between selected characteristics (P<0.001). Although, in general, self-assessed health declined with age, significant variations were observed within specific age intervals. Higher levels of self-assessed health were associated with higher levels of education and/or socio-economic status. Many individuals, especially females, who self-reported poor health were heavy consumers of sleeping pills. Males and females reported different health-risk behaviours related to lifestyle, diet and use of the healthcare system. Heavy alcohol and tobacco use, unhealthy diet, risky physical activity and non-use of the healthcare system influenced self-assessed health in males. Females were slightly less satisfied with their health than males of the same age and educational level. Even highly educated females who took preventive healthcare tests and ate a healthy diet reported a less satisfactory self-assessed level of health than expected. Sociodemographic characteristics, life style, self-assessed health and use of the healthcare system were used in the identification of seven homogeneous population clusters. A comprehensive analysis of these clusters suggests health-related prevention and intervention efforts geared towards specific populations.
NASA Astrophysics Data System (ADS)
Vulcani, Benedetta; Treu, Tommaso; Schmidt, Kasper B.; Poggianti, Bianca M.; Dressler, Alan; Fontana, Adriano; Bradač, Marusa; Brammer, Gabriel B.; Hoag, Austin; Huang, Kuan-Han; Malkan, Matthew; Pentericci, Laura; Trenti, Michele; von der Linden, Anja; Abramson, Louis; He, Julie; Morris, Glenn
2015-12-01
We present the first study of the spatial distribution of star formation in z ˜ 0.5 cluster galaxies. The analysis is based on data taken with the Wide Field Camera 3 as part of the Grism Lens-Amplified Survey from Space (GLASS). We illustrate the methodology by focusing on two clusters (MACS 0717.5+3745 and MACS 1423.8+2404) with different morphologies (one relaxed and one merging) and use foreground and background galaxies as a field control sample. The cluster+field sample consists of 42 galaxies with stellar masses in the range 108-1011 M⊙ and star formation rates in the range 1-20 M⊙ yr-1. Both in clusters and in the field, Hα is more extended than the rest-frame UV continuum in 60% of the cases, consistent with diffuse star formation and inside-out growth. In ˜20% of the cases, the Hα emission appears more extended in cluster galaxies than in the field, pointing perhaps to ionized gas being stripped and/or star formation being enhanced at large radii. The peak of the Hα emission and that of the continuum are offset by less than 1 kpc. We investigate trends with the hot gas density as traced by the X-ray emission, and with the surface mass density as inferred from gravitational lens models, and find no conclusive results. The diversity of morphologies and sizes observed in Hα illustrates the complexity of the environmental processes that regulate star formation. Upcoming analysis of the full GLASS data set will increase our sample size by almost an order of magnitude, verifying and strengthening the inference from this initial data set.
Wagner, Monica; Shafer, Valerie L.; Martin, Brett; Steinschneider, Mitchell
2013-01-01
The effect of exposure to the contextual features of the /pt/ cluster was investigated in native-English and native-Polish listeners using behavioral and event-related potential (ERP) methodology. Both groups experience the /pt/ cluster in their languages, but only the Polish group experiences the cluster in the context of word onset examined in the current experiment. The /st/ cluster was used as an experimental control. ERPs were recorded while participants identified the number of syllables in the second word of nonsense word pairs. The results found that only Polish listeners accurately perceived the /pt/ cluster and perception was reflected within a late positive component of the ERP waveform. Furthermore, evidence of discrimination of /pt/ and /pǝt/ onsets in the neural signal was found even for non-native listeners who could not perceive the difference. These findings suggest that exposure to phoneme sequences in highly specific contexts may be necessary for accurate perception. PMID:22867752
Using cluster analysis for medical resource decision making.
Dilts, D; Khamalah, J; Plotkin, A
1995-01-01
Escalating costs of health care delivery have in the recent past often made the health care industry investigate, adapt, and apply those management techniques relating to budgeting, resource control, and forecasting that have long been used in the manufacturing sector. A strategy that has contributed much in this direction is the definition and classification of a hospital's output into "products" or groups of patients that impose similar resource or cost demands on the hospital. Existing classification schemes have frequently employed cluster analysis in generating these groupings. Unfortunately, the myriad articles and books on clustering and classification contain few formalized selection methodologies for choosing a technique for solving a particular problem, hence they often leave the novice investigator at a loss. This paper reviews the literature on clustering, particularly as it has been applied in the medical resource-utilization domain, addresses the critical choices facing an investigator in the medical field using cluster analysis, and offers suggestions (using the example of clustering low-vision patients) for how such choices can be made.
Environment-based pin-power reconstruction method for homogeneous core calculations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leroyer, H.; Brosselard, C.; Girardi, E.
2012-07-01
Core calculation schemes are usually based on a classical two-step approach associated with assembly and core calculations. During the first step, infinite lattice assemblies calculations relying on a fundamental mode approach are used to generate cross-sections libraries for PWRs core calculations. This fundamental mode hypothesis may be questioned when dealing with loading patterns involving several types of assemblies (UOX, MOX), burnable poisons, control rods and burn-up gradients. This paper proposes a calculation method able to take into account the heterogeneous environment of the assemblies when using homogeneous core calculations and an appropriate pin-power reconstruction. This methodology is applied to MOXmore » assemblies, computed within an environment of UOX assemblies. The new environment-based pin-power reconstruction is then used on various clusters of 3x3 assemblies showing burn-up gradients and UOX/MOX interfaces, and compared to reference calculations performed with APOLLO-2. The results show that UOX/MOX interfaces are much better calculated with the environment-based calculation scheme when compared to the usual pin-power reconstruction method. The power peak is always better located and calculated with the environment-based pin-power reconstruction method on every cluster configuration studied. This study shows that taking into account the environment in transport calculations can significantly improve the pin-power reconstruction so far as it is consistent with the core loading pattern. (authors)« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nishimura, Yoshifumi; Department of Applied Chemistry and Institute of Molecular Science, National Chiao Tung University, 1001 Ta-Hsueh Road, Hsinchu 30010, Taiwan; Lee, Yuan-Pern
Vibrational infrared (IR) spectra of gas-phase O–H⋅⋅⋅O methanol clusters up to pentamer are simulated using self-consistent-charge density functional tight-binding method using two distinct methodologies: standard normal mode analysis and Fourier transform of the dipole time-correlation function. The twofold simulations aim at the direct critical assignment of the C–H stretching region of the recently recorded experimental spectra [H.-L. Han, C. Camacho, H. A. Witek, and Y.-P. Lee, J. Chem. Phys. 134, 144309 (2011)]. Both approaches confirm the previous assignment (ibid.) of the C–H stretching bands based on the B3LYP/ANO1 harmonic frequencies, showing that ν{sub 3}, ν{sub 9}, and ν{sub 2} C–Hmore » stretching modes of the proton-accepting (PA) and proton-donating (PD) methanol monomers experience only small splittings upon the cluster formation. This finding is in sharp discord with the assignment based on anharmonic B3LYP/VPT2/ANO1 vibrational frequencies (ibid.), suggesting that some procedural faults, likely related to the breakdown of the perturbational vibrational treatment, led the anharmonic calculations astray. The IR spectra based on the Fourier transform of the dipole time-correlation function include new, previously unaccounted for physical factors such as non-zero temperature of the system and large amplitude motions of the clusters. The elevation of temperature results in a considerable non-homogeneous broadening of the observed IR signals, while the presence of large-amplitude motions (methyl group rotations and PA-PD flipping), somewhat surprisingly, does not introduce any new features in the spectrum.« less
NASA Astrophysics Data System (ADS)
Zagouras, Athanassios; Argiriou, Athanassios A.; Flocas, Helena A.; Economou, George; Fotopoulos, Spiros
2012-11-01
Classification of weather maps at various isobaric levels as a methodological tool is used in several problems related to meteorology, climatology, atmospheric pollution and to other fields for many years. Initially the classification was performed manually. The criteria used by the person performing the classification are features of isobars or isopleths of geopotential height, depending on the type of maps to be classified. Although manual classifications integrate the perceptual experience and other unquantifiable qualities of the meteorology specialists involved, these are typically subjective and time consuming. Furthermore, during the last years different approaches of automated methods for atmospheric circulation classification have been proposed, which present automated and so-called objective classifications. In this paper a new method of atmospheric circulation classification of isobaric maps is presented. The method is based on graph theory. It starts with an intelligent prototype selection using an over-partitioning mode of fuzzy c-means (FCM) algorithm, proceeds to a graph formulation for the entire dataset and produces the clusters based on the contemporary dominant sets clustering method. Graph theory is a novel mathematical approach, allowing a more efficient representation of spatially correlated data, compared to the classical Euclidian space representation approaches, used in conventional classification methods. The method has been applied to the classification of 850 hPa atmospheric circulation over the Eastern Mediterranean. The evaluation of the automated methods is performed by statistical indexes; results indicate that the classification is adequately comparable with other state-of-the-art automated map classification methods, for a variable number of clusters.
Odoi, Agricola; Wray, Ron; Emo, Marion; Birch, Stephen; Hutchison, Brian; Eyles, John; Abernathy, Tom
2005-01-01
Background Population health planning aims to improve the health of the entire population and to reduce health inequities among population groups. Socioeconomic factors are increasingly being recognized as major determinants of many aspects of health and causes of health inequities. Knowledge of socioeconomic characteristics of neighbourhoods is necessary to identify their unique health needs and enhance identification of socioeconomically disadvantaged populations. Careful integration of this knowledge into health planning activities is necessary to ensure that health planning and service provision are tailored to unique neighbourhood population health needs. In this study, we identify unique neighbourhood socioeconomic characteristics and classify the neighbourhoods based on these characteristics. Principal components analysis (PCA) of 18 socioeconomic variables was used to identify the principal components explaining most of the variation in socioeconomic characteristics across the neighbourhoods. Cluster analysis was used to classify neighbourhoods based on their socioeconomic characteristics. Results Results of the PCA and cluster analysis were similar but the latter were more objective and easier to interpret. Five neighbourhood types with distinguishing socioeconomic and demographic characteristics were identified. The methodology provides a more complete picture of the neighbourhood socioeconomic characteristics than when a single variable (e.g. income) is used to classify neighbourhoods. Conclusion Cluster analysis is useful for generating neighbourhood population socioeconomic and demographic characteristics that can be useful in guiding neighbourhood health planning and service provision. This study is the first of a series of studies designed to investigate health inequalities at the neighbourhood level with a view to providing evidence-base for health planners, service providers and policy makers to help address health inequity issues at the neighbourhood level. Subsequent studies will investigate inequalities in health outcomes both within and across the neighbourhood types identified in the current study. PMID:16092969
NASA Astrophysics Data System (ADS)
Bharatham, Kavitha; Bharatham, Nagakumar; Kwon, Yong Jung; Lee, Keun Woo
2008-12-01
Allosteric inhibition of protein tyrosine phosphatase 1B (PTP1B), has paved a new path to design specific inhibitors for PTP1B, which is an important drug target for the treatment of type II diabetes and obesity. The PTP1B1-282-allosteric inhibitor complex crystal structure lacks α7 (287-298) and moreover there is no available 3D structure of PTP1B1-298 in open form. As the interaction between α7 and α6-α3 helices plays a crucial role in allosteric inhibition, α7 was modeled to the PTP1B1-282 in open form complexed with an allosteric inhibitor (compound-2) and a 5 ns MD simulation was performed to investigate the relative orientation of the α7-α6-α3 helices. The simulation conformational space was statistically sampled by clustering analyses. This approach was helpful to reveal certain clues on PTP1B allosteric inhibition. The simulation was also utilized in the generation of receptor based pharmacophore models to include the conformational flexibility of the protein-inhibitor complex. Three cluster representative structures of the highly populated clusters were selected for pharmacophore model generation. The three pharmacophore models were subsequently utilized for screening databases to retrieve molecules containing the features that complement the allosteric site. The retrieved hits were filtered based on certain drug-like properties and molecular docking simulations were performed in two different conformations of protein. Thus, performing MD simulation with α7 to investigate the changes at the allosteric site, then developing receptor based pharmacophore models and finally docking the retrieved hits into two distinct conformations will be a reliable methodology in identifying PTP1B allosteric inhibitors.
A two-stage method for microcalcification cluster segmentation in mammography by deformable models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.
Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods aremore » applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST{sub cluster}, average of minimum distance—AMINDIST{sub cluster}) and the area overlap measure (AOM{sub cluster}). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing tenfold cross-validation methodology. A previously developed B-spline active rays segmentation method was also considered for comparison purposes. Results: Interobserver and intraobserver segmentation agreements (median and [25%, 75%] quartile range) were substantial with respect to the distance metrics HDIST{sub cluster} (2.3 [1.8, 2.9] and 2.5 [2.1, 3.2] pixels) and AMINDIST{sub cluster} (0.8 [0.6, 1.0] and 1.0 [0.8, 1.2] pixels), while moderate with respect to AOM{sub cluster} (0.64 [0.55, 0.71] and 0.59 [0.52, 0.66]). The proposed segmentation method outperformed (0.80 ± 0.04) statistically significantly (Mann-Whitney U-test, p < 0.05) the B-spline active rays segmentation method (0.69 ± 0.04), suggesting the significance of the proposed semiautomated method. Conclusions: Results indicate a reliable semiautomated segmentation method for MC clusters offered by deformable models, which could be utilized in MC cluster quantitative image analysis.« less
NASA Astrophysics Data System (ADS)
Zhao, Huiling; Li, Yinli; Chen, Dong; Liu, Bo
2016-12-01
The co-adsorption behavior of nucleic-acid base (thymine; cytosine) and melamine was investigated by scanning tunneling microscopy (STM) technique at liquid/solid (1-octanol/graphite) interface. STM characterization results indicate that phase separation happened after dropping the mixed solution of thymine-melamine onto highly oriented pyrolytic graphite (HOPG) surface, while the hetero-component cluster-like structure was observed when cytosine-melamine binary assembly system is used. From the viewpoints of non-covalent interactions calculated by using density functional theory (DFT) method, the formation mechanisms of these assembled structures were explored in detail. This work will supply a methodology to design the supramolecular assembled structures and the hetero-component materials composed by biological and chemical compound.
Cota-Ruiz, Juan; Rosiles, Jose-Gerardo; Sifuentes, Ernesto; Rivas-Perea, Pablo
2012-01-01
This research presents a distributed and formula-based bilateration algorithm that can be used to provide initial set of locations. In this scheme each node uses distance estimates to anchors to solve a set of circle-circle intersection (CCI) problems, solved through a purely geometric formulation. The resulting CCIs are processed to pick those that cluster together and then take the average to produce an initial node location. The algorithm is compared in terms of accuracy and computational complexity with a Least-Squares localization algorithm, based on the Levenberg-Marquardt methodology. Results in accuracy vs. computational performance show that the bilateration algorithm is competitive compared with well known optimized localization algorithms.
A Cluster Of Activities On Coma From The Hubble Space Telescope, StarDate, And McDonald Observatory
NASA Astrophysics Data System (ADS)
Hemenway, Mary Kay; Jogee, S.; Fricke, K.; Preston, S.
2011-01-01
With a goal of providing a vast audience of students, teachers, the general public, and Spanish-speakers with activities to learn about research on the Coma cluster of galaxies based on the HST ACS Treasury survey of Coma, McDonald Observatory used a many-faceted approach. Since this research offered an unprecedented legacy dataset, part of the challenge was to convey the importance of this project to a diverse audience. The methodology was to create different products for different (overlapping) audiences. Five radio programs were produced in English and Spanish for distribution on over 500 radio stations in the US and Mexico with a listening audience of over 2 million; in addition to the radio listeners, there were over 13,000 downloads of the English scripts and almost 6000 of the Spanish. Images were prepared for use in the StarDate Online Astronomy Picture of the Week, for ViewSpace (used in museums), and for the StarDate/Universo Teacher Guide. A high-school level activity on the Coma Cluster was prepared and distributed both on-line and in an upgraded printed version of the StarDate/Universo Teacher Guide. This guide has been distributed to over 1700 teachers nationally. A YouTube video about careers and research in astronomy using the Coma cluster as an example was produced. Just as the activities were varied, so were the evaluation methods. This material is based upon work supported by the National Aeronautics and Space Administration under Grant/Contract/Agreement No. HST-EO-10861.35-A issued through the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS5-26555.
NASA Astrophysics Data System (ADS)
Ermida, S. L.; Trigo, I. F.; DaCamara, C.; Ghent, D.
2017-12-01
Land surface temperature (LST) values retrieved from satellite measurements in the thermal infrared (TIR) may be strongly affected by spatial anisotropy. This effect introduces significant discrepancies among LST estimations from different sensors, overlapping in space and time, that are not related to uncertainties in the methodologies or input data used. Furthermore, these directional effects deviate LST products from an ideally defined LST, which should represent to the ensemble of directional radiometric temperature of all surface elements within the FOV. Angular effects on LST are here conveniently estimated by means of a parametric model of the surface thermal emission, which describes the angular dependence of LST as a function of viewing and illumination geometry. Two models are consistently analyzed to evaluate their performance of and to assess their respective potential to correct directional effects on LST for a wide range of surface conditions, in terms of tree coverage, vegetation density, surface emissivity. We also propose an optimization of the correction of directional effects through a synergistic use of both models. The models are calibrated using LST data as provided by two sensors: MODIS on-board NASA's TERRA and AQUA; and SEVIRI on-board EUMETSAT's MSG. As shown in our previous feasibility studies the sampling of illumination and view angles has a high impact on the model parameters. This impact may be mitigated when the sampling size is increased by aggregating pixels with similar surface conditions. Here we propose a methodology where land surface is stratified by means of a cluster analysis using information on land cover type, fraction of vegetation cover and topography. The models are then adjusted to LST data corresponding to each cluster. It is shown that the quality of the cluster based models is very close to the pixel based ones. Furthermore, the reduced number of parameters allows improving the model trough the incorporation of a seasonal component. The application of the procedure discussed here towards the harmonization of LST products from multi-sensors has been tested within the framework of the ESA DUE GlobTemperature project. It is also expected to help the characterization of directional effects of LST products generated within the EUMETSAT LSA SAF.
Analysis of the structure and dynamics of human serum albumin.
Guizado, T R Cuya
2014-10-01
Human serum albumin (HSA) is a biologically relevant protein that binds a variety of drugs and other small molecules. No less than 50 structures are deposited in the RCSB Protein Data Bank (PDB). Based on these structures, we first performed a clustering analysis. Despite the diversity of ligands, only two well defined conformations are detected, with a deviation of 0.46 nm between the average structures of the two clusters, while deviations within each cluster are smaller than 0.08 nm. Those two conformations are representative of the apoprotein and the HSA-myristate complex already identified in previous literature. Considering the structures within each cluster as a representative sample of the dynamical states of the corresponding conformation, we scrutinize the structural and dynamical differences between both conformations. Analysis of the fluctuations within each cluster set reveals that domain II is the most rigid one and better matches both structures. Then, taking this domain as reference, we show that the structural difference between both conformations can be expressed in terms of twist and hinge motions of domains I and III, respectively. We also characterize the dynamical difference between conformations by computing correlations and principal components for each set of dynamical states. The two conformations display different collective motions. The results are compared with those obtained from the trajectories of short molecular dynamics simulations, giving consistent outcomes. Let us remark that, beyond the relevance of the results for the structural and dynamical characterization of HAS conformations, the present methodology could be extended to other proteins in the PDB archive.
Foods are differentially associated with subjective effect report questions of abuse liability.
Schulte, Erica M; Smeal, Julia K; Gearhardt, Ashley N
2017-01-01
The current study investigates which foods may be most implicated in addictive-like eating by examining how nutritionally diverse foods relate to loss of control consumption and various subjective effect reports. Subjective effect reports assess the abuse liabilities of substances and may similarly provide insight into which foods may be reinforcing in a manner that triggers an addictive-like response for some individuals. Cross-sectional. Online community. 507 participants (n = 501 used in analyses) recruited through Amazon MTurk. Participants (n = 501) self-reported how likely they were to experience a loss of control over their consumption of 30 nutritionally diverse foods and rated each food on five subjective effect report questions that assess the abuse liability of substances (liking, pleasure, craving, averseness, intensity). Hierarchical cluster analytic techniques were used to examine how foods grouped together based on each question. Highly processed foods, with added fats and/or refined carbohydrates, clustered together and were associated with greater loss of control, liking, pleasure, and craving. The clusters yielded from the subjective effect reports assessing liking, pleasure, and craving were most similar to clusters formed based on loss of control over consumption, whereas the clusters yielded from averseness and intensity did not meaningfully differentiate food items. The present work applies methodology used to assess the abuse liability of substances to understand whether foods may vary in their potential to be associated with addictive-like consumption. Highly processed foods (e.g., pizza, chocolate) appear to be most related to an indicator of addictive-like eating (loss of control) and several subjective effect reports (liking, pleasure, craving). Thus, these foods may be particularly reinforcing and capable of triggering an addictive-like response in some individuals. Future research is warranted to understand whether highly processed foods are related to these indicators of abuse liability at a similar magnitude as addictive substances.
Wide-Field Lensing Mass Maps from Dark Energy Survey Science Verification Data
Chang, C.
2015-07-29
We present a mass map reconstructed from weak gravitational lensing shear measurements over 139 deg 2 from the Dark Energy Survey science verification data. The mass map probes both luminous and dark matter, thus providing a tool for studying cosmology. We also find good agreement between the mass map and the distribution of massive galaxy clusters identified using a red-sequence cluster finder. Potential candidates for superclusters and voids are identified using these maps. We measure the cross-correlation between the mass map and a magnitude-limited foreground galaxy sample and find a detection at the 6.8σ level with 20 arc min smoothing.more » These measurements are consistent with simulated galaxy catalogs based on N-body simulations from a cold dark matter model with a cosmological constant. This suggests low systematics uncertainties in the map. Finally, we summarize our key findings in this Letter; the detailed methodology and tests for systematics are presented in a companion paper.« less
Failure Mode Identification Through Clustering Analysis
NASA Technical Reports Server (NTRS)
Arunajadai, Srikesh G.; Stone, Robert B.; Tumer, Irem Y.; Clancy, Daniel (Technical Monitor)
2002-01-01
Research has shown that nearly 80% of the costs and problems are created in product development and that cost and quality are essentially designed into products in the conceptual stage. Currently, failure identification procedures (such as FMEA (Failure Modes and Effects Analysis), FMECA (Failure Modes, Effects and Criticality Analysis) and FTA (Fault Tree Analysis)) and design of experiments are being used for quality control and for the detection of potential failure modes during the detail design stage or post-product launch. Though all of these methods have their own advantages, they do not give information as to what are the predominant failures that a designer should focus on while designing a product. This work uses a functional approach to identify failure modes, which hypothesizes that similarities exist between different failure modes based on the functionality of the product/component. In this paper, a statistical clustering procedure is proposed to retrieve information on the set of predominant failures that a function experiences. The various stages of the methodology are illustrated using a hypothetical design example.
Buonaccorsi, G A; Rose, C J; O'Connor, J P B; Roberts, C; Watson, Y; Jackson, A; Jayson, G C; Parker, G J M
2010-01-01
Clinical trials of anti-angiogenic and vascular-disrupting agents often use biomarkers derived from DCE-MRI, typically reporting whole-tumor summary statistics and so overlooking spatial parameter variations caused by tissue heterogeneity. We present a data-driven segmentation method comprising tracer-kinetic model-driven registration for motion correction, conversion from MR signal intensity to contrast agent concentration for cross-visit normalization, iterative principal components analysis for imputation of missing data and dimensionality reduction, and statistical outlier detection using the minimum covariance determinant to obtain a robust Mahalanobis distance. After applying these techniques we cluster in the principal components space using k-means. We present results from a clinical trial of a VEGF inhibitor, using time-series data selected because of problems due to motion and outlier time series. We obtained spatially-contiguous clusters that map to regions with distinct microvascular characteristics. This methodology has the potential to uncover localized effects in trials using DCE-MRI-based biomarkers.
Wide-Field Lensing Mass Maps from Dark Energy Survey Science Verification Data.
Chang, C; Vikram, V; Jain, B; Bacon, D; Amara, A; Becker, M R; Bernstein, G; Bonnett, C; Bridle, S; Brout, D; Busha, M; Frieman, J; Gaztanaga, E; Hartley, W; Jarvis, M; Kacprzak, T; Kovács, A; Lahav, O; Lin, H; Melchior, P; Peiris, H; Rozo, E; Rykoff, E; Sánchez, C; Sheldon, E; Troxel, M A; Wechsler, R; Zuntz, J; Abbott, T; Abdalla, F B; Allam, S; Annis, J; Bauer, A H; Benoit-Lévy, A; Brooks, D; Buckley-Geer, E; Burke, D L; Capozzi, D; Carnero Rosell, A; Carrasco Kind, M; Castander, F J; Crocce, M; D'Andrea, C B; Desai, S; Diehl, H T; Dietrich, J P; Doel, P; Eifler, T F; Evrard, A E; Fausti Neto, A; Flaugher, B; Fosalba, P; Gruen, D; Gruendl, R A; Gutierrez, G; Honscheid, K; James, D; Kent, S; Kuehn, K; Kuropatkin, N; Maia, M A G; March, M; Martini, P; Merritt, K W; Miller, C J; Miquel, R; Neilsen, E; Nichol, R C; Ogando, R; Plazas, A A; Romer, A K; Roodman, A; Sako, M; Sanchez, E; Sevilla, I; Smith, R C; Soares-Santos, M; Sobreira, F; Suchyta, E; Tarle, G; Thaler, J; Thomas, D; Tucker, D; Walker, A R
2015-07-31
We present a mass map reconstructed from weak gravitational lensing shear measurements over 139 deg2 from the Dark Energy Survey science verification data. The mass map probes both luminous and dark matter, thus providing a tool for studying cosmology. We find good agreement between the mass map and the distribution of massive galaxy clusters identified using a red-sequence cluster finder. Potential candidates for superclusters and voids are identified using these maps. We measure the cross-correlation between the mass map and a magnitude-limited foreground galaxy sample and find a detection at the 6.8σ level with 20 arc min smoothing. These measurements are consistent with simulated galaxy catalogs based on N-body simulations from a cold dark matter model with a cosmological constant. This suggests low systematics uncertainties in the map. We summarize our key findings in this Letter; the detailed methodology and tests for systematics are presented in a companion paper.
NASA Astrophysics Data System (ADS)
Ehler, Martin; Rajapakse, Vinodh; Zeeberg, Barry; Brooks, Brian; Brown, Jacob; Czaja, Wojciech; Bonner, Robert F.
The gene networks underlying closure of the optic fissure during vertebrate eye development are poorly understood. We used a novel clustering method based on Laplacian Eigenmaps, a nonlinear dimension reduction method, to analyze microarray data from laser capture microdissected (LCM) cells at the site and developmental stages (days 10.5 to 12.5) of optic fissure closure. Our new method provided greater biological specificity than classical clustering algorithms in terms of identifying more biological processes and functions related to eye development as defined by Gene Ontology at lower false discovery rates. This new methodology builds on the advantages of LCM to isolate pure phenotypic populations within complex tissues and allows improved ability to identify critical gene products expressed at lower copy number. The combination of LCM of embryonic organs, gene expression microarrays, and extracting spatial and temporal co-variations appear to be a powerful approach to understanding the gene regulatory networks that specify mammalian organogenesis.
Anomaly Detection of Electromyographic Signals.
Ijaz, Ahsan; Choi, Jongeun
2018-04-01
In this paper, we provide a robust framework to detect anomalous electromyographic (EMG) signals and identify contamination types. As a first step for feature selection, optimally selected Lawton wavelets transform is applied. Robust principal component analysis (rPCA) is then performed on these wavelet coefficients to obtain features in a lower dimension. The rPCA based features are used for constructing a self-organizing map (SOM). Finally, hierarchical clustering is applied on the SOM that separates anomalous signals residing in the smaller clusters and breaks them into logical units for contamination identification. The proposed methodology is tested using synthetic and real world EMG signals. The synthetic EMG signals are generated using a heteroscedastic process mimicking desired experimental setups. A sub-part of these synthetic signals is introduced with anomalies. These results are followed with real EMG signals introduced with synthetic anomalies. Finally, a heterogeneous real world data set is used with known quality issues under an unsupervised setting. The framework provides recall of 90% (± 3.3) and precision of 99%(±0.4).
Choosing a Cluster Sampling Design for Lot Quality Assurance Sampling Surveys
Hund, Lauren; Bedrick, Edward J.; Pagano, Marcello
2015-01-01
Lot quality assurance sampling (LQAS) surveys are commonly used for monitoring and evaluation in resource-limited settings. Recently several methods have been proposed to combine LQAS with cluster sampling for more timely and cost-effective data collection. For some of these methods, the standard binomial model can be used for constructing decision rules as the clustering can be ignored. For other designs, considered here, clustering is accommodated in the design phase. In this paper, we compare these latter cluster LQAS methodologies and provide recommendations for choosing a cluster LQAS design. We compare technical differences in the three methods and determine situations in which the choice of method results in a substantively different design. We consider two different aspects of the methods: the distributional assumptions and the clustering parameterization. Further, we provide software tools for implementing each method and clarify misconceptions about these designs in the literature. We illustrate the differences in these methods using vaccination and nutrition cluster LQAS surveys as example designs. The cluster methods are not sensitive to the distributional assumptions but can result in substantially different designs (sample sizes) depending on the clustering parameterization. However, none of the clustering parameterizations used in the existing methods appears to be consistent with the observed data, and, consequently, choice between the cluster LQAS methods is not straightforward. Further research should attempt to characterize clustering patterns in specific applications and provide suggestions for best-practice cluster LQAS designs on a setting-specific basis. PMID:26125967
Choosing a Cluster Sampling Design for Lot Quality Assurance Sampling Surveys.
Hund, Lauren; Bedrick, Edward J; Pagano, Marcello
2015-01-01
Lot quality assurance sampling (LQAS) surveys are commonly used for monitoring and evaluation in resource-limited settings. Recently several methods have been proposed to combine LQAS with cluster sampling for more timely and cost-effective data collection. For some of these methods, the standard binomial model can be used for constructing decision rules as the clustering can be ignored. For other designs, considered here, clustering is accommodated in the design phase. In this paper, we compare these latter cluster LQAS methodologies and provide recommendations for choosing a cluster LQAS design. We compare technical differences in the three methods and determine situations in which the choice of method results in a substantively different design. We consider two different aspects of the methods: the distributional assumptions and the clustering parameterization. Further, we provide software tools for implementing each method and clarify misconceptions about these designs in the literature. We illustrate the differences in these methods using vaccination and nutrition cluster LQAS surveys as example designs. The cluster methods are not sensitive to the distributional assumptions but can result in substantially different designs (sample sizes) depending on the clustering parameterization. However, none of the clustering parameterizations used in the existing methods appears to be consistent with the observed data, and, consequently, choice between the cluster LQAS methods is not straightforward. Further research should attempt to characterize clustering patterns in specific applications and provide suggestions for best-practice cluster LQAS designs on a setting-specific basis.
Class imbalance in unsupervised change detection - A diagnostic analysis from urban remote sensing
NASA Astrophysics Data System (ADS)
Leichtle, Tobias; Geiß, Christian; Lakes, Tobia; Taubenböck, Hannes
2017-08-01
Automatic monitoring of changes on the Earth's surface is an intrinsic capability and simultaneously a persistent methodological challenge in remote sensing, especially regarding imagery with very-high spatial resolution (VHR) and complex urban environments. In order to enable a high level of automatization, the change detection problem is solved in an unsupervised way to alleviate efforts associated with collection of properly encoded prior knowledge. In this context, this paper systematically investigates the nature and effects of class distribution and class imbalance in an unsupervised binary change detection application based on VHR imagery over urban areas. For this purpose, a diagnostic framework for sensitivity analysis of a large range of possible degrees of class imbalance is presented, which is of particular importance with respect to unsupervised approaches where the content of images and thus the occurrence and the distribution of classes are generally unknown a priori. Furthermore, this framework can serve as a general technique to evaluate model transferability in any two-class classification problem. The applied change detection approach is based on object-based difference features calculated from VHR imagery and subsequent unsupervised two-class clustering using k-means, genetic k-means and self-organizing map (SOM) clustering. The results from two test sites with different structural characteristics of the built environment demonstrated that classification performance is generally worse in imbalanced class distribution settings while best results were reached in balanced or close to balanced situations. Regarding suitable accuracy measures for evaluating model performance in imbalanced settings, this study revealed that the Kappa statistics show significant response to class distribution while the true skill statistic was widely insensitive to imbalanced classes. In general, the genetic k-means clustering algorithm achieved the most robust results with respect to class imbalance while the SOM clustering exhibited a distinct optimization towards a balanced distribution of classes.
Anderson, Roy; Farrell, Sam; Turner, Hugo; Walson, Judd; Donnelly, Christl A; Truscott, James
2017-02-17
A method is outlined for the use of an individual-based stochastic model of parasite transmission dynamics to assess different designs for a cluster randomized trial in which mass drug administration (MDA) is employed in attempts to eliminate the transmission of soil-transmitted helminths (STH) in defined geographic locations. The hypothesis to be tested is: Can MDA alone interrupt the transmission of STH species in defined settings? Clustering is at a village level and the choice of clusters of villages is stratified by transmission intensity (low, medium and high) and parasite species mix (either Ascaris, Trichuris or hookworm dominant). The methodological approach first uses an age-structured deterministic model to predict the MDA coverage required for treating pre-school aged children (Pre-SAC), school aged children (SAC) and adults (Adults) to eliminate transmission (crossing the breakpoint in transmission created by sexual mating in dioecious helminths) with 3 rounds of annual MDA. Stochastic individual-based models are then used to calculate the positive and negative predictive values (PPV and NPV, respectively, for observing elimination or the bounce back of infection) for a defined prevalence of infection 2 years post the cessation of MDA. For the arm only involving the treatment of Pre-SAC and SAC, the failure rate is predicted to be very high (particularly for hookworm-infected villages) unless transmission intensity is very low (R 0 , or the effective reproductive number R, just above unity in value). The calculations are designed to consider various trial arms and stratifications; namely, community-based treatment and Pre-SAC and SAC only treatment (the two arms of the trial), different STH transmission settings of low, medium and high, and different STH species mixes. Results are considered in the light of the complications introduced by the choice of statistic to define success or failure, varying adherence to treatment, migration and parameter uncertainty.
Spadone, Sara; de Pasquale, Francesco; Mantini, Dante; Della Penna, Stefania
2012-09-01
Independent component analysis (ICA) is typically applied on functional magnetic resonance imaging, electroencephalographic and magnetoencephalographic (MEG) data due to its data-driven nature. In these applications, ICA needs to be extended from single to multi-session and multi-subject studies for interpreting and assigning a statistical significance at the group level. Here a novel strategy for analyzing MEG independent components (ICs) is presented, Multivariate Algorithm for Grouping MEG Independent Components K-means based (MAGMICK). The proposed approach is able to capture spatio-temporal dynamics of brain activity in MEG studies by running ICA at subject level and then clustering the ICs across sessions and subjects. Distinctive features of MAGMICK are: i) the implementation of an efficient set of "MEG fingerprints" designed to summarize properties of MEG ICs as they are built on spatial, temporal and spectral parameters; ii) the implementation of a modified version of the standard K-means procedure to improve its data-driven character. This algorithm groups the obtained ICs automatically estimating the number of clusters through an adaptive weighting of the parameters and a constraint on the ICs independence, i.e. components coming from the same session (at subject level) or subject (at group level) cannot be grouped together. The performances of MAGMICK are illustrated by analyzing two sets of MEG data obtained during a finger tapping task and median nerve stimulation. The results demonstrate that the method can extract consistent patterns of spatial topography and spectral properties across sessions and subjects that are in good agreement with the literature. In addition, these results are compared to those from a modified version of affinity propagation clustering method. The comparison, evaluated in terms of different clustering validity indices, shows that our methodology often outperforms the clustering algorithm. Eventually, these results are confirmed by a comparison with a MEG tailored version of the self-organizing group ICA, which is largely used for fMRI IC clustering. Copyright © 2012 Elsevier Inc. All rights reserved.
Analysis of the Seismicity Preceding Large Earthquakes
NASA Astrophysics Data System (ADS)
Stallone, A.; Marzocchi, W.
2016-12-01
The most common earthquake forecasting models assume that the magnitude of the next earthquake is independent from the past. This feature is probably one of the most severe limitations of the capability to forecast large earthquakes.In this work, we investigate empirically on this specific aspect, exploring whether spatial-temporal variations in seismicity encode some information on the magnitude of the future earthquakes. For this purpose, and to verify the universality of the findings, we consider seismic catalogs covering quite different space-time-magnitude windows, such as the Alto Tiberina Near Fault Observatory (TABOO) catalogue, and the California and Japanese seismic catalog. Our method is inspired by the statistical methodology proposed by Zaliapin (2013) to distinguish triggered and background earthquakes, using the nearest-neighbor clustering analysis in a two-dimension plan defined by rescaled time and space. In particular, we generalize the metric based on the nearest-neighbor to a metric based on the k-nearest-neighbors clustering analysis that allows us to consider the overall space-time-magnitude distribution of k-earthquakes (k-foreshocks) which anticipate one target event (the mainshock); then we analyze the statistical properties of the clusters identified in this rescaled space. In essence, the main goal of this study is to verify if different classes of mainshock magnitudes are characterized by distinctive k-foreshocks distribution. The final step is to show how the findings of this work may (or not) improve the skill of existing earthquake forecasting models.
A study on phenomenology of Dhat syndrome in men in a general medical setting.
Prakash, Sathya; Sharan, Pratap; Sood, Mamta
2016-01-01
"Dhat syndrome" is believed to be a culture-bound syndrome of the Indian subcontinent. Although many studies have been performed, many have methodological limitations and there is a lack of agreement in many areas. The aim is to study the phenomenology of "Dhat syndrome" in men and to explore the possibility of subtypes within this entity. It is a cross-sectional descriptive study conducted at a sex and marriage counseling clinic of a tertiary care teaching hospital in Northern India. An operational definition and assessment instrument for "Dhat syndrome" was developed after taking all concerned stakeholders into account and review of literature. It was applied on 100 patients along with socio-demographic profile, Hamilton Depression Rating Scale, Hamilton Anxiety Rating Scale, Mini International Neuropsychiatric Interview, and Postgraduate Institute Neuroticism Scale. For statistical analysis, descriptive statistics, group comparisons, and Pearson's product moment correlations were carried out. Factor analysis and cluster analysis were done to determine the factor structure and subtypes of "Dhat syndrome." A diagnostic and assessment instrument for "Dhat syndrome" has been developed and the phenomenology in 100 patients has been described. Both the health beliefs scale and associated symptoms scale demonstrated a three-factor structure. The patients with "Dhat syndrome" could be categorized into three clusters based on severity. There appears to be a significant agreement among various stakeholders on the phenomenology of "Dhat syndrome" although some differences exist. "Dhat syndrome" could be subtyped into three clusters based on severity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sahoo, Satiprasad; Dhar, Anirban, E-mail: anirban.dhar@gmail.com; Kar, Amlanjyoti
Environmental management of an area describes a policy for its systematic and sustainable environmental protection. In the present study, regional environmental vulnerability assessment in Hirakud command area of Odisha, India is envisaged based on Grey Analytic Hierarchy Process method (Grey–AHP) using integrated remote sensing (RS) and geographic information system (GIS) techniques. Grey–AHP combines the advantages of classical analytic hierarchy process (AHP) and grey clustering method for accurate estimation of weight coefficients. It is a new method for environmental vulnerability assessment. Environmental vulnerability index (EVI) uses natural, environmental and human impact related factors, e.g., soil, geology, elevation, slope, rainfall, temperature, windmore » speed, normalized difference vegetation index, drainage density, crop intensity, agricultural DRASTIC value, population density and road density. EVI map has been classified into four environmental vulnerability zones (EVZs) namely: ‘low’, ‘moderate’ ‘high’, and ‘extreme’ encompassing 17.87%, 44.44%, 27.81% and 9.88% of the study area, respectively. EVI map indicates that the northern part of the study area is more vulnerable from an environmental point of view. EVI map shows close correlation with elevation. Effectiveness of the zone classification is evaluated by using grey clustering method. General effectiveness is in between “better” and “common classes”. This analysis demonstrates the potential applicability of the methodology. - Highlights: • Environmental vulnerability zone identification based on Grey Analytic Hierarchy Process (AHP) • The effectiveness evaluation by means of a grey clustering method with support from AHP • Use of grey approach eliminates the excessive dependency on the experience of experts.« less
NASA Astrophysics Data System (ADS)
Micheletti, Natan; Tonini, Marj; Lane, Stuart N.
2017-02-01
Acquisition of high density point clouds using terrestrial laser scanners (TLSs) has become commonplace in geomorphic science. The derived point clouds are often interpolated onto regular grids and the grids compared to detect change (i.e. erosion and deposition/advancement movements). This procedure is necessary for some applications (e.g. digital terrain analysis), but it inevitably leads to a certain loss of potentially valuable information contained within the point clouds. In the present study, an alternative methodology for geomorphological analysis and feature detection from point clouds is proposed. It rests on the use of the Density-Based Spatial Clustering of Applications with Noise (DBSCAN), applied to TLS data for a rock glacier front slope in the Swiss Alps. The proposed methods allowed the detection and isolation of movements directly from point clouds which yield to accuracies in the following computation of volumes that depend only on the actual registered distance between points. We demonstrated that these values are more conservative than volumes computed with the traditional DEM comparison. The results are illustrated for the summer of 2015, a season of enhanced geomorphic activity associated with exceptionally high temperatures.
Gulliford, Martin C; van Staa, Tjeerd P; McDermott, Lisa; McCann, Gerard; Charlton, Judith; Dregan, Alex
2014-06-11
There is growing interest in conducting clinical and cluster randomized trials through electronic health records. This paper reports on the methodological issues identified during the implementation of two cluster randomized trials using the electronic health records of the Clinical Practice Research Datalink (CPRD). Two trials were completed in primary care: one aimed to reduce inappropriate antibiotic prescribing for acute respiratory infection; the other aimed to increase physician adherence with secondary prevention interventions after first stroke. The paper draws on documentary records and trial datasets to report on the methodological experience with respect to research ethics and research governance approval, general practice recruitment and allocation, sample size calculation and power, intervention implementation, and trial analysis. We obtained research governance approvals from more than 150 primary care organizations in England, Wales, and Scotland. There were 104 CPRD general practices recruited to the antibiotic trial and 106 to the stroke trial, with the target number of practices being recruited within six months. Interventions were installed into practice information systems remotely over the internet. The mean number of participants per practice was 5,588 in the antibiotic trial and 110 in the stroke trial, with the coefficient of variation of practice sizes being 0.53 and 0.56 respectively. Outcome measures showed substantial correlations between the 12 months before, and after intervention, with coefficients ranging from 0.42 for diastolic blood pressure to 0.91 for proportion of consultations with antibiotics prescribed, defining practice and participant eligibility for analysis requires careful consideration. Cluster randomized trials may be performed efficiently in large samples from UK general practices using the electronic health records of a primary care database. The geographical dispersal of trial sites presents a difficulty for research governance approval and intervention implementation. Pretrial data analyses should inform trial design and analysis plans. Current Controlled Trials ISRCTN 47558792 and ISRCTN 35701810 (both registered on 17 March 2010).
2014-01-01
Background There is growing interest in conducting clinical and cluster randomized trials through electronic health records. This paper reports on the methodological issues identified during the implementation of two cluster randomized trials using the electronic health records of the Clinical Practice Research Datalink (CPRD). Methods Two trials were completed in primary care: one aimed to reduce inappropriate antibiotic prescribing for acute respiratory infection; the other aimed to increase physician adherence with secondary prevention interventions after first stroke. The paper draws on documentary records and trial datasets to report on the methodological experience with respect to research ethics and research governance approval, general practice recruitment and allocation, sample size calculation and power, intervention implementation, and trial analysis. Results We obtained research governance approvals from more than 150 primary care organizations in England, Wales, and Scotland. There were 104 CPRD general practices recruited to the antibiotic trial and 106 to the stroke trial, with the target number of practices being recruited within six months. Interventions were installed into practice information systems remotely over the internet. The mean number of participants per practice was 5,588 in the antibiotic trial and 110 in the stroke trial, with the coefficient of variation of practice sizes being 0.53 and 0.56 respectively. Outcome measures showed substantial correlations between the 12 months before, and after intervention, with coefficients ranging from 0.42 for diastolic blood pressure to 0.91 for proportion of consultations with antibiotics prescribed, defining practice and participant eligibility for analysis requires careful consideration. Conclusions Cluster randomized trials may be performed efficiently in large samples from UK general practices using the electronic health records of a primary care database. The geographical dispersal of trial sites presents a difficulty for research governance approval and intervention implementation. Pretrial data analyses should inform trial design and analysis plans. Trial registration Current Controlled Trials ISRCTN 47558792 and ISRCTN 35701810 (both registered on 17 March 2010). PMID:24919485
Localization-based super-resolution imaging meets high-content screening.
Beghin, Anne; Kechkar, Adel; Butler, Corey; Levet, Florian; Cabillic, Marine; Rossier, Olivier; Giannone, Gregory; Galland, Rémi; Choquet, Daniel; Sibarita, Jean-Baptiste
2017-12-01
Single-molecule localization microscopy techniques have proven to be essential tools for quantitatively monitoring biological processes at unprecedented spatial resolution. However, these techniques are very low throughput and are not yet compatible with fully automated, multiparametric cellular assays. This shortcoming is primarily due to the huge amount of data generated during imaging and the lack of software for automation and dedicated data mining. We describe an automated quantitative single-molecule-based super-resolution methodology that operates in standard multiwell plates and uses analysis based on high-content screening and data-mining software. The workflow is compatible with fixed- and live-cell imaging and allows extraction of quantitative data like fluorophore photophysics, protein clustering or dynamic behavior of biomolecules. We demonstrate that the method is compatible with high-content screening using 3D dSTORM and DNA-PAINT based super-resolution microscopy as well as single-particle tracking.
Proctor, Enola; Luke, Douglas; Calhoun, Annaliese; McMillen, Curtis; Brownson, Ross; McCrary, Stacey; Padek, Margaret
2015-06-11
Little is known about how well or under what conditions health innovations are sustained and their gains maintained once they are put into practice. Implementation science typically focuses on uptake by early adopters of one healthcare innovation at a time. The later-stage challenges of scaling up and sustaining evidence-supported interventions receive too little attention. This project identifies the challenges associated with sustainability research and generates recommendations for accelerating and strengthening this work. A multi-method, multi-stage approach, was used: (1) identifying and recruiting experts in sustainability as participants, (2) conducting research on sustainability using concept mapping, (3) action planning during an intensive working conference of sustainability experts to expand the concept mapping quantitative results, and (4) consolidating results into a set of recommendations for research, methodological advances, and infrastructure building to advance understanding of sustainability. Participants comprised researchers, funders, and leaders in health, mental health, and public health with shared interest in the sustainability of evidence-based health care. Prompted to identify important issues for sustainability research, participants generated 91 distinct statements, for which a concept mapping process produced 11 conceptually distinct clusters. During the conference, participants built upon the concept mapping clusters to generate recommendations for sustainability research. The recommendations fell into three domains: (1) pursue high priority research questions as a unified agenda on sustainability; (2) advance methods for sustainability research; (3) advance infrastructure to support sustainability research. Implementation science needs to pursue later-stage translation research questions required for population impact. Priorities include conceptual consistency and operational clarity for measuring sustainability, developing evidence about the value of sustaining interventions over time, identifying correlates of sustainability along with strategies for sustaining evidence-supported interventions, advancing the theoretical base and research designs for sustainability research, and advancing the workforce capacity, research culture, and funding mechanisms for this important work.
Incremental Support Vector Machine Framework for Visual Sensor Networks
NASA Astrophysics Data System (ADS)
Awad, Mariette; Jiang, Xianhua; Motai, Yuichi
2006-12-01
Motivated by the emerging requirements of surveillance networks, we present in this paper an incremental multiclassification support vector machine (SVM) technique as a new framework for action classification based on real-time multivideo collected by homogeneous sites. The technique is based on an adaptation of least square SVM (LS-SVM) formulation but extends beyond the static image-based learning of current SVM methodologies. In applying the technique, an initial supervised offline learning phase is followed by a visual behavior data acquisition and an online learning phase during which the cluster head performs an ensemble of model aggregations based on the sensor nodes inputs. The cluster head then selectively switches on designated sensor nodes for future incremental learning. Combining sensor data offers an improvement over single camera sensing especially when the latter has an occluded view of the target object. The optimization involved alleviates the burdens of power consumption and communication bandwidth requirements. The resulting misclassification error rate, the iterative error reduction rate of the proposed incremental learning, and the decision fusion technique prove its validity when applied to visual sensor networks. Furthermore, the enabled online learning allows an adaptive domain knowledge insertion and offers the advantage of reducing both the model training time and the information storage requirements of the overall system which makes it even more attractive for distributed sensor networks communication.
NASA Astrophysics Data System (ADS)
Peruchena, Carlos M. Fernández; García-Barberena, Javier; Guisado, María Vicenta; Gastón, Martín
2016-05-01
The design of Concentrating Solar Thermal Power (CSTP) systems requires a detailed knowledge of the dynamic behavior of the meteorology at the site of interest. Meteorological series are often condensed into one representative year with the aim of data volume reduction and speeding-up of energy system simulations, defined as Typical Meteorological Year (TMY). This approach seems to be appropriate for rather detailed simulations of a specific plant; however, in previous stages of the design of a power plant, especially during the optimization of the large number of plant parameters before a final design is reached, a huge number of simulations are needed. Even with today's technology, the computational effort to simulate solar energy system performance with one year of data at high frequency (as 1-min) may become colossal if a multivariable optimization has to be performed. This work presents a simple and efficient methodology for selecting number of individual days able to represent the electrical production of the plant throughout the complete year. To achieve this objective, a new procedure for determining a reduced set of typical weather data in order to evaluate the long-term performance of a solar energy system is proposed. The proposed methodology is based on cluster analysis and permits to drastically reduce computational effort related to the calculation of a CSTP plant energy yield by simulating a reduced number of days from a high frequency TMY.
Space Launch System Base Heating Test: Experimental Operations & Results
NASA Technical Reports Server (NTRS)
Dufrene, Aaron; Mehta, Manish; MacLean, Matthew; Seaford, Mark; Holden, Michael
2016-01-01
NASA's Space Launch System (SLS) uses four clustered liquid rocket engines along with two solid rocket boosters. The interaction between all six rocket exhaust plumes will produce a complex and severe thermal environment in the base of the vehicle. This work focuses on a recent 2% scale, hot-fire SLS base heating test. These base heating tests are short-duration tests executed with chamber pressures near the full-scale values with gaseous hydrogen/oxygen engines and RSRMV analogous solid propellant motors. The LENS II shock tunnel/Ludwieg tube tunnel was used at or near flight duplicated conditions up to Mach 5. Model development was based on the Space Shuttle base heating tests with several improvements including doubling of the maximum chamber pressures and duplication of freestream conditions. Test methodology and conditions are presented, and base heating results from 76 runs are reported in non-dimensional form. Regions of high heating are identified and comparisons of various configuration and conditions are highlighted. Base pressure and radiometer results are also reported.
A harmonic linear dynamical system for prominent ECG feature extraction.
Thi, Ngoc Anh Nguyen; Yang, Hyung-Jeong; Kim, SunHee; Do, Luu Ngoc
2014-01-01
Unsupervised mining of electrocardiography (ECG) time series is a crucial task in biomedical applications. To have efficiency of the clustering results, the prominent features extracted from preprocessing analysis on multiple ECG time series need to be investigated. In this paper, a Harmonic Linear Dynamical System is applied to discover vital prominent features via mining the evolving hidden dynamics and correlations in ECG time series. The discovery of the comprehensible and interpretable features of the proposed feature extraction methodology effectively represents the accuracy and the reliability of clustering results. Particularly, the empirical evaluation results of the proposed method demonstrate the improved performance of clustering compared to the previous main stream feature extraction approaches for ECG time series clustering tasks. Furthermore, the experimental results on real-world datasets show scalability with linear computation time to the duration of the time series.
NASA Astrophysics Data System (ADS)
Ermida, Sofia; DaCamara, Carlos C.; Trigo, Isabel F.; Pires, Ana C.; Ghent, Darren
2017-04-01
Land Surface Temperature (LST) is a key climatological variable and a diagnostic parameter of land surface conditions. Remote sensing constitutes the most effective method to observe LST over large areas and on a regular basis. Although LST estimation from remote sensing instruments operating in the Infrared (IR) is widely used and has been performed for nearly 3 decades, there is still a list of open issues. One of these is the LST dependence on viewing and illumination geometry. This effect introduces significant discrepancies among LST estimations from different sensors, overlapping in space and time, that are not related to uncertainties in the methodologies or input data used. Furthermore, these directional effects deviate LST products from an ideally defined LST, which should represent to the ensemble of directional radiometric temperature of all surface elements within the FOV. Angular effects on LST are here conveniently estimated by means of a kernel model of the surface thermal emission, which describes the angular dependence of LST as a function of viewing and illumination geometry. The model is calibrated using LST data as provided by a wide range of sensors to optimize spatial coverage, namely: 1) a LEO sensor - the Moderate Resolution Imaging Spectroradiometer (MODIS) on-board NASA's TERRA and AQUA; and 2) 3 GEO sensors - the Spinning Enhanced Visible and Infrared Imager (SEVIRI) on-board EUMETSAT's Meteosat Second Generation (MSG), the Japanese Meteorological Imager (JAMI) on-board the Japanese Meteorological Association (JMA) Multifunction Transport SATellite (MTSAT-2), and NASA's Geostationary Operational Environmental Satellites (GOES). As shown in our previous feasibility studies the sampling of illumination and view angles has a high impact on the obtained model parameters. This impact may be mitigated when the sampling size is increased by aggregating pixels with similar surface conditions. Here we propose a methodology where land surface is stratified by means of a cluster analysis using information on land cover type, fraction of vegetation cover and topography. The kernel model is then adjusted to LST data corresponding to each cluster. It is shown that the quality of the cluster based kernel model is very close to the pixel based one. Furthermore, the reduced number of parameters (limited to the number of identified clusters, instead of a pixel-by-pixel model calibration) allows improving the kernel model trough the incorporation of a seasonal component. The application of the here discussed procedure towards the harmonization of LST products from multi-sensors is on the framework of the ESA DUE GlobTemperature project.
VariantSpark: population scale clustering of genotype information.
O'Brien, Aidan R; Saunders, Neil F W; Guo, Yi; Buske, Fabian A; Scott, Rodney J; Bauer, Denis C
2015-12-10
Genomic information is increasingly used in medical practice giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. The widely used Hadoop MapReduce architecture and associated machine learning library, Mahout, provide the means for tackling computationally challenging tasks. However, many genomic analyses do not fit the Map-Reduce paradigm. We therefore utilise the recently developed SPARK engine, along with its associated machine learning library, MLlib, which offers more flexibility in the parallelisation of population-scale bioinformatics tasks. The resulting tool, VARIANTSPARK provides an interface from MLlib to the standard variant format (VCF), offers seamless genome-wide sampling of variants and provides a pipeline for visualising results. To demonstrate the capabilities of VARIANTSPARK, we clustered more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VARIANTSPARK is 80 % faster than the SPARK-based genome clustering approach, ADAM, the comparable implementation using Hadoop/Mahout, as well as ADMIXTURE, a commonly used tool for determining individual ancestries. It is over 90 % faster than traditional implementations using R and Python. The benefits of speed, resource consumption and scalability enables VARIANTSPARK to open up the usage of advanced, efficient machine learning algorithms to genomic data.
Application of a clustering-remote sensing method in analyzing security patterns
NASA Astrophysics Data System (ADS)
López-Caloca, Alejandra; Martínez-Viveros, Elvia; Chapela-Castañares, José Ignacio
2009-04-01
In Mexican academic and government circles, research on criminal spatial behavior has been neglected. Only recently has there been an interest in criminal data geo-reference. However, more sophisticated spatial analyses models are needed to disclose spatial patterns of crime and pinpoint their changes overtime. The main use of these models lies in supporting policy making and strategic intelligence. In this paper we present a model for finding patterns associated with crime. It is based on a fuzzy logic algorithm which finds the best fit within cluster numbers and shapes of groupings. We describe the methodology for building the model and its validation. The model was applied to annual data for types of felonies from 2005 to 2006 in the Mexican city of Hermosillo. The results are visualized as a standard deviational ellipse computed for the points identified to be a "cluster". These areas indicate a high to low demand for public security, and they were cross-related to urban structure analyzed by SPOT images and statistical data such as population, poverty levels, urbanization, and available services. The fusion of the model results with other geospatial data allows detecting obstacles and opportunities for crime commission in specific high risk zones and guide police activities and criminal investigations.
NASA Astrophysics Data System (ADS)
Ryu, Hoon; Jeong, Yosang; Kang, Ji-Hoon; Cho, Kyu Nam
2016-12-01
Modelling of multi-million atomic semiconductor structures is important as it not only predicts properties of physically realizable novel materials, but can accelerate advanced device designs. This work elaborates a new Technology-Computer-Aided-Design (TCAD) tool for nanoelectronics modelling, which uses a sp3d5s∗ tight-binding approach to describe multi-million atomic structures, and simulate electronic structures with high performance computing (HPC), including atomic effects such as alloy and dopant disorders. Being named as Quantum simulation tool for Advanced Nanoscale Devices (Q-AND), the tool shows nice scalability on traditional multi-core HPC clusters implying the strong capability of large-scale electronic structure simulations, particularly with remarkable performance enhancement on latest clusters of Intel Xeon PhiTM coprocessors. A review of the recent modelling study conducted to understand an experimental work of highly phosphorus-doped silicon nanowires, is presented to demonstrate the utility of Q-AND. Having been developed via Intel Parallel Computing Center project, Q-AND will be open to public to establish a sound framework of nanoelectronics modelling with advanced HPC clusters of a many-core base. With details of the development methodology and exemplary study of dopant electronics, this work will present a practical guideline for TCAD development to researchers in the field of computational nanoelectronics.
The distribution of mass for spiral galaxies in clusters and in the field
DOE Office of Scientific and Technical Information (OSTI.GOV)
Forbes, D.A.; Whitmore, B.C.
1989-04-01
A comparison is made between the mass distributions of spiral galaxies in clusters and in the field using Burstein's mass-type methodology. Both the H-alpha emission-line rotation curves and more extended H I rotation curves are used. The fitting technique for determining mass types used by Burstein and coworkers has been replaced by an objective chi-sq method. Mass types are shown to be a function of both the Hubble type and luminosity, contrary to earlier results. The present data show a difference in the distribution of mass types for spiral galaxies in the field and in clusters, in the sense thatmore » mass type I galaxies, where the inner and outer velocity gradients are similar, are generally found in the field rather than in clusters. This can be understood in terms of the results of Whitmore, Forbes, and Rubin (1988), who find that the rotation curves of galaxies in the central region of clusters are generally failing, while the outer galaxies in a cluster and field galaxies tend to have flat or rising rotation curves. 15 refs.« less
Py, Béatrice; Barras, Frédéric
2015-06-01
Since their discovery in the 50's, Fe-S cluster proteins have attracted much attention from chemists, biophysicists and biochemists. However, in the 80's they were joined by geneticists who helped to realize that in vivo maturation of Fe-S cluster bound proteins required assistance of a large number of factors defining complex multi-step pathways. The question of how clusters are formed and distributed in vivo has since been the focus of much effort. Here we review how genetics in discovering genes and investigating processes as they unfold in vivo has provoked seminal advances toward our understanding of Fe-S cluster biogenesis. The power and limitations of genetic approaches are discussed. As a final comment, we argue how the marriage of classic strategies and new high-throughput technologies should allow genetics of Fe-S cluster biology to be even more insightful in the future. This article is part of a Special Issue entitled: Fe/S proteins: Analysis, structure, function, biogenesis and diseases. Copyright © 2015 Elsevier B.V. All rights reserved.
Spatial Patterns of High Aedes aegypti Oviposition Activity in Northwestern Argentina
Estallo, Elizabet Lilia; Más, Guillermo; Vergara-Cid, Carolina; Lanfri, Mario Alberto; Ludueña-Almeida, Francisco; Scavuzzo, Carlos Marcelo; Introini, María Virginia; Zaidenberg, Mario; Almirón, Walter Ricardo
2013-01-01
Background In Argentina, dengue has affected mainly the Northern provinces, including Salta. The objective of this study was to analyze the spatial patterns of high Aedes aegypti oviposition activity in San Ramón de la Nueva Orán, northwestern Argentina. The location of clusters as hot spot areas should help control programs to identify priority areas and allocate their resources more effectively. Methodology Oviposition activity was detected in Orán City (Salta province) using ovitraps, weekly replaced (October 2005–2007). Spatial autocorrelation was measured with Moran’s Index and depicted through cluster maps to identify hot spots. Total egg numbers were spatially interpolated and a classified map with Ae. aegypti high oviposition activity areas was performed. Potential breeding and resting (PBR) sites were geo-referenced. A logistic regression analysis of interpolated egg numbers and PBR location was performed to generate a predictive mapping of mosquito oviposition activity. Principal Findings Both cluster maps and predictive map were consistent, identifying in central and southern areas of the city high Ae. aegypti oviposition activity. A logistic regression model was successfully developed to predict Ae. aegypti oviposition activity based on distance to PBR sites, with tire dumps having the strongest association with mosquito oviposition activity. A predictive map reflecting probability of oviposition activity was produced. The predictive map delimitated an area of maximum probability of Ae. aegypti oviposition activity in the south of Orán city where tire dumps predominate. The overall fit of the model was acceptable (ROC = 0.77), obtaining 99% of sensitivity and 75.29% of specificity. Conclusions Distance to tire dumps is inversely associated with high mosquito activity, allowing us to identify hot spots. These methodologies are useful for prevention, surveillance, and control of tropical vector borne diseases and might assist National Health Ministry to focus resources more effectively. PMID:23349813
SOTXTSTREAM: Density-based self-organizing clustering of text streams.
Bryant, Avory C; Cios, Krzysztof J
2017-01-01
A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets.
Goodwin, Cody R; Sherrod, Stacy D; Marasco, Christina C; Bachmann, Brian O; Schramm-Sapyta, Nicole; Wikswo, John P; McLean, John A
2014-07-01
A metabolic system is composed of inherently interconnected metabolic precursors, intermediates, and products. The analysis of untargeted metabolomics data has conventionally been performed through the use of comparative statistics or multivariate statistical analysis-based approaches; however, each falls short in representing the related nature of metabolic perturbations. Herein, we describe a complementary method for the analysis of large metabolite inventories using a data-driven approach based upon a self-organizing map algorithm. This workflow allows for the unsupervised clustering, and subsequent prioritization of, correlated features through Gestalt comparisons of metabolic heat maps. We describe this methodology in detail, including a comparison to conventional metabolomics approaches, and demonstrate the application of this method to the analysis of the metabolic repercussions of prolonged cocaine exposure in rat sera profiles.
Functional interrogation of non-coding DNA through CRISPR genome editing
Canver, Matthew C.; Bauer, Daniel E.; Orkin, Stuart H.
2017-01-01
Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. PMID:28288828
The study of direct-to-consumer advertising for prescription drugs.
Schommer, Jon C; Hansen, Richard A
2005-06-01
The objectives of this article are to (1) identify key methodological issues related to investigating the effects of direct-to-consumer advertising (DTCA) for prescription drugs, (2) highlight opportunities and challenges that these issues pose, and (3) provide suggestions to address these challenges and opportunities from a social and administrative pharmacy perspective. Through a review of existing literature and consultation with research colleagues, we identified 3 broad issues regarding the study of DTCA for prescription drugs: (1) the importance of problem formulation, (2) the role of health behavior and decision-making perspectives, and (3) data collection and data analysis challenges and opportunities. Based upon our findings, we developed recommendations for future research in this area. Clear problem formulation will be instructive for prioritizing research needs and for determining the role that health behavior and decision-making perspectives can serve in DTCA research. In addition, it appears that cluster bias, nonlinear relationships, mediating/moderating effects, time effects, acquiescent response, and case mix are particularly salient challenges for the DTCA research domain. We suggest that problem formulation, selection of sound theories upon which to base research, and data collection and data analysis challenges are key methodological issues related to investigating the effects of DTCA for prescription drugs.
Consensus-based methodology for detection communities in multilayered networks
NASA Astrophysics Data System (ADS)
Karimi-Majd, Amir-Mohsen; Fathian, Mohammad; Makrehchi, Masoud
2018-03-01
Finding groups of network users who are densely related with each other has emerged as an interesting problem in the area of social network analysis. These groups or so-called communities would be hidden behind the behavior of users. Most studies assume that such behavior could be understood by focusing on user interfaces, their behavioral attributes or a combination of these network layers (i.e., interfaces with their attributes). They also assume that all network layers refer to the same behavior. However, in real-life networks, users' behavior in one layer may differ from their behavior in another one. In order to cope with these issues, this article proposes a consensus-based community detection approach (CBC). CBC finds communities among nodes at each layer, in parallel. Then, the results of layers should be aggregated using a consensus clustering method. This means that different behavior could be detected and used in the analysis. As for other significant advantages, the methodology would be able to handle missing values. Three experiments on real-life and computer-generated datasets have been conducted in order to evaluate the performance of CBC. The results indicate superiority and stability of CBC in comparison to other approaches.
Nath, Anjali K.; Krauthammer, Michael; Li, Puyao; Davidov, Eugene; Butler, Lucas C.; Copel, Joshua; Katajamaa, Mikko; Oresic, Matej; Buhimschi, Irina; Buhimschi, Catalin; Snyder, Michael; Madri, Joseph A.
2009-01-01
Background Cardiovascular development is vital for embryonic survival and growth. Early gestation embryo loss or malformation has been linked to yolk sac vasculopathy and congenital heart defects (CHDs). However, the molecular pathways that underlie these structural defects in humans remain largely unknown hindering the development of molecular-based diagnostic tools and novel therapies. Methodology/Principal Findings Murine embryos were exposed to high glucose, a condition known to induce cardiovascular defects in both animal models and humans. We further employed a mass spectrometry-based proteomics approach to identify proteins differentially expressed in embryos with defects from those with normal cardiovascular development. The proteins detected by mass spectrometry (WNT16, ST14, Pcsk1, Jumonji, Morca2a, TRPC5, and others) were validated by Western blotting and immunoflorescent staining of the yolk sac and heart. The proteins within the proteomic dataset clustered to adhesion/migration, differentiation, transport, and insulin signaling pathways. A functional role for several proteins (WNT16, ADAM15 and NOGO-A/B) was demonstrated in an ex vivo model of heart development. Additionally, a successful application of a cluster of protein biomarkers (WNT16, ST14 and Pcsk1) as a prenatal screen for CHDs was confirmed in a study of human amniotic fluid (AF) samples from women carrying normal fetuses and those with CHDs. Conclusions/Significance The novel finding that WNT16, ST14 and Pcsk1 protein levels increase in fetuses with CHDs suggests that these proteins may play a role in the etiology of human CHDs. The information gained through this bed-side to bench translational approach contributes to a more complete understanding of the protein pathways dysregulated during cardiovascular development and provides novel avenues for diagnostic and therapeutic interventions, beneficial to fetuses at risk for CHDs. PMID:19156209
Which catchment characteristics control the temporal dependence structure of daily river flows?
NASA Astrophysics Data System (ADS)
Chiverton, Andrew; Hannaford, Jamie; Holman, Ian; Corstanje, Ron; Prudhomme, Christel; Bloomfield, John; Hess, Tim
2014-05-01
A hydrological classification system would provide information about the dominant processes in the catchment enabling information to be transferred between catchments. Currently there is no widely-agreed upon system for classifying river catchments. This paper developed a novel approach to assess the influence that catchment characteristics have on the precipitation-to-flow relationship, using a catchment classification based on the average temporal dependence structure in daily river flow data over the period 1980 to 2010. Temporal dependence in river flow data is driven by the flow pathways, connectivity and storage within the catchment. Temporal dependence was analysed by creating temporally averaged semi-variograms for a set of 116 near-natural catchments (in order to prevent direct anthropogenic disturbances influencing the results) distributed throughout the UK. Cluster analysis, using the variogram, classified the catchments into four well defined clusters driven by the interaction of catchment characteristics, predominantly characteristics which influence the precipitation-to-flow relationship. Geology, depth to gleyed layer in soils, slope of the catchment and the percentage of arable land were significantly different between the clusters. These characteristics drive the temporal dependence structure by influencing the rate at which water moves through the catchment and / or the storage in the catchment. Arable land is correlated with several other variables, hence is a proxy indicating the residence time of the water in the catchment. Finally, quadratic discriminant analysis was used to show that a model with five catchment characteristics is able to predict the temporal dependence structure for un-gauged catchments. This work demonstrates that a variogram-based approach is a powerful and flexible methodology for grouping catchments based on the precipitation-to-flow relationship which could be applied to any set of catchments with a relatively complete daily river flow record.
Sample size calculations for the design of cluster randomized trials: A summary of methodology.
Gao, Fei; Earnest, Arul; Matchar, David B; Campbell, Michael J; Machin, David
2015-05-01
Cluster randomized trial designs are growing in popularity in, for example, cardiovascular medicine research and other clinical areas and parallel statistical developments concerned with the design and analysis of these trials have been stimulated. Nevertheless, reviews suggest that design issues associated with cluster randomized trials are often poorly appreciated and there remain inadequacies in, for example, describing how the trial size is determined and the associated results are presented. In this paper, our aim is to provide pragmatic guidance for researchers on the methods of calculating sample sizes. We focus attention on designs with the primary purpose of comparing two interventions with respect to continuous, binary, ordered categorical, incidence rate and time-to-event outcome variables. Issues of aggregate and non-aggregate cluster trials, adjustment for variation in cluster size and the effect size are detailed. The problem of establishing the anticipated magnitude of between- and within-cluster variation to enable planning values of the intra-cluster correlation coefficient and the coefficient of variation are also described. Illustrative examples of calculations of trial sizes for each endpoint type are included. Copyright © 2015 Elsevier Inc. All rights reserved.
Astrostatistical Analysis in Solar and Stellar Physics
NASA Astrophysics Data System (ADS)
Stenning, David Craig
This dissertation focuses on developing statistical models and methods to address data-analytic challenges in astrostatistics---a growing interdisciplinary field fostering collaborations between statisticians and astrophysicists. The astrostatistics projects we tackle can be divided into two main categories: modeling solar activity and Bayesian analysis of stellar evolution. These categories from Part I and Part II of this dissertation, respectively. The first line of research we pursue involves classification and modeling of evolving solar features. Advances in space-based observatories are increasing both the quality and quantity of solar data, primarily in the form of high-resolution images. To analyze massive streams of solar image data, we develop a science-driven dimension reduction methodology to extract scientifically meaningful features from images. This methodology utilizes mathematical morphology to produce a concise numerical summary of the magnetic flux distribution in solar "active regions'' that (i) is far easier to work with than the source images, (ii) encapsulates scientifically relevant information in a more informative manner than existing schemes (i.e., manual classification schemes), and (iii) is amenable to sophisticated statistical analyses. In a related line of research, we perform a Bayesian analysis of the solar cycle using multiple proxy variables, such as sunspot numbers. We take advantage of patterns and correlations among the proxy variables to model solar activity using data from proxies that have become available more recently, while also taking advantage of the long history of observations of sunspot numbers. This model is an extension of the Yu et al. (2012) Bayesian hierarchical model for the solar cycle that used the sunspot numbers alone. Since proxies have different temporal coverage, we devise a multiple imputation scheme to account for missing data. We find that incorporating multiple proxies reveals important features of the solar cycle that are missed when the model is fit using only the sunspot numbers. In Part II of this dissertation we focus on two related lines of research involving Bayesian analysis of stellar evolution. We first focus on modeling multiple stellar populations in star clusters. It has long been assumed that all star clusters are comprised of single stellar populations---stars that formed at roughly the same time from a common molecular cloud. However, recent studies have produced evidence that some clusters host multiple populations, which has far-reaching scientific implications. We develop a Bayesian hierarchical model for multiple-population star clusters, extending earlier statistical models of stellar evolution (e.g., van Dyk et al. 2009, Stein et al. 2013). We also devise an adaptive Markov chain Monte Carlo algorithm to explore the complex posterior distribution. We use numerical studies to demonstrate that our method can recover parameters of multiple-population clusters, and also show how model misspecification can be diagnosed. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We also explore statistical properties of the estimators and determine that the influence of the prior distribution does not diminish with larger sample sizes, leading to non-standard asymptotics. In a final line of research, we present the first-ever attempt to estimate the carbon fraction of white dwarfs. This quantity has important implications for both astrophysics and fundamental nuclear physics, but is currently unknown. We use a numerical study to demonstrate that assuming an incorrect value for the carbon fraction leads to incorrect white-dwarf ages of star clusters. Finally, we present our attempt to estimate the carbon fraction of the white dwarfs in the well-studied star cluster 47 Tucanae.
NASA Astrophysics Data System (ADS)
Baldwin, A. T.; Watkins, L. L.; van der Marel, R. P.; Bianchini, P.; Bellini, A.; Anderson, J.
2016-08-01
We make use of the Hubble Space Telescope proper-motion catalogs derived by Bellini et al. to produce the first radial velocity dispersion profiles σ (R) for blue straggler stars (BSSs) in Galactic globular clusters (GCs), as well as the first dynamical estimates for the average mass of the entire BSS population. We show that BSSs typically have lower velocity dispersions than stars with mass equal to the main-sequence turnoff mass, as one would expect for a more massive population of stars. Since GCs are expected to experience some degree of energy equipartition, we use the relation σ \\propto {M}-η , where η is related to the degree of energy equipartition, along with our velocity dispersion profiles to estimate BSS masses. We estimate η as a function of cluster relaxation from recent Monte Carlo cluster simulations by Bianchini et al. and then derive an average mass ratio {M}{BSS}/{M}{MSTO}=1.50+/- 0.14 and an average mass {M}{BSS}=1.22+/- 0.12 M ⊙ from 598 BSSs across 19 GCs. The final error bars include any systematic errors that are random between different clusters, but not any potential biases inherent to our methodology. Our results are in good agreement with the average mass of {M}{BSS}=1.22+/- 0.06 M ⊙ for the 35 BSSs in Galactic GCs in the literature with properties that have allowed individual mass determination. Based on proprietary and archival observations with the NASA/ESA Hubble Space Telescope, obtained at the Space Telescope Science Institute, which is operated by AURA, Inc., under NASA contract NAS 5-26555.
Kubas, Adam; Noak, Johannes
2017-01-01
Absorption and multiwavelength resonance Raman spectroscopy are widely used to investigate the electronic structure of transition metal centers in coordination compounds and extended solid systems. In combination with computational methodologies that have predictive accuracy, they define powerful protocols to study the spectroscopic response of catalytic materials. In this work, we study the absorption and resonance Raman spectra of the M1 MoVOx catalyst. The spectra were calculated by time-dependent density functional theory (TD-DFT) in conjunction with the independent mode displaced harmonic oscillator model (IMDHO), which allows for detailed bandshape predictions. For this purpose cluster models with up to 9 Mo and V metallic centers are considered to represent the bulk structure of MoVOx. Capping hydrogens were used to achieve valence saturation at the edges of the cluster models. The construction of model structures was based on a thorough bonding analysis which involved conventional DFT and local coupled cluster (DLPNO-CCSD(T)) methods. Furthermore the relationship of cluster topology to the computed spectral features is discussed in detail. It is shown that due to the local nature of the involved electronic transitions, band assignment protocols developed for molecular systems can be applied to describe the calculated spectral features of the cluster models as well. The present study serves as a reference for future applications of combined experimental and computational protocols in the field of solid-state heterogeneous catalysis. PMID:28989667
Investigations of stacking fault density in perpendicular recording media
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piramanayagam, S. N., E-mail: prem-SN@dsi.a-star.edu.sg; Varghese, Binni; Yang, Yi
In magnetic recording media, the grains or clusters reverse their magnetization over a range of reversal field, resulting in a switching field distribution. In order to achieve high areal densities, it is desirable to understand and minimize such a distribution. Clusters of grains which contain stacking faults (SF) or fcc phase have lower anisotropy, an order lower than those without them. It is believed that such low anisotropy regions reverse their magnetization at a much lower reversal field than the rest of the material with a larger anisotropy. Such clusters/grains cause recording performance deterioration, such as adjacent track erasure andmore » dc noise. Therefore, the observation of clusters that reverse at very low reversal fields (nucleation sites, NS) could give information on the noise and the adjacent track erasure. Potentially, the observed clusters could also provide information on the SF. In this paper, we study the reversal of nucleation sites in granular perpendicular media based on a magnetic force microscope (MFM) methodology and validate the observations with high resolution cross-section transmission electron microscopy (HRTEM) measurements. Samples, wherein a high anisotropy CoPt layer was introduced to control the NS or SF in a systematic way, were evaluated by MFM, TEM, and magnetometry. The magnetic properties indicated that the thickness of the CoPt layer results in an increase of nucleation sites. TEM measurements indicated a correlation between the thickness of CoPt layer and the stacking fault density. A clear correlation was also observed between the MFM results, TEM observations, and the coercivity and nucleation field of the samples, validating the effectiveness of the proposed method in evaluating the nucleation sites which potentially arise from stacking faults.« less
Vijaykumar, Archana; Saini, Ajay; Jawali, Narendra
2012-01-01
Background and aims Intra-species hybridization and incompletely homogenized ribosomal RNA repeat units have earlier been reported in 21 accessions of Vigna unguiculata from six subspecies using internal transcribed spacer (ITS) and 5S intergenic spacer (IGS) analyses. However, the relationships among these accessions were not clear from these analyses. We therefore assessed intra-species hybridization in the same set of accessions. Methodology Arbitrarily primed polymerase chain reaction (AP-PCR) analysis was carried out using 12 primers. The PCR products were resolved on agarose gels and the DNA fragments were scored manually. Genetic relationships were inferred by TREECON software using unweighted paired group method with arithmetic averages (UPGMA) cluster analysis evaluated by bootstrapping and compared with previous analyses based on ITS and 5S IGS. Principal results A total of 202 (86 %) fragments were found to be polymorphic and used for generating a genetic distance matrix. Twenty-one V. unguiculata accessions were grouped into three main clusters. The cultivated subspecies (var. unguiculata) and most of its wild progenitors (var. spontanea) were placed in cluster I along with ssp. pubescens and ssp. stenophylla. Whereas var. spontanea were grouped with ssp. alba and ssp. tenuis accessions in cluster II, ssp. alba and ssp. baoulensis were included in cluster III. Close affinities of ssp. unguiculata, ssp. alba and ssp. tenuis suggested inter-subspecies hybridization. Conclusions Multi-locus AP-PCR analysis reveals that intra-species hybridization is prevalent among V. unguiculata subspecies and suggests that grouping of accessions from two different subspecies is not solely due to the similarity in the ITS and 5S IGS regions but also due to other regions of the genome. PMID:22619698
Plancoulaine, Benoit; Laurinaviciene, Aida; Herlin, Paulette; Besusparis, Justinas; Meskauskas, Raimundas; Baltrusaityte, Indra; Iqbal, Yasir; Laurinavicius, Arvydas
2015-10-19
Digital image analysis (DIA) enables higher accuracy, reproducibility, and capacity to enumerate cell populations by immunohistochemistry; however, the most unique benefits may be obtained by evaluating the spatial distribution and intra-tissue variance of markers. The proliferative activity of breast cancer tissue, estimated by the Ki67 labeling index (Ki67 LI), is a prognostic and predictive biomarker requiring robust measurement methodologies. We performed DIA on whole-slide images (WSI) of 302 surgically removed Ki67-stained breast cancer specimens; the tumour classifier algorithm was used to automatically detect tumour tissue but was not trained to distinguish between invasive and non-invasive carcinoma cells. The WSI DIA-generated data were subsampled by hexagonal tiling (HexT). Distribution and texture parameters were compared to conventional WSI DIA and pathology report data. Factor analysis of the data set, including total numbers of tumor cells, the Ki67 LI and Ki67 distribution, and texture indicators, extracted 4 factors, identified as entropy, proliferation, bimodality, and cellularity. The factor scores were further utilized in cluster analysis, outlining subcategories of heterogeneous tumors with predominant entropy, bimodality, or both at different levels of proliferative activity. The methodology also allowed the visualization of Ki67 LI heterogeneity in tumors and the automated detection and quantitative evaluation of Ki67 hotspots, based on the upper quintile of the HexT data, conceptualized as the "Pareto hotspot". We conclude that systematic subsampling of DIA-generated data into HexT enables comprehensive Ki67 LI analysis that reflects aspects of intra-tumor heterogeneity and may serve as a methodology to improve digital immunohistochemistry in general.
antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters
Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko
2015-01-01
Abstract Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software. PMID:25948579
Using a statewide survey methodology to prioritize pediatric cardiology core content.
Neal, Ashley E; Lehto, Elizabeth; Miller, Karen Hughes; Ziegler, Craig; Davis, Erin
2018-01-01
Although pediatrician-reported relevance of Canadian cardiology-specific objectives has been studied, similar data are not available for the 2016 American Board of Pediatrics (ABP) cardiology-specific objectives. This study asked Kentucky trainees, pediatricians, and pediatric cardiologists to identify "most important" content within these objectives. This cross-sectional study used an original, online survey instrument based on the 2016 ABP cardiology-specific objectives. We collected quantitative data (numerical indications of importance) and qualitative data (open-ended replies regarding missing content and difficulty in teaching and learning). Respondents indicated the top two choices of most important items within eight content areas. Descriptive statistics (frequencies and percentages) and chi-square analysis were calculated. Content within categories was organized using naturally occurring "clusters" and "gaps" in scores. Common themes among open-ended qualitative responses were identified using Pandit's version of Glaser and Strauss Grounded theory (constant comparison). Of the 136 respondents, 23 (17%) were residents, 15 (11%) fellows, 85 (62%) pediatricians, and 13 (10%) pediatric cardiologists. Of attendings, 80% reported faculty/gratis faculty status. Naturally occurring clusters in respondent-designated importance resulted in ≤3 "most selected" objectives per content area. Objectives in "most selected" content pertained to initial diagnosis (recognition of abnormality/disease) (n = 16), possible emergent/urgent intervention required (n = 14), building a differential (n = 8), and planning a workup (n = 4). Conversely, themes for "least selected" content included comanagement with subspecialist (n = 15), knowledge useful in patient-family communication (n = 9), knowledge that can be referenced (as needed) (n = 7), and longitudinal/follow-up concerns (n = 5). This study demonstrated the utility of an online survey methodology to identify pediatric cardiology content perceived most important. Learners and faculty generally provided concordant responses regarding most important content within the cardiology-specific ABP objectives. Medical educators could apply this methodology to inform curriculum revision. © 2017 Wiley Periodicals, Inc.
Ficklin, Stephen P.; Luo, Feng; Feltus, F. Alex
2010-01-01
Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes. PMID:20668062
Ficklin, Stephen P; Luo, Feng; Feltus, F Alex
2010-09-01
Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.
NASA Astrophysics Data System (ADS)
Hung, Chih-Hsuan; Hung, Hung-Chih
2016-04-01
1.Background Major portions of urban areas in Asia are highly exposed and vulnerable to devastating earthquakes. Many studies identify ways to reduce earthquake risk by concentrating more on building resilience for the particularly vulnerable populations. By 2020, as the United Nations' warning, many Asian countries would become 'super-aged societies', such as Taiwan. However, local authorities rarely use resilience approach to frame earthquake disaster risk management and land use strategies. The empirically-based research about the resilience of aging populations has also received relatively little attention. Thus, a challenge arisen for decision-makers is how to enhance resilience of aging populations within the context of risk reduction. This study aims to improve the understanding of the resilience of aging populations and its changes over time in the aftermath of a destructive earthquake at the local level. A novel methodology is proposed to assess the resilience of aging populations and to characterize their changes of spatial distribution patterns, as well as to examine their determinants. 2.Methods and data An indicator-based assessment framework is constructed with the goal of identifying composite indicators (including before, during and after a disaster) that could serve as proxies for attributes of the resilience of aging populations. Using the recovery process of the Chi-Chi earthquake struck central Taiwan in 1999 as a case study, we applied a method combined a geographical information system (GIS)-based spatial statistics technique and cluster analysis to test the extent of which the resilience of aging populations is spatially autocorrelated throughout the central Taiwan, and to explain why clustering of resilient areas occurs in specific locations. Furthermore, to scrutinize the affecting factors of resilience, we develop an aging population resilience model (APRM) based on existing resilience theory. Using the APRM, we applied a multivariate regression analysis to identify and examine how various factors connect to the resilience of aging populations. To illustrate the proposed methodology, the study collected data on the resilience attributes, the disaster impacts and damages due to the Chi-Chi earthquake. The data were offered by the National Science and Technology Center for Disaster Reduction, Taiwan, as well as collected from the National Land Use Investigation, official census statistics and questionnaire surveys. 3.Results Integrating cluster analysis with GIS-based spatial statistical analysis, the resilience of aging populations were divided into five clusters of distribution patterns over the 10 years after the Chi-Chi earthquake. It shows that both population and elderly distributions were highly heterogeneous and spatial correlated across the study areas. We also demonstrated the 'hot spots' areas of the highly concentrated aging population across central Taiwan. Results of regression analysis disclosed the major factors that caused low resilience and changes of aging population distributions over time. These factors included the levels of seismic damage, infrastructure investments, as well as the land-use and socioeconomic attributes associated with the disaster areas. Finally, our findings provide stakeholders and policy-makers with better adaptive options to design and synthesize appropriate patchworks of planning measures for different types of resilience areas to reduce earthquake disaster risk.
Chaotic map clustering algorithm for EEG analysis
NASA Astrophysics Data System (ADS)
Bellotti, R.; De Carlo, F.; Stramaglia, S.
2004-03-01
The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.
Locally Weighted Ensemble Clustering.
Huang, Dong; Wang, Chang-Dong; Lai, Jian-Huang
2018-05-01
Due to its ability to combine multiple base clusterings into a probably better and more robust clustering, the ensemble clustering technique has been attracting increasing attention in recent years. Despite the significant success, one limitation to most of the existing ensemble clustering methods is that they generally treat all base clusterings equally regardless of their reliability, which makes them vulnerable to low-quality base clusterings. Although some efforts have been made to (globally) evaluate and weight the base clusterings, yet these methods tend to view each base clustering as an individual and neglect the local diversity of clusters inside the same base clustering. It remains an open problem how to evaluate the reliability of clusters and exploit the local diversity in the ensemble to enhance the consensus performance, especially, in the case when there is no access to data features or specific assumptions on data distribution. To address this, in this paper, we propose a novel ensemble clustering approach based on ensemble-driven cluster uncertainty estimation and local weighting strategy. In particular, the uncertainty of each cluster is estimated by considering the cluster labels in the entire ensemble via an entropic criterion. A novel ensemble-driven cluster validity measure is introduced, and a locally weighted co-association matrix is presented to serve as a summary for the ensemble of diverse clusters. With the local diversity in ensembles exploited, two novel consensus functions are further proposed. Extensive experiments on a variety of real-world datasets demonstrate the superiority of the proposed approach over the state-of-the-art.
Walder, Florian; Schlaeppi, Klaus; Wittwer, Raphaël; Held, Alain Y; Vogelgsang, Susanne; van der Heijden, Marcel G A
2017-01-01
Fusarium head blight, caused by fungi from the genus Fusarium , is one of the most harmful cereal diseases, resulting not only in severe yield losses but also in mycotoxin contaminated and health-threatening grains. Fusarium head blight is caused by a diverse set of species that have different host ranges, mycotoxin profiles and responses to agricultural practices. Thus, understanding the composition of Fusarium communities in the field is crucial for estimating their impact and also for the development of effective control measures. Up to now, most molecular tools that monitor Fusarium communities on plants are limited to certain species and do not distinguish other plant associated fungi. To close these gaps, we developed a sequencing-based community profiling methodology for crop-associated fungi with a focus on the genus Fusarium . By analyzing a 1600 bp long amplicon spanning the highly variable segments ITS and D1-D3 of the ribosomal operon by PacBio SMRT sequencing, we were able to robustly quantify Fusarium down to species level through clustering against reference sequences. The newly developed methodology was successfully validated in mock communities and provided similar results as the culture-based assessment of Fusarium communities by seed health tests in grain samples from different crop species. Finally, we exemplified the newly developed methodology in a field experiment with a wheat-maize crop sequence under different cover crop and tillage regimes. We analyzed wheat straw residues, cover crop shoots and maize grains and we could reveal that the cover crop hairy vetch ( Vicia villosa ) acts as a potent alternative host for Fusarium (OTU F.ave/tri ) showing an eightfold higher relative abundance compared with other cover crop treatments. Moreover, as the newly developed methodology also allows to trace other crop-associated fungi, we found that vetch and green fallow hosted further fungal plant pathogens including Zymoseptoria tritici . Thus, besides their beneficial traits, cover crops can also entail phytopathological risks by acting as alternative hosts for Fusarium and other noxious plant pathogens. The newly developed sequencing based methodology is a powerful diagnostic tool to trace Fusarium in combination with other fungi associated to different crop species.
Walder, Florian; Schlaeppi, Klaus; Wittwer, Raphaël; Held, Alain Y.; Vogelgsang, Susanne; van der Heijden, Marcel G. A.
2017-01-01
Fusarium head blight, caused by fungi from the genus Fusarium, is one of the most harmful cereal diseases, resulting not only in severe yield losses but also in mycotoxin contaminated and health-threatening grains. Fusarium head blight is caused by a diverse set of species that have different host ranges, mycotoxin profiles and responses to agricultural practices. Thus, understanding the composition of Fusarium communities in the field is crucial for estimating their impact and also for the development of effective control measures. Up to now, most molecular tools that monitor Fusarium communities on plants are limited to certain species and do not distinguish other plant associated fungi. To close these gaps, we developed a sequencing-based community profiling methodology for crop-associated fungi with a focus on the genus Fusarium. By analyzing a 1600 bp long amplicon spanning the highly variable segments ITS and D1–D3 of the ribosomal operon by PacBio SMRT sequencing, we were able to robustly quantify Fusarium down to species level through clustering against reference sequences. The newly developed methodology was successfully validated in mock communities and provided similar results as the culture-based assessment of Fusarium communities by seed health tests in grain samples from different crop species. Finally, we exemplified the newly developed methodology in a field experiment with a wheat-maize crop sequence under different cover crop and tillage regimes. We analyzed wheat straw residues, cover crop shoots and maize grains and we could reveal that the cover crop hairy vetch (Vicia villosa) acts as a potent alternative host for Fusarium (OTU F.ave/tri) showing an eightfold higher relative abundance compared with other cover crop treatments. Moreover, as the newly developed methodology also allows to trace other crop-associated fungi, we found that vetch and green fallow hosted further fungal plant pathogens including Zymoseptoria tritici. Thus, besides their beneficial traits, cover crops can also entail phytopathological risks by acting as alternative hosts for Fusarium and other noxious plant pathogens. The newly developed sequencing based methodology is a powerful diagnostic tool to trace Fusarium in combination with other fungi associated to different crop species. PMID:29234337
How to Select the most Relevant Roughness Parameters of a Surface: Methodology Research Strategy
NASA Astrophysics Data System (ADS)
Bobrovskij, I. N.
2018-01-01
In this paper, the foundations for new methodology creation which provides solving problem of surfaces structure new standards parameters huge amount conflicted with necessary actual floors quantity of surfaces structure parameters which is related to measurement complexity decreasing are considered. At the moment, there is no single assessment of the importance of a parameters. The approval of presented methodology for aerospace cluster components surfaces allows to create necessary foundation, to develop scientific estimation of surfaces texture parameters, to obtain material for investigators of chosen technological procedure. The methods necessary for further work, the creation of a fundamental reserve and development as a scientific direction for assessing the significance of microgeometry parameters are selected.
Efficient clustering aggregation based on data fragments.
Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing
2012-06-01
Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.
2010-01-01
Background Decision support in health systems is a highly difficult task, due to the inherent complexity of the process and structures involved. Method This paper introduces a new hybrid methodology Expert-based Cooperative Analysis (EbCA), which incorporates explicit prior expert knowledge in data analysis methods, and elicits implicit or tacit expert knowledge (IK) to improve decision support in healthcare systems. EbCA has been applied to two different case studies, showing its usability and versatility: 1) Bench-marking of small mental health areas based on technical efficiency estimated by EbCA-Data Envelopment Analysis (EbCA-DEA), and 2) Case-mix of schizophrenia based on functional dependency using Clustering Based on Rules (ClBR). In both cases comparisons towards classical procedures using qualitative explicit prior knowledge were made. Bayesian predictive validity measures were used for comparison with expert panels results. Overall agreement was tested by Intraclass Correlation Coefficient in case "1" and kappa in both cases. Results EbCA is a new methodology composed by 6 steps:. 1) Data collection and data preparation; 2) acquisition of "Prior Expert Knowledge" (PEK) and design of the "Prior Knowledge Base" (PKB); 3) PKB-guided analysis; 4) support-interpretation tools to evaluate results and detect inconsistencies (here Implicit Knowledg -IK- might be elicited); 5) incorporation of elicited IK in PKB and repeat till a satisfactory solution; 6) post-processing results for decision support. EbCA has been useful for incorporating PEK in two different analysis methods (DEA and Clustering), applied respectively to assess technical efficiency of small mental health areas and for case-mix of schizophrenia based on functional dependency. Differences in results obtained with classical approaches were mainly related to the IK which could be elicited by using EbCA and had major implications for the decision making in both cases. Discussion This paper presents EbCA and shows the convenience of completing classical data analysis with PEK as a mean to extract relevant knowledge in complex health domains. One of the major benefits of EbCA is iterative elicitation of IK.. Both explicit and tacit or implicit expert knowledge are critical to guide the scientific analysis of very complex decisional problems as those found in health system research. PMID:20920289
Gibert, Karina; García-Alonso, Carlos; Salvador-Carulla, Luis
2010-09-30
Decision support in health systems is a highly difficult task, due to the inherent complexity of the process and structures involved. This paper introduces a new hybrid methodology Expert-based Cooperative Analysis (EbCA), which incorporates explicit prior expert knowledge in data analysis methods, and elicits implicit or tacit expert knowledge (IK) to improve decision support in healthcare systems. EbCA has been applied to two different case studies, showing its usability and versatility: 1) Bench-marking of small mental health areas based on technical efficiency estimated by EbCA-Data Envelopment Analysis (EbCA-DEA), and 2) Case-mix of schizophrenia based on functional dependency using Clustering Based on Rules (ClBR). In both cases comparisons towards classical procedures using qualitative explicit prior knowledge were made. Bayesian predictive validity measures were used for comparison with expert panels results. Overall agreement was tested by Intraclass Correlation Coefficient in case "1" and kappa in both cases. EbCA is a new methodology composed by 6 steps:. 1) Data collection and data preparation; 2) acquisition of "Prior Expert Knowledge" (PEK) and design of the "Prior Knowledge Base" (PKB); 3) PKB-guided analysis; 4) support-interpretation tools to evaluate results and detect inconsistencies (here Implicit Knowledg -IK- might be elicited); 5) incorporation of elicited IK in PKB and repeat till a satisfactory solution; 6) post-processing results for decision support. EbCA has been useful for incorporating PEK in two different analysis methods (DEA and Clustering), applied respectively to assess technical efficiency of small mental health areas and for case-mix of schizophrenia based on functional dependency. Differences in results obtained with classical approaches were mainly related to the IK which could be elicited by using EbCA and had major implications for the decision making in both cases. This paper presents EbCA and shows the convenience of completing classical data analysis with PEK as a mean to extract relevant knowledge in complex health domains. One of the major benefits of EbCA is iterative elicitation of IK.. Both explicit and tacit or implicit expert knowledge are critical to guide the scientific analysis of very complex decisional problems as those found in health system research.
New Statistical Methodology for Determining Cancer Clusters
The development of an innovative statistical technique that shows that women living in a broad stretch of the metropolitan northeastern United States, which includes Long Island, are slightly more likely to die from breast cancer than women in other parts of the Northeast.
Spatial and temporal clustering of dengue virus transmission in Thai villages.
Mammen, Mammen P; Pimgate, Chusak; Koenraadt, Constantianus J M; Rothman, Alan L; Aldstadt, Jared; Nisalak, Ananda; Jarman, Richard G; Jones, James W; Srikiatkhachorn, Anon; Ypil-Butac, Charity Ann; Getis, Arthur; Thammapalo, Suwich; Morrison, Amy C; Libraty, Daniel H; Green, Sharone; Scott, Thomas W
2008-11-04
Transmission of dengue viruses (DENV), the leading cause of arboviral disease worldwide, is known to vary through time and space, likely owing to a combination of factors related to the human host, virus, mosquito vector, and environment. An improved understanding of variation in transmission patterns is fundamental to conducting surveillance and implementing disease prevention strategies. To test the hypothesis that DENV transmission is spatially and temporally focal, we compared geographic and temporal characteristics within Thai villages where DENV are and are not being actively transmitted. Cluster investigations were conducted within 100 m of homes where febrile index children with (positive clusters) and without (negative clusters) acute dengue lived during two seasons of peak DENV transmission. Data on human infection and mosquito infection/density were examined to precisely (1) define the spatial and temporal dimensions of DENV transmission, (2) correlate these factors with variation in DENV transmission, and (3) determine the burden of inapparent and symptomatic infections. Among 556 village children enrolled as neighbors of 12 dengue-positive and 22 dengue-negative index cases, all 27 DENV infections (4.9% of enrollees) occurred in positive clusters (p < 0.01; attributable risk [AR] = 10.4 per 100; 95% confidence interval 1-19.8 per 100]. In positive clusters, 12.4% of enrollees became infected in a 15-d period and DENV infections were aggregated centrally near homes of index cases. As only 1 of 217 pairs of serologic specimens tested in positive clusters revealed a recent DENV infection that occurred prior to cluster initiation, we attribute the observed DENV transmission subsequent to cluster investigation to recent DENV transmission activity. Of the 1,022 female adult Ae. aegypti collected, all eight (0.8%) dengue-infected mosquitoes came from houses in positive clusters; none from control clusters or schools. Distinguishing features between positive and negative clusters were greater availability of piped water in negative clusters (p < 0.01) and greater number of Ae. aegypti pupae per person in positive clusters (p = 0.04). During primarily DENV-4 transmission seasons, the ratio of inapparent to symptomatic infections was nearly 1:1 among child enrollees. Study limitations included inability to sample all children and mosquitoes within each cluster and our reliance on serologic rather than virologic evidence of interval infections in enrollees given restrictions on the frequency of blood collections in children. Our data reveal the remarkably focal nature of DENV transmission within a hyperendemic rural area of Thailand. These data suggest that active school-based dengue case detection prompting local spraying could contain recent virus introductions and reduce the longitudinal risk of virus spread within rural areas. Our results should prompt future cluster studies to explore how host immune and behavioral aspects may impact DENV transmission and prevention strategies. Cluster methodology could serve as a useful research tool for investigation of other temporally and spatially clustered infectious diseases.
Spatial and Temporal Clustering of Dengue Virus Transmission in Thai Villages
Mammen, Mammen P; Pimgate, Chusak; Koenraadt, Constantianus J. M; Rothman, Alan L; Aldstadt, Jared; Nisalak, Ananda; Jarman, Richard G; Jones, James W; Srikiatkhachorn, Anon; Ypil-Butac, Charity Ann; Getis, Arthur; Thammapalo, Suwich; Morrison, Amy C; Libraty, Daniel H; Green, Sharone; Scott, Thomas W
2008-01-01
Background Transmission of dengue viruses (DENV), the leading cause of arboviral disease worldwide, is known to vary through time and space, likely owing to a combination of factors related to the human host, virus, mosquito vector, and environment. An improved understanding of variation in transmission patterns is fundamental to conducting surveillance and implementing disease prevention strategies. To test the hypothesis that DENV transmission is spatially and temporally focal, we compared geographic and temporal characteristics within Thai villages where DENV are and are not being actively transmitted. Methods and Findings Cluster investigations were conducted within 100 m of homes where febrile index children with (positive clusters) and without (negative clusters) acute dengue lived during two seasons of peak DENV transmission. Data on human infection and mosquito infection/density were examined to precisely (1) define the spatial and temporal dimensions of DENV transmission, (2) correlate these factors with variation in DENV transmission, and (3) determine the burden of inapparent and symptomatic infections. Among 556 village children enrolled as neighbors of 12 dengue-positive and 22 dengue-negative index cases, all 27 DENV infections (4.9% of enrollees) occurred in positive clusters (p < 0.01; attributable risk [AR] = 10.4 per 100; 95% confidence interval 1–19.8 per 100]. In positive clusters, 12.4% of enrollees became infected in a 15-d period and DENV infections were aggregated centrally near homes of index cases. As only 1 of 217 pairs of serologic specimens tested in positive clusters revealed a recent DENV infection that occurred prior to cluster initiation, we attribute the observed DENV transmission subsequent to cluster investigation to recent DENV transmission activity. Of the 1,022 female adult Ae. aegypti collected, all eight (0.8%) dengue-infected mosquitoes came from houses in positive clusters; none from control clusters or schools. Distinguishing features between positive and negative clusters were greater availability of piped water in negative clusters (p < 0.01) and greater number of Ae. aegypti pupae per person in positive clusters (p = 0.04). During primarily DENV-4 transmission seasons, the ratio of inapparent to symptomatic infections was nearly 1:1 among child enrollees. Study limitations included inability to sample all children and mosquitoes within each cluster and our reliance on serologic rather than virologic evidence of interval infections in enrollees given restrictions on the frequency of blood collections in children. Conclusions Our data reveal the remarkably focal nature of DENV transmission within a hyperendemic rural area of Thailand. These data suggest that active school-based dengue case detection prompting local spraying could contain recent virus introductions and reduce the longitudinal risk of virus spread within rural areas. Our results should prompt future cluster studies to explore how host immune and behavioral aspects may impact DENV transmission and prevention strategies. Cluster methodology could serve as a useful research tool for investigation of other temporally and spatially clustered infectious diseases. PMID:18986209
Three-dimensional Tissue Culture Based on Magnetic Cell Levitation
Souza, Glauco R.; Molina, Jennifer R.; Raphael, Robert M.; Ozawa, Michael G.; Stark, Daniel J.; Levin, Carly S.; Bronk, Lawrence F.; Ananta, Jeyarama S.; Mandelin, Jami; Georgescu, Maria-Magdalena; Bankson, James A.; Gelovani, Juri G.
2015-01-01
Cell culture is an essential tool for drug discovery, tissue engineering, and stem cell research. Conventional tissue culture produces two-dimensional (2D) cell growth with gene expression, signaling, and morphology that can differ from those in vivo and thus compromise clinical relevancy1–5. Here we report a three-dimensional (3D) culture of cells based on magnetic levitation in the presence of hydrogels containing gold and magnetic iron oxide (MIO) nanoparticles plus filamentous bacteriophage. This methodology allows for control of cell mass geometry and guided, multicellular clustering of different cell types in co-culture through spatial variance of the magnetic field. Moreover, magnetic levitation of human glioblastoma cells demonstrates similar protein expression profiles to those observed in human tumor xenografts. Taken together, these results suggest levitated 3D culture with magnetized phage-based hydrogels more closely recapitulates in vivo protein expression and allows for long-term multi-cellular studies. PMID:20228788
Liu, L L; Liu, M J; Ma, M
2015-09-28
The central task of this study was to mine the gene-to-medium relationship. Adequate knowledge of this relationship could potentially improve the accuracy of differentially expressed gene mining. One of the approaches to differentially expressed gene mining uses conventional clustering algorithms to identify the gene-to-medium relationship. Compared to conventional clustering algorithms, self-organization maps (SOMs) identify the nonlinear aspects of the gene-to-medium relationships by mapping the input space into another higher dimensional feature space. However, SOMs are not suitable for huge datasets consisting of millions of samples. Therefore, a new computational model, the Function Clustering Self-Organization Maps (FCSOMs), was developed. FCSOMs take advantage of the theory of granular computing as well as advanced statistical learning methodologies, and are built specifically for each information granule (a function cluster of genes), which are intelligently partitioned by the clustering algorithm provided by the DAVID_6.7 software platform. However, only the gene functions, and not their expression values, are considered in the fuzzy clustering algorithm of DAVID. Compared to the clustering algorithm of DAVID, these experimental results show a marked improvement in the accuracy of classification with the application of FCSOMs. FCSOMs can handle huge datasets and their complex classification problems, as each FCSOM (modeled for each function cluster) can be easily parallelized.
Tello, Javier; Cubero, Sergio; Blasco, José; Tardaguila, Javier; Aleixos, Nuria; Ibáñez, Javier
2016-10-01
Grapevine cluster morphology influences the quality and commercial value of wine and table grapes. It is routinely evaluated by subjective and inaccurate methods that do not meet the requirements set by the food industry. Novel two-dimensional (2D) and three-dimensional (3D) machine vision technologies emerge as promising tools for its automatic and fast evaluation. The automatic evaluation of cluster length, width and elongation was successfully achieved by the analysis of 2D images, significant and strong correlations with the manual methods being found (r = 0.959, 0.861 and 0.852, respectively). The classification of clusters according to their shape can be achieved by evaluating their conicity in different sections of the cluster. The geometric reconstruction of the morphological volume of the cluster from 2D features worked better than the direct 3D laser scanning system, showing a high correlation (r = 0.956) with the manual approach (water displacement method). In addition, we constructed and validated a simple linear regression model for cluster compactness estimation. It showed a high predictive capacity for both the training and validation subsets of clusters (R(2) = 84.5 and 71.1%, respectively). The methodologies proposed in this work provide continuous and accurate data for the fast and objective characterisation of cluster morphology. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.
Noble, Natasha; Paul, Christine; Turon, Heidi; Oldmeadow, Christopher
2015-12-01
There is a growing body of literature examining the clustering of health risk behaviours, but little consensus about which risk factors can be expected to cluster for which sub groups of people. This systematic review aimed to examine the international literature on the clustering of smoking, poor nutrition, excess alcohol and physical inactivity (SNAP) health behaviours among adults, including associated socio-demographic variables. A literature search was conducted in May 2014. Studies examining at least two SNAP risk factors, and using a cluster or factor analysis technique, or comparing observed to expected prevalence of risk factor combinations, were included. Fifty-six relevant studies were identified. A majority of studies (81%) reported a 'healthy' cluster characterised by the absence of any SNAP risk factors. More than half of the studies reported a clustering of alcohol with smoking, and half reported clustering of all four SNAP risk factors. The methodological quality of included studies was generally weak to moderate. Males and those with greater social disadvantage showed riskier patterns of behaviours; younger age was less clearly associated with riskier behaviours. Clustering patterns reported here reinforce the need for health promotion interventions to target multiple behaviours, and for such efforts to be specifically designed and accessible for males and those who are socially disadvantaged. Copyright © 2015 Elsevier Inc. All rights reserved.
Volanen, Salla-Maarit; Lassander, Maarit; Hankonen, Nelli; Santalahti, Päivi; Hintsanen, Mirka; Simonsen, Nina; Raevuori, Anu; Mullola, Sari; Vahlberg, Tero; But, Anna; Suominen, Sakari
2016-07-11
Mindfulness has shown positive effects on mental health, mental capacity and well-being among adult population. Among children and adolescents, previous research on the effectiveness of mindfulness interventions on health and well-being has shown promising results, but studies with methodologically sound designs have been called for. Few intervention studies in this population have compared the effectiveness of mindfulness programs to alternative intervention programs with adequate sample sizes. Our primary aim is to explore the effectiveness of a school-based mindfulness intervention program compared to a standard relaxation program among a non-clinical children and adolescent sample, and a non-treatment control group in school context. In this study, we systematically examine the effects of mindfulness intervention on mental well-being (primary outcomes being resilience; existence/absence of depressive symptoms; experienced psychological strengths and difficulties), cognitive functions, psychophysiological responses, academic achievements, and motivational determinants of practicing mindfulness. The design is a cluster randomized controlled trial with three arms (mindfulness intervention group, active control group, non-treatment group) and the sample includes 59 Finnish schools and approx. 3 000 students aged 12-15 years. Intervention consists of nine mindfulness based lessons, 45 mins per week, for 9 weeks, the dose being identical in active control group receiving standard relaxation program called Relax. The programs are delivered by 14 educated facilitators. Students, their teachers and parents will fill-in the research questionnaires before and after the intervention, and they will all be followed up 6 months after baseline. Additionally, students will be followed 12 months after baseline. For longer follow-up, consent to linking the data to the main health registers has been asked from students and their parents. The present study examines systematically the effectiveness of a school-based mindfulness program compared to a standard relaxation program, and a non-treatment control group. A strength of the current study lies in its methodologically rigorous, randomized controlled study design, which allows novel evidence on the effectiveness of mindfulness over and above a standard relaxation program. ISRCTN18642659 . Retrospectively registered 13 October 2015.
2013-01-01
Background Time course gene expression experiments are an increasingly popular method for exploring biological processes. Temporal gene expression profiles provide an important characterization of gene function, as biological systems are both developmental and dynamic. With such data it is possible to study gene expression changes over time and thereby to detect differential genes. Much of the early work on analyzing time series expression data relied on methods developed originally for static data and thus there is a need for improved methodology. Since time series expression is a temporal process, its unique features such as autocorrelation between successive points should be incorporated into the analysis. Results This work aims to identify genes that show different gene expression profiles across time. We propose a statistical procedure to discover gene groups with similar profiles using a nonparametric representation that accounts for the autocorrelation in the data. In particular, we first represent each profile in terms of a Fourier basis, and then we screen out genes that are not differentially expressed based on the Fourier coefficients. Finally, we cluster the remaining gene profiles using a model-based approach in the Fourier domain. We evaluate the screening results in terms of sensitivity, specificity, FDR and FNR, compare with the Gaussian process regression screening in a simulation study and illustrate the results by application to yeast cell-cycle microarray expression data with alpha-factor synchronization. The key elements of the proposed methodology: (i) representation of gene profiles in the Fourier domain; (ii) automatic screening of genes based on the Fourier coefficients and taking into account autocorrelation in the data, while controlling the false discovery rate (FDR); (iii) model-based clustering of the remaining gene profiles. Conclusions Using this method, we identified a set of cell-cycle-regulated time-course yeast genes. The proposed method is general and can be potentially used to identify genes which have the same patterns or biological processes, and help facing the present and forthcoming challenges of data analysis in functional genomics. PMID:24134721
Site-conditions map for Portugal based on VS measurements: methodology and final model
NASA Astrophysics Data System (ADS)
Vilanova, Susana; Narciso, João; Carvalho, João; Lopes, Isabel; Quinta Ferreira, Mario; Moura, Rui; Borges, José; Nemser, Eliza; Pinto, carlos
2017-04-01
In this paper we present a statistically significant site-condition model for Portugal based on shear-wave velocity (VS) data and surface geology. We also evaluate the performance of commonly used Vs30 proxies based on exogenous data and analyze the implications of using those proxies for calculating site amplification in seismic hazard assessment. The dataset contains 161 Vs profiles acquired in Portugal in the context of research projects, technical reports, academic thesis and academic papers. The methodologies involved in characterizing the Vs structure at the sites in the database include seismic refraction, multichannel analysis of seismic waves and refraction microtremor. Invasive measurements were performed in selected locations in order to compare the Vs profiles obtained from both invasive and non-invasive techniques. In general there was good agreement in the subsurface structure of Vs30 obtained from the different methodologies. The database flat-file includes information on Vs30, surface geology at 1:50.000 and 1:500.000 scales, elevation and topographic slope and based on SRTM30 topographic dataset. The procedure used to develop the site-conditions map is based on a three-step process that includes defining a preliminary set of geological units based on the literature, performing statistical tests to assess whether or not the differences in the distributions of Vs30 are statistically significant, and merging of the geological units accordingly. The dataset was, to some extent, affected by clustering and/or preferential sampling and therefore a declustering algorithm was applied. The final model includes three geological units: 1) Igneous, metamorphic and old (Paleogene and Mesozoic) sedimentary rocks; 2) Neogene and Pleistocene formations, and 3) Holocene formations. The evaluation of proxies indicates that although geological analogues and topographic slope are in general unbiased, the latter shows significant bias for particular geological units and subsequently for some geographical regions.
Metrics and methods for characterizing dairy farm intensification using farm survey data.
Gonzalez-Mejia, Alejandra; Styles, David; Wilson, Paul; Gibbons, James
2018-01-01
Evaluation of agricultural intensification requires comprehensive analysis of trends in farm performance across physical and socio-economic aspects, which may diverge across farm types. Typical reporting of economic indicators at sectorial or the "average farm" level does not represent farm diversity and provides limited insight into the sustainability of specific intensification pathways. Using farm business data from a total of 7281 farm survey observations of English and Welsh dairy farms over a 14-year period we calculate a time series of 16 key performance indicators (KPIs) pertinent to farm structure, environmental and socio-economic aspects of sustainability. We then apply principle component analysis and model-based clustering analysis to identify statistically the number of distinct dairy farm typologies for each year of study, and link these clusters through time using multidimensional scaling. Between 2001 and 2014, dairy farms have largely consolidated and specialized into two distinct clusters: more extensive farms relying predominantly on grass, with lower milk yields but higher labour intensity, and more intensive farms producing more milk per cow with more concentrate and more maize, but lower labour intensity. There is some indication that these clusters are converging as the extensive cluster is intensifying slightly faster than the intensive cluster, in terms of milk yield per cow and use of concentrate feed. In 2014, annual milk yields were 6,835 and 7,500 l/cow for extensive and intensive farm types, respectively, whilst annual concentrate feed use was 1.3 and 1.5 tonnes per cow. For several KPIs such as milk yield the mean trend across all farms differed substantially from the extensive and intensive typologies mean. The indicators and analysis methodology developed allows identification of distinct farm types and industry trends using readily available survey data. The identified groups allow the accurate evaluation of the consequences of the reduction in dairy farm numbers and intensification at national and international scales.
Varghese, Bright; Supply, Philip; Shoukri, Mohammed; Allix-Beguec, Caroline; Memish, Ziad; Abuljadayel, Naila; Al-Hakeem, Raafat; AlRabiah, Fahad; Al-Hajoj, Sahal
2013-01-01
Background Eastern province of Saudi Arabia is an industrial zone with large immigrant population and high level of tuberculosis case notification among immigrants. The impact of immigration and current trends of tuberculosis transmission among immigrants and autochthonous population in the region had not been investigated so far using molecular tools. Methodology During 2009- 2011, a total of 524 Mycobacterium tuberculosis isolates were collected from the central tuberculosis reference laboratory, representing an estimated 79.2% of the culture-positive tuberculosis cases over the study period in the province. These isolates were genotyped by using 24 locus-based MIRU-VNTR typing and spoligotyping followed by first line drug susceptibility testing. The molecular clustering profiles and phylogenetic diversity of isolates were determined and compared to the geographical origins of the patients. Principle Findings Genotyping showed an overall predominance of Delhi/CAS (29.4%), EAI (23.8%) and Ghana (13.3%) lineages, with slightly higher proportions of Delhi/CAS among autochthonous population (33.3 %) and EAI (30.9%) among immigrants. Rate of any drug resistance was 20.2% with 2.5% of multi-drug resistance. Strain cluster analysis indicated 42 clusters comprising 210 isolates, resulting in a calculated recent transmission index of 32.1%. Overall shared cluster ratio was 78.6% while 75.8% were shared between autochthonous population and immigrant population with a predominance of immigrants from South east Asia (40.7%). In contrast, cross national transmission within the immigrant population was limited (24.2%). Younger age (15-30- p value-0.043, 16-45, p value 0.030), Saudi nationality (p value-0.004) and South East Asian origin (p value-0.011) were identified as significant predisposing factors for molecular strain clustering. Conclusions The high proportion of molecular clusters shared among the autochthonous and immigrant populations suggests a high permeability of tuberculosis transmission between both populations in the province. These results prompt for the need to strengthen the current tuberculosis control strategies and surveillance programs. PMID:24147042
Metrics and methods for characterizing dairy farm intensification using farm survey data
Gonzalez-Mejia, Alejandra; Styles, David; Wilson, Paul
2018-01-01
Evaluation of agricultural intensification requires comprehensive analysis of trends in farm performance across physical and socio-economic aspects, which may diverge across farm types. Typical reporting of economic indicators at sectorial or the “average farm” level does not represent farm diversity and provides limited insight into the sustainability of specific intensification pathways. Using farm business data from a total of 7281 farm survey observations of English and Welsh dairy farms over a 14-year period we calculate a time series of 16 key performance indicators (KPIs) pertinent to farm structure, environmental and socio-economic aspects of sustainability. We then apply principle component analysis and model-based clustering analysis to identify statistically the number of distinct dairy farm typologies for each year of study, and link these clusters through time using multidimensional scaling. Between 2001 and 2014, dairy farms have largely consolidated and specialized into two distinct clusters: more extensive farms relying predominantly on grass, with lower milk yields but higher labour intensity, and more intensive farms producing more milk per cow with more concentrate and more maize, but lower labour intensity. There is some indication that these clusters are converging as the extensive cluster is intensifying slightly faster than the intensive cluster, in terms of milk yield per cow and use of concentrate feed. In 2014, annual milk yields were 6,835 and 7,500 l/cow for extensive and intensive farm types, respectively, whilst annual concentrate feed use was 1.3 and 1.5 tonnes per cow. For several KPIs such as milk yield the mean trend across all farms differed substantially from the extensive and intensive typologies mean. The indicators and analysis methodology developed allows identification of distinct farm types and industry trends using readily available survey data. The identified groups allow the accurate evaluation of the consequences of the reduction in dairy farm numbers and intensification at national and international scales. PMID:29742166
The Hubble Space Telescope Medium Deep Survey Cluster Sample: Methodology and Data
NASA Astrophysics Data System (ADS)
Ostrander, E. J.; Nichol, R. C.; Ratnatunga, K. U.; Griffiths, R. E.
1998-12-01
We present a new, objectively selected, sample of galaxy overdensities detected in the Hubble Space Telescope Medium Deep Survey (MDS). These clusters/groups were found using an automated procedure that involved searching for statistically significant galaxy overdensities. The contrast of the clusters against the field galaxy population is increased when morphological data are used to search around bulge-dominated galaxies. In total, we present 92 overdensities above a probability threshold of 99.5%. We show, via extensive Monte Carlo simulations, that at least 60% of these overdensities are likely to be real clusters and groups and not random line-of-sight superpositions of galaxies. For each overdensity in the MDS cluster sample, we provide a richness and the average of the bulge-to-total ratio of galaxies within each system. This MDS cluster sample potentially contains some of the most distant clusters/groups ever detected, with about 25% of the overdensities having estimated redshifts z > ~0.9. We have made this sample publicly available to facilitate spectroscopic confirmation of these clusters and help more detailed studies of cluster and galaxy evolution. We also report the serendipitous discovery of a new cluster close on the sky to the rich optical cluster Cl l0016+16 at z = 0.546. This new overdensity, HST 001831+16208, may be coincident with both an X-ray source and a radio source. HST 001831+16208 is the third cluster/group discovered near to Cl 0016+16 and appears to strengthen the claims of Connolly et al. of superclustering at high redshift.
Bair, Eric; Gaynor, Sheila; Slade, Gary D.; Ohrbach, Richard; Fillingim, Roger B.; Greenspan, Joel D.; Dubner, Ronald; Smith, Shad B.; Diatchenko, Luda; Maixner, William
2016-01-01
The classification of most chronic pain disorders gives emphasis to anatomical location of the pain to distinguish one disorder from the other (eg, back pain vs temporomandibular disorder [TMD]) or to define subtypes (eg, TMD myalgia vs arthralgia). However, anatomical criteria overlook etiology, potentially hampering treatment decisions. This study identified clusters of individuals using a comprehensive array of biopsychosocial measures. Data were collected from a case–control study of 1031 chronic TMD cases and 3247 TMD-free controls. Three subgroups were identified using supervised cluster analysis (referred to as the adaptive, pain-sensitive, and global symptoms clusters). Compared with the adaptive cluster, participants in the pain-sensitive cluster showed heightened sensitivity to experimental pain, and participants in the global symptoms cluster showed both greater pain sensitivity and greater psychological distress. Cluster membership was strongly associated with chronic TMD: 91.5% of TMD cases belonged to the pain-sensitive and global symptoms clusters, whereas 41.2% of controls belonged to the adaptive cluster. Temporomandibular disorder cases in the pain-sensitive and global symptoms clusters also showed greater pain intensity, jaw functional limitation, and more comorbid pain conditions. Similar results were obtained when the same methodology was applied to a smaller case–control study consisting of 199 chronic TMD cases and 201 TMD-free controls. During a median 3-year follow-up period of TMD-free individuals, participants in the global symptoms cluster had greater risk of developing first-onset TMD (hazard ratio = 2.8) compared with participants in the other 2 clusters. Cross-cohort predictive modeling was used to demonstrate the reliability of the clusters. PMID:26928952
NASA Astrophysics Data System (ADS)
Kawahara, Hajime; Reese, Erik D.; Kitayama, Tetsu; Sasaki, Shin; Suto, Yasushi
2008-11-01
Our previous analysis indicates that small-scale fluctuations in the intracluster medium (ICM) from cosmological hydrodynamic simulations follow the lognormal probability density function. In order to test the lognormal nature of the ICM directly against X-ray observations of galaxy clusters, we develop a method of extracting statistical information about the three-dimensional properties of the fluctuations from the two-dimensional X-ray surface brightness. We first create a set of synthetic clusters with lognormal fluctuations around their mean profile given by spherical isothermal β-models, later considering polytropic temperature profiles as well. Performing mock observations of these synthetic clusters, we find that the resulting X-ray surface brightness fluctuations also follow the lognormal distribution fairly well. Systematic analysis of the synthetic clusters provides an empirical relation between the three-dimensional density fluctuations and the two-dimensional X-ray surface brightness. We analyze Chandra observations of the galaxy cluster Abell 3667, and find that its X-ray surface brightness fluctuations follow the lognormal distribution. While the lognormal model was originally motivated by cosmological hydrodynamic simulations, this is the first observational confirmation of the lognormal signature in a real cluster. Finally we check the synthetic cluster results against clusters from cosmological hydrodynamic simulations. As a result of the complex structure exhibited by simulated clusters, the empirical relation between the two- and three-dimensional fluctuation properties calibrated with synthetic clusters when applied to simulated clusters shows large scatter. Nevertheless we are able to reproduce the true value of the fluctuation amplitude of simulated clusters within a factor of 2 from their two-dimensional X-ray surface brightness alone. Our current methodology combined with existing observational data is useful in describing and inferring the statistical properties of the three-dimensional inhomogeneity in galaxy clusters.
A study on phenomenology of Dhat syndrome in men in a general medical setting
Prakash, Sathya; Sharan, Pratap; Sood, Mamta
2016-01-01
Background: “Dhat syndrome” is believed to be a culture-bound syndrome of the Indian subcontinent. Although many studies have been performed, many have methodological limitations and there is a lack of agreement in many areas. Aims: The aim is to study the phenomenology of “Dhat syndrome” in men and to explore the possibility of subtypes within this entity. Settings and Design: It is a cross-sectional descriptive study conducted at a sex and marriage counseling clinic of a tertiary care teaching hospital in Northern India. Materials and Methods: An operational definition and assessment instrument for “Dhat syndrome” was developed after taking all concerned stakeholders into account and review of literature. It was applied on 100 patients along with socio-demographic profile, Hamilton Depression Rating Scale, Hamilton Anxiety Rating Scale, Mini International Neuropsychiatric Interview, and Postgraduate Institute Neuroticism Scale. Statistical Analysis: For statistical analysis, descriptive statistics, group comparisons, and Pearson's product moment correlations were carried out. Factor analysis and cluster analysis were done to determine the factor structure and subtypes of “Dhat syndrome.” Results: A diagnostic and assessment instrument for “Dhat syndrome” has been developed and the phenomenology in 100 patients has been described. Both the health beliefs scale and associated symptoms scale demonstrated a three-factor structure. The patients with “Dhat syndrome” could be categorized into three clusters based on severity. Conclusions: There appears to be a significant agreement among various stakeholders on the phenomenology of “Dhat syndrome” although some differences exist. “Dhat syndrome” could be subtyped into three clusters based on severity. PMID:27385844
2018-01-01
Rapid urbanization and agricultural development has resulted in the degradation of ecosystems, while also negatively impacting ecosystem services (ES) and urban sustainability. Identifying conservation priorities for ES and applying reasonable management strategies have been found to be effective methods for mitigating this phenomenon. The purpose of this study is to propose a comprehensive framework for identifying ES conservation priorities and associated management strategies for these planning areas. First, we incorporated 10 ES indicators within a systematic conservation planning (SCP) methodology in order to identify ES conservation priorities with high irreplaceability values based on conservation target goals associated with the potential distribution of ES indicators. Next, we assessed the efficiency of the ES conservation priorities for meeting the designated conservation target goals. Finally, ES conservation priorities were clustered into groups using a K-means clustering analysis in an effort to identify the dominant ES per location before formulating management strategies. We effectively identified 12 ES priorities to best represent conservation target goals for the ES indicators. These 12 priorities had a total areal coverage of 13,364 km2 representing 25.16% of the study area. The 12 priorities were further clustered into five significantly different groups (p-values between groups < 0.05), which helped to refine management strategies formulated to best enhance ES across the study area. The proposed method allows conservation and management plans to easily adapt to a wide variety of quantitative ES target goals within urban and agricultural areas, thereby preventing urban and agriculture sprawl and guiding sustainable urban development. PMID:29682412
Qu, Yi; Lu, Ming
2018-01-01
Rapid urbanization and agricultural development has resulted in the degradation of ecosystems, while also negatively impacting ecosystem services (ES) and urban sustainability. Identifying conservation priorities for ES and applying reasonable management strategies have been found to be effective methods for mitigating this phenomenon. The purpose of this study is to propose a comprehensive framework for identifying ES conservation priorities and associated management strategies for these planning areas. First, we incorporated 10 ES indicators within a systematic conservation planning (SCP) methodology in order to identify ES conservation priorities with high irreplaceability values based on conservation target goals associated with the potential distribution of ES indicators. Next, we assessed the efficiency of the ES conservation priorities for meeting the designated conservation target goals. Finally, ES conservation priorities were clustered into groups using a K-means clustering analysis in an effort to identify the dominant ES per location before formulating management strategies. We effectively identified 12 ES priorities to best represent conservation target goals for the ES indicators. These 12 priorities had a total areal coverage of 13,364 km 2 representing 25.16% of the study area. The 12 priorities were further clustered into five significantly different groups ( p -values between groups < 0.05), which helped to refine management strategies formulated to best enhance ES across the study area. The proposed method allows conservation and management plans to easily adapt to a wide variety of quantitative ES target goals within urban and agricultural areas, thereby preventing urban and agriculture sprawl and guiding sustainable urban development.
Modeling CANDU-6 liquid zone controllers for effects of thorium-based fuels
DOE Office of Scientific and Technical Information (OSTI.GOV)
St-Aubin, E.; Marleau, G.
2012-07-01
We use the DRAGON code to model the CANDU-6 liquid zone controllers and evaluate the effects of thorium-based fuels on their incremental cross sections and reactivity worth. We optimize both the numerical quadrature and spatial discretization for 2D cell models in order to provide accurate fuel properties for 3D liquid zone controller supercell models. We propose a low computer cost parameterized pseudo-exact 3D cluster geometries modeling approach that avoids tracking issues on small external surfaces. This methodology provides consistent incremental cross sections and reactivity worths when the thickness of the buffer region is reduced. When compared with an approximate annularmore » geometry representation of the fuel and coolant region, we observe that the cluster description of fuel bundles in the supercell models does not increase considerably the precision of the results while increasing substantially the CPU time. In addition, this comparison shows that it is imperative to finely describe the liquid zone controller geometry since it has a strong impact of the incremental cross sections. This paper also shows that liquid zone controller reactivity worth is greatly decreased in presence of thorium-based fuels compared to the reference natural uranium fuel, since the fission and the fast to thermal scattering incremental cross sections are higher for the new fuels. (authors)« less
Introduction to a special issue on concept mapping.
Trochim, William M; McLinden, Daniel
2017-02-01
Concept mapping was developed in the 1980s as a unique integration of qualitative (group process, brainstorming, unstructured sorting, interpretation) and quantitative (multidimensional scaling, hierarchical cluster analysis) methods designed to enable a group of people to articulate and depict graphically a coherent conceptual framework or model of any topic or issue of interest. This introduction provides the basic definition and description of the methodology for the newcomer and describes the steps typically followed in its most standard canonical form (preparation, generation, structuring, representation, interpretation and utilization). It also introduces this special issue which reviews the history of the methodology, describes its use in a variety of contexts, shows the latest ways it can be integrated with other methodologies, considers methodological advances and developments, and sketches a vision of the future of the method's evolution. Copyright © 2016 Elsevier Ltd. All rights reserved.
Panapasa, Sela; Jackson, James; Caldwell, Cleopatra; Heeringa, Steve; McNally, James; Williams, David; Coral, Debra; Taumoepeau, Leafa; Young, Louisa; Young, Setafano; Fa'asisila, Saia
2013-01-01
Objectives Reports on the challenges and lessons learned from the Pacific Island American Health Study engagement with community-based organizations (CBOs) and faith-based organizations (FBOs) in Pacific Islander (PI) communities and mechanisms to facilitate the collection of robust data. Methods Academic–community partnership building was achieved with PI CBOs and FBOs. Focus group meetings were organized to plan various aspects of the study, develop questionnaire themes and protocols for survey, assist with the interviewer recruitment process, and strategize data dissemination plan. Lessons Learned The PIA-HS represents a model for overcoming challenges in data collection among small understudied populations. FBOs represent a valuable resource for community-based participatory research (CBPR) data collection and for effective interventions. Conclusion The study methodology can be replicated for other racial/ethnic groups with high levels of religiosity combined with concentrated levels of residential clustering. Expansion of the Pacific Islander American Health Study (PIA-HS) to include other PI subgroups is encouraged. PMID:22643788
Nursing home care quality: a cluster analysis.
Grøndahl, Vigdis Abrahamsen; Fagerli, Liv Berit
2017-02-13
Purpose The purpose of this paper is to explore potential differences in how nursing home residents rate care quality and to explore cluster characteristics. Design/methodology/approach A cross-sectional design was used, with one questionnaire including questions from quality from patients' perspective and Big Five personality traits, together with questions related to socio-demographic aspects and health condition. Residents ( n=103) from four Norwegian nursing homes participated (74.1 per cent response rate). Hierarchical cluster analysis identified clusters with respect to care quality perceptions. χ 2 tests and one-way between-groups ANOVA were performed to characterise the clusters ( p<0.05). Findings Two clusters were identified; Cluster 1 residents (28.2 per cent) had the best care quality perceptions and Cluster 2 (67.0 per cent) had the worst perceptions. The clusters were statistically significant and characterised by personal-related conditions: gender, psychological well-being, preferences, admission, satisfaction with staying in the nursing home, emotional stability and agreeableness, and by external objective care conditions: healthcare personnel and registered nurses. Research limitations/implications Residents assessed as having no cognitive impairments were included, thus excluding the largest group. By choosing questionnaire design and structured interviews, the number able to participate may increase. Practical implications Findings may provide healthcare personnel and managers with increased knowledge on which to develop strategies to improve specific care quality perceptions. Originality/value Cluster analysis can be an effective tool for differentiating between nursing homes residents' care quality perceptions.
Extracting the sovereigns’ CDS market hierarchy: A correlation-filtering approach
NASA Astrophysics Data System (ADS)
León, Carlos; Leiton, Karen; Pérez, Jhonatan
2014-12-01
This paper employs correlation-into-distance mapping techniques and a minimal spanning tree-based correlation-filtering methodology on 36 sovereign CDS spread time-series in order to identify the sovereigns’ informational hierarchy. The resulting hierarchy (i) concurs with sovereigns’ eigenvector centrality; (ii) confirms the importance of geographical and credit rating clustering; (iii) identifies Russia, Turkey and Brazil as regional benchmarks; (iv) reveals the idiosyncratic nature of Japan and United States; (v) confirms that a small set of common factors affects the system; (vi) suggests that lower-medium grade rated sovereigns are the most influential, but also the most prone to contagion; and (vii) suggests the existence of a “Latin American common factor”.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Giannantonio, T.; et al.
Optical imaging surveys measure both the galaxy density and the gravitational lensing-induced shear fields across the sky. Recently, the Dark Energy Survey (DES) collaboration used a joint fit to two-point correlations between these observables to place tight constraints on cosmology (DES Collaboration et al. 2017). In this work, we develop the methodology to extend the DES Collaboration et al. (2017) analysis to include cross-correlations of the optical survey observables with gravitational lensing of the cosmic microwave background (CMB) as measured by the South Pole Telescope (SPT) and Planck. Using simulated analyses, we show how the resulting set of five two-pointmore » functions increases the robustness of the cosmological constraints to systematic errors in galaxy lensing shear calibration. Additionally, we show that contamination of the SPT+Planck CMB lensing map by the thermal Sunyaev-Zel'dovich effect is a potentially large source of systematic error for two-point function analyses, but show that it can be reduced to acceptable levels in our analysis by masking clusters of galaxies and imposing angular scale cuts on the two-point functions. The methodology developed here will be applied to the analysis of data from the DES, the SPT, and Planck in a companion work.« less
Bagley, Amy D.; Abramowitz, Carolyn S.; Kosson, David S.
2010-01-01
Deficits in emotion processing have been widely reported to be central to psychopathy. However, few prior studies have examined vocal affect recognition in psychopaths, and these studies suffer from significant methodological limitations. Moreover, prior studies have yielded conflicting findings regarding the specificity of psychopaths’ affect recognition deficits. This study examined vocal affect recognition in 107 male inmates under conditions requiring isolated prosodic vs. semantic analysis of affective cues and compared subgroups of offenders identified via cluster analysis on vocal affect recognition. Psychopaths demonstrated deficits in vocal affect recognition under conditions requiring use of semantic cues and conditions requiring use of prosodic cues. Moreover, both primary and secondary psychopaths exhibited relatively similar emotional deficits in the semantic analysis condition compared to nonpsychopathic control participants. This study demonstrates that psychopaths’ vocal affect recognition deficits are not due to methodological limitations of previous studies and provides preliminary evidence that primary and secondary psychopaths exhibit generally similar deficits in vocal affect recognition. PMID:19413412
Speizer, Ilene S; Mandal, Mahua; Xiong, Khou; Hattori, Aiko; Makina-Zimalirana, Ndinda; Kumalo, Faith; Taylor, Stephen; Ndlovu, Muzi S; Madibane, Mathata; Beke, Andy
2018-04-01
In South Africa, adolescents and young adults (ages 15-24) are at risk of HIV, sexually transmitted infections, and unintended pregnancies. Recently, the Department of Basic Education has revised its sexuality education content and teaching strategies (using scripted lessons plans) as part of its life orientation curriculum. This paper presents the methodology and baseline results from the evaluation of the scripted lesson plans and supporting activities. A rigorous cluster-level randomized design with random assignment of schools as clusters is used for the evaluation. Baseline results from grade 8 female and male learners and grade 10 female learners demonstrate that learners are at risk of HIV and early and unintended pregnancies. Multivariable analyses demonstrate that household-level food insecurity and living with an HIV-positive person are associated with sexual experience and pregnancy experience. Implications are discussed for strengthening the current life orientation program for future scale-up by the government of South Africa.
Wavelet methodology to improve single unit isolation in primary motor cortex cells
Ortiz-Rosario, Alexis; Adeli, Hojjat; Buford, John A.
2016-01-01
The proper isolation of action potentials recorded extracellularly from neural tissue is an active area of research in the fields of neuroscience and biomedical signal processing. This paper presents an isolation methodology for neural recordings using the wavelet transform (WT), a statistical thresholding scheme, and the principal component analysis (PCA) algorithm. The effectiveness of five different mother wavelets was investigated: biorthogonal, Daubachies, discrete Meyer, symmetric, and Coifman; along with three different wavelet coefficient thresholding schemes: fixed form threshold, Stein’s unbiased estimate of risk, and minimax; and two different thresholding rules: soft and hard thresholding. The signal quality was evaluated using three different statistical measures: mean-squared error, root-mean squared, and signal to noise ratio. The clustering quality was evaluated using two different statistical measures: isolation distance, and L-ratio. This research shows that the selection of the mother wavelet has a strong influence on the clustering and isolation of single unit neural activity, with the Daubachies 4 wavelet and minimax thresholding scheme performing the best. PMID:25794461
The properties of small Ag clusters bound to DNA bases.
Soto-Verdugo, Víctor; Metiu, Horia; Gwinn, Elisabeth
2010-05-21
We study the binding of neutral silver clusters, Ag(n) (n=1-6), to the DNA bases adenine (A), cytosine (C), guanine (G), and thymine (T) and the absorption spectra of the silver cluster-base complexes. Using density functional theory (DFT), we find that the clusters prefer to bind to the doubly bonded ring nitrogens and that binding to T is generally much weaker than to C, G, and A. Ag(3) and Ag(4) make the stronger bonds. Bader charge analysis indicates a mild electron transfer from the base to the clusters for all bases, except T. The donor bases (C, G, and A) bind to the sites on the cluster where the lowest unoccupied molecular orbital has a pronounced protrusion. The site where cluster binds to the base is controlled by the shape of the higher occupied states of the base. Time-dependent DFT calculations show that different base-cluster isomers may have very different absorption spectra. In particular, we find new excitations in base-cluster molecules, at energies well below those of the isolated components, and with strengths that depend strongly on the orientations of planar clusters with respect to the base planes. Our results suggest that geometric constraints on binding, imposed by designed DNA structures, may be a feasible route to engineering the selection of specific cluster-base assemblies.
Spatial event cluster detection using an approximate normal distribution.
Torabi, Mahmoud; Rosychuk, Rhonda J
2008-12-12
In geographic surveillance of disease, areas with large numbers of disease cases are to be identified so that investigations of the causes of high disease rates can be pursued. Areas with high rates are called disease clusters and statistical cluster detection tests are used to identify geographic areas with higher disease rates than expected by chance alone. Typically cluster detection tests are applied to incident or prevalent cases of disease, but surveillance of disease-related events, where an individual may have multiple events, may also be of interest. Previously, a compound Poisson approach that detects clusters of events by testing individual areas that may be combined with their neighbours has been proposed. However, the relevant probabilities from the compound Poisson distribution are obtained from a recursion relation that can be cumbersome if the number of events are large or analyses by strata are performed. We propose a simpler approach that uses an approximate normal distribution. This method is very easy to implement and is applicable to situations where the population sizes are large and the population distribution by important strata may differ by area. We demonstrate the approach on pediatric self-inflicted injury presentations to emergency departments and compare the results for probabilities based on the recursion and the normal approach. We also implement a Monte Carlo simulation to study the performance of the proposed approach. In a self-inflicted injury data example, the normal approach identifies twelve out of thirteen of the same clusters as the compound Poisson approach, noting that the compound Poisson method detects twelve significant clusters in total. Through simulation studies, the normal approach well approximates the compound Poisson approach for a variety of different population sizes and case and event thresholds. A drawback of the compound Poisson approach is that the relevant probabilities must be determined through a recursion relation and such calculations can be computationally intensive if the cluster size is relatively large or if analyses are conducted with strata variables. On the other hand, the normal approach is very flexible, easily implemented, and hence, more appealing for users. Moreover, the concepts may be more easily conveyed to non-statisticians interested in understanding the methodology associated with cluster detection test results.
Mallik, Saurav; Zhao, Zhongming
2017-12-28
For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures-weighted rank-based Jaccard and Cosine measures-and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s) through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm-RANWAR-was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.
The Frontier Fields lens modelling comparison project
NASA Astrophysics Data System (ADS)
Meneghetti, M.; Natarajan, P.; Coe, D.; Contini, E.; De Lucia, G.; Giocoli, C.; Acebron, A.; Borgani, S.; Bradac, M.; Diego, J. M.; Hoag, A.; Ishigaki, M.; Johnson, T. L.; Jullo, E.; Kawamata, R.; Lam, D.; Limousin, M.; Liesenborgs, J.; Oguri, M.; Sebesta, K.; Sharon, K.; Williams, L. L. R.; Zitrin, A.
2017-12-01
Gravitational lensing by clusters of galaxies offers a powerful probe of their structure and mass distribution. Several research groups have developed techniques independently to achieve this goal. While these methods have all provided remarkably high-precision mass maps, particularly with exquisite imaging data from the Hubble Space Telescope (HST), the reconstructions themselves have never been directly compared. In this paper, we present for the first time a detailed comparison of methodologies for fidelity, accuracy and precision. For this collaborative exercise, the lens modelling community was provided simulated cluster images that mimic the depth and resolution of the ongoing HST Frontier Fields. The results of the submitted reconstructions with the un-blinded true mass profile of these two clusters are presented here. Parametric, free-form and hybrid techniques have been deployed by the participating groups and we detail the strengths and trade-offs in accuracy and systematics that arise for each methodology. We note in conclusion that several properties of the lensing clusters are recovered equally well by most of the lensing techniques compared in this study. For example, the reconstruction of azimuthally averaged density and mass profiles by both parametric and free-form methods matches the input models at the level of ∼10 per cent. Parametric techniques are generally better at recovering the 2D maps of the convergence and of the magnification. For the best-performing algorithms, the accuracy in the magnification estimate is ∼10 per cent at μtrue = 3 and it degrades to ∼30 per cent at μtrue ∼ 10.
Transport in the Subtropical Lowermost Stratosphere during CRYSTAL-FACE
NASA Technical Reports Server (NTRS)
Pittman, Jasna V.; Weinstock, elliot M.; Oglesby, Robert J.; Sayres, David S.; Smith, Jessica B.; Anderson, James G.; Cooper, Owen R.; Wofsy, Steven C.; Xueref, Irene; Gerbig, Cristoph;
2007-01-01
We use in situ measurements of water vapor (H2O), ozone (O3), carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), and total reactive nitrogen (NO(y)) obtained during the CRYSTAL-FACE campaign in July 2002 to study summertime transport in the subtropical lowermost stratosphere. We use an objective methodology to distinguish the latitudinal origin of the sampled air masses despite the influence of convection, and we calculate backward trajectories to elucidate their recent geographical history. The methodology consists of exploring the statistical behavior of the data by performing multivariate clustering and agglomerative hierarchical clustering calculations, and projecting cluster groups onto principal component space to identify air masses of like composition and hence presumed origin. The statistically derived cluster groups are then examined in physical space using tracer-tracer correlation plots. Interpretation of the principal component analysis suggests that the variability in the data is accounted for primarily by the mean age of air in the stratosphere, followed by the age of the convective influence, and lastly by the extent of convective influence, potentially related to the latitude of convective injection [Dessler and Sherwuud, 2004]. We find that high-latitude stratospheric air is the dominant source region during the beginning of the campaign while tropical air is the dominant source region during the rest of the campaign. Influence of convection from both local and non-local events is frequently observed. The identification of air mass origin is confirmed with backward trajectories, and the behavior of the trajectories is associated with the North American monsoon circulation.
NASA Technical Reports Server (NTRS)
Pittman, Jasna V.; Weinstock, Elliot M.; Oglesby, Robert J.; Sayres, David S.; Smith, Jessica B.; Anderson, James G.; Cooper, Owen R.; Wofsy, Steven C.; Xueref, Irene; Gerbig, Cristoph;
2007-01-01
We use in situ measurements of water vapor (H2O), ozone (O3), carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), and total reactive nitrogen (NOy) obtained during the CRYSTAL-FACE campaign in July 2002 to study summertime transport in the subtropical lowermost stratosphere. We use an objective methodology to distinguish the latitudinal origin of the sampled air masses despite the influence of convection, and we calculate backward trajectories to elucidate their recent geographical history. The methodology consists of exploring the statistical behavior of the data by performing multivariate clustering and agglomerative hierarchical clustering calculations and projecting cluster groups onto principal component space to identify air masses of like composition and hence presumed origin. The statistically derived cluster groups are then examined in physical space using tracer-tracer correlation plots. Interpretation of the principal component analysis suggests that the variability in the data is accounted for primarily by the mean age of air in the stratosphere, followed by the age of the convective influence, and last by the extent of convective influence, potentially related to the latitude of convective injection (Dessler and Sherwood, 2004). We find that high-latitude stratospheric air is the dominant source region during the beginning of the campaign while tropical air is the dominant source region during the rest of the campaign. Influence of convection from both local and nonlocal events is frequently observed. The identification of air mass origin is confirmed with backward trajectories, and the behavior of the trajectories is associated with the North American monsoon circulation.
Heinmüller, Stefan; Schneider, Antonius; Linde, Klaus
2016-04-23
Academic infrastructures and networks for clinical research in primary care receive little funding in Germany. We aimed to provide an overview of the quantity, topics, methods and findings of randomised controlled trials published by German university departments of general practice. We searched Scopus (last search done in April 2015), publication lists of institutes and references of included articles. We included randomised trials published between January 2000 and December 2014 with a first or last author affiliated with a German university department of general practice or family medicine. Risk of bias was assessed with the Cochrane tool, and study findings were quantified using standardised mean differences (SMDs). Thirty-three trials met the inclusion criteria. Seventeen were cluster-randomised trials, with a majority investigating interventions aimed at improving processes compared with usual care. Sample sizes varied between 6 and 606 clusters and 168 and 7807 participants. The most frequent methodological problem was risk of selection bias due to recruitment of individuals after randomisation of clusters. Effects of interventions over usual care were mostly small (SMD <0.3). Sixteen trials randomising individual participants addressed a variety of treatment and educational interventions. Sample sizes varied between 20 and 1620 participants. The methodological quality of the trials was highly variable. Again, effects of experimental interventions over controls were mostly small. Despite limited funding, German university institutes of general practice or family medicine are increasingly performing randomised trials. Cluster-randomised trials on practice improvement are a focus, but problems with allocation concealment are frequent.
Could the outcome of the 2016 US elections have been predicted from past voting patterns?
NASA Astrophysics Data System (ADS)
Schmitz, Peter M. U.; Holloway, Jennifer P.; Dudeni-Tlhone, Nontembeko; Ntlangu, Mbulelo B.; Koen, Renee
2018-05-01
In South Africa, a team of analysts has for some years been using statistical techniques to predict election outcomes during election nights in South Africa. The prediction method involves using statistical clusters based on past voting patterns to predict final election outcomes, using a small number of released vote counts. With the US presidential elections in November 2016 hitting the global media headlines during the time period directly after successful predictions were done for the South African elections, the team decided to investigate adapting their meth-od to forecast the final outcome in the US elections. In particular, it was felt that the time zone differences between states would affect the time at which results are released and thereby provide a window of opportunity for doing election night prediction using only the early results from the eastern side of the US. Testing the method on the US presidential elections would have two advantages: it would determine whether the core methodology could be generalised, and whether it would work to include a stronger spatial element in the modelling, since the early results released would be spatially biased due to time zone differences. This paper presents a high-level view of the overall methodology and how it was adapted to predict the results of the US presidential elections. A discussion on the clustering of spatial units within the US is also provided and the spatial distribution of results together with the Electoral College prediction results from both a `test-run' and the final 2016 presidential elections are given and analysed.
Aerial Images and Convolutional Neural Network for Cotton Bloom Detection.
Xu, Rui; Li, Changying; Paterson, Andrew H; Jiang, Yu; Sun, Shangpeng; Robertson, Jon S
2017-01-01
Monitoring flower development can provide useful information for production management, estimating yield and selecting specific genotypes of crops. The main goal of this study was to develop a methodology to detect and count cotton flowers, or blooms, using color images acquired by an unmanned aerial system. The aerial images were collected from two test fields in 4 days. A convolutional neural network (CNN) was designed and trained to detect cotton blooms in raw images, and their 3D locations were calculated using the dense point cloud constructed from the aerial images with the structure from motion method. The quality of the dense point cloud was analyzed and plots with poor quality were excluded from data analysis. A constrained clustering algorithm was developed to register the same bloom detected from different images based on the 3D location of the bloom. The accuracy and incompleteness of the dense point cloud were analyzed because they affected the accuracy of the 3D location of the blooms and thus the accuracy of the bloom registration result. The constrained clustering algorithm was validated using simulated data, showing good efficiency and accuracy. The bloom count from the proposed method was comparable with the number counted manually with an error of -4 to 3 blooms for the field with a single plant per plot. However, more plots were underestimated in the field with multiple plants per plot due to hidden blooms that were not captured by the aerial images. The proposed methodology provides a high-throughput method to continuously monitor the flowering progress of cotton.
Perez, Manolo F; Carstens, Bryan C; Rodrigues, Gustavo L; Moraes, Evandro M
2016-02-01
The Pilosocereus aurisetus complex consists of eight cactus species with a fragmented distribution associated to xeric enclaves within the Cerrado biome in eastern South America. The phylogeny of these species is incompletely resolved, and this instability complicates evolutionary analyses. Previous analyses based on both plastid and microsatellite markers suggested that this complex contained species with inherent phylogeographic structure, which was attributed to recent diversification and recurring range shifts. However, limitations of the molecular markers used in these analyses prevented some questions from being properly addressed. In order to better understand the relationship among these species and make a preliminary assessment of the genetic structure within them, we developed anonymous nuclear loci from pyrosequencing data of 40 individuals from four species in the P. aurisetus complex. The data obtained from these loci were used to identify genetic clusters within species, and to investigate the phylogenetic relationship among these inferred clusters using a species tree methodology. Coupled with a palaeodistributional modelling, our results reveal a deep phylogenetic and climatic disjunction between two geographic lineages. Our results highlight the importance of sampling more regions from the genome to gain better insights on the evolution of species with an intricate evolutionary history. The methodology used here provides a feasible approach to develop numerous genealogical molecular markers throughout the genome for non-model species. These data provide a more robust hypothesis for the relationship among the lineages of the P. aurisetus complex. Copyright © 2015 Elsevier Inc. All rights reserved.
Indicator-based approach to assess sustainability of current and projected water use in Korea
NASA Astrophysics Data System (ADS)
Kong, I.; Kim, I., Sr.
2016-12-01
Recently occurred failures in water supply system derived from lacking rainfall in Korea has raised severe concerns about limited water resources exacerbated by anthropogenic drivers as well as climatic changes. Since Korea is under unprecedented changes in both social and environmental aspects, it is required to integrate social and environmental changes as well as climate factors in order to consider underlying problems and their upcoming impacts on sustainable water use. In this study, we proposed a framework to assess multilateral aspects in sustainable water use in support of performance-based monitoring. The framework is consisted of four thematic indices (climate, infrastructure, pollution, and management capacity) and subordinate indicators. Second, in order to project future circumstances, climate variability, demographic, and land cover scenarios to 2050 were applied after conducting statistical analysis identifying correlations between indicators within the framework since water crisis are caused by numerous interrelated factors. Assessment was conducted throughout 161 administrative boundaries in Korea at the time of 2010, 2030, and 2050. Third, current and future status in water use were illustrated using GIS-based methodology and statistical clustering (K-means and HCA) to elucidate spatially explicit maps and to categorize administrative regions showing similar phenomenon in the future. Based on conspicuous results shown in spatial analysis and clustering method, we suggested policy implementations to navigate local communities to decide which countermeasures should be supplemented or adopted to increase resiliency to upcoming changes in water use environments.
Amaya-Amaya, Jenny; Rojas-Villarraga, Adriana; Molano-Gonzalez, Nicolas; Montoya-Sánchez, Laura; Nath, Swapan K.; Anaya, Juan-Manuel
2015-01-01
Objective. Rheumatoid arthritis (RA) is the most common autoimmune arthropathy worldwide. The increased prevalence of cardiovascular disease (CVD) in RA is not fully explained by classic risk factors. The aim of this study was to determine the influence of rs1058587 SNP within GDF15(MIC1) gene on the risk of CVD in a Colombian RA population. Methods. This was a cross-sectional analytical study in which 310 consecutive Colombian patients with RA and 228 age- and sex-matched controls were included and assessed for variables associated with CVD. The mixed cluster methodology based on multivariate descriptive methods such as principal components analysis and multiple correspondence analyses and regression tree (CART) predictive model were performed. Results. Of the 310 patients, 87.4% were women and CVD was reported in 69.5%. Significant differences concerning GDF15 polymorphism were not observed between patients and controls. Mean arterial pressure, current smoking, and some clusters were significantly associated with CVD. Conclusion. GDF15 (rs1058587) does not influence the development of CVD in the population studied. PMID:26090487
Pilot testing model to uncover industrial symbiosis in Brazilian industrial clusters.
Saraceni, Adriana Valélia; Resende, Luis Mauricio; de Andrade Júnior, Pedro Paulo; Pontes, Joseane
2017-04-01
The main objective of this study was to create a pilot model to uncover industrial symbiosis practices in Brazilian industrial clusters. For this purpose, a systematic revision was conducted in journals selected from two categories of the ISI Web of Knowledge: Engineering, Environmental and Engineering, Industrial. After an in-depth revision of literature, results allowed the creation of an analysis structure. A methodology based on fuzzy logic was applied and used to attribute the weights of industrial symbiosis variables. It was thus possible to extract the intensity indicators of the interrelations required to analyse the development level of each correlation between the variables. Determination of variables and their weights initially resulted in a framework for the theory of industrial symbiosis assessments. Research results allowed the creation of a pilot model that could precisely identify the loopholes or development levels in each sphere. Ontology charts for data analysis were also generated. This study contributes to science by presenting the foundations for building an instrument that enables application and compilation of the pilot model, in order to identify opportunity to symbiotic development, which derives from "uncovering" existing symbioses.
Hierarchical Matching and Regression with Application to Photometric Redshift Estimation
NASA Astrophysics Data System (ADS)
Murtagh, Fionn
2017-06-01
This work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and similarity-based analytics that takes account of discrete relationships in the data. The information structure of the data is represented by a hierarchy or tree where the branch structure, rather than just the proximity, is important. The representation is related to p-adic number theory. The clustering or binning of the data values, related to the precision of the measurements, has a central role in this methodology. If used for regression, our approach is a method of cluster-wise regression, generalizing nearest neighbour regression. Both to exemplify this analytics approach, and to demonstrate computational benefits, we address the well-known photometric redshift or `photo-z' problem, seeking to match Sloan Digital Sky Survey (SDSS) spectroscopic and photometric redshifts.
Cloud-Assisted UAV Data Collection for Multiple Emerging Events in Distributed WSNs
Cao, Huiru; Liu, Yongxin; Yue, Xuejun; Zhu, Wenjian
2017-01-01
In recent years, UAVs (Unmanned Aerial Vehicles) have been widely applied for data collection and image capture. Specifically, UAVs have been integrated with wireless sensor networks (WSNs) to create data collection platforms with high flexibility. However, most studies in this domain focus on system architecture and UAVs’ flight trajectory planning while event-related factors and other important issues are neglected. To address these challenges, we propose a cloud-assisted data gathering strategy for UAV-based WSN in the light of emerging events. We also provide a cloud-assisted approach for deriving UAV’s optimal flying and data acquisition sequence of a WSN cluster. We validate our approach through simulations and experiments. It has been proved that our methodology outperforms conventional approaches in terms of flying time, energy consumption, and integrity of data acquisition. We also conducted a real-world experiment using a UAV to collect data wirelessly from multiple clusters of sensor nodes for monitoring an emerging event, which are deployed in a farm. Compared against the traditional method, this proposed approach requires less than half the flying time and achieves almost perfect data integrity. PMID:28783100
Cloud-Assisted UAV Data Collection for Multiple Emerging Events in Distributed WSNs.
Cao, Huiru; Liu, Yongxin; Yue, Xuejun; Zhu, Wenjian
2017-08-07
In recent years, UAVs (Unmanned Aerial Vehicles) have been widely applied for data collection and image capture. Specifically, UAVs have been integrated with wireless sensor networks (WSNs) to create data collection platforms with high flexibility. However, most studies in this domain focus on system architecture and UAVs' flight trajectory planning while event-related factors and other important issues are neglected. To address these challenges, we propose a cloud-assisted data gathering strategy for UAV-based WSN in the light of emerging events. We also provide a cloud-assisted approach for deriving UAV's optimal flying and data acquisition sequence of a WSN cluster. We validate our approach through simulations and experiments. It has been proved that our methodology outperforms conventional approaches in terms of flying time, energy consumption, and integrity of data acquisition. We also conducted a real-world experiment using a UAV to collect data wirelessly from multiple clusters of sensor nodes for monitoring an emerging event, which are deployed in a farm. Compared against the traditional method, this proposed approach requires less than half the flying time and achieves almost perfect data integrity.
Putra, I Nyoman Giri; Syamsuni, Yuliana Fitri; Subhan, Beginer; Pharmawati, Made; Madduppa, Hawis
2018-01-01
The Indo-Malay Archipelago is regarded as a barrier that separates organisms of the Indian and Pacific Oceans. Previous studies of marine biota from this region have found a variety of biogeographic barriers, seemingly dependent on taxon and methodology. Several hypotheses, such as emergence of the Sunda Shelf and recent physical oceanography, have been proposed to account for the genetic structuring of marine organisms in this region. Here, we used six microsatellite loci to infer genetic diversity, population differentiation and phylogeographic patterns of Enhalus acoroides across the Indo-Malay Archipelago. Heterozygosities were consistently high, and significant isolation-by-distance, consistent with restricted gene flow, was observed. Both a neighbour joining tree based on D A distance and Bayesian clustering revealed three major clusters of E. acoroides . Our results indicate that phylogeographic patterns of E. acoroides have possibly been influenced by glaciation and deglaciation during the Pleistocene. Recent physical oceanography such as the South Java Current and the Seasonally Reversing Current may also play a role in shaping the genetic patterns of E. acoroides .
Putra, I Nyoman Giri; Syamsuni, Yuliana Fitri; Subhan, Beginer; Pharmawati, Made
2018-01-01
The Indo-Malay Archipelago is regarded as a barrier that separates organisms of the Indian and Pacific Oceans. Previous studies of marine biota from this region have found a variety of biogeographic barriers, seemingly dependent on taxon and methodology. Several hypotheses, such as emergence of the Sunda Shelf and recent physical oceanography, have been proposed to account for the genetic structuring of marine organisms in this region. Here, we used six microsatellite loci to infer genetic diversity, population differentiation and phylogeographic patterns of Enhalus acoroides across the Indo-Malay Archipelago. Heterozygosities were consistently high, and significant isolation-by-distance, consistent with restricted gene flow, was observed. Both a neighbour joining tree based on DA distance and Bayesian clustering revealed three major clusters of E. acoroides. Our results indicate that phylogeographic patterns of E. acoroides have possibly been influenced by glaciation and deglaciation during the Pleistocene. Recent physical oceanography such as the South Java Current and the Seasonally Reversing Current may also play a role in shaping the genetic patterns of E. acoroides. PMID:29576933
Healthcare managers' leadership profiles in relation to perceptions of work stressors and stress.
Lornudd, Caroline; Bergman, David; Sandahl, Christer; von Thiele Schwarz, Ulrica
2016-05-03
Purpose The purpose of this study is to investigate the relationship between leadership profiles and differences in managers' own levels of work stress symptoms and perceptions of work stressors causing stress. Design/methodology/approach Cross-sectional data were used. Healthcare managers ( n = 188) rated three dimensions of their leadership behavior and levels of work stressors and stress. Hierarchical cluster analysis was performed to identify leadership profiles based on leadership behaviors. Differences in stress-related outcomes between profiles were assessed using one-way analysis of variance. Findings Four distinct clusters of leadership profiles were found. They discriminated in perception of work stressors and stress: the profile distinguished by the lowest mean in all behavior dimensions, exhibited a pattern with significantly more negative ratings compared to the other profiles. Practical implications This paper proposes that leadership profile is an individual factor involved in the stress process, including work stressors and stress, which may inform targeted health promoting interventions for healthcare managers. Originality/value This is the first study to investigate the relationship between leadership profiles and work stressors and stress in healthcare managers.
NASA Astrophysics Data System (ADS)
Falge, Eva; Brümmer, Christian; Schmullius, Christiane; Scholes, Robert; Twine, Wayne; Mudau, Azwitamisi; Midgley, Guy; Hickler, Thomas; Bradshaw, Karen; Lück, Wolfgang; Thiel-Clemen, Thomas; du Toit, Justin; Sankaran, Vaith; Kutsch, Werner
2016-04-01
Sub-Saharan Africa currently experiences significant changes in shrubland, savanna and mixed woodland ecosystems driving degradation, affecting fire frequency and water availability, and eventually fueling climate change. The project 'Adaptive Resilience of Southern African Ecosystems' (ARS AfricaE) conducts research and develops scenarios of ecosystem development under climate change, for management support in conservation or for planning rural area development. For a network of research clusters along an aridity gradient in South Africa, we measure greenhouse gas exchange, ecosystem structure and eco-physiological properties as affected by land use change at paired sites with natural and altered vegetation. We set up dynamic vegetation models and individual-based models to predict ecosystem dynamics under (post) disturbance managements. We monitor vegetation amount and heterogeneity using remotely sensed images and aerial photography over several decades to examine time series of land cover change. Finally, we investigate livelihood strategies with focus on carbon balance components to develop sustainable management strategies for disturbed ecosystems and land use change. Emphasis is given on validation of estimates obtained from eddy covariance, model approaches and satellite derivations. We envision our methodological approach on a network of research clusters a valuable means to investigate potential linkages to concepts of adaptive resilience.
The Analysis of Surface EMG Signals with the Wavelet-Based Correlation Dimension Method
Zhang, Yanyan; Wang, Jue
2014-01-01
Many attempts have been made to effectively improve a prosthetic system controlled by the classification of surface electromyographic (SEMG) signals. Recently, the development of methodologies to extract the effective features still remains a primary challenge. Previous studies have demonstrated that the SEMG signals have nonlinear characteristics. In this study, by combining the nonlinear time series analysis and the time-frequency domain methods, we proposed the wavelet-based correlation dimension method to extract the effective features of SEMG signals. The SEMG signals were firstly analyzed by the wavelet transform and the correlation dimension was calculated to obtain the features of the SEMG signals. Then, these features were used as the input vectors of a Gustafson-Kessel clustering classifier to discriminate four types of forearm movements. Our results showed that there are four separate clusters corresponding to different forearm movements at the third resolution level and the resulting classification accuracy was 100%, when two channels of SEMG signals were used. This indicates that the proposed approach can provide important insight into the nonlinear characteristics and the time-frequency domain features of SEMG signals and is suitable for classifying different types of forearm movements. By comparing with other existing methods, the proposed method exhibited more robustness and higher classification accuracy. PMID:24868240
Kwong, C. K.; Fung, K. Y.; Jiang, Huimin; Chan, K. Y.
2013-01-01
Affective design is an important aspect of product development to achieve a competitive edge in the marketplace. A neural-fuzzy network approach has been attempted recently to model customer satisfaction for affective design and it has been proved to be an effective one to deal with the fuzziness and non-linearity of the modeling as well as generate explicit customer satisfaction models. However, such an approach to modeling customer satisfaction has two limitations. First, it is not suitable for the modeling problems which involve a large number of inputs. Second, it cannot adapt to new data sets, given that its structure is fixed once it has been developed. In this paper, a modified dynamic evolving neural-fuzzy approach is proposed to address the above mentioned limitations. A case study on the affective design of mobile phones was conducted to illustrate the effectiveness of the proposed methodology. Validation tests were conducted and the test results indicated that: (1) the conventional Adaptive Neuro-Fuzzy Inference System (ANFIS) failed to run due to a large number of inputs; (2) the proposed dynamic neural-fuzzy model outperforms the subtractive clustering-based ANFIS model and fuzzy c-means clustering-based ANFIS model in terms of their modeling accuracy and computational effort. PMID:24385884
Kwong, C K; Fung, K Y; Jiang, Huimin; Chan, K Y; Siu, Kin Wai Michael
2013-01-01
Affective design is an important aspect of product development to achieve a competitive edge in the marketplace. A neural-fuzzy network approach has been attempted recently to model customer satisfaction for affective design and it has been proved to be an effective one to deal with the fuzziness and non-linearity of the modeling as well as generate explicit customer satisfaction models. However, such an approach to modeling customer satisfaction has two limitations. First, it is not suitable for the modeling problems which involve a large number of inputs. Second, it cannot adapt to new data sets, given that its structure is fixed once it has been developed. In this paper, a modified dynamic evolving neural-fuzzy approach is proposed to address the above mentioned limitations. A case study on the affective design of mobile phones was conducted to illustrate the effectiveness of the proposed methodology. Validation tests were conducted and the test results indicated that: (1) the conventional Adaptive Neuro-Fuzzy Inference System (ANFIS) failed to run due to a large number of inputs; (2) the proposed dynamic neural-fuzzy model outperforms the subtractive clustering-based ANFIS model and fuzzy c-means clustering-based ANFIS model in terms of their modeling accuracy and computational effort.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harper, Bryan; Thomas, Dennis G.; Chikkagoudar, Satish
The integration of rapid assays, large data sets, informatics and modeling can overcome current barriers in understanding nanomaterial structure-toxicity relationships by providing a weight-of-the-evidence mechanism to generate hazard rankings for nanomaterials. Here we present the use of a rapid, low-cost assay to perform screening-level toxicity evaluations of nanomaterials in vivo. Calculated EZ Metric scores, a combined measure of morbidity and mortality, were established at realistic exposure levels and used to develop a predictive model of nanomaterial toxicity. Hazard ranking and clustering analysis of 68 diverse nanomaterials revealed distinct patterns of toxicity related to both core composition and outermost surface chemistrymore » of nanomaterials. The resulting clusters guided the development of a predictive model of gold nanoparticle toxicity to embryonic zebrafish. In addition, our findings suggest that risk assessments based on the size and core composition of nanomaterials alone may be wholly inappropriate, especially when considering complex engineered nanomaterials. These findings reveal the need to expeditiously increase the availability of quantitative measures of nanomaterial hazard and broaden the sharing of that data and knowledge to support predictive modeling. In addition, research should continue to focus on methodologies for developing predictive models of nanomaterial hazard based on sub-lethal responses to low dose exposures.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lara-Castells, María Pilar de, E-mail: Pilar.deLara.Castells@csic.es; Mitrushchenkov, Alexander O.; Stoll, Hermann
2015-09-14
A combined density functional (DFT) and incremental post-Hartree-Fock (post-HF) approach, proven earlier to calculate He-surface potential energy surfaces [de Lara-Castells et al., J. Chem. Phys. 141, 151102 (2014)], is applied to describe the van der Waals dominated Ag{sub 2}/graphene interaction. It extends the dispersionless density functional theory developed by Pernal et al. [Phys. Rev. Lett. 103, 263201 (2009)] by including periodic boundary conditions while the dispersion is parametrized via the method of increments [H. Stoll, J. Chem. Phys. 97, 8449 (1992)]. Starting with the elementary cluster unit of the target surface (benzene), continuing through the realistic cluster model (coronene), andmore » ending with the periodic model of the extended system, modern ab initio methodologies for intermolecular interactions as well as state-of-the-art van der Waals-corrected density functional-based approaches are put together both to assess the accuracy of the composite scheme and to better characterize the Ag{sub 2}/graphene interaction. The present work illustrates how the combination of DFT and post-HF perspectives may be efficient to design simple and reliable ab initio-based schemes in extended systems for surface science applications.« less
Segmentation of Polarimetric SAR Images Usig Wavelet Transformation and Texture Features
NASA Astrophysics Data System (ADS)
Rezaeian, A.; Homayouni, S.; Safari, A.
2015-12-01
Polarimetric Synthetic Aperture Radar (PolSAR) sensors can collect useful observations from earth's surfaces and phenomena for various remote sensing applications, such as land cover mapping, change and target detection. These data can be acquired without the limitations of weather conditions, sun illumination and dust particles. As result, SAR images, and in particular Polarimetric SAR (PolSAR) are powerful tools for various environmental applications. Unlike the optical images, SAR images suffer from the unavoidable speckle, which causes the segmentation of this data difficult. In this paper, we use the wavelet transformation for segmentation of PolSAR images. Our proposed method is based on the multi-resolution analysis of texture features is based on wavelet transformation. Here, we use the information of gray level value and the information of texture. First, we produce coherency or covariance matrices and then generate span image from them. In the next step of proposed method is texture feature extraction from sub-bands is generated from discrete wavelet transform (DWT). Finally, PolSAR image are segmented using clustering methods as fuzzy c-means (FCM) and k-means clustering. We have applied the proposed methodology to full polarimetric SAR images acquired by the Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) L-band system, during July, in 2012 over an agricultural area in Winnipeg, Canada.
Harper, Bryan; Thomas, Dennis G.; Chikkagoudar, Satish; ...
2015-06-04
The integration of rapid assays, large data sets, informatics and modeling can overcome current barriers in understanding nanomaterial structure-toxicity relationships by providing a weight-of-the-evidence mechanism to generate hazard rankings for nanomaterials. Here we present the use of a rapid, low-cost assay to perform screening-level toxicity evaluations of nanomaterials in vivo. Calculated EZ Metric scores, a combined measure of morbidity and mortality, were established at realistic exposure levels and used to develop a predictive model of nanomaterial toxicity. Hazard ranking and clustering analysis of 68 diverse nanomaterials revealed distinct patterns of toxicity related to both core composition and outermost surface chemistrymore » of nanomaterials. The resulting clusters guided the development of a predictive model of gold nanoparticle toxicity to embryonic zebrafish. In addition, our findings suggest that risk assessments based on the size and core composition of nanomaterials alone may be wholly inappropriate, especially when considering complex engineered nanomaterials. These findings reveal the need to expeditiously increase the availability of quantitative measures of nanomaterial hazard and broaden the sharing of that data and knowledge to support predictive modeling. In addition, research should continue to focus on methodologies for developing predictive models of nanomaterial hazard based on sub-lethal responses to low dose exposures.« less
Drought prediction till 2100 under RCP 8.5 climate change scenarios for Korea
NASA Astrophysics Data System (ADS)
Park, Chang-Kyun; Byun, Hi-Ryong; Deo, Ravinesh; Lee, Bo-Ra
2015-07-01
An important step in mitigating the negative impacts of drought requires effective methodologies for predicting the future events. This study utilises the daily Effective Drought Index (EDI) to precisely and quantitatively predict future drought occurrences in Korea over the period 2014-2100. The EDI is computed from precipitation data generated by the regional climate model (HadGEM3-RA) under the Representative Concentration Pathway (RCP 8.5) scenario. Using this data for 678 grid points (12.5 km interval) groups of cluster regions with similar climates, the G1 (Northwest), G2 (Middle), G3 (Northeast) and G4 (Southern) regions, are constructed. Drought forecasting period is categorised into the early phase (EP, 2014-2040), middle phase (MP, 2041-2070) and latter phase (LP, 2071-2100). Future drought events are quantified and ranked according to the duration and intensity. Moreover, the occurrences of drought (when, where, how severe) within the clustered regions are represented as a spatial map over Korea. Based on the grid-point averages, the most severe future drought throughout the 87-year period are expected to occur in Namwon around 2039-2041 with peak intensity (minimum EDI) -3.54 and projected duration of 580 days. The most severe drought by cluster analysis is expected to occur in the G3 region with a mean intensity of -2.85 in 2027. Within the spatial area of investigation, 6.6 years of drought periodicity and a slight decrease in the peak intensity is noted. Finally a spatial-temporal drought map is constructed for all clusters and time-periods under consideration.
Drought Prediction till 2100 Under RCP 8.5 Climate Change Scenarios for Korea
NASA Astrophysics Data System (ADS)
Byun, H. R.; Park, C. K.; Deo, R. C.
2014-12-01
An important step in mitigating the negative impacts of drought requires effective methodologies for predicting the future events. This study utilizes the daily Effective Drought Index (EDI) to precisely and quantitatively predict future drought occurrences in Korea over the period 2014-2100. The EDI is computed from precipitation data generated by the regional climate model (HadGEM3-RA) under the Representative Concentration Pathway (RCP 8.5) scenario. Using this data for 678 grid points (12.5 km interval) groups of cluster regions with similar climates, the G1 (Northwest), G2 (Middle), G3 (Northeast) and G4 (Southern) regions, are constructed. Drought forecasting period is categorised into the early phase (EP, 2014-2040), middle phase (MP, 2041-2070) and latter phase (LP, 2071-2100). Future drought events are quantified and ranked according to the duration and intensity. Moreover, the occurrences of drought (when, where, how severe) within the clustered regions are represented as a spatial map over Korea. Based on the grid-point averages, the most severe future drought throughout the 87-year period are expected to occur in Namwon around 2039-2041 with peak intensity (minimum EDI) -3.54 and projected duration of 580 days. The most severe drought by cluster analysis is expected to occur in the G3 region with a mean intensity of -2.85 in 2027. Within the spatial area of investigation, 6 years of drought periodicity and a slight decrease in the peak intensity is noted. Finally a spatial-temporal drought map is constructed for all clusters and time-periods under consideration.
Dark Energy Survey Year 1 Results: galaxy mock catalogues for BAO
NASA Astrophysics Data System (ADS)
Avila, S.; Crocce, M.; Ross, A. J.; García-Bellido, J.; Percival, W. J.; Banik, N.; Camacho, H.; Kokron, N.; Chan, K. C.; Andrade-Oliveira, F.; Gomes, R.; Gomes, D.; Lima, M.; Rosenfeld, R.; Salvador, A. I.; Friedrich, O.; Abdalla, F. B.; Annis, J.; Benoit-Lévy, A.; Bertin, E.; Brooks, D.; Carrasco Kind, M.; Carretero, J.; Castander, F. J.; Cunha, C. E.; da Costa, L. N.; Davis, C.; De Vicente, J.; Doel, P.; Fosalba, P.; Frieman, J.; Gerdes, D. W.; Gruen, D.; Gruendl, R. A.; Gutierrez, G.; Hartley, W. G.; Hollowood, D.; Honscheid, K.; James, D. J.; Kuehn, K.; Kuropatkin, N.; Miquel, R.; Plazas, A. A.; Sanchez, E.; Scarpine, V.; Schindler, R.; Schubnell, M.; Sevilla-Noarbe, I.; Smith, M.; Sobreira, F.; Suchyta, E.; Swanson, M. E. C.; Tarle, G.; Thomas, D.; Walker, A. R.; Dark Energy Survey Collaboration
2018-05-01
Mock catalogues are a crucial tool in the analysis of galaxy surveys data, both for the accurate computation of covariance matrices, and for the optimisation of analysis methodology and validation of data sets. In this paper, we present a set of 1800 galaxy mock catalogues designed to match the Dark Energy Survey Year-1 BAO sample (Crocce et al. 2017) in abundance, observational volume, redshift distribution and uncertainty, and redshift dependent clustering. The simulated samples were built upon HALOGEN (Avila et al. 2015) halo catalogues, based on a 2LPT density field with an empirical halo bias. For each of them, a lightcone is constructed by the superposition of snapshots in the redshift range 0.45 < z < 1.4. Uncertainties introduced by so-called photometric redshifts estimators were modelled with a double-skewed-Gaussian curve fitted to the data. We populate halos with galaxies by introducing a hybrid Halo Occupation Distribution - Halo Abundance Matching model with two free parameters. These are adjusted to achieve a galaxy bias evolution b(zph) that matches the data at the 1-σ level in the range 0.6 < zph < 1.0. We further analyse the galaxy mock catalogues and compare their clustering to the data using the angular correlation function w(θ), the comoving transverse separation clustering ξμ < 0.8(s⊥) and the angular power spectrum Cℓ, finding them in agreement. This is the first large set of three-dimensional {ra,dec,z} galaxy mock catalogues able to simultaneously accurately reproduce the photometric redshift uncertainties and the galaxy clustering.
Sul, Woo Jun; Cole, James R.; Jesus, Ederson da C.; Wang, Qiong; Farris, Ryan J.; Fish, Jordan A.; Tiedje, James M.
2011-01-01
High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples. PMID:21873204
Functional interrogation of non-coding DNA through CRISPR genome editing.
Canver, Matthew C; Bauer, Daniel E; Orkin, Stuart H
2017-05-15
Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. Copyright © 2017 Elsevier Inc. All rights reserved.
Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data
Király, András; Abonyi, János
2014-01-01
During the last decade various algorithms have been developed and proposed for discovering overlapping clusters in high-dimensional data. The two most prominent application fields in this research, proposed independently, are frequent itemset mining (developed for market basket data) and biclustering (applied to gene expression data analysis). The common limitation of both methodologies is the limited applicability for very large binary data sets. In this paper we propose a novel and efficient method to find both frequent closed itemsets and biclusters in high-dimensional binary data. The method is based on simple but very powerful matrix and vector multiplication approaches that ensure that all patterns can be discovered in a fast manner. The proposed algorithm has been implemented in the commonly used MATLAB environment and freely available for researchers. PMID:24616651
Fast Object Motion Estimation Based on Dynamic Stixels.
Morales, Néstor; Morell, Antonio; Toledo, Jonay; Acosta, Leopoldo
2016-07-28
The stixel world is a simplification of the world in which obstacles are represented as vertical instances, called stixels, standing on a surface assumed to be planar. In this paper, previous approaches for stixel tracking are extended using a two-level scheme. In the first level, stixels are tracked by matching them between frames using a bipartite graph in which edges represent a matching cost function. Then, stixels are clustered into sets representing objects in the environment. These objects are matched based on the number of stixels paired inside them. Furthermore, a faster, but less accurate approach is proposed in which only the second level is used. Several configurations of our method are compared to an existing state-of-the-art approach to show how our methodology outperforms it in several areas, including an improvement in the quality of the depth reconstruction.
Groenewold, Matthew R
2006-01-01
Local health departments are among the first agencies to respond to disasters or other mass emergencies. However, they often lack the ability to handle large-scale events. Plans including locally developed and deployed tools may enhance local response. Simplified cluster sampling methods can be useful in assessing community needs after a sudden-onset, short duration event. Using an adaptation of the methodology used by the World Health Organization Expanded Programme on Immunization (EPI), a Microsoft Access-based application for two-stage cluster sampling of residential addresses in Louisville/Jefferson County Metro, Kentucky was developed. The sampling frame was derived from geographically referenced data on residential addresses and political districts available through the Louisville/Jefferson County Information Consortium (LOJIC). The program randomly selected 30 clusters, defined as election precincts, from within the area of interest, and then, randomly selected 10 residential addresses from each cluster. The program, called the Rapid Assessment Tools Package (RATP), was tested in terms of accuracy and precision using data on a dichotomous characteristic of residential addresses available from the local tax assessor database. A series of 30 samples were produced and analyzed with respect to their precision and accuracy in estimating the prevalence of the study attribute. Point estimates with 95% confidence intervals were calculated by determining the proportion of the study attribute values in each of the samples and compared with the population proportion. To estimate the design effect, corresponding simple random samples of 300 addresses were taken after each of the 30 cluster samples. The sample proportion fell within +/-10 absolute percentage points of the true proportion in 80% of the samples. In 93.3% of the samples, the point estimate fell within +/-12.5%, and 96.7% fell within +/-15%. All of the point estimates fell within +/-20% of the true proportion. Estimates of the design effect ranged from 0.926 to 1.436 (mean = 1.157, median = 1.170) for the 30 samples. Although prospective evaluation of its performance in field trials or a real emergency is required to confirm its utility, this study suggests that the RATP, a locally designed and deployed tool, may provide population-based estimates of community needs or the extent of event-related consequences that are precise enough to serve as the basis for the initial post-event decisions regarding relief efforts.
Rutterford, Clare; Taljaard, Monica; Dixon, Stephanie; Copas, Andrew; Eldridge, Sandra
2015-06-01
To assess the quality of reporting and accuracy of a priori estimates used in sample size calculations for cluster randomized trials (CRTs). We reviewed 300 CRTs published between 2000 and 2008. The prevalence of reporting sample size elements from the 2004 CONSORT recommendations was evaluated and a priori estimates compared with those observed in the trial. Of the 300 trials, 166 (55%) reported a sample size calculation. Only 36 of 166 (22%) reported all recommended descriptive elements. Elements specific to CRTs were the worst reported: a measure of within-cluster correlation was specified in only 58 of 166 (35%). Only 18 of 166 articles (11%) reported both a priori and observed within-cluster correlation values. Except in two cases, observed within-cluster correlation values were either close to or less than a priori values. Even with the CONSORT extension for cluster randomization, the reporting of sample size elements specific to these trials remains below that necessary for transparent reporting. Journal editors and peer reviewers should implement stricter requirements for authors to follow CONSORT recommendations. Authors should report observed and a priori within-cluster correlation values to enable comparisons between these over a wider range of trials. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Justen, Gisele C; Espinoza-Quiñones, Fernando R; Módenes, Aparecido Nivaldo; Bergamasco, Rosangela
2012-01-01
In this work the analysis of elements concentration in groundwater was performed using the synchrotron radiation total-reflection X-ray fluorescence (SR-TXRF) technique. A set of nine tube-wells with serious risk of contamination was chosen to monitor the mean concentration of elements in groundwater from the North Serra Geral aquifer in Santa Helena, Brazil, during 1 year. Element concentrations were determined applying a SR-TXRF methodology. The accuracy of SR-TXRF technique was validated by analysis of a certified reference material. As the groundwater composition in the North Serra Geral aquifer showed heterogeneity in the spatial distribution of eight major elements, a hierarchical clustering to the data was performed. By a similarity in their compositions, two of the nine wells were grouped in a first cluster, while the other seven were grouped in a second cluster. Calcium was the major element in all wells, with higher Ca concentration in the second cluster than in the first cluster. However, concentrations of Ti, V, Cr in the first cluster are slightly higher than those in the second cluster. The findings of this study within a monitoring program of tube-wells could provide a useful assessment of controls over groundwater composition and support management at regional level.
Membrane Order Is a Key Regulator of Divalent Cation-Induced Clustering of PI(3,5)P2 and PI(4,5)P2.
Sarmento, Maria J; Coutinho, Ana; Fedorov, Aleksander; Prieto, Manuel; Fernandes, Fábio
2017-10-31
Although the evidence for the presence of functionally important nanosized phosphorylated phosphoinositide (PIP)-rich domains within cellular membranes has accumulated, very limited information is available regarding the structural determinants for compartmentalization of these phospholipids. Here, we used a combination of fluorescence spectroscopy and microscopy techniques to characterize differences in divalent cation-induced clustering of PI(4,5)P 2 and PI(3,5)P 2 . Through these methodologies we were able to detect differences in divalent cation-induced clustering efficiency and cluster size. Ca 2+ -induced PI(4,5)P 2 clusters are shown to be significantly larger than the ones observed for PI(3,5)P 2 . Clustering of PI(4,5)P 2 is also detected at physiological concentrations of Mg 2+ , suggesting that in cellular membranes, these molecules are constitutively driven to clustering by the high intracellular concentration of divalent cations. Importantly, it is shown that lipid membrane order is a key factor in the regulation of clustering for both PIP isoforms, with a major impact on cluster sizes. Clustered PI(4,5)P 2 and PI(3,5)P 2 are observed to present considerably higher affinity for more ordered lipid phases than the monomeric species or than PI(4)P, possibly reflecting a more general tendency of clustered lipids for insertion into ordered domains. These results support a model for the description of the lateral organization of PIPs in cellular membranes, where both divalent cation interaction and membrane order are key modulators defining the lateral organization of these lipids.
Zhang, Zhaoyang; Fang, Hua; Wang, Honggang
2016-06-01
Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.
Zhang, Zhaoyang; Wang, Honggang
2016-01-01
Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering is more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063