Affective and cognitive factors influencing sensitivity to probabilistic information.
Tyszka, Tadeusz; Sawicki, Przemyslaw
2011-11-01
In study 1 different groups of female students were randomly assigned to one of four probabilistic information formats. Five different levels of probability of a genetic disease in an unborn child were presented to participants (within-subject factor). After the presentation of the probability level, participants were requested to indicate the acceptable level of pain they would tolerate to avoid the disease (in their unborn child), their subjective evaluation of the disease risk, and their subjective evaluation of being worried by this risk. The results of study 1 confirmed the hypothesis that an experience-based probability format decreases the subjective sense of worry about the disease, thus, presumably, weakening the tendency to overrate the probability of rare events. Study 2 showed that for the emotionally laden stimuli, the experience-based probability format resulted in higher sensitivity to probability variations than other formats of probabilistic information. These advantages of the experience-based probability format are interpreted in terms of two systems of information processing: the rational deliberative versus the affective experiential and the principle of stimulus-response compatibility. © 2011 Society for Risk Analysis.
Dinov, Martin; Leech, Robert
2017-01-01
Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses.
Dinov, Martin; Leech, Robert
2017-01-01
Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses. PMID:29163110
Frahm, Jan-Michael; Pollefeys, Marc Andre Leon; Gallup, David Robert
2015-12-08
Methods of generating a three dimensional representation of an object in a reference plane from a depth map including distances from a reference point to pixels in an image of the object taken from a reference point. Weights are assigned to respective voxels in a three dimensional grid along rays extending from the reference point through the pixels in the image based on the distances in the depth map from the reference point to the respective pixels, and a height map including an array of height values in the reference plane is formed based on the assigned weights. An n-layer height map may be constructed by generating a probabilistic occupancy grid for the voxels and forming an n-dimensional height map comprising an array of layer height values in the reference plane based on the probabilistic occupancy grid.
NASA Astrophysics Data System (ADS)
Klügel, J.
2006-12-01
Deterministic scenario-based seismic hazard analysis has a long tradition in earthquake engineering for developing the design basis of critical infrastructures like dams, transport infrastructures, chemical plants and nuclear power plants. For many applications besides of the design of infrastructures it is of interest to assess the efficiency of the design measures taken. These applications require a method allowing to perform a meaningful quantitative risk analysis. A new method for a probabilistic scenario-based seismic risk analysis has been developed based on a probabilistic extension of proven deterministic methods like the MCE- methodology. The input data required for the method are entirely based on the information which is necessary to perform any meaningful seismic hazard analysis. The method is based on the probabilistic risk analysis approach common for applications in nuclear technology developed originally by Kaplan & Garrick (1981). It is based (1) on a classification of earthquake events into different size classes (by magnitude), (2) the evaluation of the frequency of occurrence of events, assigned to the different classes (frequency of initiating events, (3) the development of bounding critical scenarios assigned to each class based on the solution of an optimization problem and (4) in the evaluation of the conditional probability of exceedance of critical design parameters (vulnerability analysis). The advantage of the method in comparison with traditional PSHA consists in (1) its flexibility, allowing to use different probabilistic models for earthquake occurrence as well as to incorporate advanced physical models into the analysis, (2) in the mathematically consistent treatment of uncertainties, and (3) in the explicit consideration of the lifetime of the critical structure as a criterion to formulate different risk goals. The method was applied for the evaluation of the risk of production interruption losses of a nuclear power plant during its residual lifetime.
Solving probability reasoning based on DNA strand displacement and probability modules.
Zhang, Qiang; Wang, Xiaobiao; Wang, Xiaojun; Zhou, Changjun
2017-12-01
In computation biology, DNA strand displacement technology is used to simulate the computation process and has shown strong computing ability. Most researchers use it to solve logic problems, but it is only rarely used in probabilistic reasoning. To process probabilistic reasoning, a conditional probability derivation model and total probability model based on DNA strand displacement were established in this paper. The models were assessed through the game "read your mind." It has been shown to enable the application of probabilistic reasoning in genetic diagnosis. Copyright © 2017 Elsevier Ltd. All rights reserved.
Modular analysis of the probabilistic genetic interaction network.
Hou, Lin; Wang, Lin; Qian, Minping; Li, Dong; Tang, Chao; Zhu, Yunping; Deng, Minghua; Li, Fangting
2011-03-15
Epistatic Miniarray Profiles (EMAP) has enabled the mapping of large-scale genetic interaction networks; however, the quantitative information gained from EMAP cannot be fully exploited since the data are usually interpreted as a discrete network based on an arbitrary hard threshold. To address such limitations, we adopted a mixture modeling procedure to construct a probabilistic genetic interaction network and then implemented a Bayesian approach to identify densely interacting modules in the probabilistic network. Mixture modeling has been demonstrated as an effective soft-threshold technique of EMAP measures. The Bayesian approach was applied to an EMAP dataset studying the early secretory pathway in Saccharomyces cerevisiae. Twenty-seven modules were identified, and 14 of those were enriched by gold standard functional gene sets. We also conducted a detailed comparison with state-of-the-art algorithms, hierarchical cluster and Markov clustering. The experimental results show that the Bayesian approach outperforms others in efficiently recovering biologically significant modules.
Rats bred for high alcohol drinking are more sensitive to delayed and probabilistic outcomes.
Wilhelm, C J; Mitchell, S H
2008-10-01
Alcoholics and heavy drinkers score higher on measures of impulsivity than nonalcoholics and light drinkers. This may be because of factors that predate drug exposure (e.g. genetics). This study examined the role of genetics by comparing impulsivity measures in ethanol-naive rats selectively bred based on their high [high alcohol drinking (HAD)] or low [low alcohol drinking (LAD)] consumption of ethanol. Replicates 1 and 2 of the HAD and LAD rats, developed by the University of Indiana Alcohol Research Center, completed two different discounting tasks. Delay discounting examines sensitivity to rewards that are delayed in time and is commonly used to assess 'choice' impulsivity. Probability discounting examines sensitivity to the uncertain delivery of rewards and has been used to assess risk taking and risk assessment. High alcohol drinking rats discounted delayed and probabilistic rewards more steeply than LAD rats. Discount rates associated with probabilistic and delayed rewards were weakly correlated, while bias was strongly correlated with discount rate in both delay and probability discounting. The results suggest that selective breeding for high alcohol consumption selects for animals that are more sensitive to delayed and probabilistic outcomes. Sensitivity to delayed or probabilistic outcomes may be predictive of future drinking in genetically predisposed individuals.
Precise Network Modeling of Systems Genetics Data Using the Bayesian Network Webserver.
Ziebarth, Jesse D; Cui, Yan
2017-01-01
The Bayesian Network Webserver (BNW, http://compbio.uthsc.edu/BNW ) is an integrated platform for Bayesian network modeling of biological datasets. It provides a web-based network modeling environment that seamlessly integrates advanced algorithms for probabilistic causal modeling and reasoning with Bayesian networks. BNW is designed for precise modeling of relatively small networks that contain less than 20 nodes. The structure learning algorithms used by BNW guarantee the discovery of the best (most probable) network structure given the data. To facilitate network modeling across multiple biological levels, BNW provides a very flexible interface that allows users to assign network nodes into different tiers and define the relationships between and within the tiers. This function is particularly useful for modeling systems genetics datasets that often consist of multiscalar heterogeneous genotype-to-phenotype data. BNW enables users to, within seconds or minutes, go from having a simply formatted input file containing a dataset to using a network model to make predictions about the interactions between variables and the potential effects of experimental interventions. In this chapter, we will introduce the functions of BNW and show how to model systems genetics datasets with BNW.
Weickert, Thomas W.; Goldberg, Terry E.; Egan, Michael F.; Apud, Jose A.; Meeter, Martijn; Myers, Catherine E.; Gluck, Mark A; Weinberger, Daniel R.
2010-01-01
Background While patients with schizophrenia display an overall probabilistic category learning performance deficit, the extent to which this deficit occurs in unaffected siblings of patients with schizophrenia is unknown. There are also discrepant findings regarding probabilistic category learning acquisition rate and performance in patients with schizophrenia. Methods A probabilistic category learning test was administered to 108 patients with schizophrenia, 82 unaffected siblings, and 121 healthy participants. Results Patients with schizophrenia displayed significant differences from their unaffected siblings and healthy participants with respect to probabilistic category learning acquisition rates. Although siblings on the whole failed to differ from healthy participants on strategy and quantitative indices of overall performance and learning acquisition, application of a revised learning criterion enabling classification into good and poor learners based on individual learning curves revealed significant differences between percentages of sibling and healthy poor learners: healthy (13.2%), siblings (34.1%), patients (48.1%), yielding a moderate relative risk. Conclusions These results clarify previous discrepant findings pertaining to probabilistic category learning acquisition rate in schizophrenia and provide the first evidence for the relative risk of probabilistic category learning abnormalities in unaffected siblings of patients with schizophrenia, supporting genetic underpinnings of probabilistic category learning deficits in schizophrenia. These findings also raise questions regarding the contribution of antipsychotic medication to the probabilistic category learning deficit in schizophrenia. The distinction between good and poor learning may be used to inform genetic studies designed to detect schizophrenia risk alleles. PMID:20172502
Optimization of Contrast Detection Power with Probabilistic Behavioral Information
Cordes, Dietmar; Herzmann, Grit; Nandy, Rajesh; Curran, Tim
2012-01-01
Recent progress in the experimental design for event-related fMRI experiments made it possible to find the optimal stimulus sequence for maximum contrast detection power using a genetic algorithm. In this study, a novel algorithm is proposed for optimization of contrast detection power by including probabilistic behavioral information, based on pilot data, in the genetic algorithm. As a particular application, a recognition memory task is studied and the design matrix optimized for contrasts involving the familiarity of individual items (pictures of objects) and the recollection of qualitative information associated with the items (left/right orientation). Optimization of contrast efficiency is a complicated issue whenever subjects’ responses are not deterministic but probabilistic. Contrast efficiencies are not predictable unless behavioral responses are included in the design optimization. However, available software for design optimization does not include options for probabilistic behavioral constraints. If the anticipated behavioral responses are included in the optimization algorithm, the design is optimal for the assumed behavioral responses, and the resulting contrast efficiency is greater than what either a block design or a random design can achieve. Furthermore, improvements of contrast detection power depend strongly on the behavioral probabilities, the perceived randomness, and the contrast of interest. The present genetic algorithm can be applied to any case in which fMRI contrasts are dependent on probabilistic responses that can be estimated from pilot data. PMID:22326984
Secrets in the eyes of Black Oystercatchers: A new sexing technique
Guzzetti, B.M.; Talbot, S.L.; Tessler, D.F.; Gill, V.A.; Murphy, E.C.
2008-01-01
Sexing oystercatchers in the field is difficult because males and females have identical plumage and are similar in size. Although Black Oystercatchers (Haematopus bachmani) are sexually dimorphic, using morphology to determine sex requires either capturing both pair members for comparison or using discriminant analyses to assign sex probabilistically based on morphometric traits. All adult Black Oystercatchers have bright yellow eyes, but some of them have dark specks, or eye flecks, in their irides. We hypothesized that this easily observable trait was sex-linked and could be used as a novel diagnostic tool for identifying sex. To test this, we compared data for oystercatchers from genetic molecular markers (CHD-W/CHD-Z and HINT-W/HINT-Z), morphometric analyses, and eye-fleck category (full eye flecks, slight eye flecks, and no eye flecks). Compared to molecular markers, we found that discriminant analyses based on morphological characteristics yielded variable results that were confounded by geographical differences in morphology. However, we found that eye flecks were sex-linked. Using an eye-fleck model where all females have full eye flecks and males have either slight eye flecks or no eye flecks, we correctly assigned the sex of 117 of 125 (94%) oystercatchers. Using discriminant analysis based on morphological characteristics, we correctly assigned the sex of 105 of 119 (88%) birds. Using the eye-fleck technique for sexing Black Oystercatchers may be preferable for some investigators because it is as accurate as discriminant analysis based on morphology and does not require capturing the birds. ??2008 Association of Field Ornithologists.
PREDICT: Privacy and Security Enhancing Dynamic Information Monitoring
2015-08-03
consisting of global server-side probabilistic assignment by an untrusted server using cloaked locations, followed by feedback-loop guided local...12], consisting of global server-side probabilistic assignment by an untrusted server using cloaked locations, followed by feedback-loop guided...these methods achieve high sensing coverage with low cost using cloaked locations [3]. In follow-on work, the issue of mobility is addressed. Task
Burgos-Paz, William; Cerón-Muñoz, Mario; Solarte-Portilla, Carlos
2011-10-01
The aim was to establish the genetic diversity and population structure of three guinea pig lines, from seven production zones located in Nariño, southwest Colombia. A total of 384 individuals were genotyped with six microsatellite markers. The measurement of intrapopulation diversity revealed allelic richness ranging from 3.0 to 6.56, and observed heterozygosity (Ho) from 0.33 to 0.60, with a deficit in heterozygous individuals. Although statistically significant (p < 0.05), genetic differentiation between population pairs was found to be low. Genetic distance, as well as clustering of guinea-pig lines and populations, coincided with the historical and geographical distribution of the populations. Likewise, high genetic identity between improved and native lines was established. An analysis of group probabilistic assignment revealed that each line should not be considered as a genetically homogeneous group. The findings corroborate the absorption of native genetic material into the improved line introduced into Colombia from Peru. It is necessary to establish conservation programs for native-line individuals in Nariño, and control genealogical and production records in order to reduce the inbreeding values in the populations.
Burgos-Paz, William; Cerón-Muñoz, Mario; Solarte-Portilla, Carlos
2011-01-01
The aim was to establish the genetic diversity and population structure of three guinea pig lines, from seven production zones located in Nariño, southwest Colombia. A total of 384 individuals were genotyped with six microsatellite markers. The measurement of intrapopulation diversity revealed allelic richness ranging from 3.0 to 6.56, and observed heterozygosity (Ho) from 0.33 to 0.60, with a deficit in heterozygous individuals. Although statistically significant (p < 0.05), genetic differentiation between population pairs was found to be low. Genetic distance, as well as clustering of guinea-pig lines and populations, coincided with the historical and geographical distribution of the populations. Likewise, high genetic identity between improved and native lines was established. An analysis of group probabilistic assignment revealed that each line should not be considered as a genetically homogeneous group. The findings corroborate the absorption of native genetic material into the improved line introduced into Colombia from Peru. It is necessary to establish conservation programs for native-line individuals in Nariño, and control genealogical and production records in order to reduce the inbreeding values in the populations. PMID:22215979
An ontology-based nurse call management system (oNCS) with probabilistic priority assessment
2011-01-01
Background The current, place-oriented nurse call systems are very static. A patient can only make calls with a button which is fixed to a wall of a room. Moreover, the system does not take into account various factors specific to a situation. In the future, there will be an evolution to a mobile button for each patient so that they can walk around freely and still make calls. The system would become person-oriented and the available context information should be taken into account to assign the correct nurse to a call. The aim of this research is (1) the design of a software platform that supports the transition to mobile and wireless nurse call buttons in hospitals and residential care and (2) the design of a sophisticated nurse call algorithm. This algorithm dynamically adapts to the situation at hand by taking the profile information of staff members and patients into account. Additionally, the priority of a call probabilistically depends on the risk factors, assigned to a patient. Methods The ontology-based Nurse Call System (oNCS) was developed as an extension of a Context-Aware Service Platform. An ontology is used to manage the profile information. Rules implement the novel nurse call algorithm that takes all this information into account. Probabilistic reasoning algorithms are designed to determine the priority of a call based on the risk factors of the patient. Results The oNCS system is evaluated through a prototype implementation and simulations, based on a detailed dataset obtained from Ghent University Hospital. The arrival times of nurses at the location of a call, the workload distribution of calls amongst nurses and the assignment of priorities to calls are compared for the oNCS system and the current, place-oriented nurse call system. Additionally, the performance of the system is discussed. Conclusions The execution time of the nurse call algorithm is on average 50.333 ms. Moreover, the oNCS system significantly improves the assignment of nurses to calls. Calls generally have a nurse present faster and the workload-distribution amongst the nurses improves. PMID:21294860
White, Shannon L.; Miller, William L.; Dowell, Stephanie A.; Bartron, Meredith L.; Wagner, Tyler
2018-01-01
Due to increased anthropogenic pressures on many fish populations, supplementing wild populations with captive‐raised individuals has become an increasingly common management practice. Stocking programs can be controversial due to uncertainty about the long‐term fitness effects of genetic introgression on wild populations. In particular, introgression between hatchery and wild individuals can cause declines in wild population fitness, resiliency, and adaptive potential, and contribute to local population extirpation. However, low survival and fitness of captive‐raised individuals can minimize the long‐term genetic consequences of stocking in wild populations, and to date the prevalence of introgression in actively stocked ecosystems has not been rigorously evaluated. We quantified the extent of introgression in 30 populations of wild brook trout (Salvelinus fontinalis) in a Pennsylvania watershed, and examined the correlation between introgression and 11 environmental covariates. Genetic assignment tests were used to determine the origin (wild vs. captive‐raised) for 1742 wild‐caught and 300 hatchery brook trout. To avoid assignment biases, individuals were assigned to two simulated populations that represented the average allele frequencies in wild and hatchery groups. Fish with intermediate probabilities of wild ancestry were classified as introgressed, with threshold values determined through simulation. Even with reoccurring stocking at most sites, over 93% of wild‐caught individuals probabilistically assigned to wild origin, and only 5.6% of wild‐caught fish assigned to introgressed. Models examining environmental drivers of introgression explained less than 3% of the among‐population variability, and all estimated effects were highly uncertain. This was not surprising given overall low introgression observed in this study. Our results suggest that introgression of hatchery‐derived genotypes can occur at low rates, even in actively stocked ecosystems and across a range of habitats. However, a cautious approach to stocking may still be warranted, as the potential effects of stocking on wild population fitness and the mechanisms limiting introgression are not known.
Kherfi, Mohammed Lamine; Ziou, Djemel
2006-04-01
In content-based image retrieval, understanding the user's needs is a challenging task that requires integrating him in the process of retrieval. Relevance feedback (RF) has proven to be an effective tool for taking the user's judgement into account. In this paper, we present a new RF framework based on a feature selection algorithm that nicely combines the advantages of a probabilistic formulation with those of using both the positive example (PE) and the negative example (NE). Through interaction with the user, our algorithm learns the importance he assigns to image features, and then applies the results obtained to define similarity measures that correspond better to his judgement. The use of the NE allows images undesired by the user to be discarded, thereby improving retrieval accuracy. As for the probabilistic formulation of the problem, it presents a multitude of advantages and opens the door to more modeling possibilities that achieve a good feature selection. It makes it possible to cluster the query data into classes, choose the probability law that best models each class, model missing data, and support queries with multiple PE and/or NE classes. The basic principle of our algorithm is to assign more importance to features with a high likelihood and those which distinguish well between PE classes and NE classes. The proposed algorithm was validated separately and in image retrieval context, and the experiments show that it performs a good feature selection and contributes to improving retrieval effectiveness.
Wang, Shi-Heng; Chen, Wei J; Tsai, Yu-Chin; Huang, Yung-Hsiang; Hwu, Hai-Gwo; Hsiao, Chuhsing K
2013-01-01
The copy number variation (CNV) is a type of genetic variation in the genome. It is measured based on signal intensity measures and can be assessed repeatedly to reduce the uncertainty in PCR-based typing. Studies have shown that CNVs may lead to phenotypic variation and modification of disease expression. Various challenges exist, however, in the exploration of CNV-disease association. Here we construct latent variables to infer the discrete CNV values and to estimate the probability of mutations. In addition, we propose to pool rare variants to increase the statistical power and we conduct family studies to mitigate the computational burden in determining the composition of CNVs on each chromosome. To explore in a stochastic sense the association between the collapsing CNV variants and disease status, we utilize a Bayesian hierarchical model incorporating the mutation parameters. This model assigns integers in a probabilistic sense to the quantitatively measured copy numbers, and is able to test simultaneously the association for all variants of interest in a regression framework. This integrative model can account for the uncertainty in copy number assignment and differentiate if the variation was de novo or inherited on the basis of posterior probabilities. For family studies, this model can accommodate the dependence within family members and among repeated CNV data. Moreover, the Mendelian rule can be assumed under this model and yet the genetic variation, including de novo and inherited variation, can still be included and quantified directly for each individual. Finally, simulation studies show that this model has high true positive and low false positive rates in the detection of de novo mutation.
Sinnott, Jennifer A; Cai, Fiona; Yu, Sheng; Hejblum, Boris P; Hong, Chuan; Kohane, Isaac S; Liao, Katherine P
2018-05-17
Standard approaches for large scale phenotypic screens using electronic health record (EHR) data apply thresholds, such as ≥2 diagnosis codes, to define subjects as having a phenotype. However, the variation in the accuracy of diagnosis codes can impair the power of such screens. Our objective was to develop and evaluate an approach which converts diagnosis codes into a probability of a phenotype (PheProb). We hypothesized that this alternate approach for defining phenotypes would improve power for genetic association studies. The PheProb approach employs unsupervised clustering to separate patients into 2 groups based on diagnosis codes. Subjects are assigned a probability of having the phenotype based on the number of diagnosis codes. This approach was developed using simulated EHR data and tested in a real world EHR cohort. In the latter, we tested the association between low density lipoprotein cholesterol (LDL-C) genetic risk alleles known for association with hyperlipidemia and hyperlipidemia codes (ICD-9 272.x). PheProb and thresholding approaches were compared. Among n = 1462 subjects in the real world EHR cohort, the threshold-based p-values for association between the genetic risk score (GRS) and hyperlipidemia were 0.126 (≥1 code), 0.123 (≥2 codes), and 0.142 (≥3 codes). The PheProb approach produced the expected significant association between the GRS and hyperlipidemia: p = .001. PheProb improves statistical power for association studies relative to standard thresholding approaches by leveraging information about the phenotype in the billing code counts. The PheProb approach has direct applications where efficient approaches are required, such as in Phenome-Wide Association Studies.
A probabilistic NF2 relational algebra for integrated information retrieval and database systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fuhr, N.; Roelleke, T.
The integration of information retrieval (IR) and database systems requires a data model which allows for modelling documents as entities, representing uncertainty and vagueness and performing uncertain inference. For this purpose, we present a probabilistic data model based on relations in non-first-normal-form (NF2). Here, tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Thus, the set of weighted index terms of a document are represented as a probabilistic subrelation. In a similar way, imprecise attribute values are modelled as a set-valued attribute. We redefine the relational operators for this type of relations such thatmore » the result of each operator is again a probabilistic NF2 relation, where the weight of a tuple gives the probability that this tuple belongs to the result. By ordering the tuples according to decreasing probabilities, the model yields a ranking of answers like in most IR models. This effect also can be used for typical database queries involving imprecise attribute values as well as for combinations of database and IR queries.« less
Integer Linear Programming for Constrained Multi-Aspect Committee Review Assignment
Karimzadehgan, Maryam; Zhai, ChengXiang
2011-01-01
Automatic review assignment can significantly improve the productivity of many people such as conference organizers, journal editors and grant administrators. A general setup of the review assignment problem involves assigning a set of reviewers on a committee to a set of documents to be reviewed under the constraint of review quota so that the reviewers assigned to a document can collectively cover multiple topic aspects of the document. No previous work has addressed such a setup of committee review assignments while also considering matching multiple aspects of topics and expertise. In this paper, we tackle the problem of committee review assignment with multi-aspect expertise matching by casting it as an integer linear programming problem. The proposed algorithm can naturally accommodate any probabilistic or deterministic method for modeling multiple aspects to automate committee review assignments. Evaluation using a multi-aspect review assignment test set constructed using ACM SIGIR publications shows that the proposed algorithm is effective and efficient for committee review assignments based on multi-aspect expertise matching. PMID:22711970
Spectrum-to-Spectrum Searching Using a Proteome-wide Spectral Library*
Yen, Chia-Yu; Houel, Stephane; Ahn, Natalie G.; Old, William M.
2011-01-01
The unambiguous assignment of tandem mass spectra (MS/MS) to peptide sequences remains a key unsolved problem in proteomics. Spectral library search strategies have emerged as a promising alternative for peptide identification, in which MS/MS spectra are directly compared against a reference library of confidently assigned spectra. Two problems relate to library size. First, reference spectral libraries are limited to rediscovery of previously identified peptides and are not applicable to new peptides, because of their incomplete coverage of the human proteome. Second, problems arise when searching a spectral library the size of the entire human proteome. We observed that traditional dot product scoring methods do not scale well with spectral library size, showing reduction in sensitivity when library size is increased. We show that this problem can be addressed by optimizing scoring metrics for spectrum-to-spectrum searches with large spectral libraries. MS/MS spectra for the 1.3 million predicted tryptic peptides in the human proteome are simulated using a kinetic fragmentation model (MassAnalyzer version2.1) to create a proteome-wide simulated spectral library. Searches of the simulated library increase MS/MS assignments by 24% compared with Mascot, when using probabilistic and rank based scoring methods. The proteome-wide coverage of the simulated library leads to 11% increase in unique peptide assignments, compared with parallel searches of a reference spectral library. Further improvement is attained when reference spectra and simulated spectra are combined into a hybrid spectral library, yielding 52% increased MS/MS assignments compared with Mascot searches. Our study demonstrates the advantages of using probabilistic and rank based scores to improve performance of spectrum-to-spectrum search strategies. PMID:21532008
NASA Astrophysics Data System (ADS)
Lowe, R.; Ballester, J.; Robine, J.; Herrmann, F. R.; Jupp, T. E.; Stephenson, D.; Rodó, X.
2013-12-01
Users of climate information often require probabilistic information on which to base their decisions. However, communicating information contained within a probabilistic forecast presents a challenge. In this paper we demonstrate a novel visualisation technique to display ternary probabilistic forecasts on a map in order to inform decision making. In this method, ternary probabilistic forecasts, which assign probabilities to a set of three outcomes (e.g. low, medium, and high risk), are considered as a point in a triangle of barycentric coordinates. This allows a unique colour to be assigned to each forecast from a continuum of colours defined on the triangle. Colour saturation increases with information gain relative to the reference forecast (i.e. the long term average). This provides additional information to decision makers compared with conventional methods used in seasonal climate forecasting, where one colour is used to represent one forecast category on a forecast map (e.g. red = ';dry'). We use the tool to present climate-related mortality projections across Europe. Temperature and humidity are related to human mortality via location-specific transfer functions, calculated using historical data. Daily mortality data at the NUTS2 level for 16 countries in Europe were obtain from 1998-2005. Transfer functions were calculated for 54 aggregations in Europe, defined using criteria related to population and climatological similarities. Aggregations are restricted to fall within political boundaries to avoid problems related to varying adaptation policies between countries. A statistical model is fit to cold and warm tails to estimate future mortality using forecast temperatures, in a Bayesian probabilistic framework. Using predefined categories of temperature-related mortality risk, we present maps of probabilistic projections for human mortality at seasonal to decadal time scales. We demonstrate the information gained from using this technique compared to more traditional methods to display ternary probabilistic forecasts. This technique allows decision makers to identify areas where the model predicts with certainty area-specific heat waves or cold snaps, in order to effectively target resources to those areas most at risk, for a given season or year. It is hoped that this visualisation tool will facilitate the interpretation of the probabilistic forecasts not only for public health decision makers but also within a multi-sectoral climate service framework.
Bhaskar, Anand; Javanmard, Adel; Courtade, Thomas A; Tse, David
2017-03-15
Genetic variation in human populations is influenced by geographic ancestry due to spatial locality in historical mating and migration patterns. Spatial population structure in genetic datasets has been traditionally analyzed using either model-free algorithms, such as principal components analysis (PCA) and multidimensional scaling, or using explicit spatial probabilistic models of allele frequency evolution. We develop a general probabilistic model and an associated inference algorithm that unify the model-based and data-driven approaches to visualizing and inferring population structure. Our spatial inference algorithm can also be effectively applied to the problem of population stratification in genome-wide association studies (GWAS), where hidden population structure can create fictitious associations when population ancestry is correlated with both the genotype and the trait. Our algorithm Geographic Ancestry Positioning (GAP) relates local genetic distances between samples to their spatial distances, and can be used for visually discerning population structure as well as accurately inferring the spatial origin of individuals on a two-dimensional continuum. On both simulated and several real datasets from diverse human populations, GAP exhibits substantially lower error in reconstructing spatial ancestry coordinates compared to PCA. We also develop an association test that uses the ancestry coordinates inferred by GAP to accurately account for ancestry-induced correlations in GWAS. Based on simulations and analysis of a dataset of 10 metabolic traits measured in a Northern Finland cohort, which is known to exhibit significant population structure, we find that our method has superior power to current approaches. Our software is available at https://github.com/anand-bhaskar/gap . abhaskar@stanford.edu or ajavanma@usc.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Li, Zhixi; Peck, Kyung K.; Brennan, Nicole P.; Jenabi, Mehrnaz; Hsu, Meier; Zhang, Zhigang; Holodny, Andrei I.; Young, Robert J.
2014-01-01
Purpose The purpose of this study was to compare the deterministic and probabilistic tracking methods of diffusion tensor white matter fiber tractography in patients with brain tumors. Materials and Methods We identified 29 patients with left brain tumors <2 cm from the arcuate fasciculus who underwent pre-operative language fMRI and DTI. The arcuate fasciculus was reconstructed using a deterministic Fiber Assignment by Continuous Tracking (FACT) algorithm and a probabilistic method based on an extended Monte Carlo Random Walk algorithm. Tracking was controlled using two ROIs corresponding to Broca’s and Wernicke’s areas. Tracts in tumoraffected hemispheres were examined for extension between Broca’s and Wernicke’s areas, anterior-posterior length and volume, and compared with the normal contralateral tracts. Results Probabilistic tracts displayed more complete anterior extension to Broca’s area than did FACT tracts on the tumor-affected and normal sides (p < 0.0001). The median length ratio for tumor: normal sides was greater for probabilistic tracts than FACT tracts (p < 0.0001). The median tract volume ratio for tumor: normal sides was also greater for probabilistic tracts than FACT tracts (p = 0.01). Conclusion Probabilistic tractography reconstructs the arcuate fasciculus more completely and performs better through areas of tumor and/or edema. The FACT algorithm tends to underestimate the anterior-most fibers of the arcuate fasciculus, which are crossed by primary motor fibers. PMID:25328583
Probabilistic reasoning in data analysis.
Sirovich, Lawrence
2011-09-20
This Teaching Resource provides lecture notes, slides, and a student assignment for a lecture on probabilistic reasoning in the analysis of biological data. General probabilistic frameworks are introduced, and a number of standard probability distributions are described using simple intuitive ideas. Particular attention is focused on random arrivals that are independent of prior history (Markovian events), with an emphasis on waiting times, Poisson processes, and Poisson probability distributions. The use of these various probability distributions is applied to biomedical problems, including several classic experimental studies.
Adaptive role switching promotes fairness in networked ultimatum game.
Wu, Te; Fu, Feng; Zhang, Yanling; Wang, Long
2013-01-01
In recent years, mechanisms favoring fair split in the ultimatum game have attracted growing interests because of its practical implications for international bargains. In this game, two players are randomly assigned two different roles respectively to split an offer: the proposer suggests how to split and the responder decides whether or not to accept it. Only when both agree is the offer successfully split; otherwise both get nothing. It is of importance and interest to break the symmetry in role assignment especially when the game is repeatedly played in a heterogeneous population. Here we consider an adaptive role assignment: whenever the split fails, the two players switch their roles probabilistically. The results show that this simple feedback mechanism proves much more effective at promoting fairness than other alternatives (where, for example, the role assignment is based on the number of neighbors).
Business Planning in the Light of Neuro-fuzzy and Predictive Forecasting
NASA Astrophysics Data System (ADS)
Chakrabarti, Prasun; Basu, Jayanta Kumar; Kim, Tai-Hoon
In this paper we have pointed out gain sensing on forecast based techniques.We have cited an idea of neural based gain forecasting. Testing of sequence of gain pattern is also verifies using statsistical analysis of fuzzy value assignment. The paper also suggests realization of stable gain condition using K-Means clustering of data mining. A new concept of 3D based gain sensing has been pointed out. The paper also reveals what type of trend analysis can be observed for probabilistic gain prediction.
Ciffroy, Philippe; Charlatchka, Rayna; Ferreira, Daniel; Marang, Laura
2013-07-01
The biotic ligand model (BLM) theoretically enables the derivation of environmental quality standards that are based on true bioavailable fractions of metals. Several physicochemical variables (especially pH, major cations, dissolved organic carbon, and dissolved metal concentrations) must, however, be assigned to run the BLM, but they are highly variable in time and space in natural systems. This article describes probabilistic approaches for integrating such variability during the derivation of risk indexes. To describe each variable using a probability density function (PDF), several methods were combined to 1) treat censored data (i.e., data below the limit of detection), 2) incorporate the uncertainty of the solid-to-liquid partitioning of metals, and 3) detect outliers. From a probabilistic perspective, 2 alternative approaches that are based on log-normal and Γ distributions were tested to estimate the probability of the predicted environmental concentration (PEC) exceeding the predicted non-effect concentration (PNEC), i.e., p(PEC/PNEC>1). The probabilistic approach was tested on 4 real-case studies based on Cu-related data collected from stations on the Loire and Moselle rivers. The approach described in this article is based on BLM tools that are freely available for end-users (i.e., the Bio-Met software) and on accessible statistical data treatments. This approach could be used by stakeholders who are involved in risk assessments of metals for improving site-specific studies. Copyright © 2013 SETAC.
Upgrades to the Probabilistic NAS Platform Air Traffic Simulation Software
NASA Technical Reports Server (NTRS)
Hunter, George; Boisvert, Benjamin
2013-01-01
This document is the final report for the project entitled "Upgrades to the Probabilistic NAS Platform Air Traffic Simulation Software." This report consists of 17 sections which document the results of the several subtasks of this effort. The Probabilistic NAS Platform (PNP) is an air operations simulation platform developed and maintained by the Saab Sensis Corporation. The improvements made to the PNP simulation include the following: an airborne distributed separation assurance capability, a required time of arrival assignment and conformance capability, and a tactical and strategic weather avoidance capability.
NASA Technical Reports Server (NTRS)
Onwubiko, Chinyere; Onyebueke, Landon
1996-01-01
This program report is the final report covering all the work done on this project. The goal of this project is technology transfer of methodologies to improve design process. The specific objectives are: 1. To learn and understand the Probabilistic design analysis using NESSUS. 2. To assign Design Projects to either undergraduate or graduate students on the application of NESSUS. 3. To integrate the application of NESSUS into some selected senior level courses in Civil and Mechanical Engineering curricula. 4. To develop courseware in Probabilistic Design methodology to be included in a graduate level Design Methodology course. 5. To study the relationship between the Probabilistic design methodology and Axiomatic design methodology.
Event-Based Media Enrichment Using an Adaptive Probabilistic Hypergraph Model.
Liu, Xueliang; Wang, Meng; Yin, Bao-Cai; Huet, Benoit; Li, Xuelong
2015-11-01
Nowadays, with the continual development of digital capture technologies and social media services, a vast number of media documents are captured and shared online to help attendees record their experience during events. In this paper, we present a method combining semantic inference and multimodal analysis for automatically finding media content to illustrate events using an adaptive probabilistic hypergraph model. In this model, media items are taken as vertices in the weighted hypergraph and the task of enriching media to illustrate events is formulated as a ranking problem. In our method, each hyperedge is constructed using the K-nearest neighbors of a given media document. We also employ a probabilistic representation, which assigns each vertex to a hyperedge in a probabilistic way, to further exploit the correlation among media data. Furthermore, we optimize the hypergraph weights in a regularization framework, which is solved as a second-order cone problem. The approach is initiated by seed media and then used to rank the media documents using a transductive inference process. The results obtained from validating the approach on an event dataset collected from EventMedia demonstrate the effectiveness of the proposed approach.
A probabilistic method for testing and estimating selection differences between populations
He, Yungang; Wang, Minxian; Huang, Xin; Li, Ran; Xu, Hongyang; Xu, Shuhua; Jin, Li
2015-01-01
Human populations around the world encounter various environmental challenges and, consequently, develop genetic adaptations to different selection forces. Identifying the differences in natural selection between populations is critical for understanding the roles of specific genetic variants in evolutionary adaptation. Although numerous methods have been developed to detect genetic loci under recent directional selection, a probabilistic solution for testing and quantifying selection differences between populations is lacking. Here we report the development of a probabilistic method for testing and estimating selection differences between populations. By use of a probabilistic model of genetic drift and selection, we showed that logarithm odds ratios of allele frequencies provide estimates of the differences in selection coefficients between populations. The estimates approximate a normal distribution, and variance can be estimated using genome-wide variants. This allows us to quantify differences in selection coefficients and to determine the confidence intervals of the estimate. Our work also revealed the link between genetic association testing and hypothesis testing of selection differences. It therefore supplies a solution for hypothesis testing of selection differences. This method was applied to a genome-wide data analysis of Han and Tibetan populations. The results confirmed that both the EPAS1 and EGLN1 genes are under statistically different selection in Han and Tibetan populations. We further estimated differences in the selection coefficients for genetic variants involved in melanin formation and determined their confidence intervals between continental population groups. Application of the method to empirical data demonstrated the outstanding capability of this novel approach for testing and quantifying differences in natural selection. PMID:26463656
Localization of the lumbar discs using machine learning and exact probabilistic inference.
Oktay, Ayse Betul; Akgul, Yusuf Sinan
2011-01-01
We propose a novel fully automatic approach to localize the lumbar intervertebral discs in MR images with PHOG based SVM and a probabilistic graphical model. At the local level, our method assigns a score to each pixel in target image that indicates whether it is a disc center or not. At the global level, we define a chain-like graphical model that represents the lumbar intervertebral discs and we use an exact inference algorithm to localize the discs. Our main contributions are the employment of the SVM with the PHOG based descriptor which is robust against variations of the discs and a graphical model that reflects the linear nature of the vertebral column. Our inference algorithm runs in polynomial time and produces globally optimal results. The developed system is validated on a real spine MRI dataset and the final localization results are favorable compared to the results reported in the literature.
Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng
2017-05-10
Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .
Dobosz, Marina; Bocci, Chiara; Bonuglia, Margherita; Grasso, Cinzia; Merigioli, Sara; Russo, Alessandra; De Iuliis, Paolo
2010-01-01
Microsatellites have been used for parentage testing and individual identification in forensic science because they are highly polymorphic and show abundant sequences dispersed throughout most eukaryotic nuclear genomes. At present, genetic testing based on DNA technology is used for most domesticated animals, including horses, to confirm identity, to determine parentage, and to validate registration certificates. But if genetic data of one of the putative parents are missing, verifying a genealogy could be questionable. The aim of this paper is to illustrate a new approach to analyze complex cases of disputed relationship with microsatellites markers. These cases were solved by analyzing the genotypes of the offspring and other horses' genotypes in the pedigrees of the putative dam/sire with probabilistic expert systems (PESs). PES was especially efficient in supplying reliable, error-free Bayesian probabilities in complex cases with missing pedigree data. One of these systems was developed for forensic purposes (FINEX program) and is particularly valuable in human analyses. We applied this program to parentage analysis in horses, and we will illustrate how different cases have been successfully worked out.
RaptorX server: a resource for template-based protein structure modeling.
Källberg, Morten; Margaryan, Gohar; Wang, Sheng; Ma, Jianzhu; Xu, Jinbo
2014-01-01
Assigning functional properties to a newly discovered protein is a key challenge in modern biology. To this end, computational modeling of the three-dimensional atomic arrangement of the amino acid chain is often crucial in determining the role of the protein in biological processes. We present a community-wide web-based protocol, RaptorX server ( http://raptorx.uchicago.edu ), for automated protein secondary structure prediction, template-based tertiary structure modeling, and probabilistic alignment sampling.Given a target sequence, RaptorX server is able to detect even remotely related template sequences by means of a novel nonlinear context-specific alignment potential and probabilistic consistency algorithm. Using the protocol presented here it is thus possible to obtain high-quality structural models for many target protein sequences when only distantly related protein domains have experimentally solved structures. At present, RaptorX server can perform secondary and tertiary structure prediction of a 200 amino acid target sequence in approximately 30 min.
A probabilistic method for testing and estimating selection differences between populations.
He, Yungang; Wang, Minxian; Huang, Xin; Li, Ran; Xu, Hongyang; Xu, Shuhua; Jin, Li
2015-12-01
Human populations around the world encounter various environmental challenges and, consequently, develop genetic adaptations to different selection forces. Identifying the differences in natural selection between populations is critical for understanding the roles of specific genetic variants in evolutionary adaptation. Although numerous methods have been developed to detect genetic loci under recent directional selection, a probabilistic solution for testing and quantifying selection differences between populations is lacking. Here we report the development of a probabilistic method for testing and estimating selection differences between populations. By use of a probabilistic model of genetic drift and selection, we showed that logarithm odds ratios of allele frequencies provide estimates of the differences in selection coefficients between populations. The estimates approximate a normal distribution, and variance can be estimated using genome-wide variants. This allows us to quantify differences in selection coefficients and to determine the confidence intervals of the estimate. Our work also revealed the link between genetic association testing and hypothesis testing of selection differences. It therefore supplies a solution for hypothesis testing of selection differences. This method was applied to a genome-wide data analysis of Han and Tibetan populations. The results confirmed that both the EPAS1 and EGLN1 genes are under statistically different selection in Han and Tibetan populations. We further estimated differences in the selection coefficients for genetic variants involved in melanin formation and determined their confidence intervals between continental population groups. Application of the method to empirical data demonstrated the outstanding capability of this novel approach for testing and quantifying differences in natural selection. © 2015 He et al.; Published by Cold Spring Harbor Laboratory Press.
Galan, Maxime; Guivier, Emmanuel; Caraux, Gilles; Charbonnel, Nathalie; Cosson, Jean-François
2010-05-11
High-throughput sequencing technologies offer new perspectives for biomedical, agronomical and evolutionary research. Promising progresses now concern the application of these technologies to large-scale studies of genetic variation. Such studies require the genotyping of high numbers of samples. This is theoretically possible using 454 pyrosequencing, which generates billions of base pairs of sequence data. However several challenges arise: first in the attribution of each read produced to its original sample, and second, in bioinformatic analyses to distinguish true from artifactual sequence variation. This pilot study proposes a new application for the 454 GS FLX platform, allowing the individual genotyping of thousands of samples in one run. A probabilistic model has been developed to demonstrate the reliability of this method. DNA amplicons from 1,710 rodent samples were individually barcoded using a combination of tags located in forward and reverse primers. Amplicons consisted in 222 bp fragments corresponding to DRB exon 2, a highly polymorphic gene in mammals. A total of 221,789 reads were obtained, of which 153,349 were finally assigned to original samples. Rules based on a probabilistic model and a four-step procedure, were developed to validate sequences and provide a confidence level for each genotype. The method gave promising results, with the genotyping of DRB exon 2 sequences for 1,407 samples from 24 different rodent species and the sequencing of 392 variants in one half of a 454 run. Using replicates, we estimated that the reproducibility of genotyping reached 95%. This new approach is a promising alternative to classical methods involving electrophoresis-based techniques for variant separation and cloning-sequencing for sequence determination. The 454 system is less costly and time consuming and may enhance the reliability of genotypes obtained when high numbers of samples are studied. It opens up new perspectives for the study of evolutionary and functional genetics of highly polymorphic genes like major histocompatibility complex genes in vertebrates or loci regulating self-compatibility in plants. Important applications in biomedical research will include the detection of individual variation in disease susceptibility. Similarly, agronomy will benefit from this approach, through the study of genes implicated in productivity or disease susceptibility traits.
Trial to assess the utility of genetic sequencing to improve patient outcomes
A pilot trial to assess whether assigning treatment based on specific gene mutations can provide benefit to patients with metastatic solid tumors is being launched this month by the NCI. The Molecular Profiling based Assignment of Cancer Therapeutics, or
Probability and possibility-based representations of uncertainty in fault tree analysis.
Flage, Roger; Baraldi, Piero; Zio, Enrico; Aven, Terje
2013-01-01
Expert knowledge is an important source of input to risk analysis. In practice, experts might be reluctant to characterize their knowledge and the related (epistemic) uncertainty using precise probabilities. The theory of possibility allows for imprecision in probability assignments. The associated possibilistic representation of epistemic uncertainty can be combined with, and transformed into, a probabilistic representation; in this article, we show this with reference to a simple fault tree analysis. We apply an integrated (hybrid) probabilistic-possibilistic computational framework for the joint propagation of the epistemic uncertainty on the values of the (limiting relative frequency) probabilities of the basic events of the fault tree, and we use possibility-probability (probability-possibility) transformations for propagating the epistemic uncertainty within purely probabilistic and possibilistic settings. The results of the different approaches (hybrid, probabilistic, and possibilistic) are compared with respect to the representation of uncertainty about the top event (limiting relative frequency) probability. Both the rationale underpinning the approaches and the computational efforts they require are critically examined. We conclude that the approaches relevant in a given setting depend on the purpose of the risk analysis, and that further research is required to make the possibilistic approaches operational in a risk analysis context. © 2012 Society for Risk Analysis.
Finding models to detect Alzheimer's disease by fusing structural and neuropsychological information
NASA Astrophysics Data System (ADS)
Giraldo, Diana L.; García-Arteaga, Juan D.; Velasco, Nelson; Romero, Eduardo
2015-12-01
Alzheimer's disease (AD) is a neurodegenerative disease that affects higher brain functions. Initial diagnosis of AD is based on the patient's clinical history and a battery of neuropsychological tests. The accuracy of the diagnosis is highly dependent on the examiner's skills and on the evolution of a variable clinical frame. This work presents an automatic strategy that learns probabilistic brain models for different stages of the disease, reducing the complexity, parameter adjustment and computational costs. The proposed method starts by setting a probabilistic class description using the information stored in the neuropsychological test, followed by constructing the different structural class models using membership values from the learned probabilistic functions. These models are then used as a reference frame for the classification problem: a new case is assigned to a particular class simply by projecting to the different models. The validation was performed using a leave-one-out cross-validation, two classes were used: Normal Control (NC) subjects and patients diagnosed with mild AD. In this experiment it is possible to achieve a sensibility and specificity of 80% and 79% respectively.
NASA Astrophysics Data System (ADS)
Umut Caglar, Mehmet; Pal, Ranadip
2010-10-01
The central dogma of molecular biology states that ``information cannot be transferred back from protein to either protein or nucleic acid.'' However, this assumption is not exactly correct in most of the cases. There are a lot of feedback loops and interactions between different levels of systems. These types of interactions are hard to analyze due to the lack of data in the cellular level and probabilistic nature of interactions. Probabilistic models like Stochastic Master Equation (SME) or deterministic models like differential equations (DE) can be used to analyze these types of interactions. SME models based on chemical master equation (CME) can provide detailed representation of genetic regulatory system, but their use is restricted by the large data requirements and computational costs of calculations. The differential equations models on the other hand, have low calculation costs and much more adequate to generate control procedures on the system; but they are not adequate to investigate the probabilistic nature of interactions. In this work the success of the mapping between SME and DE is analyzed, and the success of a control policy generated by DE model with respect to SME model is examined. Index Terms--- Stochastic Master Equation models, Differential Equation Models, Control Policy Design, Systems biology
Li, Y. H.; Chu, H. P.; Jiang, Y. N.; Lin, C. Y.; Li, S. H.; Li, K. T.; Weng, G. J.; Cheng, C. C.; Lu, D. J.; Ju, Y. T.
2014-01-01
The Lanyu is a miniature pig breed indigenous to Lanyu Island, Taiwan. It is distantly related to Asian and European pig breeds. It has been inbred to generate two breeds and crossed with Landrace and Duroc to produce two hybrids for laboratory use. Selecting sets of informative genetic markers to track the genetic qualities of laboratory animals and stud stock is an important function of genetic databases. For more than two decades, Lanyu derived breeds of common ancestry and crossbreeds have been used to examine the effectiveness of genetic marker selection and optimal approaches for individual assignment. In this paper, these pigs and the following breeds: Berkshire, Duroc, Landrace and Yorkshire, Meishan and Taoyuan, TLRI Black Pig No. 1, and Kaohsiung Animal Propagation Station Black pig are studied to build a genetic reference database. Nineteen microsatellite markers (loci) provide information on genetic variation and differentiation among studied breeds. High differentiation index (FST) and Cavalli-Sforza chord distances give genetic differentiation among breeds, including Lanyu’s inbred populations. Inbreeding values (FIS) show that Lanyu and its derived inbred breeds have significant loss of heterozygosity. Individual assignment testing of 352 animals was done with different numbers of microsatellite markers in this study. The testing assigned 99% of the animals successfully into their correct reference populations based on 9 to 14 markers ranking D-scores, allelic number, expected heterozygosity (HE) or FST, respectively. All miss-assigned individuals came from close lineage Lanyu breeds. To improve individual assignment among close lineage breeds, microsatellite markers selected from Lanyu populations with high polymorphic, heterozygosity, FST and D-scores were used. Only 6 to 8 markers ranking HE, FST or allelic number were required to obtain 99% assignment accuracy. This result suggests empirical examination of assignment-error rates is required if discernible levels of co-ancestry exist. In the reference group, optimum assignment accuracy was achievable achieved through a combination of different markers by ranking the heterozygosity, FST and allelic number of close lineage populations. PMID:25049996
da Fonseca Neto, João Viana; Abreu, Ivanildo Silva; da Silva, Fábio Nogueira
2010-04-01
Toward the synthesis of state-space controllers, a neural-genetic model based on the linear quadratic regulator design for the eigenstructure assignment of multivariable dynamic systems is presented. The neural-genetic model represents a fusion of a genetic algorithm and a recurrent neural network (RNN) to perform the selection of the weighting matrices and the algebraic Riccati equation solution, respectively. A fourth-order electric circuit model is used to evaluate the convergence of the computational intelligence paradigms and the control design method performance. The genetic search convergence evaluation is performed in terms of the fitness function statistics and the RNN convergence, which is evaluated by landscapes of the energy and norm, as a function of the parameter deviations. The control problem solution is evaluated in the time and frequency domains by the impulse response, singular values, and modal analysis.
Modeling marine oily wastewater treatment by a probabilistic agent-based approach.
Jing, Liang; Chen, Bing; Zhang, Baiyu; Ye, Xudong
2018-02-01
This study developed a novel probabilistic agent-based approach for modeling of marine oily wastewater treatment processes. It begins first by constructing a probability-based agent simulation model, followed by a global sensitivity analysis and a genetic algorithm-based calibration. The proposed modeling approach was tested through a case study of the removal of naphthalene from marine oily wastewater using UV irradiation. The removal of naphthalene was described by an agent-based simulation model using 8 types of agents and 11 reactions. Each reaction was governed by a probability parameter to determine its occurrence. The modeling results showed that the root mean square errors between modeled and observed removal rates were 8.73 and 11.03% for calibration and validation runs, respectively. Reaction competition was analyzed by comparing agent-based reaction probabilities, while agents' heterogeneity was visualized by plotting their real-time spatial distribution, showing a strong potential for reactor design and process optimization. Copyright © 2017 Elsevier Ltd. All rights reserved.
Genetic programming for evolving due-date assignment models in job shop environments.
Nguyen, Su; Zhang, Mengjie; Johnston, Mark; Tan, Kay Chen
2014-01-01
Due-date assignment plays an important role in scheduling systems and strongly influences the delivery performance of job shops. Because of the stochastic and dynamic nature of job shops, the development of general due-date assignment models (DDAMs) is complicated. In this study, two genetic programming (GP) methods are proposed to evolve DDAMs for job shop environments. The experimental results show that the evolved DDAMs can make more accurate estimates than other existing dynamic DDAMs with promising reusability. In addition, the evolved operation-based DDAMs show better performance than the evolved DDAMs employing aggregate information of jobs and machines.
Probabilistic resource allocation system with self-adaptive capability
NASA Technical Reports Server (NTRS)
Yufik, Yan M. (Inventor)
1996-01-01
A probabilistic resource allocation system is disclosed containing a low capacity computational module (Short Term Memory or STM) and a self-organizing associative network (Long Term Memory or LTM) where nodes represent elementary resources, terminal end nodes represent goals, and directed links represent the order of resource association in different allocation episodes. Goals and their priorities are indicated by the user, and allocation decisions are made in the STM, while candidate associations of resources are supplied by the LTM based on the association strength (reliability). Reliability values are automatically assigned to the network links based on the frequency and relative success of exercising those links in the previous allocation decisions. Accumulation of allocation history in the form of an associative network in the LTM reduces computational demands on subsequent allocations. For this purpose, the network automatically partitions itself into strongly associated high reliability packets, allowing fast approximate computation and display of allocation solutions satisfying the overall reliability and other user-imposed constraints. System performance improves in time due to modification of network parameters and partitioning criteria based on the performance feedback.
Probabilistic resource allocation system with self-adaptive capability
NASA Technical Reports Server (NTRS)
Yufik, Yan M. (Inventor)
1998-01-01
A probabilistic resource allocation system is disclosed containing a low capacity computational module (Short Term Memory or STM) and a self-organizing associative network (Long Term Memory or LTM) where nodes represent elementary resources, terminal end nodes represent goals, and weighted links represent the order of resource association in different allocation episodes. Goals and their priorities are indicated by the user, and allocation decisions are made in the STM, while candidate associations of resources are supplied by the LTM based on the association strength (reliability). Weights are automatically assigned to the network links based on the frequency and relative success of exercising those links in the previous allocation decisions. Accumulation of allocation history in the form of an associative network in the LTM reduces computational demands on subsequent allocations. For this purpose, the network automatically partitions itself into strongly associated high reliability packets, allowing fast approximate computation and display of allocation solutions satisfying the overall reliability and other user-imposed constraints. System performance improves in time due to modification of network parameters and partitioning criteria based on the performance feedback.
Degen, Bernd; Blanc-Jolivet, Céline; Stierand, Katrin; Gillet, Elizabeth
2017-03-01
During the past decade, the use of DNA for forensic applications has been extensively implemented for plant and animal species, as well as in humans. Tracing back the geographical origin of an individual usually requires genetic assignment analysis. These approaches are based on reference samples that are grouped into populations or other aggregates and intend to identify the most likely group of origin. Often this grouping does not have a biological but rather a historical or political justification, such as "country of origin". In this paper, we present a new nearest neighbour approach to individual assignment or classification within a given but potentially imperfect grouping of reference samples. This method, which is based on the genetic distance between individuals, functions better in many cases than commonly used methods. We demonstrate the operation of our assignment method using two data sets. One set is simulated for a large number of trees distributed in a 120km by 120km landscape with individual genotypes at 150 SNPs, and the other set comprises experimental data of 1221 individuals of the African tropical tree species Entandrophragma cylindricum (Sapelli) genotyped at 61 SNPs. Judging by the level of correct self-assignment, our approach outperformed the commonly used frequency and Bayesian approaches by 15% for the simulated data set and by 5-7% for the Sapelli data set. Our new approach is less sensitive to overlapping sources of genetic differentiation, such as genetic differences among closely-related species, phylogeographic lineages and isolation by distance, and thus operates better even for suboptimal grouping of individuals. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Maximizing Statistical Power When Verifying Probabilistic Forecasts of Hydrometeorological Events
NASA Astrophysics Data System (ADS)
DeChant, C. M.; Moradkhani, H.
2014-12-01
Hydrometeorological events (i.e. floods, droughts, precipitation) are increasingly being forecasted probabilistically, owing to the uncertainties in the underlying causes of the phenomenon. In these forecasts, the probability of the event, over some lead time, is estimated based on some model simulations or predictive indicators. By issuing probabilistic forecasts, agencies may communicate the uncertainty in the event occurring. Assuming that the assigned probability of the event is correct, which is referred to as a reliable forecast, the end user may perform some risk management based on the potential damages resulting from the event. Alternatively, an unreliable forecast may give false impressions of the actual risk, leading to improper decision making when protecting resources from extreme events. Due to this requisite for reliable forecasts to perform effective risk management, this study takes a renewed look at reliability assessment in event forecasts. Illustrative experiments will be presented, showing deficiencies in the commonly available approaches (Brier Score, Reliability Diagram). Overall, it is shown that the conventional reliability assessment techniques do not maximize the ability to distinguish between a reliable and unreliable forecast. In this regard, a theoretical formulation of the probabilistic event forecast verification framework will be presented. From this analysis, hypothesis testing with the Poisson-Binomial distribution is the most exact model available for the verification framework, and therefore maximizes one's ability to distinguish between a reliable and unreliable forecast. Application of this verification system was also examined within a real forecasting case study, highlighting the additional statistical power provided with the use of the Poisson-Binomial distribution.
Concurrent Probabilistic Simulation of High Temperature Composite Structural Response
NASA Technical Reports Server (NTRS)
Abdi, Frank
1996-01-01
A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership.
A Hough Transform Global Probabilistic Approach to Multiple-Subject Diffusion MRI Tractography
Aganj, Iman; Lenglet, Christophe; Jahanshad, Neda; Yacoub, Essa; Harel, Noam; Thompson, Paul M.; Sapiro, Guillermo
2011-01-01
A global probabilistic fiber tracking approach based on the voting process provided by the Hough transform is introduced in this work. The proposed framework tests candidate 3D curves in the volume, assigning to each one a score computed from the diffusion images, and then selects the curves with the highest scores as the potential anatomical connections. The algorithm avoids local minima by performing an exhaustive search at the desired resolution. The technique is easily extended to multiple subjects, considering a single representative volume where the registered high-angular resolution diffusion images (HARDI) from all the subjects are non-linearly combined, thereby obtaining population-representative tracts. The tractography algorithm is run only once for the multiple subjects, and no tract alignment is necessary. We present experimental results on HARDI volumes, ranging from simulated and 1.5T physical phantoms to 7T and 4T human brain and 7T monkey brain datasets. PMID:21376655
Applications of random forest feature selection for fine-scale genetic population assignment.
Sylvester, Emma V A; Bentzen, Paul; Bradbury, Ian R; Clément, Marie; Pearce, Jon; Horne, John; Beiko, Robert G
2018-02-01
Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine-learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with F ST ranking for selection of single nucleotide polymorphisms (SNP) for fine-scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon ( Salmo salar ) and a published SNP data set for Alaskan Chinook salmon ( Oncorhynchus tshawytscha ). In each species, we identified the minimum panel size required to obtain a self-assignment accuracy of at least 90% using each method to create panels of 50-700 markers Panels of SNPs identified using random forest-based methods performed up to 7.8 and 11.2 percentage points better than F ST -selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self-assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using F ST -selected panels. Our results demonstrate a role for machine-learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.
Using probabilistic theory to develop interpretation guidelines for Y-STR profiles.
Taylor, Duncan; Bright, Jo-Anne; Buckleton, John
2016-03-01
Y-STR profiling makes up a small but important proportion of forensic DNA casework. Often Y-STR profiles are used when autosomal profiling has failed to yield an informative result. Consequently Y-STR profiles are often from the most challenging samples. In addition to these points, Y-STR loci are linked, meaning that evaluation of haplotype probabilities are either based on overly simplified counting methods or computationally costly genetic models, neither of which extend well to the evaluation of mixed Y-STR data. For all of these reasons Y-STR data analysis has not seen the same advances as autosomal STR data. We present here a probabilistic model for the interpretation of Y-STR data. Due to the fact that probabilistic systems for Y-STR data are still some way from reaching active casework, we also describe how data can be analysed in a continuous way to generate interpretational thresholds and guidelines. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Stimulus discriminability may bias value-based probabilistic learning.
Schutte, Iris; Slagter, Heleen A; Collins, Anne G E; Frank, Michael J; Kenemans, J Leon
2017-01-01
Reinforcement learning tasks are often used to assess participants' tendency to learn more from the positive or more from the negative consequences of one's action. However, this assessment often requires comparison in learning performance across different task conditions, which may differ in the relative salience or discriminability of the stimuli associated with more and less rewarding outcomes, respectively. To address this issue, in a first set of studies, participants were subjected to two versions of a common probabilistic learning task. The two versions differed with respect to the stimulus (Hiragana) characters associated with reward probability. The assignment of character to reward probability was fixed within version but reversed between versions. We found that performance was highly influenced by task version, which could be explained by the relative perceptual discriminability of characters assigned to high or low reward probabilities, as assessed by a separate discrimination experiment. Participants were more reliable in selecting rewarding characters that were more discriminable, leading to differences in learning curves and their sensitivity to reward probability. This difference in experienced reinforcement history was accompanied by performance biases in a test phase assessing ability to learn from positive vs. negative outcomes. In a subsequent large-scale web-based experiment, this impact of task version on learning and test measures was replicated and extended. Collectively, these findings imply a key role for perceptual factors in guiding reward learning and underscore the need to control stimulus discriminability when making inferences about individual differences in reinforcement learning.
POPPER, a simple programming language for probabilistic semantic inference in medicine.
Robson, Barry
2015-01-01
Our previous reports described the use of the Hyperbolic Dirac Net (HDN) as a method for probabilistic inference from medical data, and a proposed probabilistic medical Semantic Web (SW) language Q-UEL to provide that data. Rather like a traditional Bayes Net, that HDN provided estimates of joint and conditional probabilities, and was static, with no need for evolution due to "reasoning". Use of the SW will require, however, (a) at least the semantic triple with more elaborate relations than conditional ones, as seen in use of most verbs and prepositions, and (b) rules for logical, grammatical, and definitional manipulation that can generate changes in the inference net. Here is described the simple POPPER language for medical inference. It can be automatically written by Q-UEL, or by hand. Based on studies with our medical students, it is believed that a tool like this may help in medical education and that a physician unfamiliar with SW science can understand it. It is here used to explore the considerable challenges of assigning probabilities, and not least what the meaning and utility of inference net evolution would be for a physician. Copyright © 2014 Elsevier Ltd. All rights reserved.
Processing of probabilistic information in weight perception and motor prediction.
Trampenau, Leif; van Eimeren, Thilo; Kuhtz-Buschbeck, Johann
2017-02-01
We studied the effects of probabilistic cues, i.e., of information of limited certainty, in the context of an action task (GL: grip-lift) and of a perceptual task (WP: weight perception). Normal subjects (n = 22) saw four different probabilistic visual cues, each of which announced the likely weight of an object. In the GL task, the object was grasped and lifted with a pinch grip, and the peak force rates indicated that the grip and load forces were scaled predictively according to the probabilistic information. The WP task provided the expected heaviness related to each probabilistic cue; the participants gradually adjusted the object's weight until its heaviness matched the expected weight for a given cue. Subjects were randomly assigned to two groups: one started with the GL task and the other one with the WP task. The four different probabilistic cues influenced weight adjustments in the WP task and peak force rates in the GL task in a similar manner. The interpretation and utilization of the probabilistic information was critically influenced by the initial task. Participants who started with the WP task classified the four probabilistic cues into four distinct categories and applied these categories to the subsequent GL task. On the other side, participants who started with the GL task applied three distinct categories to the four cues and retained this classification in the following WP task. The initial strategy, once established, determined the way how the probabilistic information was interpreted and implemented.
Chaves, Camila L; Degen, Bernd; Pakull, Birte; Mader, Malte; Honorio, Euridice; Ruas, Paulo; Tysklind, Niklas; Sebbenn, Alexandre M
2018-06-27
Deforestation-reinforced by illegal logging-is a serious problem in many tropical regions and causes pervasive environmental and economic damage. Existing laws that intend to reduce illegal logging need efficient, fraud resistant control methods. We developed a genetic reference database for Jatoba (Hymenaea courbaril), an important, high value timber species from the Neotropics. The data set can be used for controls on declarations of wood origin. Samples from 308 Hymenaea trees from 12 locations in Brazil, Bolivia, Peru, and French Guiana have been collected and genotyped on 10 nuclear microsatellites (nSSRs), 13 chloroplast SNPs (cpSNP), and 1 chloroplast indel marker. The chloroplast gene markers have been developed using Illumina DNA sequencing. Bayesian cluster analysis divided the individuals based on the nSSRs into 8 genetic groups. Using self-assignment tests, the power of the genetic reference database to judge on declarations on the location has been tested for 3 different assignment methods. We observed a strong genetic differentiation among locations leading to high and reliable self-assignment rates for the locations between 50% to 100% (average of 88%). Although all 3 assignment methods came up with similar mean self-assignment rates, there were differences for some locations linked to the level of genetic diversity, differentiation, and heterozygosity. Our results show that the nuclear and chloroplast gene markers are effective to be used for a genetic certification system and can provide national and international authorities with a robust tool to confirm legality of timber.
Streaming fragment assignment for real-time analysis of sequencing experiments
Roberts, Adam; Pachter, Lior
2013-01-01
We present eXpress, a software package for highly efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time, and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data, showing greater efficiency than other quantification methods. PMID:23160280
Mining disease fingerprints from within genetic pathways.
Nabhan, Ahmed Ragab; Sarkar, Indra Neil
2012-01-01
Mining biological networks can be an effective means to uncover system level knowledge out of micro level associations, such as encapsulated in genetic pathways. Analysis of human disease genetic pathways can lead to the identification of major mechanisms that may underlie disorders at an abstract functional level. The focus of this study was to develop an approach for structural pattern analysis and classification of genetic pathways of diseases. A probabilistic model was developed to capture characteristic components ('fingerprints') of functionally annotated pathways. A probability estimation procedure of this model searched for fingerprints in each disease pathway while improving probability estimates of model parameters. The approach was evaluated on data from the Kyoto Encyclopedia of Genes and Genomes (consisting of 56 pathways across seven disease categories). Based on the achieved average classification accuracy of up to ~77%, the findings suggest that these fingerprints may be used for classification and discovery of genetic pathways.
Mining Disease Fingerprints From Within Genetic Pathways
Nabhan, Ahmed Ragab; Sarkar, Indra Neil
2012-01-01
Mining biological networks can be an effective means to uncover system level knowledge out of micro level associations, such as encapsulated in genetic pathways. Analysis of human disease genetic pathways can lead to the identification of major mechanisms that may underlie disorders at an abstract functional level. The focus of this study was to develop an approach for structural pattern analysis and classification of genetic pathways of diseases. A probabilistic model was developed to capture characteristic components (‘fingerprints’) of functionally annotated pathways. A probability estimation procedure of this model searched for fingerprints in each disease pathway while improving probability estimates of model parameters. The approach was evaluated on data from the Kyoto Encyclopedia of Genes and Genomes (consisting of 56 pathways across seven disease categories). Based on the achieved average classification accuracy of up to ∼77%, the findings suggest that these fingerprints may be used for classification and discovery of genetic pathways. PMID:23304411
Probabilistic grammatical model for helix‐helix contact site classification
2013-01-01
Background Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. Results In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites. Conclusions We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists. PMID:24350601
Probabilities and predictions: modeling the development of scientific problem-solving skills.
Stevens, Ron; Johnson, David F; Soller, Amy
2005-01-01
The IMMEX (Interactive Multi-Media Exercises) Web-based problem set platform enables the online delivery of complex, multimedia simulations, the rapid collection of student performance data, and has already been used in several genetic simulations. The next step is the use of these data to understand and improve student learning in a formative manner. This article describes the development of probabilistic models of undergraduate student problem solving in molecular genetics that detailed the spectrum of strategies students used when problem solving, and how the strategic approaches evolved with experience. The actions of 776 university sophomore biology majors from three molecular biology lecture courses were recorded and analyzed. Each of six simulations were first grouped by artificial neural network clustering to provide individual performance measures, and then sequences of these performances were probabilistically modeled by hidden Markov modeling to provide measures of progress. The models showed that students with different initial problem-solving abilities choose different strategies. Initial and final strategies varied across different sections of the same course and were not strongly correlated with other achievement measures. In contrast to previous studies, we observed no significant gender differences. We suggest that instructor interventions based on early student performances with these simulations may assist students to recognize effective and efficient problem-solving strategies and enhance learning.
NASA Astrophysics Data System (ADS)
Kozine, Igor
2018-04-01
The paper suggests looking on probabilistic risk quantities and concepts through the prism of accepting one of the views: whether a true value of risk exists or not. It is argued that discussions until now have been primarily focused on closely related topics that are different from the topic of the current paper. The paper examines operational consequences of adhering to each of the views and contrasts them. It is demonstrated that operational differences on how and what probabilistic measures can be assessed and how they can be interpreted appear tangible. In particular, this concerns prediction intervals, the use of Byes rule, models of complete ignorance, hierarchical models of uncertainty, assignment of probabilities over possibility space and interpretation of derived probabilistic measures. Behavioural implications of favouring the either view are also briefly described.
A probabilistic tornado wind hazard model for the continental United States
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hossain, Q; Kimball, J; Mensing, R
A probabilistic tornado wind hazard model for the continental United States (CONUS) is described. The model incorporates both aleatory (random) and epistemic uncertainties associated with quantifying the tornado wind hazard parameters. The temporal occurrences of tornadoes within the continental United States (CONUS) is assumed to be a Poisson process. A spatial distribution of tornado touchdown locations is developed empirically based on the observed historical events within the CONUS. The hazard model is an aerial probability model that takes into consideration the size and orientation of the facility, the length and width of the tornado damage area (idealized as a rectanglemore » and dependent on the tornado intensity scale), wind speed variation within the damage area, tornado intensity classification errors (i.e.,errors in assigning a Fujita intensity scale based on surveyed damage), and the tornado path direction. Epistemic uncertainties in describing the distributions of the aleatory variables are accounted for by using more than one distribution model to describe aleatory variations. The epistemic uncertainties are based on inputs from a panel of experts. A computer program, TORNADO, has been developed incorporating this model; features of this program are also presented.« less
Diagnosis of students' ability in a statistical course based on Rasch probabilistic outcome
NASA Astrophysics Data System (ADS)
Mahmud, Zamalia; Ramli, Wan Syahira Wan; Sapri, Shamsiah; Ahmad, Sanizah
2017-06-01
Measuring students' ability and performance are important in assessing how well students have learned and mastered the statistical courses. Any improvement in learning will depend on the student's approaches to learning, which are relevant to some factors of learning, namely assessment methods carrying out tasks consisting of quizzes, tests, assignment and final examination. This study has attempted an alternative approach to measure students' ability in an undergraduate statistical course based on the Rasch probabilistic model. Firstly, this study aims to explore the learning outcome patterns of students in a statistics course (Applied Probability and Statistics) based on an Entrance-Exit survey. This is followed by investigating students' perceived learning ability based on four Course Learning Outcomes (CLOs) and students' actual learning ability based on their final examination scores. Rasch analysis revealed that students perceived themselves as lacking the ability to understand about 95% of the statistics concepts at the beginning of the class but eventually they had a good understanding at the end of the 14 weeks class. In terms of students' performance in their final examination, their ability in understanding the topics varies at different probability values given the ability of the students and difficulty of the questions. Majority found the probability and counting rules topic to be the most difficult to learn.
A probabilistic seismic model for the European Arctic
NASA Astrophysics Data System (ADS)
Hauser, Juerg; Dyer, Kathleen M.; Pasyanos, Michael E.; Bungum, Hilmar; Faleide, Jan I.; Clark, Stephen A.; Schweitzer, Johannes
2011-01-01
The development of three-dimensional seismic models for the crust and upper mantle has traditionally focused on finding one model that provides the best fit to the data while observing some regularization constraints. In contrast to this, the inversion employed here fits the data in a probabilistic sense and thus provides a quantitative measure of model uncertainty. Our probabilistic model is based on two sources of information: (1) prior information, which is independent from the data, and (2) different geophysical data sets, including thickness constraints, velocity profiles, gravity data, surface wave group velocities, and regional body wave traveltimes. We use a Markov chain Monte Carlo (MCMC) algorithm to sample models from the prior distribution, the set of plausible models, and test them against the data to generate the posterior distribution, the ensemble of models that fit the data with assigned uncertainties. While being computationally more expensive, such a probabilistic inversion provides a more complete picture of solution space and allows us to combine various data sets. The complex geology of the European Arctic, encompassing oceanic crust, continental shelf regions, rift basins and old cratonic crust, as well as the nonuniform coverage of the region by data with varying degrees of uncertainty, makes it a challenging setting for any imaging technique and, therefore, an ideal environment for demonstrating the practical advantages of a probabilistic approach. Maps of depth to basement and depth to Moho derived from the posterior distribution are in good agreement with previously published maps and interpretations of the regional tectonic setting. The predicted uncertainties, which are as important as the absolute values, correlate well with the variations in data coverage and quality in the region. A practical advantage of our probabilistic model is that it can provide estimates for the uncertainties of observables due to model uncertainties. We will demonstrate how this can be used for the formulation of earthquake location algorithms that take model uncertainties into account when estimating location uncertainties.
Assignment of functional activations to probabilistic cytoarchitectonic areas revisited.
Eickhoff, Simon B; Paus, Tomas; Caspers, Svenja; Grosbras, Marie-Helene; Evans, Alan C; Zilles, Karl; Amunts, Katrin
2007-07-01
Probabilistic cytoarchitectonic maps in standard reference space provide a powerful tool for the analysis of structure-function relationships in the human brain. While these microstructurally defined maps have already been successfully used in the analysis of somatosensory, motor or language functions, several conceptual issues in the analysis of structure-function relationships still demand further clarification. In this paper, we demonstrate the principle approaches for anatomical localisation of functional activations based on probabilistic cytoarchitectonic maps by exemplary analysis of an anterior parietal activation evoked by visual presentation of hand gestures. After consideration of the conceptual basis and implementation of volume or local maxima labelling, we comment on some potential interpretational difficulties, limitations and caveats that could be encountered. Extending and supplementing these methods, we then propose a supplementary approach for quantification of structure-function correspondences based on distribution analysis. This approach relates the cytoarchitectonic probabilities observed at a particular functionally defined location to the areal specific null distribution of probabilities across the whole brain (i.e., the full probability map). Importantly, this method avoids the need for a unique classification of voxels to a single cortical area and may increase the comparability between results obtained for different areas. Moreover, as distribution-based labelling quantifies the "central tendency" of an activation with respect to anatomical areas, it will, in combination with the established methods, allow an advanced characterisation of the anatomical substrates of functional activations. Finally, the advantages and disadvantages of the various methods are discussed, focussing on the question of which approach is most appropriate for a particular situation.
An algebraic hypothesis about the primeval genetic code architecture.
Sánchez, Robersy; Grau, Ricardo
2009-09-01
A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.
A Hough transform global probabilistic approach to multiple-subject diffusion MRI tractography.
Aganj, Iman; Lenglet, Christophe; Jahanshad, Neda; Yacoub, Essa; Harel, Noam; Thompson, Paul M; Sapiro, Guillermo
2011-08-01
A global probabilistic fiber tracking approach based on the voting process provided by the Hough transform is introduced in this work. The proposed framework tests candidate 3D curves in the volume, assigning to each one a score computed from the diffusion images, and then selects the curves with the highest scores as the potential anatomical connections. The algorithm avoids local minima by performing an exhaustive search at the desired resolution. The technique is easily extended to multiple subjects, considering a single representative volume where the registered high-angular resolution diffusion images (HARDI) from all the subjects are non-linearly combined, thereby obtaining population-representative tracts. The tractography algorithm is run only once for the multiple subjects, and no tract alignment is necessary. We present experimental results on HARDI volumes, ranging from simulated and 1.5T physical phantoms to 7T and 4T human brain and 7T monkey brain datasets. Copyright © 2011 Elsevier B.V. All rights reserved.
A Probabilistic Model of Social Working Memory for Information Retrieval in Social Interactions.
Li, Liyuan; Xu, Qianli; Gan, Tian; Tan, Cheston; Lim, Joo-Hwee
2018-05-01
Social working memory (SWM) plays an important role in navigating social interactions. Inspired by studies in psychology, neuroscience, cognitive science, and machine learning, we propose a probabilistic model of SWM to mimic human social intelligence for personal information retrieval (IR) in social interactions. First, we establish a semantic hierarchy as social long-term memory to encode personal information. Next, we propose a semantic Bayesian network as the SWM, which integrates the cognitive functions of accessibility and self-regulation. One subgraphical model implements the accessibility function to learn the social consensus about IR-based on social information concept, clustering, social context, and similarity between persons. Beyond accessibility, one more layer is added to simulate the function of self-regulation to perform the personal adaptation to the consensus based on human personality. Two learning algorithms are proposed to train the probabilistic SWM model on a raw dataset of high uncertainty and incompleteness. One is an efficient learning algorithm of Newton's method, and the other is a genetic algorithm. Systematic evaluations show that the proposed SWM model is able to learn human social intelligence effectively and outperforms the baseline Bayesian cognitive model. Toward real-world applications, we implement our model on Google Glass as a wearable assistant for social interaction.
Fu, H C; Xu, Y Y; Chang, H Y
1999-12-01
Recognition of similar (confusion) characters is a difficult problem in optical character recognition (OCR). In this paper, we introduce a neural network solution that is capable of modeling minor differences among similar characters, and is robust to various personal handwriting styles. The Self-growing Probabilistic Decision-based Neural Network (SPDNN) is a probabilistic type neural network, which adopts a hierarchical network structure with nonlinear basis functions and a competitive credit-assignment scheme. Based on the SPDNN model, we have constructed a three-stage recognition system. First, a coarse classifier determines a character to be input to one of the pre-defined subclasses partitioned from a large character set, such as Chinese mixed with alphanumerics. Then a character recognizer determines the input image which best matches the reference character in the subclass. Lastly, the third module is a similar character recognizer, which can further enhance the recognition accuracy among similar or confusing characters. The prototype system has demonstrated a successful application of SPDNN to similar handwritten Chinese recognition for the public database CCL/HCCR1 (5401 characters x200 samples). Regarding performance, experiments on the CCL/HCCR1 database produced 90.12% recognition accuracy with no rejection, and 94.11% accuracy with 6.7% rejection, respectively. This recognition accuracy represents about 4% improvement on the previously announced performance. As to processing speed, processing before recognition (including image preprocessing, segmentation, and feature extraction) requires about one second for an A4 size character image, and recognition consumes approximately 0.27 second per character on a Pentium-100 based personal computer, without use of any hardware accelerator or co-processor.
Economic Evaluation of Pediatric Telemedicine Consultations to Rural Emergency Departments.
Yang, Nikki H; Dharmar, Madan; Yoo, Byung-Kwang; Leigh, J Paul; Kuppermann, Nathan; Romano, Patrick S; Nesbitt, Thomas S; Marcin, James P
2015-08-01
Comprehensive economic evaluations have not been conducted on telemedicine consultations to children in rural emergency departments (EDs). We conducted an economic evaluation to estimate the cost, effectiveness, and return on investment (ROI) of telemedicine consultations provided to health care providers of acutely ill and injured children in rural EDs compared with telephone consultations from a health care payer prospective. We built a decision model with parameters from primary programmatic data, national data, and the literature. We performed a base-case cost-effectiveness analysis (CEA), a probabilistic CEA with Monte Carlo simulation, and ROI estimation when CEA suggested cost-saving. The CEA was based on program effectiveness, derived from transfer decisions following telemedicine and telephone consultations. The average cost for a telemedicine consultation was $3641 per child/ED/year in 2013 US dollars. Telemedicine consultations resulted in 31% fewer patient transfers compared with telephone consultations and a cost reduction of $4662 per child/ED/year. Our probabilistic CEA demonstrated telemedicine consultations were less costly than telephone consultations in 57% of simulation iterations. The ROI was calculated to be 1.28 ($4662/$3641) from the base-case analysis and estimated to be 1.96 from the probabilistic analysis, suggesting a $1.96 return for each dollar invested in telemedicine. Treating 10 acutely ill and injured children at each rural ED with telemedicine resulted in an annual cost-savings of $46,620 per ED. Telephone and telemedicine consultations were not randomly assigned, potentially resulting in biased results. From a health care payer perspective, telemedicine consultations to health care providers of acutely ill and injured children presenting to rural EDs are cost-saving (base-case and more than half of Monte Carlo simulation iterations) or cost-effective compared with telephone consultations. © The Author(s) 2015.
ERIC Educational Resources Information Center
Martin, Nancy
Presented is a technical report concerning the use of a mathematical model describing certain aspects of the duplication and selection processes in natural genetic adaptation. This reproductive plan/model occurs in artificial genetics (the use of ideas from genetics to develop general problem solving techniques for computers). The reproductive…
Kurushima, J. D.; Lipinski, M. J.; Gandolfi, B.; Froenicke, L.; Grahn, J. C.; Grahn, R. A.; Lyons, L. A.
2012-01-01
Summary Both cat breeders and the lay public have interests in the origins of their pets, not only in the genetic identity of the purebred individuals, but also the historical origins of common household cats. The cat fancy is a relatively new institution with over 85% of its 40–50 breeds arising only in the past 75 years, primarily through selection on single-gene aesthetic traits. The short, yet intense cat breed history poses a significant challenge to the development of a genetic marker-based breed identification strategy. Using different breed assignment strategies and methods, 477 cats representing 29 fancy breeds were analysed with 38 short tandem repeats, 148 intergenic and five phenotypic single nucleotide polymorphisms. Results suggest the frequentist method of Paetkau (accuracy single nucleotide polymorphisms = 0.78, short tandem repeats = 0.88) surpasses the Bayesian method of Rannala and Mountain (single nucleotide polymorphisms = 0.56, short tandem repeats = 0.83) for accurate assignment of individuals to the correct breed. Additionally, a post-assignment verification step with the five phenotypic single nucleotide polymorphisms accurately identified between 0.31 and 0.58 of the mis-assigned individuals raising the sensitivity of assignment with the frequentist method to 0.89 and 0.92 single nucleotide polymorphisms and short tandem repeats respectively. This study provides a novel multi-step assignment strategy and suggests that, despite their short breed history and breed family groupings, a majority of cats can be assigned to their proper breed or population of origin, i.e. race. PMID:23171373
NASA Astrophysics Data System (ADS)
Caglar, Mehmet Umut; Pal, Ranadip
2011-03-01
Central dogma of molecular biology states that ``information cannot be transferred back from protein to either protein or nucleic acid''. However, this assumption is not exactly correct in most of the cases. There are a lot of feedback loops and interactions between different levels of systems. These types of interactions are hard to analyze due to the lack of cell level data and probabilistic - nonlinear nature of interactions. Several models widely used to analyze and simulate these types of nonlinear interactions. Stochastic Master Equation (SME) models give probabilistic nature of the interactions in a detailed manner, with a high calculation cost. On the other hand Probabilistic Boolean Network (PBN) models give a coarse scale picture of the stochastic processes, with a less calculation cost. Differential Equation (DE) models give the time evolution of mean values of processes in a highly cost effective way. The understanding of the relations between the predictions of these models is important to understand the reliability of the simulations of genetic regulatory networks. In this work the success of the mapping between SME, PBN and DE models is analyzed and the accuracy and affectivity of the control policies generated by using PBN and DE models is compared.
Functional proteomics outlines the complexity of breast cancer molecular subtypes.
Gámez-Pozo, Angelo; Trilla-Fuertes, Lucía; Berges-Soria, Julia; Selevsek, Nathalie; López-Vacas, Rocío; Díaz-Almirón, Mariana; Nanni, Paolo; Arevalillo, Jorge M; Navarro, Hilario; Grossmann, Jonas; Gayá Moreno, Francisco; Gómez Rioja, Rubén; Prado-Vázquez, Guillermo; Zapater-Moros, Andrea; Main, Paloma; Feliú, Jaime; Martínez Del Prado, Purificación; Zamora, Pilar; Ciruelos, Eva; Espinosa, Enrique; Fresno Vara, Juan Ángel
2017-08-30
Breast cancer is a heterogeneous disease comprising a variety of entities with various genetic backgrounds. Estrogen receptor-positive, human epidermal growth factor receptor 2-negative tumors typically have a favorable outcome; however, some patients eventually relapse, which suggests some heterogeneity within this category. In the present study, we used proteomics and miRNA profiling techniques to characterize a set of 102 either estrogen receptor-positive (ER+)/progesterone receptor-positive (PR+) or triple-negative formalin-fixed, paraffin-embedded breast tumors. Protein expression-based probabilistic graphical models and flux balance analyses revealed that some ER+/PR+ samples had a protein expression profile similar to that of triple-negative samples and had a clinical outcome similar to those with triple-negative disease. This probabilistic graphical model-based classification had prognostic value in patients with luminal A breast cancer. This prognostic information was independent of that provided by standard genomic tests for breast cancer, such as MammaPrint, OncoType Dx and the 8-gene Score.
A human reliability based usability evaluation method for safety-critical software
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boring, R. L.; Tran, T. Q.; Gertman, D. I.
2006-07-01
Boring and Gertman (2005) introduced a novel method that augments heuristic usability evaluation methods with that of the human reliability analysis method of SPAR-H. By assigning probabilistic modifiers to individual heuristics, it is possible to arrive at the usability error probability (UEP). Although this UEP is not a literal probability of error, it nonetheless provides a quantitative basis to heuristic evaluation. This method allows one to seamlessly prioritize and identify usability issues (i.e., a higher UEP requires more immediate fixes). However, the original version of this method required the usability evaluator to assign priority weights to the final UEP, thusmore » allowing the priority of a usability issue to differ among usability evaluators. The purpose of this paper is to explore an alternative approach to standardize the priority weighting of the UEP in an effort to improve the method's reliability. (authors)« less
Patel, Ameera X; Bullmore, Edward T
2016-11-15
Connectome mapping using techniques such as functional magnetic resonance imaging (fMRI) has become a focus of systems neuroscience. There remain many statistical challenges in analysis of functional connectivity and network architecture from BOLD fMRI multivariate time series. One key statistic for any time series is its (effective) degrees of freedom, df, which will generally be less than the number of time points (or nominal degrees of freedom, N). If we know the df, then probabilistic inference on other fMRI statistics, such as the correlation between two voxel or regional time series, is feasible. However, we currently lack good estimators of df in fMRI time series, especially after the degrees of freedom of the "raw" data have been modified substantially by denoising algorithms for head movement. Here, we used a wavelet-based method both to denoise fMRI data and to estimate the (effective) df of the denoised process. We show that seed voxel correlations corrected for locally variable df could be tested for false positive connectivity with better control over Type I error and greater specificity of anatomical mapping than probabilistic connectivity maps using the nominal degrees of freedom. We also show that wavelet despiked statistics can be used to estimate all pairwise correlations between a set of regional nodes, assign a P value to each edge, and then iteratively add edges to the graph in order of increasing P. These probabilistically thresholded graphs are likely more robust to regional variation in head movement effects than comparable graphs constructed by thresholding correlations. Finally, we show that time-windowed estimates of df can be used for probabilistic connectivity testing or dynamic network analysis so that apparent changes in the functional connectome are appropriately corrected for the effects of transient noise bursts. Wavelet despiking is both an algorithm for fMRI time series denoising and an estimator of the (effective) df of denoised fMRI time series. Accurate estimation of df offers many potential advantages for probabilistically thresholding functional connectivity and network statistics tested in the context of spatially variant and non-stationary noise. Code for wavelet despiking, seed correlational testing and probabilistic graph construction is freely available to download as part of the BrainWavelet Toolbox at www.brainwavelet.org. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Probabilistic and deterministic evaluation of uncertainty in a local scale multi-risk analysis
NASA Astrophysics Data System (ADS)
Lari, S.; Frattini, P.; Crosta, G. B.
2009-04-01
We performed a probabilistic multi-risk analysis (QPRA) at the local scale for a 420 km2 area surrounding the town of Brescia (Northern Italy). We calculated the expected annual loss in terms of economical damage and life loss, for a set of risk scenarios of flood, earthquake and industrial accident with different occurrence probabilities and different intensities. The territorial unit used for the study was the census parcel, of variable area, for which a large amount of data was available. Due to the lack of information related to the evaluation of the hazards, to the value of the exposed elements (e.g., residential and industrial area, population, lifelines, sensitive elements as schools, hospitals) and to the process-specific vulnerability, and to a lack of knowledge of the processes (floods, industrial accidents, earthquakes), we assigned an uncertainty to the input variables of the analysis. For some variables an homogeneous uncertainty was assigned on the whole study area, as for instance for the number of buildings of various typologies, and for the event occurrence probability. In other cases, as for phenomena intensity (e.g.,depth of water during flood) and probability of impact, the uncertainty was defined in relation to the census parcel area. In fact assuming some variables homogeneously diffused or averaged on the census parcels, we introduce a larger error for larger parcels. We propagated the uncertainty in the analysis using three different models, describing the reliability of the output (risk) as a function of the uncertainty of the inputs (scenarios and vulnerability functions). We developed a probabilistic approach based on Monte Carlo simulation, and two deterministic models, namely First Order Second Moment (FOSM) and Point Estimate (PE). In general, similar values of expected losses are obtained with the three models. The uncertainty of the final risk value is in the three cases around the 30% of the expected value. Each of the models, nevertheless, requires different assumptions and computational efforts, and provides results with different level of detail.
Stochastic model simulation using Kronecker product analysis and Zassenhaus formula approximation.
Caglar, Mehmet Umut; Pal, Ranadip
2013-01-01
Probabilistic Models are regularly applied in Genetic Regulatory Network modeling to capture the stochastic behavior observed in the generation of biological entities such as mRNA or proteins. Several approaches including Stochastic Master Equations and Probabilistic Boolean Networks have been proposed to model the stochastic behavior in genetic regulatory networks. It is generally accepted that Stochastic Master Equation is a fundamental model that can describe the system being investigated in fine detail, but the application of this model is computationally enormously expensive. On the other hand, Probabilistic Boolean Network captures only the coarse-scale stochastic properties of the system without modeling the detailed interactions. We propose a new approximation of the stochastic master equation model that is able to capture the finer details of the modeled system including bistabilities and oscillatory behavior, and yet has a significantly lower computational complexity. In this new method, we represent the system using tensors and derive an identity to exploit the sparse connectivity of regulatory targets for complexity reduction. The algorithm involves an approximation based on Zassenhaus formula to represent the exponential of a sum of matrices as product of matrices. We derive upper bounds on the expected error of the proposed model distribution as compared to the stochastic master equation model distribution. Simulation results of the application of the model to four different biological benchmark systems illustrate performance comparable to detailed stochastic master equation models but with considerably lower computational complexity. The results also demonstrate the reduced complexity of the new approach as compared to commonly used Stochastic Simulation Algorithm for equivalent accuracy.
10 CFR 436.24 - Uncertainty analyses.
Code of Federal Regulations, 2011 CFR
2011-01-01
... Procedures for Life Cycle Cost Analyses § 436.24 Uncertainty analyses. If particular items of cost data or... impact of uncertainty on the calculation of life cycle cost effectiveness or the assignment of rank order... and probabilistic analysis. If additional analysis casts substantial doubt on the life cycle cost...
10 CFR 436.24 - Uncertainty analyses.
Code of Federal Regulations, 2013 CFR
2013-01-01
... Procedures for Life Cycle Cost Analyses § 436.24 Uncertainty analyses. If particular items of cost data or... impact of uncertainty on the calculation of life cycle cost effectiveness or the assignment of rank order... and probabilistic analysis. If additional analysis casts substantial doubt on the life cycle cost...
10 CFR 436.24 - Uncertainty analyses.
Code of Federal Regulations, 2014 CFR
2014-01-01
... Procedures for Life Cycle Cost Analyses § 436.24 Uncertainty analyses. If particular items of cost data or... impact of uncertainty on the calculation of life cycle cost effectiveness or the assignment of rank order... and probabilistic analysis. If additional analysis casts substantial doubt on the life cycle cost...
10 CFR 436.24 - Uncertainty analyses.
Code of Federal Regulations, 2012 CFR
2012-01-01
... Procedures for Life Cycle Cost Analyses § 436.24 Uncertainty analyses. If particular items of cost data or... impact of uncertainty on the calculation of life cycle cost effectiveness or the assignment of rank order... and probabilistic analysis. If additional analysis casts substantial doubt on the life cycle cost...
Probabilistic peak detection in CE-LIF for STR DNA typing.
Woldegebriel, Michael; van Asten, Arian; Kloosterman, Ate; Vivó-Truyols, Gabriel
2017-07-01
In this work, we present a novel probabilistic peak detection algorithm based on a Bayesian framework for forensic DNA analysis. The proposed method aims at an exhaustive use of raw electropherogram data from a laser-induced fluorescence multi-CE system. As the raw data are informative up to a single data point, the conventional threshold-based approaches discard relevant forensic information early in the data analysis pipeline. Our proposed method assigns a posterior probability reflecting the data point's relevance with respect to peak detection criteria. Peaks of low intensity generated from a truly existing allele can thus constitute evidential value instead of fully discarding them and contemplating a potential allele drop-out. This way of working utilizes the information available within each individual data point and thus avoids making early (binary) decisions on the data analysis that can lead to error propagation. The proposed method was tested and compared to the application of a set threshold as is current practice in forensic STR DNA profiling. The new method was found to yield a significant improvement in the number of alleles identified, regardless of the peak heights and deviation from Gaussian shape. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Thulin, Carl-Gustaf; Simberloff, Daniel; Barun, Arijana; McCracken, Gary; Pascal, Michel; Islam, M Anwarul
2006-11-01
The combination of founder events, random drift and new selective forces experienced by introduced species typically lowers genetic variation and induces differentiation from the ancestral population. Here, we investigate microsatellite differentiation between introduced and native populations of the small Indian mongoose (Herpestes auropunctatus). Many expectations based on introduction history, such as loss of alleles and relationships among populations, are confirmed. Nevertheless, when applying population assignment methods to our data, we observe a few specimens that are incorrectly assigned and/or appear to have a mixed ancestry, despite estimates of substantial population differentiation. Thus, we suggest that population assignments of individuals should be viewed as tentative and that there should be agreement among different algorithms before assignments are applied in conservation or management. Further, we find no congruence between previously reported morphological differentiation and the sorting of microsatellite variation. Some introduced populations have retained much genetic variation while others have not, irrespective of morphology. Finally, we find alleles from the sympatric grey mongoose (Herpestes edwardsii) in one small Indian mongoose within the native range, suggesting an alternative explanation for morphological differentiation involving a shift in female preferences in allopatry.
7 CFR 457.112 - Hybrid sorghum seed crop insurance provisions.
Code of Federal Regulations, 2010 CFR
2010-01-01
... produced by crossing a male and female parent plant, each having a different genetic character. This... formula for establishing the value must be based on data provided by a public third party that establishes..., number or code assigned to a specific genetic cross by the seed company or the Special Provisions for the...
Kopps, Anna M; Kang, Jungkoo; Sherwin, William B; Palsbøll, Per J
2015-06-30
Kinship analyses are important pillars of ecological and conservation genetic studies with potentially far-reaching implications. There is a need for power analyses that address a range of possible relationships. Nevertheless, such analyses are rarely applied, and studies that use genetic-data-based-kinship inference often ignore the influence of intrinsic population characteristics. We investigated 11 questions regarding the correct classification rate of dyads to relatedness categories (relatedness category assignments; RCA) using an individual-based model with realistic life history parameters. We investigated the effects of the number of genetic markers; marker type (microsatellite, single nucleotide polymorphism SNP, or both); minor allele frequency; typing error; mating system; and the number of overlapping generations under different demographic conditions. We found that (i) an increasing number of genetic markers increased the correct classification rate of the RCA so that up to >80% first cousins can be correctly assigned; (ii) the minimum number of genetic markers required for assignments with 80 and 95% correct classifications differed between relatedness categories, mating systems, and the number of overlapping generations; (iii) the correct classification rate was improved by adding additional relatedness categories and age and mitochondrial DNA data; and (iv) a combination of microsatellite and single-nucleotide polymorphism data increased the correct classification rate if <800 SNP loci were available. This study shows how intrinsic population characteristics, such as mating system and the number of overlapping generations, life history traits, and genetic marker characteristics, can influence the correct classification rate of an RCA study. Therefore, species-specific power analyses are essential for empirical studies. Copyright © 2015 Kopps et al.
NASA Astrophysics Data System (ADS)
Sari, Dwi Ivayana; Budayasa, I. Ketut; Juniati, Dwi
2017-08-01
Formulation of mathematical learning goals now is not only oriented on cognitive product, but also leads to cognitive process, which is probabilistic thinking. Probabilistic thinking is needed by students to make a decision. Elementary school students are required to develop probabilistic thinking as foundation to learn probability at higher level. A framework of probabilistic thinking of students had been developed by using SOLO taxonomy, which consists of prestructural probabilistic thinking, unistructural probabilistic thinking, multistructural probabilistic thinking and relational probabilistic thinking. This study aimed to analyze of probability task completion based on taxonomy of probabilistic thinking. The subjects were two students of fifth grade; boy and girl. Subjects were selected by giving test of mathematical ability and then based on high math ability. Subjects were given probability tasks consisting of sample space, probability of an event and probability comparison. The data analysis consisted of categorization, reduction, interpretation and conclusion. Credibility of data used time triangulation. The results was level of boy's probabilistic thinking in completing probability tasks indicated multistructural probabilistic thinking, while level of girl's probabilistic thinking in completing probability tasks indicated unistructural probabilistic thinking. The results indicated that level of boy's probabilistic thinking was higher than level of girl's probabilistic thinking. The results could contribute to curriculum developer in developing probability learning goals for elementary school students. Indeed, teachers could teach probability with regarding gender difference.
Probabilistic load simulation: Code development status
NASA Astrophysics Data System (ADS)
Newell, J. F.; Ho, H.
1991-05-01
The objective of the Composite Load Spectra (CLS) project is to develop generic load models to simulate the composite load spectra that are included in space propulsion system components. The probabilistic loads thus generated are part of the probabilistic design analysis (PDA) of a space propulsion system that also includes probabilistic structural analyses, reliability, and risk evaluations. Probabilistic load simulation for space propulsion systems demands sophisticated probabilistic methodology and requires large amounts of load information and engineering data. The CLS approach is to implement a knowledge based system coupled with a probabilistic load simulation module. The knowledge base manages and furnishes load information and expertise and sets up the simulation runs. The load simulation module performs the numerical computation to generate the probabilistic loads with load information supplied from the CLS knowledge base.
Lee, Insuk; Li, Zhihua; Marcotte, Edward M.
2007-01-01
Background Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations. Methodology/Principal Findings We report a significantly improved version (v. 2) of a probabilistic functional gene network [1] of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis. Conclusions/Significance YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org. PMID:17912365
PROBABILISTIC CROSS-IDENTIFICATION IN CROWDED FIELDS AS AN ASSIGNMENT PROBLEM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Budavári, Tamás; Basu, Amitabh, E-mail: budavari@jhu.edu, E-mail: basu.amitabh@jhu.edu
2016-10-01
One of the outstanding challenges of cross-identification is multiplicity: detections in crowded regions of the sky are often linked to more than one candidate associations of similar likelihoods. We map the resulting maximum likelihood partitioning to the fundamental assignment problem of discrete mathematics and efficiently solve the two-way catalog-level matching in the realm of combinatorial optimization using the so-called Hungarian algorithm. We introduce the method, demonstrate its performance in a mock universe where the true associations are known, and discuss the applicability of the new procedure to large surveys.
Probabilistic Cross-identification in Crowded Fields as an Assignment Problem
NASA Astrophysics Data System (ADS)
Budavári, Tamás; Basu, Amitabh
2016-10-01
One of the outstanding challenges of cross-identification is multiplicity: detections in crowded regions of the sky are often linked to more than one candidate associations of similar likelihoods. We map the resulting maximum likelihood partitioning to the fundamental assignment problem of discrete mathematics and efficiently solve the two-way catalog-level matching in the realm of combinatorial optimization using the so-called Hungarian algorithm. We introduce the method, demonstrate its performance in a mock universe where the true associations are known, and discuss the applicability of the new procedure to large surveys.
Orhan, A Emin; Ma, Wei Ji
2017-07-26
Animals perform near-optimal probabilistic inference in a wide range of psychophysical tasks. Probabilistic inference requires trial-to-trial representation of the uncertainties associated with task variables and subsequent use of this representation. Previous work has implemented such computations using neural networks with hand-crafted and task-dependent operations. We show that generic neural networks trained with a simple error-based learning rule perform near-optimal probabilistic inference in nine common psychophysical tasks. In a probabilistic categorization task, error-based learning in a generic network simultaneously explains a monkey's learning curve and the evolution of qualitative aspects of its choice behavior. In all tasks, the number of neurons required for a given level of performance grows sublinearly with the input population size, a substantial improvement on previous implementations of probabilistic inference. The trained networks develop a novel sparsity-based probabilistic population code. Our results suggest that probabilistic inference emerges naturally in generic neural networks trained with error-based learning rules.Behavioural tasks often require probability distributions to be inferred about task specific variables. Here, the authors demonstrate that generic neural networks can be trained using a simple error-based learning rule to perform such probabilistic computations efficiently without any need for task specific operations.
Simulation optimization of PSA-threshold based prostate cancer screening policies
Zhang, Jingyu; Denton, Brian T.; Shah, Nilay D.; Inman, Brant A.
2013-01-01
We describe a simulation optimization method to design PSA screening policies based on expected quality adjusted life years (QALYs). Our method integrates a simulation model in a genetic algorithm which uses a probabilistic method for selection of the best policy. We present computational results about the efficiency of our algorithm. The best policy generated by our algorithm is compared to previously recommended screening policies. Using the policies determined by our model, we present evidence that patients should be screened more aggressively but for a shorter length of time than previously published guidelines recommend. PMID:22302420
N. B. Klopfenstein; J. W. Hanna; P. G. Cannon; R. Medel-Ortiz; D. Alvarado-Rosales; F. Lorea-Hernandez; R. D. Elias-Roman; M. -S. Kim
2014-01-01
In September 2007, rhizomorphs with morphological characteristics of Armillaria were collected from woody hosts in forests of Mexico State, Veracruz, and Oaxaca, Mexico. Based on pairing tests, isolates were assigned to five somatically compatible genets or clones (MEX7R, MEX11R, MEX23R, MEX28R, and MEX30R). These genets were all identified as Armillaria gallica based...
Middlebrooks, E H; Tuna, I S; Grewal, S S; Almeida, L; Heckman, M G; Lesser, E R; Foote, K D; Okun, M S; Holanda, V M
2018-06-01
Although globus pallidus internus deep brain stimulation is a widely accepted treatment for Parkinson disease, there is persistent variability in outcomes that is not yet fully understood. In this pilot study, we aimed to investigate the potential role of globus pallidus internus segmentation using probabilistic tractography as a supplement to traditional targeting methods. Eleven patients undergoing globus pallidus internus deep brain stimulation were included in this retrospective analysis. Using multidirection diffusion-weighted MR imaging, we performed probabilistic tractography at all individual globus pallidus internus voxels. Each globus pallidus internus voxel was then assigned to the 1 ROI with the greatest number of propagated paths. On the basis of deep brain stimulation programming settings, the volume of tissue activated was generated for each patient using a finite element method solution. For each patient, the volume of tissue activated within each of the 10 segmented globus pallidus internus regions was calculated and examined for association with a change in the Unified Parkinson Disease Rating Scale, Part III score before and after treatment. Increasing volume of tissue activated was most strongly correlated with a change in the Unified Parkinson Disease Rating Scale, Part III score for the primary motor region (Spearman r = 0.74, P = .010), followed by the supplementary motor area/premotor cortex (Spearman r = 0.47, P = .15). In this pilot study, we assessed a novel method of segmentation of the globus pallidus internus based on probabilistic tractography as a supplement to traditional targeting methods. Our results suggest that our method may be an independent predictor of deep brain stimulation outcome, and evaluation of a larger cohort or prospective study is warranted to validate these findings. © 2018 by American Journal of Neuroradiology.
Accounting for Uncertainties in Strengths of SiC MEMS Parts
NASA Technical Reports Server (NTRS)
Nemeth, Noel; Evans, Laura; Beheim, Glen; Trapp, Mark; Jadaan, Osama; Sharpe, William N., Jr.
2007-01-01
A methodology has been devised for accounting for uncertainties in the strengths of silicon carbide structural components of microelectromechanical systems (MEMS). The methodology enables prediction of the probabilistic strengths of complexly shaped MEMS parts using data from tests of simple specimens. This methodology is intended to serve as a part of a rational basis for designing SiC MEMS, supplementing methodologies that have been borrowed from the art of designing macroscopic brittle material structures. The need for this or a similar methodology arises as a consequence of the fundamental nature of MEMS and the brittle silicon-based materials of which they are typically fabricated. When tested to fracture, MEMS and structural components thereof show wide part-to-part scatter in strength. The methodology involves the use of the Ceramics Analysis and Reliability Evaluation of Structures Life (CARES/Life) software in conjunction with the ANSYS Probabilistic Design System (PDS) software to simulate or predict the strength responses of brittle material components while simultaneously accounting for the effects of variability of geometrical features on the strength responses. As such, the methodology involves the use of an extended version of the ANSYS/CARES/PDS software system described in Probabilistic Prediction of Lifetimes of Ceramic Parts (LEW-17682-1/4-1), Software Tech Briefs supplement to NASA Tech Briefs, Vol. 30, No. 9 (September 2006), page 10. The ANSYS PDS software enables the ANSYS finite-element-analysis program to account for uncertainty in the design-and analysis process. The ANSYS PDS software accounts for uncertainty in material properties, dimensions, and loading by assigning probabilistic distributions to user-specified model parameters and performing simulations using various sampling techniques.
Bayesian modeling of consumer behavior in the presence of anonymous visits
NASA Astrophysics Data System (ADS)
Novak, Julie Esther
Tailoring content to consumers has become a hallmark of marketing and digital media, particularly as it has become easier to identify customers across usage or purchase occasions. However, across a wide variety of contexts, companies find that customers do not consistently identify themselves, leaving a substantial fraction of anonymous visits. We develop a Bayesian hierarchical model that allows us to probabilistically assign anonymous sessions to users. These probabilistic assignments take into account a customer's demographic information, frequency of visitation, activities taken when visiting, and times of arrival. We present two studies, one with synthetic and one with real data, where we demonstrate improved performance over two popular practices (nearest-neighbor matching and deleting the anonymous visits) due to increased efficiency and reduced bias driven by the non-ignorability of which types of events are more likely to be anonymous. Using our proposed model, we avoid potential bias in understanding the effect of a firm's marketing on its customers, improve inference about the total number of customers in the dataset, and provide more precise targeted marketing to both previously observed and unobserved customers.
Molecular Test to Assign Individuals within the Cacopsylla pruni Complex
Peccoud, Jean; Labonne, Gérard; Sauvion, Nicolas
2013-01-01
Crop protection requires the accurate identification of disease vectors, a task that can be made difficult when these vectors encompass cryptic species. Here we developed a rapid molecular diagnostic test to identify individuals of Cacopsylla pruni (Scopoli, 1763) (Hemiptera: Psyllidae), the main vector of the European stone fruit yellows phytoplasma. This psyllid encompasses two highly divergent genetic groups that are morphologically similar and that are characterized by genotyping several microsatellite markers, a costly and time-consuming protocol. With the aim of developing species-specific PCR primers, we sequenced the Internal Transcribed Spacer 2 (ITS2) on a collection of C . pruni samples from France and other European countries. ITS2 sequences showed that the two genetic groups represent two highly divergent clades. This enabled us to develop specific primers for the assignment of individuals to either genetic group in a single PCR, based on ITS2 amplicon size. All previously assigned individuals yielded bands of expected sizes, and the PCR proved efficient on a larger sample of 799 individuals. Because none appeared heterozygous at the ITS2 locus (i.e., none produced two bands), we inferred that the genetic groups of C . pruni , whose distribution is partly sympatric, constitute biological species that have not exchanged genes for an extended period of time. Other psyllid species (Cacopsylla, Psylla, Triozidae and Aphalaridae) failed to yield any amplicon. These primers are therefore unlikely to produce false positives and allow rapid assignment of C . pruni individuals to either cryptic species. PMID:23977301
Saenz-Agudelo, P; Jones, G P; Thorrold, S R; Planes, S
2009-04-01
The application of spatially explicit models of population dynamics to fisheries management and the design marine reserve network systems has been limited due to a lack of empirical estimates of larval dispersal. Here we compared assignment tests and parentage analysis for examining larval retention and connectivity under two different gene flow scenarios using panda clownfish (Amphiprion polymnus) in Papua New Guinea. A metapopulation of panda clownfish in Bootless Bay with little or no genetic differentiation among five spatially discrete locations separated by 2-6 km provided the high gene flow scenario. The low gene flow scenario compared the Bootless Bay metapopulation with a genetically distinct population (F(ST )= 0.1) located at Schumann Island, New Britain, 1500 km to the northeast. We used assignment tests and parentage analysis based on microsatellite DNA data to identify natal origins of 177 juveniles in Bootless Bay and 73 juveniles at Schumann Island. At low rates of gene flow, assignment tests correctly classified juveniles to their source population. On the other hand, parentage analysis led to an overestimate of self-recruitment within the two populations due to the significant deviation from panmixia when both populations were pooled. At high gene flow (within Bootless Bay), assignment tests underestimated self-recruitment and connectivity among subpopulations, and grossly overestimated self-recruitment within the overall metapopulation. However, the assignment tests did identify immigrants from distant (genetically distinct) populations. Parentage analysis clearly provided the most accurate estimates of connectivity in situations of high gene flow.
A Probabilistic Design Method Applied to Smart Composite Structures
NASA Technical Reports Server (NTRS)
Shiao, Michael C.; Chamis, Christos C.
1995-01-01
A probabilistic design method is described and demonstrated using a smart composite wing. Probabilistic structural design incorporates naturally occurring uncertainties including those in constituent (fiber/matrix) material properties, fabrication variables, structure geometry and control-related parameters. Probabilistic sensitivity factors are computed to identify those parameters that have a great influence on a specific structural reliability. Two performance criteria are used to demonstrate this design methodology. The first criterion requires that the actuated angle at the wing tip be bounded by upper and lower limits at a specified reliability. The second criterion requires that the probability of ply damage due to random impact load be smaller than an assigned value. When the relationship between reliability improvement and the sensitivity factors is assessed, the results show that a reduction in the scatter of the random variable with the largest sensitivity factor (absolute value) provides the lowest failure probability. An increase in the mean of the random variable with a negative sensitivity factor will reduce the failure probability. Therefore, the design can be improved by controlling or selecting distribution parameters associated with random variables. This can be implemented during the manufacturing process to obtain maximum benefit with minimum alterations.
Joint Inference of Population Assignment and Demographic History
Choi, Sang Chul; Hey, Jody
2011-01-01
A new approach to assigning individuals to populations using genetic data is described. Most existing methods work by maximizing Hardy–Weinberg and linkage equilibrium within populations, neither of which will apply for many demographic histories. By including a demographic model, within a likelihood framework based on coalescent theory, we can jointly study demographic history and population assignment. Genealogies and population assignments are sampled from a posterior distribution using a general isolation-with-migration model for multiple populations. A measure of partition distance between assignments facilitates not only the summary of a posterior sample of assignments, but also the estimation of the posterior density for the demographic history. It is shown that joint estimates of assignment and demographic history are possible, including estimation of population phylogeny for samples from three populations. The new method is compared to results of a widely used assignment method, using simulated and published empirical data sets. PMID:21775468
Yang, Yu; Fritzsching, Keith J; Hong, Mei
2013-11-01
A multi-objective genetic algorithm is introduced to predict the assignment of protein solid-state NMR (SSNMR) spectra with partial resonance overlap and missing peaks due to broad linewidths, molecular motion, and low sensitivity. This non-dominated sorting genetic algorithm II (NSGA-II) aims to identify all possible assignments that are consistent with the spectra and to compare the relative merit of these assignments. Our approach is modeled after the recently introduced Monte-Carlo simulated-annealing (MC/SA) protocol, with the key difference that NSGA-II simultaneously optimizes multiple assignment objectives instead of searching for possible assignments based on a single composite score. The multiple objectives include maximizing the number of consistently assigned peaks between multiple spectra ("good connections"), maximizing the number of used peaks, minimizing the number of inconsistently assigned peaks between spectra ("bad connections"), and minimizing the number of assigned peaks that have no matching peaks in the other spectra ("edges"). Using six SSNMR protein chemical shift datasets with varying levels of imperfection that was introduced by peak deletion, random chemical shift changes, and manual peak picking of spectra with moderately broad linewidths, we show that the NSGA-II algorithm produces a large number of valid and good assignments rapidly. For high-quality chemical shift peak lists, NSGA-II and MC/SA perform similarly well. However, when the peak lists contain many missing peaks that are uncorrelated between different spectra and have chemical shift deviations between spectra, the modified NSGA-II produces a larger number of valid solutions than MC/SA, and is more effective at distinguishing good from mediocre assignments by avoiding the hazard of suboptimal weighting factors for the various objectives. These two advantages, namely diversity and better evaluation, lead to a higher probability of predicting the correct assignment for a larger number of residues. On the other hand, when there are multiple equally good assignments that are significantly different from each other, the modified NSGA-II is less efficient than MC/SA in finding all the solutions. This problem is solved by a combined NSGA-II/MC algorithm, which appears to have the advantages of both NSGA-II and MC/SA. This combination algorithm is robust for the three most difficult chemical shift datasets examined here and is expected to give the highest-quality de novo assignment of challenging protein NMR spectra.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weigend, Florian, E-mail: florian.weigend@kit.edu
2014-10-07
Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, asmore » shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf{sub 12} and [LaPb{sub 7}Bi{sub 7}]{sup 4−}. For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the “pure” genetic algorithm.« less
ActionMap: A web-based software that automates loci assignments to framework maps.
Albini, Guillaume; Falque, Matthieu; Joets, Johann
2003-07-01
Genetic linkage computation may be a repetitive and time consuming task, especially when numerous loci are assigned to a framework map. We thus developed ActionMap, a web-based software that automates genetic mapping on a fixed framework map without adding the new markers to the map. Using this tool, hundreds of loci may be automatically assigned to the framework in a single process. ActionMap was initially developed to map numerous ESTs with a small plant mapping population and is limited to inbred lines and backcrosses. ActionMap is highly configurable and consists of Perl and PHP scripts that automate command steps for the MapMaker program. A set of web forms were designed for data import and mapping settings. Results of automatic mapping can be displayed as tables or drawings of maps and may be exported. The user may create personal access-restricted projects to store raw data, settings and mapping results. All data may be edited, updated or deleted. ActionMap may be used either online or downloaded for free (http://moulon.inra.fr/~bioinfo/).
ActionMap: a web-based software that automates loci assignments to framework maps
Albini, Guillaume; Falque, Matthieu; Joets, Johann
2003-01-01
Genetic linkage computation may be a repetitive and time consuming task, especially when numerous loci are assigned to a framework map. We thus developed ActionMap, a web-based software that automates genetic mapping on a fixed framework map without adding the new markers to the map. Using this tool, hundreds of loci may be automatically assigned to the framework in a single process. ActionMap was initially developed to map numerous ESTs with a small plant mapping population and is limited to inbred lines and backcrosses. ActionMap is highly configurable and consists of Perl and PHP scripts that automate command steps for the MapMaker program. A set of web forms were designed for data import and mapping settings. Results of automatic mapping can be displayed as tables or drawings of maps and may be exported. The user may create personal access-restricted projects to store raw data, settings and mapping results. All data may be edited, updated or deleted. ActionMap may be used either online or downloaded for free (http://moulon.inra.fr/~bioinfo/). PMID:12824426
O’Connor, David; Enshaei, Amir; Bartram, Jack; Hancock, Jeremy; Harrison, Christine J.; Hough, Rachael; Samarasinghe, Sujith; Schwab, Claire; Vora, Ajay; Wade, Rachel; Moppett, John; Moorman, Anthony V.; Goulden, Nick
2018-01-01
Purpose Minimal residual disease (MRD) and genetic abnormalities are important risk factors for outcome in acute lymphoblastic leukemia. Current risk algorithms dichotomize MRD data and do not assimilate genetics when assigning MRD risk, which reduces predictive accuracy. The aim of our study was to exploit the full power of MRD by examining it as a continuous variable and to integrate it with genetics. Patients and Methods We used a population-based cohort of 3,113 patients who were treated in UKALL2003, with a median follow-up of 7 years. MRD was evaluated by polymerase chain reaction analysis of Ig/TCR gene rearrangements, and patients were assigned to a genetic subtype on the basis of immunophenotype, cytogenetics, and fluorescence in situ hybridization. To examine response kinetics at the end of induction, we log-transformed the absolute MRD value and examined its distribution across subgroups. Results MRD was log normally distributed at the end of induction. MRD distributions of patients with distinct genetic subtypes were different (P < .001). Patients with good-risk cytogenetics demonstrated the fastest disease clearance, whereas patients with high-risk genetics and T-cell acute lymphoblastic leukemia responded more slowly. The risk of relapse was correlated with MRD kinetics, and each log reduction in disease level reduced the risk by 20% (hazard ratio, 0.80; 95% CI, 0.77 to 0.83; P < .001). Although the risk of relapse was directly proportional to the MRD level within each genetic risk group, absolute relapse rate that was associated with a specific MRD value or category varied significantly by genetic subtype. Integration of genetic subtype–specific MRD values allowed more refined risk group stratification. Conclusion A single threshold for assigning patients to an MRD risk group does not reflect the response kinetics of the different genetic subtypes. Future risk algorithms should integrate genetics with MRD to accurately identify patients with the lowest and highest risk of relapse. PMID:29131699
Identifying a Probabilistic Boolean Threshold Network From Samples.
Melkman, Avraham A; Cheng, Xiaoqing; Ching, Wai-Ki; Akutsu, Tatsuya
2018-04-01
This paper studies the problem of exactly identifying the structure of a probabilistic Boolean network (PBN) from a given set of samples, where PBNs are probabilistic extensions of Boolean networks. Cheng et al. studied the problem while focusing on PBNs consisting of pairs of AND/OR functions. This paper considers PBNs consisting of Boolean threshold functions while focusing on those threshold functions that have unit coefficients. The treatment of Boolean threshold functions, and triplets and -tuplets of such functions, necessitates a deepening of the theoretical analyses. It is shown that wide classes of PBNs with such threshold functions can be exactly identified from samples under reasonable constraints, which include: 1) PBNs in which any number of threshold functions can be assigned provided that all have the same number of input variables and 2) PBNs consisting of pairs of threshold functions with different numbers of input variables. It is also shown that the problem of deciding the equivalence of two Boolean threshold functions is solvable in pseudopolynomial time but remains co-NP complete.
A review of estimation of distribution algorithms in bioinformatics
Armañanzas, Rubén; Inza, Iñaki; Santana, Roberto; Saeys, Yvan; Flores, Jose Luis; Lozano, Jose Antonio; Peer, Yves Van de; Blanco, Rosa; Robles, Víctor; Bielza, Concha; Larrañaga, Pedro
2008-01-01
Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and representative evolutionary search technique, have been the subject of the major part of such applications. Estimation of distribution algorithms (EDAs) offer a novel evolutionary paradigm that constitutes a natural and attractive alternative to genetic algorithms. They make use of a probabilistic model, learnt from the promising solutions, to guide the search process. In this paper, we set out a basic taxonomy of EDA techniques, underlining the nature and complexity of the probabilistic model of each EDA variant. We review a set of innovative works that make use of EDA techniques to solve challenging bioinformatics problems, emphasizing the EDA paradigm's potential for further research in this domain. PMID:18822112
Molecular analysis of genetic diversity among vine accessions using DNA markers.
da Costa, A F; Teodoro, P E; Bhering, L L; Tardin, F D; Daher, R F; Campos, W F; Viana, A P; Pereira, M G
2017-04-13
Viticulture presents a number of economic and social advantages, such as increasing employment levels and fixing the labor force in rural areas. With the aim of initiating a program of genetic improvement in grapevine from the State University of the state of Rio de Janeiro North Darcy Ribeiro, genetic diversity between 40 genotypes (varieties, rootstock, and species of different subgenera) was evaluated using Random amplified polymorphic DNA (RAPD) molecular markers. We built a matrix of binary data, whereby the presence of a band was assigned as "1" and the absence of a band was assigned as "0." The genetic distance was calculated between pairs of genotypes based on the arithmetic complement from the Jaccard Index. The results revealed the presence of considerable variability in the collection. Analysis of the genetic dissimilarity matrix revealed that the most dissimilar genotypes were Rupestris du Lot and Vitis rotundifolia because they were the most genetically distant (0.5972). The most similar were genotypes 31 (unidentified) and Rupestris du lot, which showed zero distance, confirming the results of field observations. A duplicate was confirmed, consistent with field observations, and a short distance was found between the variety 'Italy' and its mutation, 'Ruby'. The grouping methods used were somewhat concordant.
Probabilistic Magnetotelluric Inversion with Adaptive Regularisation Using the No-U-Turns Sampler
NASA Astrophysics Data System (ADS)
Conway, Dennis; Simpson, Janelle; Didana, Yohannes; Rugari, Joseph; Heinson, Graham
2018-04-01
We present the first inversion of magnetotelluric (MT) data using a Hamiltonian Monte Carlo algorithm. The inversion of MT data is an underdetermined problem which leads to an ensemble of feasible models for a given dataset. A standard approach in MT inversion is to perform a deterministic search for the single solution which is maximally smooth for a given data-fit threshold. An alternative approach is to use Markov Chain Monte Carlo (MCMC) methods, which have been used in MT inversion to explore the entire solution space and produce a suite of likely models. This approach has the advantage of assigning confidence to resistivity models, leading to better geological interpretations. Recent advances in MCMC techniques include the No-U-Turns Sampler (NUTS), an efficient and rapidly converging method which is based on Hamiltonian Monte Carlo. We have implemented a 1D MT inversion which uses the NUTS algorithm. Our model includes a fixed number of layers of variable thickness and resistivity, as well as probabilistic smoothing constraints which allow sharp and smooth transitions. We present the results of a synthetic study and show the accuracy of the technique, as well as the fast convergence, independence of starting models, and sampling efficiency. Finally, we test our technique on MT data collected from a site in Boulia, Queensland, Australia to show its utility in geological interpretation and ability to provide probabilistic estimates of features such as depth to basement.
Probabilistic dose-response modeling: case study using dichloromethane PBPK model results.
Marino, Dale J; Starr, Thomas B
2007-12-01
A revised assessment of dichloromethane (DCM) has recently been reported that examines the influence of human genetic polymorphisms on cancer risks using deterministic PBPK and dose-response modeling in the mouse combined with probabilistic PBPK modeling in humans. This assessment utilized Bayesian techniques to optimize kinetic variables in mice and humans with mean values from posterior distributions used in the deterministic modeling in the mouse. To supplement this research, a case study was undertaken to examine the potential impact of probabilistic rather than deterministic PBPK and dose-response modeling in mice on subsequent unit risk factor (URF) determinations. Four separate PBPK cases were examined based on the exposure regimen of the NTP DCM bioassay. These were (a) Same Mouse (single draw of all PBPK inputs for both treatment groups); (b) Correlated BW-Same Inputs (single draw of all PBPK inputs for both treatment groups except for bodyweights (BWs), which were entered as correlated variables); (c) Correlated BW-Different Inputs (separate draws of all PBPK inputs for both treatment groups except that BWs were entered as correlated variables); and (d) Different Mouse (separate draws of all PBPK inputs for both treatment groups). Monte Carlo PBPK inputs reflect posterior distributions from Bayesian calibration in the mouse that had been previously reported. A minimum of 12,500 PBPK iterations were undertaken, in which dose metrics, i.e., mg DCM metabolized by the GST pathway/L tissue/day for lung and liver were determined. For dose-response modeling, these metrics were combined with NTP tumor incidence data that were randomly selected from binomial distributions. Resultant potency factors (0.1/ED(10)) were coupled with probabilistic PBPK modeling in humans that incorporated genetic polymorphisms to derive URFs. Results show that there was relatively little difference, i.e., <10% in central tendency and upper percentile URFs, regardless of the case evaluated. Independent draws of PBPK inputs resulted in the slightly higher URFs. Results were also comparable to corresponding values from the previously reported deterministic mouse PBPK and dose-response modeling approach that used LED(10)s to derive potency factors. This finding indicated that the adjustment from ED(10) to LED(10) in the deterministic approach for DCM compensated for variability resulting from probabilistic PBPK and dose-response modeling in the mouse. Finally, results show a similar degree of variability in DCM risk estimates from a number of different sources including the current effort even though these estimates were developed using very different techniques. Given the variety of different approaches involved, 95th percentile-to-mean risk estimate ratios of 2.1-4.1 represent reasonable bounds on variability estimates regarding probabilistic assessments of DCM.
Dynamic traffic assignment : genetic algorithms approach
DOT National Transportation Integrated Search
1997-01-01
Real-time route guidance is a promising approach to alleviating congestion on the nations highways. A dynamic traffic assignment model is central to the development of guidance strategies. The artificial intelligence technique of genetic algorithm...
Characterizing dispersal patterns in a threatened seabird with limited genetic structure
Laurie A. Hall; Per J. Palsboll; Steven R. Beissinger; James T. Harvey; Martine Berube; Martin G. Raphael; Kim Nelson; Richard T. Golightly; Laura McFarlane-Tranquilla; Scott H. Newman; M. Zachariah Peery
2009-01-01
Genetic assignment methods provide an appealing approach for characterizing dispersal patterns on ecological time scales, but require sufficient genetic differentiation to accurately identify migrants and a large enough sample size of migrants to, for example, compare dispersal between sexes or age classes. We demonstrate that assignment methods can be rigorously used...
Generating probabilistic Boolean networks from a prescribed transition probability matrix.
Ching, W-K; Chen, X; Tsing, N-K
2009-11-01
Probabilistic Boolean networks (PBNs) have received much attention in modeling genetic regulatory networks. A PBN can be regarded as a Markov chain process and is characterised by a transition probability matrix. In this study, the authors propose efficient algorithms for constructing a PBN when its transition probability matrix is given. The complexities of the algorithms are also analysed. This is an interesting inverse problem in network inference using steady-state data. The problem is important as most microarray data sets are assumed to be obtained from sampling the steady-state.
On construction of stochastic genetic networks based on gene expression sequences.
Ching, Wai-Ki; Ng, Michael M; Fung, Eric S; Akutsu, Tatsuya
2005-08-01
Reconstruction of genetic regulatory networks from time series data of gene expression patterns is an important research topic in bioinformatics. Probabilistic Boolean Networks (PBNs) have been proposed as an effective model for gene regulatory networks. PBNs are able to cope with uncertainty, corporate rule-based dependencies between genes and discover the sensitivity of genes in their interactions with other genes. However, PBNs are unlikely to use directly in practice because of huge amount of computational cost for obtaining predictors and their corresponding probabilities. In this paper, we propose a multivariate Markov model for approximating PBNs and describing the dynamics of a genetic network for gene expression sequences. The main contribution of the new model is to preserve the strength of PBNs and reduce the complexity of the networks. The number of parameters of our proposed model is O(n2) where n is the number of genes involved. We also develop efficient estimation methods for solving the model parameters. Numerical examples on synthetic data sets and practical yeast data sequences are given to demonstrate the effectiveness of the proposed model.
Is Probabilistic Evidence a Source of Knowledge?
ERIC Educational Resources Information Center
Friedman, Ori; Turri, John
2015-01-01
We report a series of experiments examining whether people ascribe knowledge for true beliefs based on probabilistic evidence. Participants were less likely to ascribe knowledge for beliefs based on probabilistic evidence than for beliefs based on perceptual evidence (Experiments 1 and 2A) or testimony providing causal information (Experiment 2B).…
Behavioral genetics and criminal responsibility at the courtroom.
Tatarelli, Roberto; Del Casale, Antonio; Tatarelli, Caterina; Serata, Daniele; Rapinesi, Chiara; Sani, Gabriele; Kotzalidis, Georgios D; Girardi, Paolo
2014-04-01
Several questions arise from the recent use of behavioral genetic research data in the courtroom. Ethical issues concerning the influence of biological factors on human free will, must be considered when specific gene patterns are advocated to constrain court's judgment, especially regarding violent crimes. Aggression genetics studies are both difficult to interpret and inconsistent, hence, in the absence of a psychiatric diagnosis, genetic data are currently difficult to prioritize in the courtroom. The judge's probabilistic considerations in formulating a sentence must take into account causality, and the latter cannot be currently ensured by genetic data. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Pandiselvi, S; Raja, R; Cao, Jinde; Rajchakit, G; Ahmad, Bashir
2018-01-01
This work predominantly labels the problem of approximation of state variables for discrete-time stochastic genetic regulatory networks with leakage, distributed, and probabilistic measurement delays. Here we design a linear estimator in such a way that the absorption of mRNA and protein can be approximated via known measurement outputs. By utilizing a Lyapunov-Krasovskii functional and some stochastic analysis execution, we obtain the stability formula of the estimation error systems in the structure of linear matrix inequalities under which the estimation error dynamics is robustly exponentially stable. Further, the obtained conditions (in the form of LMIs) can be effortlessly solved by some available software packages. Moreover, the specific expression of the desired estimator is also shown in the main section. Finally, two mathematical illustrative examples are accorded to show the advantage of the proposed conceptual results.
A Genetic-Based Scheduling Algorithm to Minimize the Makespan of the Grid Applications
NASA Astrophysics Data System (ADS)
Entezari-Maleki, Reza; Movaghar, Ali
Task scheduling algorithms in grid environments strive to maximize the overall throughput of the grid. In order to maximize the throughput of the grid environments, the makespan of the grid tasks should be minimized. In this paper, a new task scheduling algorithm is proposed to assign tasks to the grid resources with goal of minimizing the total makespan of the tasks. The algorithm uses the genetic approach to find the suitable assignment within grid resources. The experimental results obtained from applying the proposed algorithm to schedule independent tasks within grid environments demonstrate the applicability of the algorithm in achieving schedules with comparatively lower makespan in comparison with other well-known scheduling algorithms such as, Min-min, Max-min, RASA and Sufferage algorithms.
Aiewsakun, Pakorn; Simmonds, Peter
2018-02-20
The International Committee on Taxonomy of Viruses (ICTV) classifies viruses into families, genera and species and provides a regulated system for their nomenclature that is universally used in virus descriptions. Virus taxonomic assignments have traditionally been based upon virus phenotypic properties such as host range, virion morphology and replication mechanisms, particularly at family level. However, gene sequence comparisons provide a clearer guide to their evolutionary relationships and provide the only information that may guide the incorporation of viruses detected in environmental (metagenomic) studies that lack any phenotypic data. The current study sought to determine whether the existing virus taxonomy could be reproduced by examination of genetic relationships through the extraction of protein-coding gene signatures and genome organisational features. We found large-scale consistency between genetic relationships and taxonomic assignments for viruses of all genome configurations and genome sizes. The analysis pipeline that we have called 'Genome Relationships Applied to Virus Taxonomy' (GRAViTy) was highly effective at reproducing the current assignments of viruses at family level as well as inter-family groupings into orders. Its ability to correctly differentiate assigned viruses from unassigned viruses, and classify them into the correct taxonomic group, was evaluated by threefold cross-validation technique. This predicted family membership of eukaryotic viruses with close to 100% accuracy and specificity potentially enabling the algorithm to predict assignments for the vast corpus of metagenomic sequences consistently with ICTV taxonomy rules. In an evaluation run of GRAViTy, over one half (460/921) of (near)-complete genome sequences from several large published metagenomic eukaryotic virus datasets were assigned to 127 novel family-level groupings. If corroborated by other analysis methods, these would potentially more than double the number of eukaryotic virus families in the ICTV taxonomy. A rapid and objective means to explore metagenomic viral diversity and make informed recommendations for their assignments at each taxonomic layer is essential. GRAViTy provides one means to make rule-based assignments at family and order levels in a manner that preserves the integrity and underlying organisational principles of the current ICTV taxonomy framework. Such methods are increasingly required as the vast virosphere is explored.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-03-22
... Staff Guidance on Implementation of a Seismic Margin Analysis for New Reactors Based on Probabilistic... Seismic Margin Analysis for New Reactors Based on Probabilistic Risk Assessment,'' (Agencywide Documents.../COL-ISG-020 ``Implementation of a Seismic Margin Analysis for New Reactors Based on Probabilistic Risk...
Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS.
Kim, Nora Chung; Andrews, Peter C; Asselbergs, Folkert W; Frost, H Robert; Williams, Scott M; Harris, Brent T; Read, Cynthia; Askland, Kathleen D; Moore, Jason H
2012-07-28
It is increasingly clear that common human diseases have a complex genetic architecture characterized by both additive and nonadditive genetic effects. The goal of the present study was to determine whether patterns of both additive and nonadditive genetic associations aggregate in specific functional groups as defined by the Gene Ontology (GO). We first estimated all pairwise additive and nonadditive genetic effects using the multifactor dimensionality reduction (MDR) method that makes few assumptions about the underlying genetic model. Statistical significance was evaluated using permutation testing in two genome-wide association studies of ALS. The detection data consisted of 276 subjects with ALS and 271 healthy controls while the replication data consisted of 221 subjects with ALS and 211 healthy controls. Both studies included genotypes from approximately 550,000 single-nucleotide polymorphisms (SNPs). Each SNP was mapped to a gene if it was within 500 kb of the start or end. Each SNP was assigned a p-value based on its strongest joint effect with the other SNPs. We then used the Exploratory Visual Analysis (EVA) method and software to assign a p-value to each gene based on the overabundance of significant SNPs at the α = 0.05 level in the gene. We also used EVA to assign p-values to each GO group based on the overabundance of significant genes at the α = 0.05 level. A GO category was determined to replicate if that category was significant at the α = 0.05 level in both studies. We found two GO categories that replicated in both studies. The first, 'Regulation of Cellular Component Organization and Biogenesis', a GO Biological Process, had p-values of 0.010 and 0.014 in the detection and replication studies, respectively. The second, 'Actin Cytoskeleton', a GO Cellular Component, had p-values of 0.040 and 0.046 in the detection and replication studies, respectively. Pathway analysis of pairwise genetic associations in two GWAS of sporadic ALS revealed a set of genes involved in cellular component organization and actin cytoskeleton, more specifically, that were not reported by prior GWAS. However, prior biological studies have implicated actin cytoskeleton in ALS and other motor neuron diseases. This study supports the idea that pathway-level analysis of GWAS data may discover important associations not revealed using conventional one-SNP-at-a-time approaches.
NASA Astrophysics Data System (ADS)
Wei, Y.; Thomas, S.; Zhou, H.; Arcas, D.; Titov, V. V.
2017-12-01
The increasing potential tsunami hazards pose great challenges for infrastructures along the coastlines of the U.S. Pacific Northwest. Tsunami impact at a coastal site is usually assessed from deterministic scenarios based on 10,000 years of geological records in the Cascadia Subduction Zone (CSZ). Aside from these deterministic methods, the new ASCE 7-16 tsunami provisions provide engineering design criteria of tsunami loads on buildings based on a probabilistic approach. This work develops a site-specific model near Newport, OR using high-resolution grids, and compute tsunami inundation depth and velocities at the study site resulted from credible probabilistic and deterministic earthquake sources in the Cascadia Subduction Zone. Three Cascadia scenarios, two deterministic scenarios, XXL1 and L1, and a 2,500-yr probabilistic scenario compliant with the new ASCE 7-16 standard, are simulated using combination of a depth-averaged shallow water model for offshore propagation and a Boussinesq-type model for onshore inundation. We speculate on the methods and procedure to obtain the 2,500-year probabilistic scenario for Newport that is compliant with the ASCE 7-16 tsunami provisions. We provide details of model results, particularly the inundation depth and flow speed for a new building, which will also be designated as a tsunami vertical evacuation shelter, at Newport, Oregon. We show that the ASCE 7-16 consistent hazards are between those obtained from deterministic L1 and XXL1 scenarios, and the greatest impact on the building may come from later waves. As a further step, we utilize the inundation model results to numerically compute tracks of large vessels in the vicinity of the building site and estimate if these vessels will impact on the building site during the extreme XXL1 and ASCE 7-16 hazard-consistent scenarios. Two-step study is carried out first to study tracks of massless particles and then large vessels with assigned mass considering drag force, inertial force, ship grounding and mooring. The simulation results show that none of the large vessels will impact on the building site in all tested scenarios.
A Hough Transform Global Probabilistic Approach to Multiple-Subject Diffusion MRI Tractography
2010-04-01
distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT A global probabilistic fiber tracking approach based on the voting process provided by...umn.edu 2 ABSTRACT A global probabilistic fiber tracking approach based on the voting process provided by the Hough transform is introduced in...criteria for aligning curves and particularly tracts. In this work, we present a global probabilistic approach inspired by the voting procedure provided
New optimization model for routing and spectrum assignment with nodes insecurity
NASA Astrophysics Data System (ADS)
Xuan, Hejun; Wang, Yuping; Xu, Zhanqi; Hao, Shanshan; Wang, Xiaoli
2017-04-01
By adopting the orthogonal frequency division multiplexing technology, elastic optical networks can provide the flexible and variable bandwidth allocation to each connection request and get higher spectrum utilization. The routing and spectrum assignment problem in elastic optical network is a well-known NP-hard problem. In addition, information security has received worldwide attention. We combine these two problems to investigate the routing and spectrum assignment problem with the guaranteed security in elastic optical network, and establish a new optimization model to minimize the maximum index of the used frequency slots, which is used to determine an optimal routing and spectrum assignment schemes. To solve the model effectively, a hybrid genetic algorithm framework integrating a heuristic algorithm into a genetic algorithm is proposed. The heuristic algorithm is first used to sort the connection requests and then the genetic algorithm is designed to look for an optimal routing and spectrum assignment scheme. In the genetic algorithm, tailor-made crossover, mutation and local search operators are designed. Moreover, simulation experiments are conducted with three heuristic strategies, and the experimental results indicate that the effectiveness of the proposed model and algorithm framework.
1992-12-01
Dynamics and Free Energy Perturbation Methods." Reviews in Computational Chem- istry edited by Kenny B. Lipkowitz and Donald B. Boyd, chapter 8, 295-320...atomic motions during annealing, allows the search to probabilistically move in a locally non-optimal direction. The probability of doing so is...Network processors communicate via communication links. This type of communication is generally very slow relative to other processor activities
Cost-Effectiveness Analysis of Different Genetic Testing Strategies for Lynch Syndrome in Taiwan.
Chen, Ying-Erh; Kao, Sung-Shuo; Chung, Ren-Hua
2016-01-01
Patients with Lynch syndrome (LS) have a significantly increased risk of developing colorectal cancer (CRC) and other cancers. Genetic screening for LS among patients with newly diagnosed CRC aims to identify mutations in the disease-causing genes (i.e., the DNA mismatch repair genes) in the patients, to offer genetic testing for relatives of the patients with the mutations, and then to provide early prevention for the relatives with the mutations. Several genetic tests are available for LS, such as DNA sequencing for MMR genes and tumor testing using microsatellite instability and immunohistochemical analyses. Cost-effectiveness analyses of different genetic testing strategies for LS have been performed in several studies from different countries such as the US and Germany. However, a cost-effectiveness analysis for the testing has not yet been performed in Taiwan. In this study, we evaluated the cost-effectiveness of four genetic testing strategies for LS described in previous studies, while population-specific parameters, such as the mutation rates of the DNA mismatch repair genes and treatment costs for CRC in Taiwan, were used. The incremental cost-effectiveness ratios based on discounted life years gained due to genetic screening were calculated for the strategies relative to no screening and to the previous strategy. Using the World Health Organization standard, which was defined based on Taiwan's Gross Domestic Product per capita, the strategy based on immunohistochemistry as a genetic test followed by BRAF mutation testing was considered to be highly cost-effective relative to no screening. Our probabilistic sensitivity analysis results also suggest that the strategy has a probability of 0.939 of being cost-effective relative to no screening based on the commonly used threshold of $50,000 to determine cost-effectiveness. To the best of our knowledge, this is the first cost-effectiveness analysis for evaluating different genetic testing strategies for LS in Taiwan. The results will be informative for the government when considering offering screening for LS in patients newly diagnosed with CRC.
iNJclust: Iterative Neighbor-Joining Tree Clustering Framework for Inferring Population Structure.
Limpiti, Tulaya; Amornbunchornvej, Chainarong; Intarapanich, Apichart; Assawamakin, Anunchai; Tongsima, Sissades
2014-01-01
Understanding genetic differences among populations is one of the most important issues in population genetics. Genetic variations, e.g., single nucleotide polymorphisms, are used to characterize commonality and difference of individuals from various populations. This paper presents an efficient graph-based clustering framework which operates iteratively on the Neighbor-Joining (NJ) tree called the iNJclust algorithm. The framework uses well-known genetic measurements, namely the allele-sharing distance, the neighbor-joining tree, and the fixation index. The behavior of the fixation index is utilized in the algorithm's stopping criterion. The algorithm provides an estimated number of populations, individual assignments, and relationships between populations as outputs. The clustering result is reported in the form of a binary tree, whose terminal nodes represent the final inferred populations and the tree structure preserves the genetic relationships among them. The clustering performance and the robustness of the proposed algorithm are tested extensively using simulated and real data sets from bovine, sheep, and human populations. The result indicates that the number of populations within each data set is reasonably estimated, the individual assignment is robust, and the structure of the inferred population tree corresponds to the intrinsic relationships among populations within the data.
NASA Astrophysics Data System (ADS)
Yilmaz, Diba; Tekkaya, Ceren; Sungur, Semra
2011-03-01
The present study examined the comparative effects of a prediction/discussion-based learning cycle, conceptual change text (CCT), and traditional instructions on students' understanding of genetics concepts. A quasi-experimental research design of the pre-test-post-test non-equivalent control group was adopted. The three intact classes, taught by the same science teacher, were randomly assigned as prediction/discussion-based learning cycle class (N = 30), CCT class (N = 25), and traditional class (N = 26). Participants completed the genetics concept test as pre-test, post-test, and delayed post-test to examine the effects of instructional strategies on their genetics understanding and retention. While the dependent variable of this study was students' understanding of genetics, the independent variables were time (Time 1, Time 2, and Time 3) and mode of instruction. The mixed between-within subjects analysis of variance revealed that students in both prediction/discussion-based learning cycle and CCT groups understood the genetics concepts and retained their knowledge significantly better than students in the traditional instruction group.
Cannon, Tyrone D; Thompson, Paul M; van Erp, Theo G M; Huttunen, Matti; Lonnqvist, Jouko; Kaprio, Jaakko; Toga, Arthur W
2006-01-01
There is an urgent need to decipher the complex nature of genotype-phenotype relationships within the multiple dimensions of brain structure and function that are compromised in neuropsychiatric syndromes such as schizophrenia. Doing so requires sophisticated methodologies to represent population variability in neural traits and to probe their heritable and molecular genetic bases. We have recently developed and applied computational algorithms to map the heritability of, as well as genetic linkage and association to, neural features encoded using brain imaging in the context of three-dimensional (3D), populationbased, statistical brain atlases. One set of algorithms builds on our prior work using classical twin study methods to estimate heritability by fitting biometrical models for additive genetic, unique, and common environmental influences. Another set of algorithms performs regression-based (Haseman-Elston) identical-bydescent linkage analysis and genetic association analysis of DNA polymorphisms in relation to neural traits of interest in the same 3D population-based brain atlas format. We demonstrate these approaches using samples of healthy monozygotic (MZ) and dizygotic (DZ) twin pairs, as well as MZ and DZ twin pairs discordant for schizophrenia, but the methods can be generalized to other classes of relatives and to other diseases. The results confirm prior evidence of genetic influences on gray matter density in frontal brain regions. They also provide converging evidence that the chromosome 1q42 region is relevant to schizophrenia by demonstrating linkage and association of markers of the Transelin-Associated-Factor-X and Disrupted-In- Schizophrenia-1 genes with prefrontal cortical gray matter deficits in twins discordant for schizophrenia.
Poynton, Clare B; Chen, Kevin T; Chonde, Daniel B; Izquierdo-Garcia, David; Gollub, Randy L; Gerstner, Elizabeth R; Batchelor, Tracy T; Catana, Ciprian
2014-01-01
We present a new MRI-based attenuation correction (AC) approach for integrated PET/MRI systems that combines both segmentation- and atlas-based methods by incorporating dual-echo ultra-short echo-time (DUTE) and T1-weighted (T1w) MRI data and a probabilistic atlas. Segmented atlases were constructed from CT training data using a leave-one-out framework and combined with T1w, DUTE, and CT data to train a classifier that computes the probability of air/soft tissue/bone at each voxel. This classifier was applied to segment the MRI of the subject of interest and attenuation maps (μ-maps) were generated by assigning specific linear attenuation coefficients (LACs) to each tissue class. The μ-maps generated with this "Atlas-T1w-DUTE" approach were compared to those obtained from DUTE data using a previously proposed method. For validation of the segmentation results, segmented CT μ-maps were considered to the "silver standard"; the segmentation accuracy was assessed qualitatively and quantitatively through calculation of the Dice similarity coefficient (DSC). Relative change (RC) maps between the CT and MRI-based attenuation corrected PET volumes were also calculated for a global voxel-wise assessment of the reconstruction results. The μ-maps obtained using the Atlas-T1w-DUTE classifier agreed well with those derived from CT; the mean DSCs for the Atlas-T1w-DUTE-based μ-maps across all subjects were higher than those for DUTE-based μ-maps; the atlas-based μ-maps also showed a lower percentage of misclassified voxels across all subjects. RC maps from the atlas-based technique also demonstrated improvement in the PET data compared to the DUTE method, both globally as well as regionally.
An Improved SoC Test Scheduling Method Based on Simulated Annealing Algorithm
NASA Astrophysics Data System (ADS)
Zheng, Jingjing; Shen, Zhihang; Gao, Huaien; Chen, Bianna; Zheng, Weida; Xiong, Xiaoming
2017-02-01
In this paper, we propose an improved SoC test scheduling method based on simulated annealing algorithm (SA). It is our first to disorganize IP core assignment for each TAM to produce a new solution for SA, allocate TAM width for each TAM using greedy algorithm and calculate corresponding testing time. And accepting the core assignment according to the principle of simulated annealing algorithm and finally attain the optimum solution. Simultaneously, we run the test scheduling experiment with the international reference circuits provided by International Test Conference 2002(ITC’02) and the result shows that our algorithm is superior to the conventional integer linear programming algorithm (ILP), simulated annealing algorithm (SA) and genetic algorithm(GA). When TAM width reaches to 48,56 and 64, the testing time based on our algorithm is lesser than the classic methods and the optimization rates are 30.74%, 3.32%, 16.13% respectively. Moreover, the testing time based on our algorithm is very close to that of improved genetic algorithm (IGA), which is state-of-the-art at present.
Fragment assignment in the cloud with eXpress-D
2013-01-01
Background Probabilistic assignment of ambiguously mapped fragments produced by high-throughput sequencing experiments has been demonstrated to greatly improve accuracy in the analysis of RNA-Seq and ChIP-Seq, and is an essential step in many other sequence census experiments. A maximum likelihood method using the expectation-maximization (EM) algorithm for optimization is commonly used to solve this problem. However, batch EM-based approaches do not scale well with the size of sequencing datasets, which have been increasing dramatically over the past few years. Thus, current approaches to fragment assignment rely on heuristics or approximations for tractability. Results We present an implementation of a distributed EM solution to the fragment assignment problem using Spark, a data analytics framework that can scale by leveraging compute clusters within datacenters–“the cloud”. We demonstrate that our implementation easily scales to billions of sequenced fragments, while providing the exact maximum likelihood assignment of ambiguous fragments. The accuracy of the method is shown to be an improvement over the most widely used tools available and can be run in a constant amount of time when cluster resources are scaled linearly with the amount of input data. Conclusions The cloud offers one solution for the difficulties faced in the analysis of massive high-thoughput sequencing data, which continue to grow rapidly. Researchers in bioinformatics must follow developments in distributed systems–such as new frameworks like Spark–for ways to port existing methods to the cloud and help them scale to the datasets of the future. Our software, eXpress-D, is freely available at: http://github.com/adarob/express-d. PMID:24314033
Athrey, Giridhar; Lance, Richard F.; Leberg, Paul L.
2015-01-01
Dispersal is a key demographic process, ultimately responsible for genetic connectivity among populations. Despite its importance, quantifying dispersal within and between populations has proven difficult for many taxa. Even in passerines, which are among the most intensely studied, individual movement and its relation to gene flow remains poorly understood. In this study we used two parallel genetic approaches to quantify natal dispersal distances in a Neotropical migratory passerine, the black-capped vireo. First, we employed a strategy of sampling evenly across the landscape coupled with parentage assignment to map the genealogical relationships of individuals across the landscape, and estimate dispersal distances; next, we calculated Wright’s neighborhood size to estimate gene dispersal distances. We found that a high percentage of captured individuals were assigned at short distances within the natal population, and males were assigned to the natal population more often than females, confirming sex-biased dispersal. Parentage-based dispersal estimates averaged 2400m, whereas gene dispersal estimates indicated dispersal distances ranging from 1600–4200 m. Our study was successful in quantifying natal dispersal distances, linking individual movement to gene dispersal distances, while also providing a detailed look into the dispersal biology of Neotropical passerines. The high-resolution information was obtained with much reduced effort (sampling only 20% of breeding population) compared to mark-resight approaches, demonstrating the potential applicability of parentage-based approaches for quantifying dispersal in other vagile passerine species. PMID:26461257
BootGraph: probabilistic fiber tractography using bootstrap algorithms and graph theory.
Vorburger, Robert S; Reischauer, Carolin; Boesiger, Peter
2013-02-01
Bootstrap methods have recently been introduced to diffusion-weighted magnetic resonance imaging to estimate the measurement uncertainty of ensuing diffusion parameters directly from the acquired data without the necessity to assume a noise model. These methods have been previously combined with deterministic streamline tractography algorithms to allow for the assessment of connection probabilities in the human brain. Thereby, the local noise induced disturbance in the diffusion data is accumulated additively due to the incremental progression of streamline tractography algorithms. Graph based approaches have been proposed to overcome this drawback of streamline techniques. For this reason, the bootstrap method is in the present work incorporated into a graph setup to derive a new probabilistic fiber tractography method, called BootGraph. The acquired data set is thereby converted into a weighted, undirected graph by defining a vertex in each voxel and edges between adjacent vertices. By means of the cone of uncertainty, which is derived using the wild bootstrap, a weight is thereafter assigned to each edge. Two path finding algorithms are subsequently applied to derive connection probabilities. While the first algorithm is based on the shortest path approach, the second algorithm takes all existing paths between two vertices into consideration. Tracking results are compared to an established algorithm based on the bootstrap method in combination with streamline fiber tractography and to another graph based algorithm. The BootGraph shows a very good performance in crossing situations with respect to false negatives and permits incorporating additional constraints, such as a curvature threshold. By inheriting the advantages of the bootstrap method and graph theory, the BootGraph method provides a computationally efficient and flexible probabilistic tractography setup to compute connection probability maps and virtual fiber pathways without the drawbacks of streamline tractography algorithms or the assumption of a noise distribution. Moreover, the BootGraph can be applied to common DTI data sets without further modifications and shows a high repeatability. Thus, it is very well suited for longitudinal studies and meta-studies based on DTI. Copyright © 2012 Elsevier Inc. All rights reserved.
The Genetic Privacy Act and commentary
DOE Office of Scientific and Technical Information (OSTI.GOV)
Annas, G.J.; Glantz, L.H.; Roche, P.A.
1995-02-28
The Genetic Privacy Act is a proposal for federal legislation. The Act is based on the premise that genetic information is different from other types of personal information in ways that require special protection. The DNA molecule holds an extensive amount of currently indecipherable information. The major goal of the Human Genome Project is to decipher this code so that the information it contains is accessible. The privacy question is, accessible to whom? The highly personal nature of the information contained in DNA can be illustrated by thinking of DNA as containing an individual`s {open_quotes}future diary.{close_quotes} A diary is perhapsmore » the most personal and private document a person can create. It contains a person`s innermost thoughts and perceptions, and is usually hidden and locked to assure its secrecy. Diaries describe the past. The information in one`s genetic code can be thought of as a coded probabilistic future diary because it describes an important part of a unique and personal future. This document presents an introduction to the proposal for federal legislation `the Genetic Privacy Act`; a copy of the proposed act; and comment.« less
Probabilistic Learning in Junior High School: Investigation of Student Probabilistic Thinking Levels
NASA Astrophysics Data System (ADS)
Kurniasih, R.; Sujadi, I.
2017-09-01
This paper was to investigate level on students’ probabilistic thinking. Probabilistic thinking level is level of probabilistic thinking. Probabilistic thinking is thinking about probabilistic or uncertainty matter in probability material. The research’s subject was students in grade 8th Junior High School students. The main instrument is a researcher and a supporting instrument is probabilistic thinking skills test and interview guidelines. Data was analyzed using triangulation method. The results showed that the level of students probabilistic thinking before obtaining a teaching opportunity at the level of subjective and transitional. After the students’ learning level probabilistic thinking is changing. Based on the results of research there are some students who have in 8th grade level probabilistic thinking numerically highest of levels. Level of students’ probabilistic thinking can be used as a reference to make a learning material and strategy.
Centralized Multi-Sensor Square Root Cubature Joint Probabilistic Data Association
Liu, Jun; Li, Gang; Qi, Lin; Li, Yaowen; He, You
2017-01-01
This paper focuses on the tracking problem of multiple targets with multiple sensors in a nonlinear cluttered environment. To avoid Jacobian matrix computation and scaling parameter adjustment, improve numerical stability, and acquire more accurate estimated results for centralized nonlinear tracking, a novel centralized multi-sensor square root cubature joint probabilistic data association algorithm (CMSCJPDA) is proposed. Firstly, the multi-sensor tracking problem is decomposed into several single-sensor multi-target tracking problems, which are sequentially processed during the estimation. Then, in each sensor, the assignment of its measurements to target tracks is accomplished on the basis of joint probabilistic data association (JPDA), and a weighted probability fusion method with square root version of a cubature Kalman filter (SRCKF) is utilized to estimate the targets’ state. With the measurements in all sensors processed CMSCJPDA is derived and the global estimated state is achieved. Experimental results show that CMSCJPDA is superior to the state-of-the-art algorithms in the aspects of tracking accuracy, numerical stability, and computational cost, which provides a new idea to solve multi-sensor tracking problems. PMID:29113085
Centralized Multi-Sensor Square Root Cubature Joint Probabilistic Data Association.
Liu, Yu; Liu, Jun; Li, Gang; Qi, Lin; Li, Yaowen; He, You
2017-11-05
This paper focuses on the tracking problem of multiple targets with multiple sensors in a nonlinear cluttered environment. To avoid Jacobian matrix computation and scaling parameter adjustment, improve numerical stability, and acquire more accurate estimated results for centralized nonlinear tracking, a novel centralized multi-sensor square root cubature joint probabilistic data association algorithm (CMSCJPDA) is proposed. Firstly, the multi-sensor tracking problem is decomposed into several single-sensor multi-target tracking problems, which are sequentially processed during the estimation. Then, in each sensor, the assignment of its measurements to target tracks is accomplished on the basis of joint probabilistic data association (JPDA), and a weighted probability fusion method with square root version of a cubature Kalman filter (SRCKF) is utilized to estimate the targets' state. With the measurements in all sensors processed CMSCJPDA is derived and the global estimated state is achieved. Experimental results show that CMSCJPDA is superior to the state-of-the-art algorithms in the aspects of tracking accuracy, numerical stability, and computational cost, which provides a new idea to solve multi-sensor tracking problems.
NASA Astrophysics Data System (ADS)
Mansoor, Awais; Casas, Rafael; Linguraru, Marius G.
2016-03-01
Pleural effusion is an abnormal collection of fluid within the pleural cavity. Excessive accumulation of pleural fluid is an important bio-marker for various illnesses, including congestive heart failure, pneumonia, metastatic cancer, and pulmonary embolism. Quantification of pleural effusion can be indicative of the progression of disease as well as the effectiveness of any treatment being administered. Quantification, however, is challenging due to unpredictable amounts and density of fluid, complex topology of the pleural cavity, and the similarity in texture and intensity of pleural fluid to the surrounding tissues in computed tomography (CT) scans. Herein, we present an automated method for the segmentation of pleural effusion in CT scans based on spatial context information. The method consists of two stages: first, a probabilistic pleural effusion map is created using multi-atlas segmentation. The probabilistic map assigns a priori probabilities to the presence of pleural uid at every location in the CT scan. Second, a statistical pattern classification approach is designed to annotate pleural regions using local descriptors based on a priori probabilities, geometrical, and spatial features. Thirty seven CT scans from a diverse patient population containing confirmed cases of minimal to severe amounts of pleural effusion were used to validate the proposed segmentation method. An average Dice coefficient of 0.82685 and Hausdorff distance of 16.2155 mm was obtained.
Faith, Daniel P
2008-12-01
New species conservation strategies, including the EDGE of Existence (EDGE) program, have expanded threatened species assessments by integrating information about species' phylogenetic distinctiveness. Distinctiveness has been measured through simple scores that assign shared credit among species for evolutionary heritage represented by the deeper phylogenetic branches. A species with a high score combined with a high extinction probability receives high priority for conservation efforts. Simple hypothetical scenarios for phylogenetic trees and extinction probabilities demonstrate how such scoring approaches can provide inefficient priorities for conservation. An existing probabilistic framework derived from the phylogenetic diversity measure (PD) properly captures the idea of shared responsibility for the persistence of evolutionary history. It avoids static scores, takes into account the status of close relatives through their extinction probabilities, and allows for the necessary updating of priorities in light of changes in species threat status. A hypothetical phylogenetic tree illustrates how changes in extinction probabilities of one or more species translate into changes in expected PD. The probabilistic PD framework provided a range of strategies that moved beyond expected PD to better consider worst-case PD losses. In another example, risk aversion gave higher priority to a conservation program that provided a smaller, but less risky, gain in expected PD. The EDGE program could continue to promote a list of top species conservation priorities through application of probabilistic PD and simple estimates of current extinction probability. The list might be a dynamic one, with all the priority scores updated as extinction probabilities change. Results of recent studies suggest that estimation of extinction probabilities derived from the red list criteria linked to changes in species range sizes may provide estimated probabilities for many different species. Probabilistic PD provides a framework for single-species assessment that is well-integrated with a broader measurement of impacts on PD owing to climate change and other factors.
Genetic and Dynamic Analysis of Murine Peak Bone Density
1998-10-01
number of diseases including Waardenburg syndrome (18, 19) and retinitis pigmentosa (20). Substantial computerized genetic data bases maintained at...411, 1992. 14. Foy, C., Newton, V., Wellesley, D., Harris, R., and Read, A. P. Assignment of the locus for Waardenburg syndrome type 1 to human...Balling, R., Gruss, P., and Strachan, T. Waardenburg’s syndrome patients have mutations in the human homologue of the Pax-3 paired box gene, Nature
NASA Astrophysics Data System (ADS)
Ring, Christoph; Pollinger, Felix; Kaspar-Ott, Irena; Hertig, Elke; Jacobeit, Jucundus; Paeth, Heiko
2018-03-01
A major task of climate science are reliable projections of climate change for the future. To enable more solid statements and to decrease the range of uncertainty, global general circulation models and regional climate models are evaluated based on a 2 × 2 contingency table approach to generate model weights. These weights are compared among different methodologies and their impact on probabilistic projections of temperature and precipitation changes is investigated. Simulated seasonal precipitation and temperature for both 50-year trends and climatological means are assessed at two spatial scales: in seven study regions around the globe and in eight sub-regions of the Mediterranean area. Overall, 24 models of phase 3 and 38 models of phase 5 of the Coupled Model Intercomparison Project altogether 159 transient simulations of precipitation and 119 of temperature from four emissions scenarios are evaluated against the ERA-20C reanalysis over the 20th century. The results show high conformity with previous model evaluation studies. The metrics reveal that mean of precipitation and both temperature mean and trend agree well with the reference dataset and indicate improvement for the more recent ensemble mean, especially for temperature. The method is highly transferrable to a variety of further applications in climate science. Overall, there are regional differences of simulation quality, however, these are less pronounced than those between the results for 50-year mean and trend. The trend results are suitable for assigning weighting factors to climate models. Yet, the implications for probabilistic climate projections is strictly dependent on the region and season.
2016-01-01
Abstract Background Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. New information In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand. Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset. Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach. PMID:27932919
Holovachov, Oleksandr
2016-01-01
Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand.Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset.Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach.
Risk analysis of Safety Service Patrol (SSP) systems in Virginia.
Dickey, Brett D; Santos, Joost R
2011-12-01
The transportation infrastructure is a vital backbone of any regional economy as it supports workforce mobility, tourism, and a host of socioeconomic activities. In this article, we specifically examine the incident management function of the transportation infrastructure. In many metropolitan regions, incident management is handled primarily by safety service patrols (SSPs), which monitor and resolve roadway incidents. In Virginia, SSP allocation across highway networks is based typically on average vehicle speeds and incident volumes. This article implements a probabilistic network model that partitions "business as usual" traffic flow with extreme-event scenarios. Results of simulated network scenarios reveal that flexible SSP configurations can improve incident resolution times relative to predetermined SSP assignments. © 2011 Society for Risk Analysis.
Richardson, Keith; Denny, Richard; Hughes, Chris; Skilling, John; Sikora, Jacek; Dadlez, Michał; Manteca, Angel; Jung, Hye Ryung; Jensen, Ole Nørregaard; Redeker, Virginie; Melki, Ronald; Langridge, James I.; Vissers, Johannes P.C.
2013-01-01
A probability-based quantification framework is presented for the calculation of relative peptide and protein abundance in label-free and label-dependent LC-MS proteomics data. The results are accompanied by credible intervals and regulation probabilities. The algorithm takes into account data uncertainties via Poisson statistics modified by a noise contribution that is determined automatically during an initial normalization stage. Protein quantification relies on assignments of component peptides to the acquired data. These assignments are generally of variable reliability and may not be present across all of the experiments comprising an analysis. It is also possible for a peptide to be identified to more than one protein in a given mixture. For these reasons the algorithm accepts a prior probability of peptide assignment for each intensity measurement. The model is constructed in such a way that outliers of any type can be automatically reweighted. Two discrete normalization methods can be employed. The first method is based on a user-defined subset of peptides, while the second method relies on the presence of a dominant background of endogenous peptides for which the concentration is assumed to be unaffected. Normalization is performed using the same computational and statistical procedures employed by the main quantification algorithm. The performance of the algorithm will be illustrated on example data sets, and its utility demonstrated for typical proteomics applications. The quantification algorithm supports relative protein quantification based on precursor and product ion intensities acquired by means of data-dependent methods, originating from all common isotopically-labeled approaches, as well as label-free ion intensity-based data-independent methods. PMID:22871168
Genomic Model with Correlation Between Additive and Dominance Effects.
Xiang, Tao; Christensen, Ole Fredslund; Vitezica, Zulma Gladis; Legarra, Andres
2018-05-09
Dominance genetic effects are rarely included in pedigree-based genetic evaluation. With the availability of single nucleotide polymorphism markers and the development of genomic evaluation, estimates of dominance genetic effects have become feasible using genomic best linear unbiased prediction (GBLUP). Usually, studies involving additive and dominance genetic effects ignore possible relationships between them. It has been often suggested that the magnitude of functional additive and dominance effects at the quantitative trait loci are related, but there is no existing GBLUP-like approach accounting for such correlation. Wellmann and Bennewitz showed two ways of considering directional relationships between additive and dominance effects, which they estimated in a Bayesian framework. However, these relationships cannot be fitted at the level of individuals instead of loci in a mixed model and are not compatible with standard animal or plant breeding software. This comes from a fundamental ambiguity in assigning the reference allele at a given locus. We show that, if there has been selection, assigning the most frequent as the reference allele orients the correlation between functional additive and dominance effects. As a consequence, the most frequent reference allele is expected to have a positive value. We also demonstrate that selection creates negative covariance between genotypic additive and dominance genetic values. For parameter estimation, it is possible to use a combined additive and dominance relationship matrix computed from marker genotypes, and to use standard restricted maximum likelihood (REML) algorithms based on an equivalent model. Through a simulation study, we show that such correlations can easily be estimated by mixed model software and accuracy of prediction for genetic values is slightly improved if such correlations are used in GBLUP. However, a model assuming uncorrelated effects and fitting orthogonal breeding values and dominant deviations performed similarly for prediction. Copyright © 2018, Genetics.
Bell-Curve Based Evolutionary Optimization Algorithm
NASA Technical Reports Server (NTRS)
Sobieszczanski-Sobieski, J.; Laba, K.; Kincaid, R.
1998-01-01
The paper presents an optimization algorithm that falls in the category of genetic, or evolutionary algorithms. While the bit exchange is the basis of most of the Genetic Algorithms (GA) in research and applications in America, some alternatives, also in the category of evolutionary algorithms, but use a direct, geometrical approach have gained popularity in Europe and Asia. The Bell-Curve Based Evolutionary Algorithm (BCB) is in this alternative category and is distinguished by the use of a combination of n-dimensional geometry and the normal distribution, the bell-curve, in the generation of the offspring. The tool for creating a child is a geometrical construct comprising a line connecting two parents and a weighted point on that line. The point that defines the child deviates from the weighted point in two directions: parallel and orthogonal to the connecting line, the deviation in each direction obeying a probabilistic distribution. Tests showed satisfactory performance of BCB. The principal advantage of BCB is its controllability via the normal distribution parameters and the geometrical construct variables.
kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity.
Murray, Kevin D; Webers, Christfried; Ong, Cheng Soon; Borevitz, Justin; Warthmann, Norman
2017-09-01
Modern genomics techniques generate overwhelming quantities of data. Extracting population genetic variation demands computationally efficient methods to determine genetic relatedness between individuals (or "samples") in an unbiased manner, preferably de novo. Rapid estimation of genetic relatedness directly from sequencing data has the potential to overcome reference genome bias, and to verify that individuals belong to the correct genetic lineage before conclusions are drawn using mislabelled, or misidentified samples. We present the k-mer Weighted Inner Product (kWIP), an assembly-, and alignment-free estimator of genetic similarity. kWIP combines a probabilistic data structure with a novel metric, the weighted inner product (WIP), to efficiently calculate pairwise similarity between sequencing runs from their k-mer counts. It produces a distance matrix, which can then be further analysed and visualised. Our method does not require prior knowledge of the underlying genomes and applications include establishing sample identity and detecting mix-up, non-obvious genomic variation, and population structure. We show that kWIP can reconstruct the true relatedness between samples from simulated populations. By re-analysing several published datasets we show that our results are consistent with marker-based analyses. kWIP is written in C++, licensed under the GNU GPL, and is available from https://github.com/kdmurray91/kwip.
Moran, Paul; Bromaghin, Jeffrey F.; Masuda, Michele
2014-01-01
Many applications in ecological genetics involve sampling individuals from a mixture of multiple biological populations and subsequently associating those individuals with the populations from which they arose. Analytical methods that assign individuals to their putative population of origin have utility in both basic and applied research, providing information about population-specific life history and habitat use, ecotoxins, pathogen and parasite loads, and many other non-genetic ecological, or phenotypic traits. Although the question is initially directed at the origin of individuals, in most cases the ultimate desire is to investigate the distribution of some trait among populations. Current practice is to assign individuals to a population of origin and study properties of the trait among individuals within population strata as if they constituted independent samples. It seemed that approach might bias population-specific trait inference. In this study we made trait inferences directly through modeling, bypassing individual assignment. We extended a Bayesian model for population mixture analysis to incorporate parameters for the phenotypic trait and compared its performance to that of individual assignment with a minimum probability threshold for assignment. The Bayesian mixture model outperformed individual assignment under some trait inference conditions. However, by discarding individuals whose origins are most uncertain, the individual assignment method provided a less complex analytical technique whose performance may be adequate for some common trait inference problems. Our results provide specific guidance for method selection under various genetic relationships among populations with different trait distributions.
Moran, Paul; Bromaghin, Jeffrey F.; Masuda, Michele
2014-01-01
Many applications in ecological genetics involve sampling individuals from a mixture of multiple biological populations and subsequently associating those individuals with the populations from which they arose. Analytical methods that assign individuals to their putative population of origin have utility in both basic and applied research, providing information about population-specific life history and habitat use, ecotoxins, pathogen and parasite loads, and many other non-genetic ecological, or phenotypic traits. Although the question is initially directed at the origin of individuals, in most cases the ultimate desire is to investigate the distribution of some trait among populations. Current practice is to assign individuals to a population of origin and study properties of the trait among individuals within population strata as if they constituted independent samples. It seemed that approach might bias population-specific trait inference. In this study we made trait inferences directly through modeling, bypassing individual assignment. We extended a Bayesian model for population mixture analysis to incorporate parameters for the phenotypic trait and compared its performance to that of individual assignment with a minimum probability threshold for assignment. The Bayesian mixture model outperformed individual assignment under some trait inference conditions. However, by discarding individuals whose origins are most uncertain, the individual assignment method provided a less complex analytical technique whose performance may be adequate for some common trait inference problems. Our results provide specific guidance for method selection under various genetic relationships among populations with different trait distributions. PMID:24905464
NASA Astrophysics Data System (ADS)
Walz, Michael; Leckebusch, Gregor C.
2016-04-01
Extratropical wind storms pose one of the most dangerous and loss intensive natural hazards for Europe. However, due to only 50 years of high quality observational data, it is difficult to assess the statistical uncertainty of these sparse events just based on observations. Over the last decade seasonal ensemble forecasts have become indispensable in quantifying the uncertainty of weather prediction on seasonal timescales. In this study seasonal forecasts are used in a climatological context: By making use of the up to 51 ensemble members, a broad and physically consistent statistical base can be created. This base can then be used to assess the statistical uncertainty of extreme wind storm occurrence more accurately. In order to determine the statistical uncertainty of storms with different paths of progression, a probabilistic clustering approach using regression mixture models is used to objectively assign storm tracks (either based on core pressure or on extreme wind speeds) to different clusters. The advantage of this technique is that the entire lifetime of a storm is considered for the clustering algorithm. Quadratic curves are found to describe the storm tracks most accurately. Three main clusters (diagonal, horizontal or vertical progression of the storm track) can be identified, each of which have their own particulate features. Basic storm features like average velocity and duration are calculated and compared for each cluster. The main benefit of this clustering technique, however, is to evaluate if the clusters show different degrees of uncertainty, e.g. more (less) spread for tracks approaching Europe horizontally (diagonally). This statistical uncertainty is compared for different seasonal forecast products.
Computational approaches to protein inference in shotgun proteomics
2012-01-01
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programing and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area. PMID:23176300
Use of knowledge-sharing web-based portal in gross and microscopic anatomy.
Durosaro, Olayemi; Lachman, Nirusha; Pawlina, Wojciech
2008-12-01
Changes in worldwide healthcare delivery require review of current medical school curricula structure to develop learning outcomes that ensures mastery of knowledge and clinical competency. In the last 3 years, Mayo Medical School implemented outcomes-based curriculum to encompass new graduate outcomes. Standard courses were replaced by 6-week clinically-integrated didactic blocks separated by student-self selected academic enrichment activities. Gross and microscopic anatomy was integrated with radiology and genetics respectively. Laboratory components include virtual microscopy and anatomical dissection. Students assigned to teams utilise computer portals to share learning experiences. High-resolution computed tomographic (CT) scans of cadavers prior to dissection were made available for correlative learning between the cadaveric material and radiologic images. Students work in teams on assigned presentations that include histology, cell and molecular biology, genetics and genomic using the Nexus Portal, based on DrupalEd, to share their observations, reflections and dissection findings. New generation of medical students are clearly comfortable utilising web-based programmes that maximise their learning potential of conceptually difficult and labor intensive courses. Team-based learning approach emphasising the use of knowledge-sharing computer portals maximises opportunities for students to master their knowledge and improve cognitive skills to ensure clinical competency.
Molecular and morphologic data reveal multiple species in Peromyscus pectoralis
Bradley, Robert D.; Schmidly, David J.; Amman, Brian R.; Platt, Roy N.; Neumann, Kathy M.; Huynh, Howard M.; Muñiz-Martínez, Raúl; López-González, Celia; Ordóñez-Garza, Nicté
2015-01-01
DNA sequence and morphometric data were used to re-evaluate the taxonomy and systematics of Peromyscus pectoralis. Phylogenetic analyses (maximum likelihood and Bayesian inference) of DNA sequences from the mitochondrial cytochrome-b gene in 44 samples of P. pectoralis indicated 2 well-supported monophyletic clades. The 1st clade contained specimens from Texas historically assigned to P. p. laceianus; the 2nd was comprised of specimens previously referable to P. p. collinus, P. p. laceianus, and P. p. pectoralis obtained from northern and eastern Mexico. Levels of genetic variation (~7%) between these 2 clades indicated that the genetic divergence typically exceeded that reported for other species of Peromyscus. Samples of P. p. laceianus north and south of the Río Grande were not monophyletic. In addition, samples representing P. p. collinus and P. p. pectoralis formed 2 clades that differed genetically by 7.14%. Multivariate analyses of external and cranial measurements from 63 populations of P. pectoralis revealed 4 morpho-groups consistent with clades in the DNA sequence analysis: 1 from Texas and New Mexico assignable to P. p. laceianus; a 2nd from western and southern Mexico assignable to P. p. pectoralis; a 3rd from northern and central Mexico previously assigned to P. p. pectoralis but herein shown to represent an undescribed taxon; and a 4th from southeastern Mexico assignable to P. p. collinus. Based on the concordance of these results, populations from the United States are referred to as P. laceianus, whereas populations from Mexico are referred to as P. pectoralis (including some samples historically assigned to P. p. collinus, P. p. laceianus, and P. p. pectoralis). A new subspecies is described to represent populations south of the Río Grande in northern and central Mexico. Additional research is needed to discern if P. p. collinus warrants species recognition. PMID:26937045
Clarke, Shannon M.; Henry, Hannah M.; Dodds, Ken G.; Jowett, Timothy W. D.; Manley, Tim R.; Anderson, Rayna M.; McEwan, John C.
2014-01-01
Accurate pedigree information is critical to animal breeding systems to ensure the highest rate of genetic gain and management of inbreeding. The abundance of available genomic data, together with development of high throughput genotyping platforms, means that single nucleotide polymorphisms (SNPs) are now the DNA marker of choice for genomic selection studies. Furthermore the superior qualities of SNPs compared to microsatellite markers allows for standardization between laboratories; a property that is crucial for developing an international set of markers for traceability studies. The objective of this study was to develop a high throughput SNP assay for use in the New Zealand sheep industry that gives accurate pedigree assignment and will allow a reduction in breeder input over lambing. This required two phases of development- firstly, a method of extracting quality DNA from ear-punch tissue performed in a high throughput cost efficient manner and secondly a SNP assay that has the ability to assign paternity to progeny resulting from mob mating. A likelihood based approach to infer paternity was used where sires with the highest LOD score (log of the ratio of the likelihood given parentage to likelihood given non-parentage) are assigned. An 84 “parentage SNP panel” was developed that assigned, on average, 99% of progeny to a sire in a problem where there were 3,000 progeny from 120 mob mated sires that included numerous half sib sires. In only 6% of those cases was there another sire with at least a 0.02 probability of paternity. Furthermore dam information (either recorded, or by genotyping possible dams) was absent, highlighting the SNP test’s suitability for paternity testing. Utilization of this parentage SNP assay will allow implementation of progeny testing into large commercial farms where the improved accuracy of sire assignment and genetic evaluations will increase genetic gain in the sheep industry. PMID:24740141
Clarke, Shannon M; Henry, Hannah M; Dodds, Ken G; Jowett, Timothy W D; Manley, Tim R; Anderson, Rayna M; McEwan, John C
2014-01-01
Accurate pedigree information is critical to animal breeding systems to ensure the highest rate of genetic gain and management of inbreeding. The abundance of available genomic data, together with development of high throughput genotyping platforms, means that single nucleotide polymorphisms (SNPs) are now the DNA marker of choice for genomic selection studies. Furthermore the superior qualities of SNPs compared to microsatellite markers allows for standardization between laboratories; a property that is crucial for developing an international set of markers for traceability studies. The objective of this study was to develop a high throughput SNP assay for use in the New Zealand sheep industry that gives accurate pedigree assignment and will allow a reduction in breeder input over lambing. This required two phases of development--firstly, a method of extracting quality DNA from ear-punch tissue performed in a high throughput cost efficient manner and secondly a SNP assay that has the ability to assign paternity to progeny resulting from mob mating. A likelihood based approach to infer paternity was used where sires with the highest LOD score (log of the ratio of the likelihood given parentage to likelihood given non-parentage) are assigned. An 84 "parentage SNP panel" was developed that assigned, on average, 99% of progeny to a sire in a problem where there were 3,000 progeny from 120 mob mated sires that included numerous half sib sires. In only 6% of those cases was there another sire with at least a 0.02 probability of paternity. Furthermore dam information (either recorded, or by genotyping possible dams) was absent, highlighting the SNP test's suitability for paternity testing. Utilization of this parentage SNP assay will allow implementation of progeny testing into large commercial farms where the improved accuracy of sire assignment and genetic evaluations will increase genetic gain in the sheep industry.
Hunnicutt, Jacob N; Ulbricht, Christine M; Chrysanthopoulou, Stavroula A; Lapane, Kate L
2016-12-01
We systematically reviewed pharmacoepidemiologic and comparative effectiveness studies that use probabilistic bias analysis to quantify the effects of systematic error including confounding, misclassification, and selection bias on study results. We found articles published between 2010 and October 2015 through a citation search using Web of Science and Google Scholar and a keyword search using PubMed and Scopus. Eligibility of studies was assessed by one reviewer. Three reviewers independently abstracted data from eligible studies. Fifteen studies used probabilistic bias analysis and were eligible for data abstraction-nine simulated an unmeasured confounder and six simulated misclassification. The majority of studies simulating an unmeasured confounder did not specify the range of plausible estimates for the bias parameters. Studies simulating misclassification were in general clearer when reporting the plausible distribution of bias parameters. Regardless of the bias simulated, the probability distributions assigned to bias parameters, number of simulated iterations, sensitivity analyses, and diagnostics were not discussed in the majority of studies. Despite the prevalence and concern of bias in pharmacoepidemiologic and comparative effectiveness studies, probabilistic bias analysis to quantitatively model the effect of bias was not widely used. The quality of reporting and use of this technique varied and was often unclear. Further discussion and dissemination of the technique are warranted. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Chen, Ruikun; Hara, Takashi; Ohsawa, Ryo; Yoshioka, Yosuke
2017-01-01
Diversity analysis of rapeseed accessions preserved in the Japanese Genebank can provide valuable information for breeding programs. In this study, 582 accessions were genotyped with 30 SSR markers covering all 19 rapeseed chromosomes. These markers amplified 311 alleles (10.37 alleles per marker; range, 3–39). The genetic diversity of Japanese accessions was lower than that of overseas accessions. Analysis of molecular variance indicated significant genetic differentiation between Japanese and overseas accessions. Small but significant differences were found among geographical groups in Japan, and genetic differentiation tended to increase with geographical distance. STRUCTURE analysis indicated the presence of two main genetic clusters in the NARO rapeseed collection. With the membership probabilities threshold, 227 accessions mostly originating from overseas were assigned to one subgroup, and 276 accessions mostly originating from Japan were assigned to the other subgroup. The remaining 79 accessions are assigned to admixed group. The core collection constructed comprises 96 accessions of diverse origin. It represents the whole collection well and thus it may be useful for rapeseed genetic research and breeding programs. The core collection improves the efficiency of management, evaluation, and utilization of genetic resources. PMID:28744177
VanDeHey, Justin A.; Sloss, Brian L.; Peeters, Paul J.; Sutton, Trent M.
2009-01-01
Management of commercially exploited fish should be conducted at the stock level. If a mixed stock fishery exists, a comprehensive mixed stock analysis is required for stock-based management. The lake whitefish Coregonus clupeaformis comprises the primary commercial fishery across the Great Lakes. Recent research resolved that six genetic stocks of lake whitefish were present in Lake Michigan, and long-term tagging data indicate that Lake Michigan's lake whitefish commercial fishery is a mixed stock fishery. The objective of this research was to determine the usefulness of microsatellite data for conducting comprehensive mixed stock analyses of the Lake Michigan lake whitefish commercial fishery. We used the individual assignment method as implemented in the program ONCOR to determine the accuracy level at which microsatellite data can reliably identify component populations or stocks. Self-assignment of lake whitefish to their population and stock of origin ranged from > 96% to 100%. Evaluation of genetic stock discreteness indicated a moderately high degree of correct assignment (average = 75%); simulations indicated supplementing baseline data by ∼ 50 to 100 individuals could increase accuracy by up to 4.5%. Simulated mixed stock commercial harvests with known stock composition showed a high degree of correct proportional assignment between observed and predicted harvest values. These data suggest that a comprehensive mixed stock analysis of Lake Michigan's lake whitefish commercial fishery is viable and would provide valuable information for improving management.
Chao, Eunice; Krewski, Daniel
2008-12-01
This paper presents an exploratory evaluation of four functional components of a proposed risk-based classification scheme (RBCS) for crop-derived genetically modified (GM) foods in a concordance study. Two independent raters assigned concern levels to 20 reference GM foods using a rating form based on the proposed RBCS. The four components of evaluation were: (1) degree of concordance, (2) distribution across concern levels, (3) discriminating ability of the scheme, and (4) ease of use. At least one of the 20 reference foods was assigned to each of the possible concern levels, demonstrating the ability of the scheme to identify GM foods of different concern with respect to potential health risk. There was reasonably good concordance between the two raters for the three separate parts of the RBCS. The raters agreed that the criteria in the scheme were sufficiently clear in discriminating reference foods into different concern levels, and that with some experience, the scheme was reasonably easy to use. Specific issues and suggestions for improvements identified in the concordance study are discussed.
Probabilistic models of genetic variation in structured populations applied to global human studies.
Hao, Wei; Song, Minsun; Storey, John D
2016-03-01
Modern population genetics studies typically involve genome-wide genotyping of individuals from a diverse network of ancestries. An important problem is how to formulate and estimate probabilistic models of observed genotypes that account for complex population structure. The most prominent work on this problem has focused on estimating a model of admixture proportions of ancestral populations for each individual. Here, we instead focus on modeling variation of the genotypes without requiring a higher-level admixture interpretation. We formulate two general probabilistic models, and we propose computationally efficient algorithms to estimate them. First, we show how principal component analysis can be utilized to estimate a general model that includes the well-known Pritchard-Stephens-Donnelly admixture model as a special case. Noting some drawbacks of this approach, we introduce a new 'logistic factor analysis' framework that seeks to directly model the logit transformation of probabilities underlying observed genotypes in terms of latent variables that capture population structure. We demonstrate these advances on data from the Human Genome Diversity Panel and 1000 Genomes Project, where we are able to identify SNPs that are highly differentiated with respect to structure while making minimal modeling assumptions. A Bioconductor R package called lfa is available at http://www.bioconductor.org/packages/release/bioc/html/lfa.html jstorey@princeton.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Khan, F I; Abbasi, S A
2000-07-10
Fault tree analysis (FTA) is based on constructing a hypothetical tree of base events (initiating events) branching into numerous other sub-events, propagating the fault and eventually leading to the top event (accident). It has been a powerful technique used traditionally in identifying hazards in nuclear installations and power industries. As the systematic articulation of the fault tree is associated with assigning probabilities to each fault, the exercise is also sometimes called probabilistic risk assessment. But powerful as this technique is, it is also very cumbersome and costly, limiting its area of application. We have developed a new algorithm based on analytical simulation (named as AS-II), which makes the application of FTA simpler, quicker, and cheaper; thus opening up the possibility of its wider use in risk assessment in chemical process industries. Based on the methodology we have developed a computer-automated tool. The details are presented in this paper.
Plasticity in probabilistic reaction norms for maturation in a salmonid fish.
Morita, Kentaro; Tsuboi, Jun-ichi; Nagasawa, Toru
2009-10-23
The relationship between body size and the probability of maturing, often referred to as the probabilistic maturation reaction norm (PMRN), has been increasingly used to infer genetic variation in maturation schedule. Despite this trend, few studies have directly evaluated plasticity in the PMRN. A transplant experiment using white-spotted charr demonstrated that the PMRN for precocious males exhibited plasticity. A smaller threshold size at maturity occurred in charr inhabiting narrow streams where more refuges are probably available for small charr, which in turn might enhance the reproductive success of sneaker precocious males. Our findings suggested that plastic effects should clearly be included in investigations of variation in PMRNs.
Rafiei, Vahideh; Banihashemi, Ziaeddin; Bautista-Jalon, Laura S; Del Mar Jiménez-Gasco, Maria; Turgeon, B Gillian; Milgroom, Michael G
2018-06-01
Verticillium dahliae is a plant pathogenic fungus that reproduces asexually and its population structure is highly clonal. In the present study, 78 V. dahliae isolates from Iran were genotyped for mating type, single nucleotide polymorphisms (SNPs), and microsatellites to assign them to clonal lineages and to determine population genetic structure in Iran. The mating type of all isolates was MAT1-2. Based on neighbor-joining analysis and minimum spanning networks constructed from SNPs and microsatellite genotypes, respectively, all but four isolates were assigned to lineage 2B 824 ; four isolates were assigned to lineage 4B. The inferred coalescent genealogy of isolates in lineage 2B 824 showed a clear divergence into two clades that corresponded to geographic origin and host. Haplotypes of cotton and pistachio isolates sampled from central Iran were in one clade, and those of isolates from Prunus spp. sampled from northwestern Iran were in the other. The strong divergence in haplotypes between the two clades suggests that there were at least two separate introductions of lineage 2B 824 to different parts of Iran. Given the history of cotton and pistachio cultivation and Verticillium wilt in Iran, these results are consistent with the hypothesis that cotton was historically a likely source inoculum causing Verticillium wilt in pistachio.
Jakob, Sabine S.; Rödder, Dennis; Engler, Jan O.; Shaaf, Salar; Özkan, Hakan; Blattner, Frank R.; Kilian, Benjamin
2014-01-01
Studies of Hordeum vulgare subsp. spontaneum, the wild progenitor of cultivated barley, have mostly relied on materials collected decades ago and maintained since then ex situ in germplasm repositories. We analyzed spatial genetic variation in wild barley populations collected rather recently, exploring sequence variations at seven single-copy nuclear loci, and inferred the relationships among these populations and toward the genepool of the crop. The wild barley collection covers the whole natural distribution area from the Mediterranean to Middle Asia. In contrast to earlier studies, Bayesian assignment analyses revealed three population clusters, in the Levant, Turkey, and east of Turkey, respectively. Genetic diversity was exceptionally high in the Levant, while eastern populations were depleted of private alleles. Species distribution modeling based on climate parameters and extant occurrence points of the taxon inferred suitable habitat conditions during the ice-age, particularly in the Levant and Turkey. Together with the ecologically wide range of habitats, they might contribute to structured but long-term stable populations in this region and their high genetic diversity. For recently collected individuals, Bayesian assignment to geographic clusters was generally unambiguous, but materials from genebanks often showed accessions that were not placed according to their assumed geographic origin or showed traces of introgression from cultivated barley. We assign this to gene flow among accessions during ex situ maintenance. Evolutionary studies based on such materials might therefore result in wrong conclusions regarding the history of the species or the origin and mode of domestication of the crop, depending on the accessions included. PMID:24586028
Satellite-map position estimation for the Mars rover
NASA Technical Reports Server (NTRS)
Hayashi, Akira; Dean, Thomas
1989-01-01
A method for locating the Mars rover using an elevation map generated from satellite data is described. In exploring its environment, the rover is assumed to generate a local rover-centered elevation map that can be used to extract information about the relative position and orientation of landmarks corresponding to local maxima. These landmarks are integrated into a stochastic map which is then matched with the satellite map to obtain an estimate of the robot's current location. The landmarks are not explicitly represented in the satellite map. The results of the matching algorithm correspond to a probabilistic assessment of whether or not the robot is located within a given region of the satellite map. By assigning a probabilistic interpretation to the information stored in the satellite map, researchers are able to provide a precise characterization of the results computed by the matching algorithm.
Learning Bayesian Networks from Correlated Data
NASA Astrophysics Data System (ADS)
Bae, Harold; Monti, Stefano; Montano, Monty; Steinberg, Martin H.; Perls, Thomas T.; Sebastiani, Paola
2016-05-01
Bayesian networks are probabilistic models that represent complex distributions in a modular way and have become very popular in many fields. There are many methods to build Bayesian networks from a random sample of independent and identically distributed observations. However, many observational studies are designed using some form of clustered sampling that introduces correlations between observations within the same cluster and ignoring this correlation typically inflates the rate of false positive associations. We describe a novel parameterization of Bayesian networks that uses random effects to model the correlation within sample units and can be used for structure and parameter learning from correlated data without inflating the Type I error rate. We compare different learning metrics using simulations and illustrate the method in two real examples: an analysis of genetic and non-genetic factors associated with human longevity from a family-based study, and an example of risk factors for complications of sickle cell anemia from a longitudinal study with repeated measures.
De Kort, Hanne; Mergeay, Joachim; Jacquemyn, Hans; Honnay, Olivier
2016-01-01
Background and Aims Many invasive species severely threaten native biodiversity and ecosystem functioning. One of the most prominent questions in invasion genetics is how invasive populations can overcome genetic founder effects to establish stable populations after colonization of new habitats. High native genetic diversity and multiple introductions are expected to increase genetic diversity and adaptive potential in the invasive range. Our aim was to identify the European source populations of Frangula alnus (glossy buckthorn), an ornamental and highly invasive woody species that was deliberately introduced into North America at the end of the 18th century. A second aim of this study was to assess the adaptive potential as an explanation for the invasion success of this species. Methods Using a set of annotated single-nucleotide polymorphisms (SNPs) that were assigned a putative function based on sequence comparison with model species, a total of 38 native European and 21 invasive North American populations were subjected to distance-based structure and assignment analyses combined with population genomic tools. Genetic diversity at SNPs with ecologically relevant functions was considered as a proxy for adaptive potential. Key Results Patterns of invasion coincided with early modern transatlantic trading routes. Multiple introductions through transatlantic trade from a limited number of European port regions to American urban areas led to the establishment of bridgehead populations with high allelic richness and expected heterozygosity, allowing continuous secondary migration to natural areas. Conclusions Targeted eradication of the urban populations, where the highest genetic diversity and adaptive potential were observed, offers a promising strategy to arrest further invasion of native American prairies and forests. PMID:27539599
Microbial species delineation using whole genome sequences
Varghese, Neha J.; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T.; Mavrommatis, Kostas; Kyrpides, Nikos C.; Pati, Amrita
2015-01-01
Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. PMID:26150420
Probabilistic liquefaction triggering based on the cone penetration test
Moss, R.E.S.; Seed, R.B.; Kayen, R.E.; Stewart, J.P.; Tokimatsu, K.
2005-01-01
Performance-based earthquake engineering requires a probabilistic treatment of potential failure modes in order to accurately quantify the overall stability of the system. This paper is a summary of the application portions of the probabilistic liquefaction triggering correlations proposed recently proposed by Moss and co-workers. To enable probabilistic treatment of liquefaction triggering, the variables comprising the seismic load and the liquefaction resistance were treated as inherently uncertain. Supporting data from an extensive Cone Penetration Test (CPT)-based liquefaction case history database were used to develop a probabilistic correlation. The methods used to measure the uncertainty of the load and resistance variables, how the interactions of these variables were treated using Bayesian updating, and how reliability analysis was applied to produce curves of equal probability of liquefaction are presented. The normalization for effective overburden stress, the magnitude correlated duration weighting factor, and the non-linear shear mass participation factor used are also discussed.
Undiscovered porphyry copper resources in the Urals—A probabilistic mineral resource assessment
Hammarstrom, Jane M.; Mihalasky, Mark J.; Ludington, Stephen; Phillips, Jeffrey; Berger, Byron R.; Denning, Paul; Dicken, Connie; Mars, John; Zientek, Michael L.; Herrington, Richard J.; Seltmann, Reimar
2017-01-01
A probabilistic mineral resource assessment of metal resources in undiscovered porphyry copper deposits of the Ural Mountains in Russia and Kazakhstan was done using a quantitative form of mineral resource assessment. Permissive tracts were delineated on the basis of mapped and inferred subsurface distributions of igneous rocks assigned to tectonic zones that include magmatic arcs where the occurrence of porphyry copper deposits within 1 km of the Earth's surface are possible. These permissive tracts outline four north-south trending volcano-plutonic belts in major structural zones of the Urals. From west to east, these include permissive lithologies for porphyry copper deposits associated with Paleozoic subduction-related island-arc complexes preserved in the Tagil and Magnitogorsk arcs, Paleozoic island-arc fragments and associated tonalite-granodiorite intrusions in the East Uralian zone, and Carboniferous continental-margin arcs developed on the Kazakh craton in the Transuralian zone. The tracts range from about 50,000 to 130,000 km2 in area. The Urals host 8 known porphyry copper deposits with total identified resources of about 6.4 million metric tons of copper, at least 20 additional porphyry copper prospect areas, and numerous copper-bearing skarns and copper occurrences.Probabilistic estimates predict a mean of 22 undiscovered porphyry copper deposits within the four permissive tracts delineated in the Urals. Combining estimates with established grade and tonnage models predicts a mean of 82 million metric tons of undiscovered copper. Application of an economic filter suggests that about half of that amount could be economically recoverable based on assumed depth distributions, availability of infrastructure, recovery rates, current metals prices, and investment environment.
Use of EST-SSR loci flanking regions for phylogenetic analysis of genus Arachis
USDA-ARS?s Scientific Manuscript database
All wild peanut collections in the genus Arachis were assigned to nine taxonomy sections on the bases of cross-compatibility and morphologic character clustering. These nine sections consist of 80 species from the most ancient to the most advanced, providing a diverse genetic resource for phylogenet...
Sequence similarity is more relevant than species specificity in probabilistic backtranslation.
Ferro, Alfredo; Giugno, Rosalba; Pigola, Giuseppe; Pulvirenti, Alfredo; Di Pietro, Cinzia; Purrello, Michele; Ragusa, Marco
2007-02-21
Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species. This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision. The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically.
Process for computing geometric perturbations for probabilistic analysis
Fitch, Simeon H. K. [Charlottesville, VA; Riha, David S [San Antonio, TX; Thacker, Ben H [San Antonio, TX
2012-04-10
A method for computing geometric perturbations for probabilistic analysis. The probabilistic analysis is based on finite element modeling, in which uncertainties in the modeled system are represented by changes in the nominal geometry of the model, referred to as "perturbations". These changes are accomplished using displacement vectors, which are computed for each node of a region of interest and are based on mean-value coordinate calculations.
Lynn, Denis H; Wright, André-Denis G
2013-01-01
There are over 100 species in the Order Clevelandellida distributed in many hosts. The majority is assigned to one of the five families, the Nyctotheridae. Our knowledge of clevelandellid genetic diversity is limited to species of Nyctotherus and Nyctotheroides. To increase our understanding of clevelandellid genetic diversity, species were isolated from intestines of the Australian wood-feeding roach Panesthia cribrata Saussure, 1864 from August to October, 2008. Four morphospecies, similar to those reported in Java and Japan by Kidder [Parasitologica, 29:163-205], were identified: Clevelandella constricta, Clevelandella nipponensis, Clevelandella parapanesthiae, and Clevelandella panesthiae. Small subunit rRNA gene sequences assigned all species to a "family" clade that was sister to the clade of species assigned to the Family Nyctotheridae in the Order Clevelandellida. Genetics and morphology were consistent for the first three Clevelandella species, but isolates assigned to C. panesthiae were assignable to three different genotypes, suggesting that this may be a cryptic species complex. © 2013 The Author(s) Journal of Eukaryotic Microbiology © 2013 International Society of Protistologists.
Technical Report 1205: A Simple Probabilistic Combat Model
2016-07-08
This page intentionally left blank. 1. INTRODUCTION The Lanchester combat model1 is a simple way to assess the effects of quantity and quality...model. For the random case, assume R red weapons are allocated to B blue weapons randomly. We are interested in the distribution of weapons assigned...the initial condition is very close to the break even line. What is more interesting is that the probability density tends to concentrate at either a
A Technique for Developing Probabilistic Properties of Earth Materials
1988-04-01
Department of Civil Engineering. Responsibility for coordi- nating this program was assigned to Mr. A. E . Jackson, Jr., GD, under the supervision of Dr...assuming deformation as a right circular cylinder E = expected value F = ratio of the between sample variance and the within sample variance F = area...radial strain = true radial strain rT e = axial strainz = number of increments in the covariance analysis VL = loading Poisson’s ratio VUN = unloading
A three-way approach for protein function classification
2017-01-01
The knowledge of protein functions plays an essential role in understanding biological cells and has a significant impact on human life in areas such as personalized medicine, better crops and improved therapeutic interventions. Due to expense and inherent difficulty of biological experiments, intelligent methods are generally relied upon for automatic assignment of functions to proteins. The technological advancements in the field of biology are improving our understanding of biological processes and are regularly resulting in new features and characteristics that better describe the role of proteins. It is inevitable to neglect and overlook these anticipated features in designing more effective classification techniques. A key issue in this context, that is not being sufficiently addressed, is how to build effective classification models and approaches for protein function prediction by incorporating and taking advantage from the ever evolving biological information. In this article, we propose a three-way decision making approach which provides provisions for seeking and incorporating future information. We considered probabilistic rough sets based models such as Game-Theoretic Rough Sets (GTRS) and Information-Theoretic Rough Sets (ITRS) for inducing three-way decisions. An architecture of protein functions classification with probabilistic rough sets based three-way decisions is proposed and explained. Experiments are carried out on Saccharomyces cerevisiae species dataset obtained from Uniprot database with the corresponding functional classes extracted from the Gene Ontology (GO) database. The results indicate that as the level of biological information increases, the number of deferred cases are reduced while maintaining similar level of accuracy. PMID:28234929
A three-way approach for protein function classification.
Ur Rehman, Hafeez; Azam, Nouman; Yao, JingTao; Benso, Alfredo
2017-01-01
The knowledge of protein functions plays an essential role in understanding biological cells and has a significant impact on human life in areas such as personalized medicine, better crops and improved therapeutic interventions. Due to expense and inherent difficulty of biological experiments, intelligent methods are generally relied upon for automatic assignment of functions to proteins. The technological advancements in the field of biology are improving our understanding of biological processes and are regularly resulting in new features and characteristics that better describe the role of proteins. It is inevitable to neglect and overlook these anticipated features in designing more effective classification techniques. A key issue in this context, that is not being sufficiently addressed, is how to build effective classification models and approaches for protein function prediction by incorporating and taking advantage from the ever evolving biological information. In this article, we propose a three-way decision making approach which provides provisions for seeking and incorporating future information. We considered probabilistic rough sets based models such as Game-Theoretic Rough Sets (GTRS) and Information-Theoretic Rough Sets (ITRS) for inducing three-way decisions. An architecture of protein functions classification with probabilistic rough sets based three-way decisions is proposed and explained. Experiments are carried out on Saccharomyces cerevisiae species dataset obtained from Uniprot database with the corresponding functional classes extracted from the Gene Ontology (GO) database. The results indicate that as the level of biological information increases, the number of deferred cases are reduced while maintaining similar level of accuracy.
Mezlini, Aziz M; Goldenberg, Anna
2017-10-01
Discovering genetic mechanisms driving complex diseases is a hard problem. Existing methods often lack power to identify the set of responsible genes. Protein-protein interaction networks have been shown to boost power when detecting gene-disease associations. We introduce a Bayesian framework, Conflux, to find disease associated genes from exome sequencing data using networks as a prior. There are two main advantages to using networks within a probabilistic graphical model. First, networks are noisy and incomplete, a substantial impediment to gene discovery. Incorporating networks into the structure of a probabilistic models for gene inference has less impact on the solution than relying on the noisy network structure directly. Second, using a Bayesian framework we can keep track of the uncertainty of each gene being associated with the phenotype rather than returning a fixed list of genes. We first show that using networks clearly improves gene detection compared to individual gene testing. We then show consistently improved performance of Conflux compared to the state-of-the-art diffusion network-based method Hotnet2 and a variety of other network and variant aggregation methods, using randomly generated and literature-reported gene sets. We test Hotnet2 and Conflux on several network configurations to reveal biases and patterns of false positives and false negatives in each case. Our experiments show that our novel Bayesian framework Conflux incorporates many of the advantages of the current state-of-the-art methods, while offering more flexibility and improved power in many gene-disease association scenarios.
Tracing Asian Seabass Individuals to Single Fish Farms Using Microsatellites
Yue, Gen Hua; Xia, Jun Hong; Liu, Peng; Liu, Feng; Sun, Fei; Lin, Grace
2012-01-01
Traceability through physical labels is well established, but it is not highly reliable as physical labels can be easily changed or lost. Application of DNA markers to the traceability of food plays an increasingly important role for consumer protection and confidence building. In this study, we tested the efficiency of 16 polymorphic microsatellites and their combinations for tracing 368 fish to four populations where they originated. Using the maximum likelihood and Bayesian methods, three most efficient microsatellites were required to assign over 95% of fish to the correct populations. Selection of markers based on the assignment score estimated with the software WHICHLOCI was most effective in choosing markers for individual assignment, followed by the selection based on the allele number of individual markers. By combining rapid DNA extraction, and high-throughput genotyping of selected microsatellites, it is possible to conduct routine genetic traceability with high accuracy in Asian seabass. PMID:23285169
NASA Technical Reports Server (NTRS)
Singhal, Surendra N.
2003-01-01
The SAE G-11 RMSL Division and Probabilistic Methods Committee meeting sponsored by the Picatinny Arsenal during March 1-3, 2004 at Westin Morristown, will report progress on projects for probabilistic assessment of Army system and launch an initiative for probabilistic education. The meeting features several Army and industry Senior executives and Ivy League Professor to provide an industry/government/academia forum to review RMSL technology; reliability and probabilistic technology; reliability-based design methods; software reliability; and maintainability standards. With over 100 members including members with national/international standing, the mission of the G-11s Probabilistic Methods Committee is to enable/facilitate rapid deployment of probabilistic technology to enhance the competitiveness of our industries by better, faster, greener, smarter, affordable and reliable product development.
Satellite Based Probabilistic Snow Cover Extent Mapping (SCE) at Hydro-Québec
NASA Astrophysics Data System (ADS)
Teasdale, Mylène; De Sève, Danielle; Angers, Jean-François; Perreault, Luc
2016-04-01
Over 40% of Canada's water resources are in Quebec and Hydro-Quebec has developed potential to become one of the largest producers of hydroelectricity in the world, with a total installed capacity of 36,643 MW. The Hydro-Québec fleet park includes 27 large reservoirs with a combined storage capacity of 176 TWh, and 668 dams and 98 controls. Thus, over 98% of all electricity used to supply the domestic market comes from water resources and the excess output is sold on the wholesale markets. In this perspective the efficient management of water resources is needed and it is based primarily on a good river flow estimation including appropriate hydrological data. Snow on ground is one of the significant variables representing 30% to 40% of its annual energy reserve. More specifically, information on snow cover extent (SCE) and snow water equivalent (SWE) is crucial for hydrological forecasting, particularly in northern regions since the snowmelt provides the water that fills the reservoirs and is subsequently used for hydropower generation. For several years Hydro Quebec's research institute ( IREQ) developed several algorithms to map SCE and SWE. So far all the methods were deterministic. However, given the need to maximize the efficient use of all resources while ensuring reliability, the electrical systems must now be managed taking into account all risks. Since snow cover estimation is based on limited spatial information, it is important to quantify and handle its uncertainty in the hydrological forecasting system. This paper presents the first results of a probabilistic algorithm for mapping SCE by combining Bayesian mixture of probability distributions and multiple logistic regression models applied to passive microwave data. This approach allows assigning for each grid point, probabilities to the set of the mutually exclusive discrete outcomes: "snow" and "no snow". Its performance was evaluated using the Brier score since it is particularly appropriate to measure the accuracy of probabilistic discrete predictions. The scores were measured by comparing the snow probabilities produced by our models with the Hydro-Québec's snow ground data.
Diway, Bibian; Khoo, Eyen
2017-01-01
The development of timber tracking methods based on genetic markers can provide scientific evidence to verify the origin of timber products and fulfill the growing requirement for sustainable forestry practices. In this study, the origin of an important Dark Red Meranti wood, Shorea platyclados, was studied by using the combination of seven chloroplast DNA and 15 short tandem repeats (STRs) markers. A total of 27 natural populations of S. platyclados were sampled throughout Malaysia to establish population level and individual level identification databases. A haplotype map was generated from chloroplast DNA sequencing for population identification, resulting in 29 multilocus haplotypes, based on 39 informative intraspecific variable sites. Subsequently, a DNA profiling database was developed from 15 STRs allowing for individual identification in Malaysia. Cluster analysis divided the 27 populations into two genetic clusters, corresponding to the region of Eastern and Western Malaysia. The conservativeness tests showed that the Malaysia database is conservative after removal of bias from population subdivision and sampling effects. Independent self-assignment tests correctly assigned individuals to the database in an overall 60.60−94.95% of cases for identified populations, and in 98.99−99.23% of cases for identified regions. Both the chloroplast DNA database and the STRs appear to be useful for tracking timber originating in Malaysia. Hence, this DNA-based method could serve as an effective addition tool to the existing forensic timber identification system for ensuring the sustainably management of this species into the future. PMID:28430826
Plasticity in probabilistic reaction norms for maturation in a salmonid fish
Morita, Kentaro; Tsuboi, Jun-ichi; Nagasawa, Toru
2009-01-01
The relationship between body size and the probability of maturing, often referred to as the probabilistic maturation reaction norm (PMRN), has been increasingly used to infer genetic variation in maturation schedule. Despite this trend, few studies have directly evaluated plasticity in the PMRN. A transplant experiment using white-spotted charr demonstrated that the PMRN for precocious males exhibited plasticity. A smaller threshold size at maturity occurred in charr inhabiting narrow streams where more refuges are probably available for small charr, which in turn might enhance the reproductive success of sneaker precocious males. Our findings suggested that plastic effects should clearly be included in investigations of variation in PMRNs. PMID:19493875
Superposition-Based Analysis of First-Order Probabilistic Timed Automata
NASA Astrophysics Data System (ADS)
Fietzke, Arnaud; Hermanns, Holger; Weidenbach, Christoph
This paper discusses the analysis of first-order probabilistic timed automata (FPTA) by a combination of hierarchic first-order superposition-based theorem proving and probabilistic model checking. We develop the overall semantics of FPTAs and prove soundness and completeness of our method for reachability properties. Basically, we decompose FPTAs into their time plus first-order logic aspects on the one hand, and their probabilistic aspects on the other hand. Then we exploit the time plus first-order behavior by hierarchic superposition over linear arithmetic. The result of this analysis is the basis for the construction of a reachability equivalent (to the original FPTA) probabilistic timed automaton to which probabilistic model checking is finally applied. The hierarchic superposition calculus required for the analysis is sound and complete on the first-order formulas generated from FPTAs. It even works well in practice. We illustrate the potential behind it with a real-life DHCP protocol example, which we analyze by means of tool chain support.
vonHoldt, Bridgett M; Stahler, Daniel R; Bangs, Edward E; Smith, Douglas W; Jimenez, Mike D; Mack, Curt M; Niemeyer, Carter C; Pollinger, John P; Wayne, Robert K
2010-10-01
The successful re-introduction of grey wolves to the western United States is an impressive accomplishment for conservation science. However, the degree to which subpopulations are genetically structured and connected, along with the preservation of genetic variation, is an important concern for the continued viability of the metapopulation. We analysed DNA samples from 555 Northern Rocky Mountain wolves from the three recovery areas (Greater Yellowstone Area, Montana, and Idaho), including all 66 re-introduced founders, for variation in 26 microsatellite loci over the initial 10-year recovery period (1995-2004). The population maintained high levels of variation (H(O) = 0.64-0.72; allelic diversity k=7.0-10.3) with low levels of inbreeding (F(IS) < 0.03) and throughout this period, the population expanded rapidly (n(1995) =101; n(2004) =846). Individual-based Bayesian analyses revealed significant population genetic structure and identified three subpopulations coinciding with designated recovery areas. Population assignment and migrant detection were difficult because of the presence of related founders among different recovery areas and required a novel approach to determine genetically effective migration and admixture. However, by combining assignment tests, private alleles, sibship reconstruction, and field observations, we detected genetically effective dispersal among the three recovery areas. Successful conservation of Northern Rocky Mountain wolves will rely on management decisions that promote natural dispersal dynamics and minimize anthropogenic factors that reduce genetic connectivity. © 2010 Blackwell Publishing Ltd.
CPT-based probabilistic and deterministic assessment of in situ seismic soil liquefaction potential
Moss, R.E.S.; Seed, R.B.; Kayen, R.E.; Stewart, J.P.; Der Kiureghian, A.; Cetin, K.O.
2006-01-01
This paper presents a complete methodology for both probabilistic and deterministic assessment of seismic soil liquefaction triggering potential based on the cone penetration test (CPT). A comprehensive worldwide set of CPT-based liquefaction field case histories were compiled and back analyzed, and the data then used to develop probabilistic triggering correlations. Issues investigated in this study include improved normalization of CPT resistance measurements for the influence of effective overburden stress, and adjustment to CPT tip resistance for the potential influence of "thin" liquefiable layers. The effects of soil type and soil character (i.e., "fines" adjustment) for the new correlations are based on a combination of CPT tip and sleeve resistance. To quantify probability for performancebased engineering applications, Bayesian "regression" methods were used, and the uncertainties of all variables comprising both the seismic demand and the liquefaction resistance were estimated and included in the analysis. The resulting correlations were developed using a Bayesian framework and are presented in both probabilistic and deterministic formats. The results are compared to previous probabilistic and deterministic correlations. ?? 2006 ASCE.
Lung Cancer Assistant: a hybrid clinical decision support application for lung cancer care.
Sesen, M Berkan; Peake, Michael D; Banares-Alcantara, Rene; Tse, Donald; Kadir, Timor; Stanley, Roz; Gleeson, Fergus; Brady, Michael
2014-09-06
Multidisciplinary team (MDT) meetings are becoming the model of care for cancer patients worldwide. While MDTs have improved the quality of cancer care, the meetings impose substantial time pressure on the members, who generally attend several such MDTs. We describe Lung Cancer Assistant (LCA), a clinical decision support (CDS) prototype designed to assist the experts in the treatment selection decisions in the lung cancer MDTs. A novel feature of LCA is its ability to provide rule-based and probabilistic decision support within a single platform. The guideline-based CDS is based on clinical guideline rules, while the probabilistic CDS is based on a Bayesian network trained on the English Lung Cancer Audit Database (LUCADA). We assess rule-based and probabilistic recommendations based on their concordances with the treatments recorded in LUCADA. Our results reveal that the guideline rule-based recommendations perform well in simulating the recorded treatments with exact and partial concordance rates of 0.57 and 0.79, respectively. On the other hand, the exact and partial concordance rates achieved with probabilistic results are relatively poorer with 0.27 and 0.76. However, probabilistic decision support fulfils a complementary role in providing accurate survival estimations. Compared to recorded treatments, both CDS approaches promote higher resection rates and multimodality treatments.
Zhang, Xuejun; Lei, Jiaxing
2015-01-01
Considering reducing the airspace congestion and the flight delay simultaneously, this paper formulates the airway network flow assignment (ANFA) problem as a multiobjective optimization model and presents a new multiobjective optimization framework to solve it. Firstly, an effective multi-island parallel evolution algorithm with multiple evolution populations is employed to improve the optimization capability. Secondly, the nondominated sorting genetic algorithm II is applied for each population. In addition, a cooperative coevolution algorithm is adapted to divide the ANFA problem into several low-dimensional biobjective optimization problems which are easier to deal with. Finally, in order to maintain the diversity of solutions and to avoid prematurity, a dynamic adjustment operator based on solution congestion degree is specifically designed for the ANFA problem. Simulation results using the real traffic data from China air route network and daily flight plans demonstrate that the proposed approach can improve the solution quality effectively, showing superiority to the existing approaches such as the multiobjective genetic algorithm, the well-known multiobjective evolutionary algorithm based on decomposition, and a cooperative coevolution multiobjective algorithm as well as other parallel evolution algorithms with different migration topology. PMID:26180840
A Probabilistic Feature Map-Based Localization System Using a Monocular Camera.
Kim, Hyungjin; Lee, Donghwa; Oh, Taekjun; Choi, Hyun-Taek; Myung, Hyun
2015-08-31
Image-based localization is one of the most widely researched localization techniques in the robotics and computer vision communities. As enormous image data sets are provided through the Internet, many studies on estimating a location with a pre-built image-based 3D map have been conducted. Most research groups use numerous image data sets that contain sufficient features. In contrast, this paper focuses on image-based localization in the case of insufficient images and features. A more accurate localization method is proposed based on a probabilistic map using 3D-to-2D matching correspondences between a map and a query image. The probabilistic feature map is generated in advance by probabilistic modeling of the sensor system as well as the uncertainties of camera poses. Using the conventional PnP algorithm, an initial camera pose is estimated on the probabilistic feature map. The proposed algorithm is optimized from the initial pose by minimizing Mahalanobis distance errors between features from the query image and the map to improve accuracy. To verify that the localization accuracy is improved, the proposed algorithm is compared with the conventional algorithm in a simulation and realenvironments.
A Probabilistic Feature Map-Based Localization System Using a Monocular Camera
Kim, Hyungjin; Lee, Donghwa; Oh, Taekjun; Choi, Hyun-Taek; Myung, Hyun
2015-01-01
Image-based localization is one of the most widely researched localization techniques in the robotics and computer vision communities. As enormous image data sets are provided through the Internet, many studies on estimating a location with a pre-built image-based 3D map have been conducted. Most research groups use numerous image data sets that contain sufficient features. In contrast, this paper focuses on image-based localization in the case of insufficient images and features. A more accurate localization method is proposed based on a probabilistic map using 3D-to-2D matching correspondences between a map and a query image. The probabilistic feature map is generated in advance by probabilistic modeling of the sensor system as well as the uncertainties of camera poses. Using the conventional PnP algorithm, an initial camera pose is estimated on the probabilistic feature map. The proposed algorithm is optimized from the initial pose by minimizing Mahalanobis distance errors between features from the query image and the map to improve accuracy. To verify that the localization accuracy is improved, the proposed algorithm is compared with the conventional algorithm in a simulation and realenvironments. PMID:26404284
A novel probabilistic framework for event-based speech recognition
NASA Astrophysics Data System (ADS)
Juneja, Amit; Espy-Wilson, Carol
2003-10-01
One of the reasons for unsatisfactory performance of the state-of-the-art automatic speech recognition (ASR) systems is the inferior acoustic modeling of low-level acoustic-phonetic information in the speech signal. An acoustic-phonetic approach to ASR, on the other hand, explicitly targets linguistic information in the speech signal, but such a system for continuous speech recognition (CSR) is not known to exist. A probabilistic and statistical framework for CSR based on the idea of the representation of speech sounds by bundles of binary valued articulatory phonetic features is proposed. Multiple probabilistic sequences of linguistically motivated landmarks are obtained using binary classifiers of manner phonetic features-syllabic, sonorant and continuant-and the knowledge-based acoustic parameters (APs) that are acoustic correlates of those features. The landmarks are then used for the extraction of knowledge-based APs for source and place phonetic features and their binary classification. Probabilistic landmark sequences are constrained using manner class language models for isolated or connected word recognition. The proposed method could overcome the disadvantages encountered by the early acoustic-phonetic knowledge-based systems that led the ASR community to switch to systems highly dependent on statistical pattern analysis methods and probabilistic language or grammar models.
Genetic and Environmental Influences on Early Literacy Skills across School Grade Contexts
ERIC Educational Resources Information Center
Haughbrook, Rasheda; Hart, Sara A.; Schatschneider, Christopher; Taylor, Jeanette
2017-01-01
Recent research suggests that the etiology of reading achievement can differ across environmental contexts. In the US, schools are commonly assigned grades (e.g. "A," "B") often interpreted to indicate school quality. This study explored differences in the etiology of early literacy skills for students based on these school…
Gender identification of Caspian Terns using external morphology and discriminant function analysis
Ackerman, Joshua T.; Takekawa, John Y.; Bluso, J.D.; Yee, J.L.; Eagles-Smith, Collin A.
2008-01-01
Caspian Tern (Sterna caspia) plumage characteristics are sexually monochromatic and gender cannot easily be distinguished in the field without extensive behavioral observations. We assessed sexual size dimorphism and developed a discriminant function to assign gender in Caspian Terns based on external morphology. We collected and measured Caspian Terns in San Francisco Bay, California, and confirmed their gender based on necropsy and genetic analysis. Of the eight morphological measurements we examined, only bill depth at the gonys and head plus bill length differed between males and females with males being larger than females. A discriminant function using both bill depth at the gonys and head plus bill length accurately assigned gender of 83% of terns for which gender was known. We improved the accuracy of our discriminant function to 90% by excluding individuals that had less than a 75% posterior probability of correctly being assigned to gender. Caspian Terns showed little sexual size dimorphism in many morphometries, but our results indicate they can be reliably assigned to gender in the field using two morphological measurements.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moses, Alan M.; Chiang, Derek Y.; Pollard, Daniel A.
2004-10-28
We introduce a method (MONKEY) to identify conserved transcription-factor binding sites in multispecies alignments. MONKEY employs probabilistic models of factor specificity and binding site evolution, on which basis we compute the likelihood that putative sites are conserved and assign statistical significance to each hit. Using genomes from the genus Saccharomyces, we illustrate how the significance of real sites increases with evolutionary distance and explore the relationship between conservation and function.
Probabilistic Surface Characterization for Safe Landing Hazard Detection and Avoidance (HDA)
NASA Technical Reports Server (NTRS)
Johnson, Andrew E. (Inventor); Ivanov, Tonislav I. (Inventor); Huertas, Andres (Inventor)
2015-01-01
Apparatuses, systems, computer programs and methods for performing hazard detection and avoidance for landing vehicles are provided. Hazard assessment takes into consideration the geometry of the lander. Safety probabilities are computed for a plurality of pixels in a digital elevation map. The safety probabilities are combined for pixels associated with one or more aim points and orientations. A worst case probability value is assigned to each of the one or more aim points and orientations.
Poynton, Clare B; Chen, Kevin T; Chonde, Daniel B; Izquierdo-Garcia, David; Gollub, Randy L; Gerstner, Elizabeth R; Batchelor, Tracy T; Catana, Ciprian
2014-01-01
We present a new MRI-based attenuation correction (AC) approach for integrated PET/MRI systems that combines both segmentation- and atlas-based methods by incorporating dual-echo ultra-short echo-time (DUTE) and T1-weighted (T1w) MRI data and a probabilistic atlas. Segmented atlases were constructed from CT training data using a leave-one-out framework and combined with T1w, DUTE, and CT data to train a classifier that computes the probability of air/soft tissue/bone at each voxel. This classifier was applied to segment the MRI of the subject of interest and attenuation maps (μ-maps) were generated by assigning specific linear attenuation coefficients (LACs) to each tissue class. The μ-maps generated with this “Atlas-T1w-DUTE” approach were compared to those obtained from DUTE data using a previously proposed method. For validation of the segmentation results, segmented CT μ-maps were considered to the “silver standard”; the segmentation accuracy was assessed qualitatively and quantitatively through calculation of the Dice similarity coefficient (DSC). Relative change (RC) maps between the CT and MRI-based attenuation corrected PET volumes were also calculated for a global voxel-wise assessment of the reconstruction results. The μ-maps obtained using the Atlas-T1w-DUTE classifier agreed well with those derived from CT; the mean DSCs for the Atlas-T1w-DUTE-based μ-maps across all subjects were higher than those for DUTE-based μ-maps; the atlas-based μ-maps also showed a lower percentage of misclassified voxels across all subjects. RC maps from the atlas-based technique also demonstrated improvement in the PET data compared to the DUTE method, both globally as well as regionally. PMID:24753982
Error Discounting in Probabilistic Category Learning
Craig, Stewart; Lewandowsky, Stephan; Little, Daniel R.
2011-01-01
Some current theories of probabilistic categorization assume that people gradually attenuate their learning in response to unavoidable error. However, existing evidence for this error discounting is sparse and open to alternative interpretations. We report two probabilistic-categorization experiments that investigated error discounting by shifting feedback probabilities to new values after different amounts of training. In both experiments, responding gradually became less responsive to errors, and learning was slowed for some time after the feedback shift. Both results are indicative of error discounting. Quantitative modeling of the data revealed that adding a mechanism for error discounting significantly improved the fits of an exemplar-based and a rule-based associative learning model, as well as of a recency-based model of categorization. We conclude that error discounting is an important component of probabilistic learning. PMID:21355666
Controversies of Sex Re-assignment in Genetic Males with Congenital Inadequacy of the Penis.
Raveenthiran, Venkatachalam
2017-09-01
Sex assignment in 46XY genetic male children with congenital inadequacy of the penis (CIP) is controversial. Traditionally, children with penile length less than 2 cm at birth are considered unsuitable to be raised as males. They are typically re-assigned to female-sex and feminizing genitoplasty is usually done in infancy. However, the concept of cerebral androgen imprinting has caused paradigm shift in the philosophy of sex re-assignment. Masculinization of the brain, rather than length of the penis, is the modern criterion of sex re-assignment in CIP. This review summarizes the current understanding of the complex issue. In 46XY children with CIP, male-sex assignment appears appropriate in non-hormonal conditions such as idiopathic micropenis, aphallia and exstrophy. Female-sex re-assignment appears acceptable in complete androgen insensitivity (CAIS), while partial androgen insensitivity syndrome (PAIS) patients are highly dissatisfied with the assignment of either sex. Children with 5-alpha reductase deficiency are likely to have spontaneous penile lengthening at puberty. Hence, they are better raised as males. Although female assignment is common in pure gonadal dysgenesis, long-term results are not known to justify the decision.
A probabilistic model to predict clinical phenotypic traits from genome sequencing.
Chen, Yun-Ching; Douville, Christopher; Wang, Cheng; Niknafs, Noushin; Yeo, Grace; Beleva-Guthrie, Violeta; Carter, Hannah; Stenson, Peter D; Cooper, David N; Li, Biao; Mooney, Sean; Karchin, Rachel
2014-09-01
Genetic screening is becoming possible on an unprecedented scale. However, its utility remains controversial. Although most variant genotypes cannot be easily interpreted, many individuals nevertheless attempt to interpret their genetic information. Initiatives such as the Personal Genome Project (PGP) and Illumina's Understand Your Genome are sequencing thousands of adults, collecting phenotypic information and developing computational pipelines to identify the most important variant genotypes harbored by each individual. These pipelines consider database and allele frequency annotations and bioinformatics classifications. We propose that the next step will be to integrate these different sources of information to estimate the probability that a given individual has specific phenotypes of clinical interest. To this end, we have designed a Bayesian probabilistic model to predict the probability of dichotomous phenotypes. When applied to a cohort from PGP, predictions of Gilbert syndrome, Graves' disease, non-Hodgkin lymphoma, and various blood groups were accurate, as individuals manifesting the phenotype in question exhibited the highest, or among the highest, predicted probabilities. Thirty-eight PGP phenotypes (26%) were predicted with area-under-the-ROC curve (AUC)>0.7, and 23 (15.8%) of these were statistically significant, based on permutation tests. Moreover, in a Critical Assessment of Genome Interpretation (CAGI) blinded prediction experiment, the models were used to match 77 PGP genomes to phenotypic profiles, generating the most accurate prediction of 16 submissions, according to an independent assessor. Although the models are currently insufficiently accurate for diagnostic utility, we expect their performance to improve with growth of publicly available genomics data and model refinement by domain experts.
Vandeputte, Marc; Haffray, Pierrick
2014-01-01
Since the middle of the 1990s, parentage assignment using microsatellite markers has been introduced as a tool in aquaculture breeding. It now allows close to 100% assignment success, and offered new ways to develop aquaculture breeding using mixed family designs in commercial conditions. Its main achievements are the knowledge and control of family representation and inbreeding, especially in mass spawning species, above all the capacity to estimate reliable genetic parameters in any species and rearing system with no prior investment in structures, and the development of new breeding programs in many species. Parentage assignment should not be seen as a way to replace physical tagging, but as a new way to conceive breeding programs, which have to be optimized with its specific constraints, one of the most important being to well define the number of individuals to genotype to limit costs, maximize genetic gain while minimizing inbreeding. The recent possible shift to (for the moment) more costly single nucleotide polymorphism markers should benefit from future developments in genomics and marker-assisted selection to combine parentage assignment and indirect prediction of breeding values. PMID:25566319
Microbial species delineation using whole genome sequences.
Varghese, Neha J; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T; Mavrommatis, Kostas; Kyrpides, Nikos C; Pati, Amrita
2015-08-18
Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Gilbey, John; Cauwelier, Eef; Coulson, Mark W.; Stradmeyer, Lee; Sampayo, James N.; Armstrong, Anja; Verspoor, Eric; Corrigan, Laura; Shelley, Jonathan; Middlemas, Stuart
2016-01-01
Understanding the habitat use patterns of migratory fish, such as Atlantic salmon (Salmo salar L.), and the natural and anthropogenic impacts on them, is aided by the ability to identify individuals to their stock of origin. Presented here are the results of an analysis of informative single nucleotide polymorphic (SNP) markers for detecting genetic structuring in Atlantic salmon in Scotland and NE England and their ability to allow accurate genetic stock identification. 3,787 fish from 147 sites covering 27 rivers were screened at 5,568 SNP markers. In order to identify a cost-effective subset of SNPs, they were ranked according to their ability to differentiate between fish from different rivers. A panel of 288 SNPs was used to examine both individual assignments and mixed stock fisheries and eighteen assignment units were defined. The results improved greatly on previously available methods and, for the first time, fish caught in the marine environment can be confidently assigned to geographically coherent units within Scotland and NE England, including individual rivers. As such, this SNP panel has the potential to aid understanding of the various influences acting upon Atlantic salmon on their marine migrations, be they natural environmental variations and/or anthropogenic impacts, such as mixed stock fisheries and interactions with marine power generation installations. PMID:27723810
NASA Astrophysics Data System (ADS)
Maryam, Syeda; McCrackin, Laura; Crowley, Mark; Rathi, Yogesh; Michailovich, Oleg
2017-03-01
The world's aging population has given rise to an increasing awareness towards neurodegenerative disorders, including Alzheimers Disease (AD). Treatment options for AD are currently limited, but it is believed that future success depends on our ability to detect the onset of the disease in its early stages. The most frequently used tools for this include neuropsychological assessments, along with genetic, proteomic, and image-based diagnosis. Recently, the applicability of Diffusion Magnetic Resonance Imaging (dMRI) analysis for early diagnosis of AD has also been reported. The sensitivity of dMRI to the microstructural organization of cerebral tissue makes it particularly well-suited to detecting changes which are known to occur in the early stages of AD. Existing dMRI approaches can be divided into two broad categories: region-based and tract-based. In this work, we propose a new approach, which extends region-based approaches to the simultaneous characterization of multiple brain regions. Given a predefined set of features derived from dMRI data, we compute the probabilistic distances between different brain regions and treat the resulting connectivity pattern as an undirected, fully-connected graph. The characteristics of this graph are then used as markers to discriminate between AD subjects and normal controls (NC). Although in this preliminary work we omit subjects in the prodromal stage of AD, mild cognitive impairment (MCI), our method demonstrates perfect separability between AD and NC subject groups with substantial margin, and thus holds promise for fine-grained stratification of NC, MCI and AD populations.
Base-Rate Neglect as a Function of Base Rates in Probabilistic Contingency Learning
ERIC Educational Resources Information Center
Kutzner, Florian; Freytag, Peter; Vogel, Tobias; Fiedler, Klaus
2008-01-01
When humans predict criterion events based on probabilistic predictors, they often lend excessive weight to the predictor and insufficient weight to the base rate of the criterion event. In an operant analysis, using a matching-to-sample paradigm, Goodie and Fantino (1996) showed that humans exhibit base-rate neglect when predictors are associated…
Djalalov, Sandjar; Yong, Jean; Beca, Jaclyn; Black, Sandra; Saposnik, Gustavo; Musa, Zahra; Siminovitch, Katherine; Moretti, Myla; Hoch, Jeffrey S
2012-12-01
To evaluate the cost effectiveness of genetic screening for the apolipoprotein (APOE) ε4 allele in combination with preventive donepezil treatment in comparison with the standard of care for amnestic mild cognitive impairment (AMCI) patients in Canada. We performed a cost-effectiveness analysis using a Markov model with a societal perspective and a time horizon of 30 years. For each strategy, we calculated quality-adjusted life-years (QALYs), using utilities from the literature. Costs were also based on the literature and, when appropriate, Ontario sources. One-way and probabilistic sensitivity analyses were performed. Expected value of perfect information (EVPI) analysis was conducted to explore the value of future research. The base case results in our exploratory study suggest that the combination of genetic testing and preventive donepezil treatment resulted in a gain of 0.027 QALYs and an incremental cost of $1,015 (in 2009 Canadian dollars [Can$]), compared with the standard of care. The incremental cost-effectiveness ratio (ICER) for the base case was Can$38,016 per QALY. The ICER was sensitive to the effectiveness of donepezil in slowing the rate of progression to Alzheimer's disease (AD), utility in AMCI patients, and AD and donepezil treatment costs. EVPI analysis showed that additional information on these parameters would be of value. Using presently available clinical evidence, this exploratory study illustrates that genetic testing combined with preventive donepezil treatment for AMCI patients may be economically attractive. Since our results were based on a secondary post hoc analysis, our study alone is insufficient to warrant recommending APOE genotyping in AMCI patients. Future research on the effectiveness of preventive donepezil as a targeted therapy is recommended.
Cartwright, Reed A; Hussin, Julie; Keebler, Jonathan E M; Stone, Eric A; Awadalla, Philip
2012-01-06
Recent advances in high-throughput DNA sequencing technologies and associated statistical analyses have enabled in-depth analysis of whole-genome sequences. As this technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors' genotypes. The presence of a de novo mutation within the pedigree is indicated by a violation of Mendelian inheritance laws. Here, we present a method for probabilistically inferring genotypes across a pedigree using high-throughput sequencing data and producing the posterior probability of de novo mutation at each genomic site examined. This framework can be used to disentangle the effects of germline and somatic mutational processes and to simultaneously estimate the effect of sequencing error and the initial genetic variation in the population from which the founders of the pedigree arise. This approach is examined in detail through simulations and areas for method improvement are noted. By applying this method to data from members of a well-defined nuclear family with accurate pedigree information, the stage is set to make the most direct estimates of the human mutation rate to date.
Students’ difficulties in probabilistic problem-solving
NASA Astrophysics Data System (ADS)
Arum, D. P.; Kusmayadi, T. A.; Pramudya, I.
2018-03-01
There are many errors can be identified when students solving mathematics problems, particularly in solving the probabilistic problem. This present study aims to investigate students’ difficulties in solving the probabilistic problem. It focuses on analyzing and describing students errors during solving the problem. This research used the qualitative method with case study strategy. The subjects in this research involve ten students of 9th grade that were selected by purposive sampling. Data in this research involve students’ probabilistic problem-solving result and recorded interview regarding students’ difficulties in solving the problem. Those data were analyzed descriptively using Miles and Huberman steps. The results show that students have difficulties in solving the probabilistic problem and can be divided into three categories. First difficulties relate to students’ difficulties in understanding the probabilistic problem. Second, students’ difficulties in choosing and using appropriate strategies for solving the problem. Third, students’ difficulties with the computational process in solving the problem. Based on the result seems that students still have difficulties in solving the probabilistic problem. It means that students have not able to use their knowledge and ability for responding probabilistic problem yet. Therefore, it is important for mathematics teachers to plan probabilistic learning which could optimize students probabilistic thinking ability.
Probabilistic classifiers with high-dimensional data
Kim, Kyung In; Simon, Richard
2011-01-01
For medical classification problems, it is often desirable to have a probability associated with each class. Probabilistic classifiers have received relatively little attention for small n large p classification problems despite of their importance in medical decision making. In this paper, we introduce 2 criteria for assessment of probabilistic classifiers: well-calibratedness and refinement and develop corresponding evaluation measures. We evaluated several published high-dimensional probabilistic classifiers and developed 2 extensions of the Bayesian compound covariate classifier. Based on simulation studies and analysis of gene expression microarray data, we found that proper probabilistic classification is more difficult than deterministic classification. It is important to ensure that a probabilistic classifier is well calibrated or at least not “anticonservative” using the methods developed here. We provide this evaluation for several probabilistic classifiers and also evaluate their refinement as a function of sample size under weak and strong signal conditions. We also present a cross-validation method for evaluating the calibration and refinement of any probabilistic classifier on any data set. PMID:21087946
A comprehensive probabilistic analysis model of oil pipelines network based on Bayesian network
NASA Astrophysics Data System (ADS)
Zhang, C.; Qin, T. X.; Jiang, B.; Huang, C.
2018-02-01
Oil pipelines network is one of the most important facilities of energy transportation. But oil pipelines network accident may result in serious disasters. Some analysis models for these accidents have been established mainly based on three methods, including event-tree, accident simulation and Bayesian network. Among these methods, Bayesian network is suitable for probabilistic analysis. But not all the important influencing factors are considered and the deployment rule of the factors has not been established. This paper proposed a probabilistic analysis model of oil pipelines network based on Bayesian network. Most of the important influencing factors, including the key environment condition and emergency response are considered in this model. Moreover, the paper also introduces a deployment rule for these factors. The model can be used in probabilistic analysis and sensitive analysis of oil pipelines network accident.
Treatment of uncertainties in the IPCC: a philosophical analysis
NASA Astrophysics Data System (ADS)
Jebeile, J.; Drouet, I.
2014-12-01
The IPCC produces scientific reports out of findings on climate and climate change. Because the findings are uncertain in many respects, the production of reports requires aggregating assessments of uncertainties of different kinds. This difficult task is currently regulated by the Guidance note for lead authors of the IPCC fifth assessment report on consistent treatment of uncertainties. The note recommends that two metrics—i.e. confidence and likelihood— be used for communicating the degree of certainty in findings. Confidence is expressed qualitatively "based on the type, amount, quality, and consistency of evidence […] and the degree of agreement", while likelihood is expressed probabilistically "based on statistical analysis of observations or model results, or expert judgment". Therefore, depending on the evidence evaluated, authors have the choice to present either an assigned level of confidence or a quantified measure of likelihood. But aggregating assessments of uncertainties of these two different kinds express distinct and conflicting methodologies. So the question arises whether the treatment of uncertainties in the IPCC is rationally justified. In order to answer the question, it is worth comparing the IPCC procedures with the formal normative theories of epistemic rationality which have been developed by philosophers. These theories—which include contributions to the philosophy of probability and to bayesian probabilistic confirmation theory—are relevant for our purpose because they are commonly used to assess the rationality of common collective jugement formation based on uncertain knowledge. In this paper we make the comparison and pursue the following objectives: i/we determine whether the IPCC confidence and likelihood can be compared with the notions of uncertainty targeted by or underlying the formal normative theories of epistemic rationality; ii/we investigate whether the formal normative theories of epistemic rationality justify treating uncertainty along those two dimensions, and indicate how this can be avoided.
NASA Astrophysics Data System (ADS)
Zhang, Shaojie; Zhao, Luqiang; Delgado-Tellez, Ricardo; Bao, Hongjun
2018-03-01
Conventional outputs of physics-based landslide forecasting models are presented as deterministic warnings by calculating the safety factor (Fs) of potentially dangerous slopes. However, these models are highly dependent on variables such as cohesion force and internal friction angle which are affected by a high degree of uncertainty especially at a regional scale, resulting in unacceptable uncertainties of Fs. Under such circumstances, the outputs of physical models are more suitable if presented in the form of landslide probability values. In order to develop such models, a method to link the uncertainty of soil parameter values with landslide probability is devised. This paper proposes the use of Monte Carlo methods to quantitatively express uncertainty by assigning random values to physical variables inside a defined interval. The inequality Fs < 1 is tested for each pixel in n simulations which are integrated in a unique parameter. This parameter links the landslide probability to the uncertainties of soil mechanical parameters and is used to create a physics-based probabilistic forecasting model for rainfall-induced shallow landslides. The prediction ability of this model was tested in a case study, in which simulated forecasting of landslide disasters associated with heavy rainfalls on 9 July 2013 in the Wenchuan earthquake region of Sichuan province, China, was performed. The proposed model successfully forecasted landslides in 159 of the 176 disaster points registered by the geo-environmental monitoring station of Sichuan province. Such testing results indicate that the new model can be operated in a highly efficient way and show more reliable results, attributable to its high prediction accuracy. Accordingly, the new model can be potentially packaged into a forecasting system for shallow landslides providing technological support for the mitigation of these disasters at regional scale.
Pariset, L; Mariotti, M; Nardone, A; Soysal, M I; Ozkan, E; Williams, J L; Dunner, S; Leveziel, H; Maróti-Agóts, A; Bodò, I; Valentini, A
2010-12-01
Italian Maremmana, Turkish Grey and Hungarian Grey breeds belong to the same Podolic group of cattle, have a similar conformation and recently experienced a similar demographic reduction. The aim of this study was to assess the relationship among the analysed Podolic breeds and to verify whether their genetic state reflects their history. To do so, approximately 100 single nucleotide polymorphisms (SNPs) were genotyped on individuals belonging to these breeds and compared to genotypes of individuals of two Italian beef breeds, Marchigiana and Piemontese, which underwent different selection and migration histories. Population genetic parameters such as allelic frequencies and heterozygosity values were assessed, genetic distances calculated and assignment test performed to evaluate the possibility of recent admixture between the populations. The data show that the physical similarity among the Podolic breeds examined, and particularly between Hungarian Grey and Maremmana cattle that experienced admixture in the recent past, is mainly morphological. The assignment of individuals from genotype data was achieved using Bayesian inference, confirming that the set of chosen SNPs is able to distinguish among the breeds and that the breeds are genetically distinct. Individuals of Turkish Grey breed were clearly assigned to their breed of origin for all clustering alternatives, showing that this breed can be differentiated from the others on the basis of the allelic frequencies. Remarkably, in the Turkish Grey there were differences observed between the population of Enez district, where in situ conservation studies are practised, and that of Bandirma district of Balikesir, where ex situ conservation studies are practised out of the original raising area. In conclusion, this study demonstrates that molecular data could be used to reveal an unbiased view of past events and provide the basis for a rational exploitation of livestock, suggesting appropriate cross-breeding plans based on genetic distance or breeding strategies that include the population structure. © 2010 Blackwell Verlag GmbH.
Brancolini, Florencia; del Pazo, Felipe; Posner, Victoria Maria; Grimberg, Alexis; Arranz, Silvia Eda
2016-01-01
Valid fish species identification is essential for biodiversity conservation and fisheries management. Here, we provide a sequence reference library based on mitochondrial cytochrome c oxidase subunit I for a valid identification of 79 freshwater fish species from the Lower Paraná River. Neighbour-joining analysis based on K2P genetic distances formed non-overlapping clusters for almost all species with a ≥99% bootstrap support each. Identification was successful for 97.8% of species as the minimum genetic distance to the nearest neighbour exceeded the maximum intraspecific distance in all these cases. A barcoding gap of 2.5% was apparent for the whole data set with the exception of four cases. Within-species distances ranged from 0.00% to 7.59%, while interspecific distances varied between 4.06% and 19.98%, without considering Odontesthes species with a minimum genetic distance of 0%. Sequence library validation was performed by applying BOLDs BIN analysis tool, Poisson Tree Processes model and Automatic Barcode Gap Discovery, along with a reliable taxonomic assignment by experts. Exhaustive revision of vouchers was performed when a conflicting assignment was detected after sequence analysis and BIN discordance evaluation. Thus, the sequence library presented here can be confidently used as a benchmark for identification of half of the fish species recorded for the Lower Paraná River. PMID:27442116
Mondol, Samrat; Sridhar, Vanjulavalli; Yadav, Prasanjeet; Gubbi, Sanjay; Ramakrishnan, Uma
2015-04-01
Illicit trade in wildlife products is rapidly decimating many species across the globe. Such trade is often underestimated for wide-ranging species until it is too late for the survival of their remaining populations. Policing this trade could be vastly improved if one could reliably determine geographic origins of illegal wildlife products and identify areas where greater enforcement is needed. Using DNA-based assignment tests (i.e., samples are assigned to geographic locations), we addressed these factors for leopards (Panthera pardus) on the Indian subcontinent. We created geography-specific allele frequencies from a genetic reference database of 173 leopards across India to infer geographic origins of DNA samples from 40 seized leopard skins. Sensitivity analyses of samples of known geographic origins and assignments of seized skins demonstrated robust assignments for Indian leopards. We found that confiscated pelts seized in small numbers were not necessarily from local leopards. The geographic footprint of large seizures appeared to be bigger than the cumulative footprint of several smaller seizures, indicating widespread leopard poaching across the subcontinent. Our seized samples had male-biased sex ratios, especially the large seizures. From multiple seized sample assignments, we identified central India as a poaching hotspot for leopards. The techniques we applied can be used to identify origins of seized illegal wildlife products and trade routes at the subcontinent scale and beyond. © 2014 Society for Conservation Biology.
Misfortune may be a blessing in disguise: Fairness perception and emotion modulate decision making.
Liu, Hong-Hsiang; Hwang, Yin-Dir; Hsieh, Ming H; Hsu, Yung-Fong; Lai, Wen-Sung
2017-08-01
Fairness perception and equality during social interactions frequently elicit affective arousal and affect decision making. By integrating the dictator game and a probabilistic gambling task, this study aimed to investigate the effects of a negative experience induced by perceived unfairness on decision making using behavioral, model fitting, and electrophysiological approaches. Participants were randomly assigned to the neutral, harsh, or kind groups, which consisted of various asset allocation scenarios to induce different levels of perceived unfairness. The monetary gain was subsequently considered the initial asset in a negatively rewarded, probabilistic gambling task in which the participants were instructed to maintain as much asset as possible. Our behavioral results indicated that the participants in the harsh group exhibited increased levels of negative emotions but retained greater total game scores than the participants in the other two groups. Parameter estimation of a reinforcement learning model using a Bayesian approach indicated that these participants were more loss aversive and consistent in decision making. Data from simultaneous ERP recordings further demonstrated that these participants exhibited larger feedback-related negativity to unexpected outcomes in the gambling task, which suggests enhanced reward sensitivity and signaling of reward prediction error. Collectively, our study suggests that a negative experience may be an advantage in the modulation of reward-based decision making. © 2017 Society for Psychophysiological Research.
Yang, Hua; Gao, Wen; Liu, Lei; Liu, Ke; Liu, E-Hu; Qi, Lian-Wen; Li, Ping
2015-11-10
Most Aconitum species, also known as aconite, are extremely poisonous, so it must be identified carefully. Differentiation of Aconitum species is challenging because of their similar appearance and chemical components. In this study, a universal strategy to discover chemical markers was developed for effective authentication of three commonly used aconite roots. The major procedures include: (1) chemical profiling and structural assignment of herbs by liquid chromatography with mass spectrometry (LC-MS), (2) quantification of major components by LC-MS, (3) probabilistic neural network (PNN) model to calculate contributions of components toward species classification, (4) discovery of minimized number of chemical markers for quality control. The MS fragmentation pathways of diester-, monoester-, and alkyloyamine-diterpenoid alkaloids were compared. Using these rules, 42 aconite alkaloids were identified in aconite roots. Subsequently, 11 characteristic compounds were quantified. A component-species modeling by PNN was then established combining the 11 analytes and 26-batch samples from three aconite species. The contribution of each analyte to species classification was calculated. Selection of fuziline, benzoylhypaconine, and talatizamine, or a combination of more compounds based on a contribution order, can be used for successful categorization of the three aconite species. Collectively, the proposed strategy is beneficial to selection of rational chemical markers for the species classification and quality control of herbal medicines. Copyright © 2015 Elsevier B.V. All rights reserved.
Byass, Peter; Huong, Dao Lan; Minh, Hoang Van
2003-01-01
Verbal autopsy (VA) has become an important tool in the past 20 years for determining cause of death in communities where there is no routine registration. In many cases, expert physicians have been used to interpret the VA findings and so assign individual causes of death. However, this is time consuming and not always repeatable. Other approaches such as algorithms and neural networks have been developed in some settings. This paper aims to develop a method that is simple, reliable and consistent, which could represent an advance in VA interpretation. This paper describes the development of a Bayesian probability model for VA interpretation as an attempt to find a better approach. This methodology and a preliminary implementation are described, with an evaluation based on VA material from rural Vietnam. The new model was tested against a series of 189 VA interviews from a rural community in Vietnam. Using this very basic model, over 70% of individual causes of death corresponded with those determined by two physicians increasing to over 80% if those cases ascribed to old age or as being indeterminate by the physicians were excluded. Although there is a clear need to improve the preliminary model and to test more extensively with larger and more varied datasets, these preliminary results suggest that there may be good potential in this probabilistic approach.
Wildlife forensic science: A review of genetic geographic origin assignment.
Ogden, Rob; Linacre, Adrian
2015-09-01
Wildlife forensic science has become a key means of enforcing legislation surrounding the illegal trade in protected and endangered species. A relatively new dimension to this area of forensic science is to determine the geographic origin of a seized sample. This review focuses on DNA testing, which relies on assignment of an unknown sample to its genetic population of origin. Key examples of this are the trade in timber, fish and ivory and these are used only to illustrate the large number of species for which this type of testing is potentially available. The role of mitochondrial and nuclear DNA markers is discussed, alongside a comparison of neutral markers with those exhibiting signatures of selection, which potentially offer much higher levels of assignment power to address specific questions. A review of assignment tests is presented along with detailed methods for evaluating error rates and considerations for marker selection. The availability and quality of reference data are of paramount importance to support assignment applications and ensure reliability of any conclusions drawn. The genetic methods discussed have been developed initially as investigative tools but comment is made regarding their use in courts. The potential to compliment DNA markers with elemental assays for greater assignment power is considered and finally recommendations are made for the future of this type of testing. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Abbas, Ahmed; Guo, Xianrong; Jing, Bing-Yi; Gao, Xin
2014-06-01
Despite significant advances in automated nuclear magnetic resonance-based protein structure determination, the high numbers of false positives and false negatives among the peaks selected by fully automated methods remain a problem. These false positives and negatives impair the performance of resonance assignment methods. One of the main reasons for this problem is that the computational research community often considers peak picking and resonance assignment to be two separate problems, whereas spectroscopists use expert knowledge to pick peaks and assign their resonances at the same time. We propose a novel framework that simultaneously conducts slice picking and spin system forming, an essential step in resonance assignment. Our framework then employs a genetic algorithm, directed by both connectivity information and amino acid typing information from the spin systems, to assign the spin systems to residues. The inputs to our framework can be as few as two commonly used spectra, i.e., CBCA(CO)NH and HNCACB. Different from the existing peak picking and resonance assignment methods that treat peaks as the units, our method is based on 'slices', which are one-dimensional vectors in three-dimensional spectra that correspond to certain ([Formula: see text]) values. Experimental results on both benchmark simulated data sets and four real protein data sets demonstrate that our method significantly outperforms the state-of-the-art methods while using a less number of spectra than those methods. Our method is freely available at http://sfb.kaust.edu.sa/Pages/Software.aspx.
Is probabilistic bias analysis approximately Bayesian?
MacLehose, Richard F.; Gustafson, Paul
2011-01-01
Case-control studies are particularly susceptible to differential exposure misclassification when exposure status is determined following incident case status. Probabilistic bias analysis methods have been developed as ways to adjust standard effect estimates based on the sensitivity and specificity of exposure misclassification. The iterative sampling method advocated in probabilistic bias analysis bears a distinct resemblance to a Bayesian adjustment; however, it is not identical. Furthermore, without a formal theoretical framework (Bayesian or frequentist), the results of a probabilistic bias analysis remain somewhat difficult to interpret. We describe, both theoretically and empirically, the extent to which probabilistic bias analysis can be viewed as approximately Bayesian. While the differences between probabilistic bias analysis and Bayesian approaches to misclassification can be substantial, these situations often involve unrealistic prior specifications and are relatively easy to detect. Outside of these special cases, probabilistic bias analysis and Bayesian approaches to exposure misclassification in case-control studies appear to perform equally well. PMID:22157311
Studying human disease genes in Caenorhabditis elegans: a molecular genetics laboratory project.
Cox-Paulson, Elisabeth A; Grana, Theresa M; Harris, Michelle A; Batzli, Janet M
2012-01-01
Scientists routinely integrate information from various channels to explore topics under study. We designed a 4-wk undergraduate laboratory module that used a multifaceted approach to study a question in molecular genetics. Specifically, students investigated whether Caenorhabditis elegans can be a useful model system for studying genes associated with human disease. In a large-enrollment, sophomore-level laboratory course, groups of three to four students were assigned a gene associated with either breast cancer (brc-1), Wilson disease (cua-1), ovarian dysgenesis (fshr-1), or colon cancer (mlh-1). Students compared observable phenotypes of wild-type C. elegans and C. elegans with a homozygous deletion in the assigned gene. They confirmed the genetic deletion with nested polymerase chain reaction and performed a bioinformatics analysis to predict how the deletion would affect the encoded mRNA and protein. Students also performed RNA interference (RNAi) against their assigned gene and evaluated whether RNAi caused a phenotype similar to that of the genetic deletion. As a capstone activity, students prepared scientific posters in which they presented their data, evaluated whether C. elegans was a useful model system for studying their assigned genes, and proposed future directions. Assessment showed gains in understanding genotype versus phenotype, RNAi, common bioinformatics tools, and the utility of model organisms.
Studying Human Disease Genes in Caenorhabditis elegans: A Molecular Genetics Laboratory Project
Cox-Paulson, Elisabeth A.; Grana, Theresa M.; Harris, Michelle A.; Batzli, Janet M.
2012-01-01
Scientists routinely integrate information from various channels to explore topics under study. We designed a 4-wk undergraduate laboratory module that used a multifaceted approach to study a question in molecular genetics. Specifically, students investigated whether Caenorhabditis elegans can be a useful model system for studying genes associated with human disease. In a large-enrollment, sophomore-level laboratory course, groups of three to four students were assigned a gene associated with either breast cancer (brc-1), Wilson disease (cua-1), ovarian dysgenesis (fshr-1), or colon cancer (mlh-1). Students compared observable phenotypes of wild-type C. elegans and C. elegans with a homozygous deletion in the assigned gene. They confirmed the genetic deletion with nested polymerase chain reaction and performed a bioinformatics analysis to predict how the deletion would affect the encoded mRNA and protein. Students also performed RNA interference (RNAi) against their assigned gene and evaluated whether RNAi caused a phenotype similar to that of the genetic deletion. As a capstone activity, students prepared scientific posters in which they presented their data, evaluated whether C. elegans was a useful model system for studying their assigned genes, and proposed future directions. Assessment showed gains in understanding genotype versus phenotype, RNAi, common bioinformatics tools, and the utility of model organisms. PMID:22665589
Bacles, C F E; Ennos, R A
2008-10-01
Paternity analysis based on microsatellite marker genotyping was used to infer contemporary genetic connectivity by pollen of three population remnants of the wind-pollinated, wind-dispersed tree Fraxinus excelsior, in a deforested Scottish landscape. By deterministically accounting for genotyping error and comparing a range of assignment methods, individual-based paternity assignments were used to derive population-level estimates of gene flow. Pollen immigration into a 300 ha landscape represents between 43 and 68% of effective pollination, mostly depending on assignment method. Individual male reproductive success is unequal, with 31 of 48 trees fertilizing one seed or more, but only three trees fertilizing more than ten seeds. Spatial analysis suggests a fat-tailed pollen dispersal curve with 85% of detected pollination occurring within 100 m, and 15% spreading between 300 and 1900 m from the source. Identification of immigrating pollen sourced from two neighbouring remnants indicates further effective dispersal at 2900 m. Pollen exchange among remnants is driven by population size rather than geographic distance, with larger remnants acting predominantly as pollen donors, and smaller remnants as pollen recipients. Enhanced wind dispersal of pollen in a barren landscape ensures that the seed produced within the catchment includes genetic material from a wide geographic area. However, gene flow estimates based on analysis of non-dispersed seeds were shown to underestimate realized gene immigration into the remnants by a factor of two suggesting that predictive landscape conservation requires integrated estimates of post-recruitment gene flow occurring via both pollen and seed.
Using temporal sampling to improve attribution of source populations for invasive species.
Goldstien, Sharyn J; Inglis, Graeme J; Schiel, David R; Gemmell, Neil J
2013-01-01
Numerous studies have applied genetic tools to the identification of source populations and transport pathways for invasive species. However, there are many gaps in the knowledge obtained from such studies because comprehensive and meaningful spatial sampling to meet these goals is difficult to achieve. Sampling populations as they arrive at the border should fill the gaps in source population identification, but such an advance has not yet been achieved with genetic data. Here we use previously acquired genetic data to assign new incursions as they invade populations within New Zealand ports and marinas. We also investigated allelelic frequency change in these recently established populations over a two-year period, and assessed the effect of temporal genetic sampling on our ability to assign new incursions to their population of source. We observed shifts in the allele frequencies among populations, as well as the complete loss of some alleles and the addition of alleles novel to New Zealand, within these recently established populations. There was no significant level of genetic differentiation observed in our samples between years, and the use of these temporal data did alter the assignment probability of new incursions. Our study further suggests that new incursions can add genetic variation to the population in a single introduction event as the founders themselves are often more genetically diverse than theory initially predicted.
Folksonomical P2P File Sharing Networks Using Vectorized KANSEI Information as Search Tags
NASA Astrophysics Data System (ADS)
Ohnishi, Kei; Yoshida, Kaori; Oie, Yuji
We present the concept of folksonomical peer-to-peer (P2P) file sharing networks that allow participants (peers) to freely assign structured search tags to files. These networks are similar to folksonomies in the present Web from the point of view that users assign search tags to information distributed over a network. As a concrete example, we consider an unstructured P2P network using vectorized Kansei (human sensitivity) information as structured search tags for file search. Vectorized Kansei information as search tags indicates what participants feel about their files and is assigned by the participant to each of their files. A search query also has the same form of search tags and indicates what participants want to feel about files that they will eventually obtain. A method that enables file search using vectorized Kansei information is the Kansei query-forwarding method, which probabilistically propagates a search query to peers that are likely to hold more files having search tags that are similar to the query. The similarity between the search query and the search tags is measured in terms of their dot product. The simulation experiments examine if the Kansei query-forwarding method can provide equal search performance for all peers in a network in which only the Kansei information and the tendency with respect to file collection are different among all of the peers. The simulation results show that the Kansei query forwarding method and a random-walk-based query forwarding method, for comparison, work effectively in different situations and are complementary. Furthermore, the Kansei query forwarding method is shown, through simulations, to be superior to or equal to the random-walk based one in terms of search speed.
Conservation genetics of the rare Pyreneo-Cantabrian endemic Aster pyrenaeus (Asteraceae)
Escaravage, Nathalie; Cambecèdes, Jocelyne; Largier, Gérard; Pornon, André
2011-01-01
Background and aims Aster pyrenaeus (Asteraceae) is an endangered species, endemic to the Pyrenees and Cantabrian Mountain ranges (Spain). For its long-term persistence, this taxon needs an appropriate conservation strategy to be implemented. In this context, we studied the genetic structure over the entire geographical range of the species and then inferred the genetic relationships between populations. Methodology Molecular diversity was analysed for 290 individuals from 12 populations in the Pyrenees and the Cantabrian Mountains using inter simple sequence repeats (ISSRs). Bayesian-based analysis was applied to examine population structure. Principal results Analysis of genetic similarity and diversity, based on 87 polymorphic ISSR markers, suggests that despite being small and isolated, populations have an intermediate genetic diversity level (P % = 52.8 %, HE = 0.21 ± 0.01, genetic similarity between individuals = 49.6 %). Genetic variation was mainly found within populations (80–84 %), independently of mountain ranges, whereas 16–18 % was found between populations and <5 % between mountain ranges. Analyses of molecular variance indicated that population differentiation was highly significant. However, no significant correlation was found between the genetic and geographical distances among populations (Rs = 0.359, P = 0.140). Geographical structure based on assignment tests identified five different gene pools that were independent of any particular structure in the landscape. Conclusions The results suggest that population isolation is probably relatively recent, and that the outbreeding behaviour of the species maintains a high within-population genetic diversity. We assume that some long-distance dispersal, even among topographically remote populations, may be determinant for the pattern of genetic variation found in populations. Based on these findings, strategies are proposed for genetic conservation and management of the species. PMID:22476499
Ethical principles and pitfalls of genetic testing for dementia.
Hedera, P
2001-01-01
Progress in the genetics of dementing disorders and the availability of clinical tests for practicing physicians increase the need for a better understanding of multifaceted issues associated with genetic testing. The genetics of dementia is complex, and genetic testing is fraught with many ethical concerns. Genetic testing can be considered for patients with a family history suggestive of a single gene disorder as a cause of dementia. Testing of affected patients should be accompanied by competent genetic counseling that focuses on probabilistic implications for at-risk first-degree relatives. Predictive testing of at-risk asymptomatic patients should be modeled after presymptomatic testing for Huntington's disease. Testing using susceptibility genes has only a limited diagnostic value at present because potential improvement in diagnostic accuracy does not justify potentially negative consequences for first-degree relatives. Predictive testing of unaffected subjects using susceptibility genes is currently not recommended because individual risk cannot be quantified and there are no therapeutic interventions for dementia in presymptomatic patients.
Receipt of Preventive Services After Oregon's Randomized Medicaid Experiment.
Marino, Miguel; Bailey, Steffani R; Gold, Rachel; Hoopes, Megan J; O'Malley, Jean P; Huguet, Nathalie; Heintzman, John; Gallia, Charles; McConnell, K John; DeVoe, Jennifer E
2016-02-01
It is predicted that gaining health insurance via the Affordable Care Act will result in increased rates of preventive health services receipt in the U.S., primarily based on self-reported findings from previous health insurance expansion studies. This study examined the long-term (36-month) impact of Oregon's 2008 randomized Medicaid expansion ("Oregon Experiment") on receipt of 12 preventive care services in community health centers using electronic health record data. Demographic data from adult (aged 19-64 years) Oregon Experiment participants were probabilistically matched to electronic health record data from 49 Oregon community health centers within the OCHIN community health information network (N=10,643). Intent-to-treat analyses compared receipt of preventive services over a 36-month (2008-2011) period among those randomly assigned to apply for Medicaid versus not assigned, and instrumental variable analyses estimated the effect of actually gaining Medicaid coverage on preventive services receipt (data collected in 2012-2014; analysis performed in 2014-2015). Intent-to-treat analyses revealed statistically significant differences between patients randomly assigned to apply for Medicaid (versus not assigned) for 8 of 12 assessed preventive services. In intent-to-treat analyses, Medicaid coverage significantly increased the odds of receipt of most preventive services (ORs ranging from 1.04 [95% CI=1.02, 1.06] for smoking assessment to 1.27 [95% CI=1.02, 1.57] for mammography). Rates of preventive services receipt will likely increase as community health center patients gain insurance through Affordable Care Act expansions. Continued effort is needed to increase health insurance coverage in an effort to decrease health disparities in vulnerable populations. Copyright © 2016 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.
Receipt of Preventive Services After Oregon’s Randomized Medicaid Experiment
Marino, Miguel; Bailey, Steffani R.; Gold, Rachel; Hoopes, Megan J.; O’Malley, Jean P.; Huguet, Nathalie; Heintzman, John; Gallia, Charles; McConnell, K. John; DeVoe, Jennifer E.
2015-01-01
Introduction It is predicted that gaining health insurance via the Affordable Care Act will result in increased rates of preventive health services receipt in the U.S, primarily based on self-reported findings from previous health insurance expansion studies. This study examined the long-term (36-month) impact of Oregon’s 2008 randomized Medicaid expansion (“Oregon Experiment”) on receipt of 12 preventive care services in community health centers using electronic health record data. Methods Demographic data from adult (aged 19–64 years) Oregon Experiment participants were probabilistically matched to electronic health record data from 49 Oregon community health centers within the OCHIN community health information network (N=10,643). Intent-to-treat analyses compared receipt of preventive services over a 36-month (2008–2011) period among those randomly assigned to apply for Medicaid versus not assigned, and instrumental variable analyses estimated the effect of actually gaining Medicaid coverage on preventive services receipt (data collected in 2012–2014; analysis performed in 2014–2015). Results Intent-to-treat analyses revealed statistically significant differences between patients randomly assigned to apply for Medicaid (versus not assigned) for eight of 12 assessed preventive services. In intent-to-treat[MM1] analyses, Medicaid coverage significantly increased the odds of receipt of most preventive services (ORs ranging from 1.04 [95% CI=1.02, 1.06] for smoking assessment to 1.27 [95% CI=1.02, 1.57] for mammography). Conclusions Rates of preventive services receipt will likely increase as community health center patients gain insurance through Affordable Care Act expansions. Continued effort is needed to increase health insurance coverage in an effort to decrease health disparities in vulnerable populations. PMID:26497264
2006-05-31
dynamics (MD) and kinetic Monte Carlo ( KMC ) procedures. In 2D surface modeling our calculations project speedups of 9 orders of magnitude at 300 degrees...programming is used to perform customized statistical mechanics by bridging the different time scales of MD and KMC quickly and well. Speedups in
Peakall, Rod; Smouse, Peter E
2012-10-01
GenAlEx: Genetic Analysis in Excel is a cross-platform package for population genetic analyses that runs within Microsoft Excel. GenAlEx offers analysis of diploid codominant, haploid and binary genetic loci and DNA sequences. Both frequency-based (F-statistics, heterozygosity, HWE, population assignment, relatedness) and distance-based (AMOVA, PCoA, Mantel tests, multivariate spatial autocorrelation) analyses are provided. New features include calculation of new estimators of population structure: G'(ST), G''(ST), Jost's D(est) and F'(ST) through AMOVA, Shannon Information analysis, linkage disequilibrium analysis for biallelic data and novel heterogeneity tests for spatial autocorrelation analysis. Export to more than 30 other data formats is provided. Teaching tutorials and expanded step-by-step output options are included. The comprehensive guide has been fully revised. GenAlEx is written in VBA and provided as a Microsoft Excel Add-in (compatible with Excel 2003, 2007, 2010 on PC; Excel 2004, 2011 on Macintosh). GenAlEx, and supporting documentation and tutorials are freely available at: http://biology.anu.edu.au/GenAlEx. rod.peakall@anu.edu.au.
Zelenina, D A; Khrustaleva, A M; Volkov, A A
2006-05-01
Using two types of molecular markers, a comparative analysis of the population structure of sockeye salmon from West Kamchatka as well as population assignment of each individual fish were carried out. The values of a RAPD-PCR-based population assignment test (94-100%) were somewhat higher than those based on microsatellite data (74-84%). However, these results seem quite satisfactory because of high polymorphism of the microsatellite loci examined. The UPGMA dendrograms of genetic similarity of three largest spawning populations, constructed using each of the methods, were highly reliable, which was demonstrated by high bootstrap indices (100% in the case of RAPD-PCR; 84 and 100%, in the case of microsatellite analysis), though the resultant trees differed from one another. The different topology of the trees, in our view, is explained by the fact that the employed methods explored different parts of the genome; hence, the obtained results, albeit valid, may not correlate. Thus, to enhance reliability of the results, several methods of analysis should be used concurrently.
Combining Different Privacy-Preserving Record Linkage Methods for Hospital Admission Data.
Stausberg, Jürgen; Waldenburger, Andreas; Borgs, Christian; Schnell, Rainer
2017-01-01
Record linkage (RL) is the process of identifying pairs of records that correspond to the same entity, for example the same patient. The basic approach assigns to each pair of records a similarity weight, and then determines a certain threshold, above which the two records are considered to be a match. Three different RL methods were applied under privacy-preserving conditions on hospital admission data: deterministic RL (DRL), probabilistic RL (PRL), and Bloom filters. The patient characteristics like names were one-way encrypted (DRL, PRL) or transformed to a cryptographic longterm key (Bloom filters). Based on one year of hospital admissions, the data set was split randomly in 30 thousand new and 1,5 million known patients. With the combination of the three RL-methods, a positive predictive value of 83 % (95 %-confidence interval 65 %-94 %) was attained. Thus, the application of the presented combination of RL-methods seem to be suited for other applications of population-based research.
Genetic Algorithm Calibration of Probabilistic Cellular Automata for Modeling Mining Permit Activity
Louis, S.J.; Raines, G.L.
2003-01-01
We use a genetic algorithm to calibrate a spatially and temporally resolved cellular automata to model mining activity on public land in Idaho and western Montana. The genetic algorithm searches through a space of transition rule parameters of a two dimensional cellular automata model to find rule parameters that fit observed mining activity data. Previous work by one of the authors in calibrating the cellular automaton took weeks - the genetic algorithm takes a day and produces rules leading to about the same (or better) fit to observed data. These preliminary results indicate that genetic algorithms are a viable tool in calibrating cellular automata for this application. Experience gained during the calibration of this cellular automata suggests that mineral resource information is a critical factor in the quality of the results. With automated calibration, further refinements of how the mineral-resource information is provided to the cellular automaton will probably improve our model.
Automated design of genetic toggle switches with predetermined bistability.
Chen, Shuobing; Zhang, Haoqian; Shi, Handuo; Ji, Weiyue; Feng, Jingchen; Gong, Yan; Yang, Zhenglin; Ouyang, Qi
2012-07-20
Synthetic biology aims to rationally construct biological devices with required functionalities. Methods that automate the design of genetic devices without post-hoc adjustment are therefore highly desired. Here we provide a method to predictably design genetic toggle switches with predetermined bistability. To accomplish this task, a biophysical model that links ribosome binding site (RBS) DNA sequence to toggle switch bistability was first developed by integrating a stochastic model with RBS design method. Then, to parametrize the model, a library of genetic toggle switch mutants was experimentally built, followed by establishing the equivalence between RBS DNA sequences and switch bistability. To test this equivalence, RBS nucleotide sequences for different specified bistabilities were in silico designed and experimentally verified. Results show that the deciphered equivalence is highly predictive for the toggle switch design with predetermined bistability. This method can be generalized to quantitative design of other probabilistic genetic devices in synthetic biology.
Genetic assignment of large seizures of elephant ivory reveals Africa’s major poaching hotspots
Wasser, S. K.; Brown, L.; Mailand, C.; Mondol, S.; Clark, W.; Laurie, C.; Weir, B. S.
2017-01-01
Poaching of elephants is now occurring at rates that threaten African populations with extinction. Identifying the number and location of Africa’s major poaching hotspots may assist efforts to end poaching and facilitate recovery of elephant populations. We genetically assign origin to 28 large ivory seizures (≥0.5 metric tons) made between 1996 and 2014, also testing assignment accuracy. Results suggest that the major poaching hotspots in Africa may be currently concentrated in as few as two areas. Increasing law enforcement in these two hotspots could help curtail future elephant losses across Africa and disrupt this organized transnational crime. PMID:26089357
Probabilistic pathway construction.
Yousofshahi, Mona; Lee, Kyongbum; Hassoun, Soha
2011-07-01
Expression of novel synthesis pathways in host organisms amenable to genetic manipulations has emerged as an attractive metabolic engineering strategy to overproduce natural products, biofuels, biopolymers and other commercially useful metabolites. We present a pathway construction algorithm for identifying viable synthesis pathways compatible with balanced cell growth. Rather than exhaustive exploration, we investigate probabilistic selection of reactions to construct the pathways. Three different selection schemes are investigated for the selection of reactions: high metabolite connectivity, low connectivity and uniformly random. For all case studies, which involved a diverse set of target metabolites, the uniformly random selection scheme resulted in the highest average maximum yield. When compared to an exhaustive search enumerating all possible reaction routes, our probabilistic algorithm returned nearly identical distributions of yields, while requiring far less computing time (minutes vs. years). The pathways identified by our algorithm have previously been confirmed in the literature as viable, high-yield synthesis routes. Prospectively, our algorithm could facilitate the design of novel, non-native synthesis routes by efficiently exploring the diversity of biochemical transformations in nature. Copyright © 2011 Elsevier Inc. All rights reserved.
Residential air exchange rates (AERs) are a key determinant in the infiltration of ambient air pollution indoors. Population-based human exposure models using probabilistic approaches to estimate personal exposure to air pollutants have relied on input distributions from AER meas...
A Practical Probabilistic Graphical Modeling Tool for Weighing Ecological Risk-Based Evidence
Past weight-of-evidence frameworks for adverse ecological effects have provided soft-scoring procedures for judgments based on the quality and measured attributes of evidence. Here, we provide a flexible probabilistic structure for weighing and integrating lines of evidence for e...
Functional Assessment of Genetic Variants with Outcomes Adapted to Clinical Decision-Making
Thouvenot, Pierre; Ben Yamin, Barbara; Fourrière, Lou; Lescure, Aurianne; Boudier, Thomas; Del Nery, Elaine; Chauchereau, Anne; Goldgar, David E.; Stoppa-Lyonnet, Dominique; Nicolas, Alain; Millot, Gaël A.
2016-01-01
Understanding the medical effect of an ever-growing number of human variants detected is a long term challenge in genetic counseling. Functional assays, based on in vitro or in vivo evaluations of the variant effects, provide essential information, but they require robust statistical validation, as well as adapted outputs, to be implemented in the clinical decision-making process. Here, we assessed 25 pathogenic and 15 neutral missense variants of the BRCA1 breast/ovarian cancer susceptibility gene in four BRCA1 functional assays. Next, we developed a novel approach that refines the variant ranking in these functional assays. Lastly, we developed a computational system that provides a probabilistic classification of variants, adapted to clinical interpretation. Using this system, the best functional assay exhibits a variant classification accuracy estimated at 93%. Additional theoretical simulations highlight the benefit of this ready-to-use system in the classification of variants after functional assessment, which should facilitate the consideration of functional evidences in the decision-making process after genetic testing. Finally, we demonstrate the versatility of the system with the classification of siRNAs tested for human cell growth inhibition in high throughput screening. PMID:27272900
Doitsidou, Maria; Jarriault, Sophie; Poole, Richard J.
2016-01-01
The use of next-generation sequencing (NGS) has revolutionized the way phenotypic traits are assigned to genes. In this review, we describe NGS-based methods for mapping a mutation and identifying its molecular identity, with an emphasis on applications in Caenorhabditis elegans. In addition to an overview of the general principles and concepts, we discuss the main methods, provide practical and conceptual pointers, and guide the reader in the types of bioinformatics analyses that are required. Owing to the speed and the plummeting costs of NGS-based methods, mapping and cloning a mutation of interest has become straightforward, quick, and relatively easy. Removing this bottleneck previously associated with forward genetic screens has significantly advanced the use of genetics to probe fundamental biological processes in an unbiased manner. PMID:27729495
Saklatvala, Jake R; Dand, Nick; Simpson, Michael A
2018-05-01
The genetic diagnosis of rare monogenic diseases using exome/genome sequencing requires the true causal variant(s) to be identified from tens of thousands of observed variants. Typically a virtual gene panel approach is taken whereby only variants in genes known to cause phenotypes resembling the patient under investigation are considered. With the number of known monogenic gene-disease pairs exceeding 5,000, manual curation of personalized virtual panels using exhaustive knowledge of the genetic basis of the human monogenic phenotypic spectrum is challenging. We present improved probabilistic methods for estimating phenotypic similarity based on Human Phenotype Ontology annotation. A limitation of existing methods for evaluating a disease's similarity to a reference set is that reference diseases are typically represented as a series of binary (present/absent) observations of phenotypic terms. We evaluate a quantified disease reference set, using term frequency in phenotypic text descriptions to approximate term relevance. We demonstrate an improved ability to identify related diseases through the use of a quantified reference set, and that vector space similarity measures perform better than established information content-based measures. These improvements enable the generation of bespoke virtual gene panels, facilitating more accurate and efficient interpretation of genomic variant profiles from individuals with rare Mendelian disorders. These methods are available online at https://atlas.genetics.kcl.ac.uk/~jake/cgi-bin/patient_sim.py. © 2018 Wiley Periodicals, Inc.
Unaldi, Numan; Temel, Samil; Asari, Vijayan K.
2012-01-01
One of the most critical issues of Wireless Sensor Networks (WSNs) is the deployment of a limited number of sensors in order to achieve maximum coverage on a terrain. The optimal sensor deployment which enables one to minimize the consumed energy, communication time and manpower for the maintenance of the network has attracted interest with the increased number of studies conducted on the subject in the last decade. Most of the studies in the literature today are proposed for two dimensional (2D) surfaces; however, real world sensor deployments often arise on three dimensional (3D) environments. In this paper, a guided wavelet transform (WT) based deployment strategy (WTDS) for 3D terrains, in which the sensor movements are carried out within the mutation phase of the genetic algorithms (GAs) is proposed. The proposed algorithm aims to maximize the Quality of Coverage (QoC) of a WSN via deploying a limited number of sensors on a 3D surface by utilizing a probabilistic sensing model and the Bresenham's line of sight (LOS) algorithm. In addition, the method followed in this paper is novel to the literature and the performance of the proposed algorithm is compared with the Delaunay Triangulation (DT) method as well as a standard genetic algorithm based method and the results reveal that the proposed method is a more powerful and more successful method for sensor deployment on 3D terrains. PMID:22666078
NASA Astrophysics Data System (ADS)
Hirata, K.; Fujiwara, H.; Nakamura, H.; Osada, M.; Morikawa, N.; Kawai, S.; Ohsumi, T.; Aoi, S.; Yamamoto, N.; Matsuyama, H.; Toyama, N.; Kito, T.; Murashima, Y.; Murata, Y.; Inoue, T.; Saito, R.; Takayama, J.; Akiyama, S.; Korenaga, M.; Abe, Y.; Hashimoto, N.
2015-12-01
The Earthquake Research Committee(ERC)/HERP, Government of Japan (2013) revised their long-term evaluation of the forthcoming large earthquake along the Nankai Trough; the next earthquake is estimated M8 to 9 class, and the probability (P30) that the next earthquake will occur within the next 30 years (from Jan. 1, 2013) is 60% to 70%. In this study, we assess tsunami hazards (maximum coastal tsunami heights) in the near future, in terms of a probabilistic approach, from the next earthquake along Nankai Trough, on the basis of ERC(2013)'s report. The probabilistic tsunami hazard assessment that we applied is as follows; (1) Characterized earthquake fault models (CEFMs) are constructed on each of the 15 hypothetical source areas (HSA) that ERC(2013) showed. The characterization rule follows Toyama et al.(2015, JpGU). As results, we obtained total of 1441 CEFMs. (2) We calculate tsunamis due to CEFMs by solving nonlinear, finite-amplitude, long-wave equations with advection and bottom friction terms by finite-difference method. Run-up computation on land is included. (3) A time predictable model predicts the recurrent interval of the present seismic cycle is T=88.2 years (ERC,2013). We fix P30 = 67% by applying the renewal process based on BPT distribution with T and alpha=0.24 as its aperiodicity. (4) We divide the probability P30 into P30(i) for i-th subgroup consisting of the earthquakes occurring in each of 15 HSA by following a probability re-distribution concept (ERC,2014). Then each earthquake (CEFM) in i-th subgroup is assigned a probability P30(i)/N where N is the number of CEFMs in each sub-group. Note that such re-distribution concept of the probability is nothing but tentative because the present seismology cannot give deep knowledge enough to do it. Epistemic logic-tree approach may be required in future. (5) We synthesize a number of tsunami hazard curves at every evaluation points on coasts by integrating the information about 30 years occurrence probabilities P30(i) for all earthquakes (CEFMs) and calculated maximum coastal tsunami heights. In the synthesis, aleatory uncertainties relating to incompleteness of governing equations, CEFM modeling, bathymetry and topography data, etc, are modeled assuming a log-normal probabilistic distribution. Examples of tsunami hazard curves will be presented.
A Darwinian approach to control-structure design
NASA Technical Reports Server (NTRS)
Zimmerman, David C.
1993-01-01
Genetic algorithms (GA's), as introduced by Holland (1975), are one form of directed random search. The form of direction is based on Darwin's 'survival of the fittest' theories. GA's are radically different from the more traditional design optimization techniques. GA's work with a coding of the design variables, as opposed to working with the design variables directly. The search is conducted from a population of designs (i.e., from a large number of points in the design space), unlike the traditional algorithms which search from a single design point. The GA requires only objective function information, as opposed to gradient or other auxiliary information. Finally, the GA is based on probabilistic transition rules, as opposed to deterministic rules. These features allow the GA to attack problems with local-global minima, discontinuous design spaces and mixed variable problems, all in a single, consistent framework.
NASA Technical Reports Server (NTRS)
Ryan, Robert S.; Townsend, John S.
1993-01-01
The prospective improvement of probabilistic methods for space program analysis/design entails the further development of theories, codes, and tools which match specific areas of application, the drawing of lessons from previous uses of probability and statistics data bases, the enlargement of data bases (especially in the field of structural failures), and the education of engineers and managers on the advantages of these methods. An evaluation is presently made of the current limitations of probabilistic engineering methods. Recommendations are made for specific applications.
Generalized probabilistic scale space for image restoration.
Wong, Alexander; Mishra, Akshaya K
2010-10-01
A novel generalized sampling-based probabilistic scale space theory is proposed for image restoration. We explore extending the definition of scale space to better account for both noise and observation models, which is important for producing accurately restored images. A new class of scale-space realizations based on sampling and probability theory is introduced to realize this extended definition in the context of image restoration. Experimental results using 2-D images show that generalized sampling-based probabilistic scale-space theory can be used to produce more accurate restored images when compared with state-of-the-art scale-space formulations, particularly under situations characterized by low signal-to-noise ratios and image degradation.
Verifying the geographic origin of mahogany (Swietenia macrophylla King) with DNA-fingerprints.
Degen, B; Ward, S E; Lemes, M R; Navarro, C; Cavers, S; Sebbenn, A M
2013-01-01
Illegal logging is one of the main causes of ongoing worldwide deforestation and needs to be eradicated. The trade in illegal timber and wood products creates market disadvantages for products from sustainable forestry. Although various measures have been established to counter illegal logging and the subsequent trade, there is a lack of practical mechanisms for identifying the origin of timber and wood products. In this study, six nuclear microsatellites were used to generate DNA fingerprints for a genetic reference database characterising the populations of origin of a large set of mahogany (Swietenia macrophylla King, Meliaceae) samples. For the database, leaves and/or cambium from 1971 mahogany trees sampled in 31 stands from Mexico to Bolivia were genotyped. A total of 145 different alleles were found, showing strong genetic differentiation (δ(Gregorious)=0.52, F(ST)=0.18, G(ST(Hedrick))=0.65) and clear correlation between genetic and spatial distances among stands (r=0.82, P<0.05). We used the genetic reference database and Bayesian assignment testing to determine the geographic origins of two sets of mahogany wood samples, based on their multilocus genotypes. In both cases the wood samples were assigned to the correct country of origin. We discuss the overall applicability of this methodology to tropical timber trading. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Probabilistic dual heuristic programming-based adaptive critic
NASA Astrophysics Data System (ADS)
Herzallah, Randa
2010-02-01
Adaptive critic (AC) methods have common roots as generalisations of dynamic programming for neural reinforcement learning approaches. Since they approximate the dynamic programming solutions, they are potentially suitable for learning in noisy, non-linear and non-stationary environments. In this study, a novel probabilistic dual heuristic programming (DHP)-based AC controller is proposed. Distinct to current approaches, the proposed probabilistic (DHP) AC method takes uncertainties of forward model and inverse controller into consideration. Therefore, it is suitable for deterministic and stochastic control problems characterised by functional uncertainty. Theoretical development of the proposed method is validated by analytically evaluating the correct value of the cost function which satisfies the Bellman equation in a linear quadratic control problem. The target value of the probabilistic critic network is then calculated and shown to be equal to the analytically derived correct value. Full derivation of the Riccati solution for this non-standard stochastic linear quadratic control problem is also provided. Moreover, the performance of the proposed probabilistic controller is demonstrated on linear and non-linear control examples.
NASA Astrophysics Data System (ADS)
Song, Lu-Kai; Wen, Jie; Fei, Cheng-Wei; Bai, Guang-Chen
2018-05-01
To improve the computing efficiency and precision of probabilistic design for multi-failure structure, a distributed collaborative probabilistic design method-based fuzzy neural network of regression (FR) (called as DCFRM) is proposed with the integration of distributed collaborative response surface method and fuzzy neural network regression model. The mathematical model of DCFRM is established and the probabilistic design idea with DCFRM is introduced. The probabilistic analysis of turbine blisk involving multi-failure modes (deformation failure, stress failure and strain failure) was investigated by considering fluid-structure interaction with the proposed method. The distribution characteristics, reliability degree, and sensitivity degree of each failure mode and overall failure mode on turbine blisk are obtained, which provides a useful reference for improving the performance and reliability of aeroengine. Through the comparison of methods shows that the DCFRM reshapes the probability of probabilistic analysis for multi-failure structure and improves the computing efficiency while keeping acceptable computational precision. Moreover, the proposed method offers a useful insight for reliability-based design optimization of multi-failure structure and thereby also enriches the theory and method of mechanical reliability design.
New Proofs of Some q-Summation and q-Transformation Formulas
Liu, Xian-Fang; Bi, Ya-Qing; Luo, Qiu-Ming
2014-01-01
We obtain an expectation formula and give the probabilistic proofs of some summation and transformation formulas of q-series based on our expectation formula. Although these formulas in themselves are not the probability results, the proofs given are based on probabilistic concepts. PMID:24895675
Probabilistic modeling of the evolution of gene synteny within reconciled phylogenies
2015-01-01
Background Most models of genome evolution concern either genetic sequences, gene content or gene order. They sometimes integrate two of the three levels, but rarely the three of them. Probabilistic models of gene order evolution usually have to assume constant gene content or adopt a presence/absence coding of gene neighborhoods which is blind to complex events modifying gene content. Results We propose a probabilistic evolutionary model for gene neighborhoods, allowing genes to be inserted, duplicated or lost. It uses reconciled phylogenies, which integrate sequence and gene content evolution. We are then able to optimize parameters such as phylogeny branch lengths, or probabilistic laws depicting the diversity of susceptibility of syntenic regions to rearrangements. We reconstruct a structure for ancestral genomes by optimizing a likelihood, keeping track of all evolutionary events at the level of gene content and gene synteny. Ancestral syntenies are associated with a probability of presence. We implemented the model with the restriction that at most one gene duplication separates two gene speciations in reconciled gene trees. We reconstruct ancestral syntenies on a set of 12 drosophila genomes, and compare the evolutionary rates along the branches and along the sites. We compare with a parsimony method and find a significant number of results not supported by the posterior probability. The model is implemented in the Bio++ library. It thus benefits from and enriches the classical models and methods for molecular evolution. PMID:26452018
Green, Nancy
2005-04-01
We developed a Bayesian network coding scheme for annotating biomedical content in layperson-oriented clinical genetics documents. The coding scheme supports the representation of probabilistic and causal relationships among concepts in this domain, at a high enough level of abstraction to capture commonalities among genetic processes and their relationship to health. We are using the coding scheme to annotate a corpus of genetic counseling patient letters as part of the requirements analysis and knowledge acquisition phase of a natural language generation project. This paper describes the coding scheme and presents an evaluation of intercoder reliability for its tag set. In addition to giving examples of use of the coding scheme for analysis of discourse and linguistic features in this genre, we suggest other uses for it in analysis of layperson-oriented text and dialogue in medical communication.
Probabilistic simulation of uncertainties in thermal structures
NASA Technical Reports Server (NTRS)
Chamis, Christos C.; Shiao, Michael
1990-01-01
Development of probabilistic structural analysis methods for hot structures is a major activity at Lewis Research Center. It consists of five program elements: (1) probabilistic loads; (2) probabilistic finite element analysis; (3) probabilistic material behavior; (4) assessment of reliability and risk; and (5) probabilistic structural performance evaluation. Recent progress includes: (1) quantification of the effects of uncertainties for several variables on high pressure fuel turbopump (HPFT) blade temperature, pressure, and torque of the Space Shuttle Main Engine (SSME); (2) the evaluation of the cumulative distribution function for various structural response variables based on assumed uncertainties in primitive structural variables; (3) evaluation of the failure probability; (4) reliability and risk-cost assessment, and (5) an outline of an emerging approach for eventual hot structures certification. Collectively, the results demonstrate that the structural durability/reliability of hot structural components can be effectively evaluated in a formal probabilistic framework. In addition, the approach can be readily extended to computationally simulate certification of hot structures for aerospace environments.
An improved probabilistic account of counterfactual reasoning.
Lucas, Christopher G; Kemp, Charles
2015-10-01
When people want to identify the causes of an event, assign credit or blame, or learn from their mistakes, they often reflect on how things could have gone differently. In this kind of reasoning, one considers a counterfactual world in which some events are different from their real-world counterparts and considers what else would have changed. Researchers have recently proposed several probabilistic models that aim to capture how people do (or should) reason about counterfactuals. We present a new model and show that it accounts better for human inferences than several alternative models. Our model builds on the work of Pearl (2000), and extends his approach in a way that accommodates backtracking inferences and that acknowledges the difference between counterfactual interventions and counterfactual observations. We present 6 new experiments and analyze data from 4 experiments carried out by Rips (2010), and the results suggest that the new model provides an accurate account of both mean human judgments and the judgments of individuals. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
Samberg, Leah H; Fishman, Lila; Allendorf, Fred W
2013-01-01
Conservation strategies are increasingly driven by our understanding of the processes and patterns of gene flow across complex landscapes. The expansion of population genetic approaches into traditional agricultural systems requires understanding how social factors contribute to that landscape, and thus to gene flow. This study incorporates extensive farmer interviews and population genetic analysis of barley landraces (Hordeum vulgare) to build a holistic picture of farmer-mediated geneflow in an ancient, traditional agricultural system in the highlands of Ethiopia. We analyze barley samples at 14 microsatellite loci across sites at varying elevations and locations across a contiguous mountain range, and across farmer-identified barley types and management strategies. Genetic structure is analyzed using population-based and individual-based methods, including measures of population differentiation and genetic distance, multivariate Principal Coordinate Analysis, and Bayesian assignment tests. Phenotypic analysis links genetic patterns to traits identified by farmers. We find that differential farmer management strategies lead to markedly different patterns of population structure across elevation classes and barley types. The extent to which farmer seed management appears as a stronger determinant of spatial structure than the physical landscape highlights the need for incorporation of social, landscape, and genetic data for the design of conservation strategies in human-influenced landscapes. PMID:24478796
None of the above: A Bayesian account of the detection of novel categories.
Navarro, Daniel J; Kemp, Charles
2017-10-01
Every time we encounter a new object, action, or event, there is some chance that we will need to assign it to a novel category. We describe and evaluate a class of probabilistic models that detect when an object belongs to a category that has not previously been encountered. The models incorporate a prior distribution that is influenced by the distribution of previous objects among categories, and we present 2 experiments that demonstrate that people are also sensitive to this distributional information. Two additional experiments confirm that distributional information is combined with similarity when both sources of information are available. We compare our approach to previous models of unsupervised categorization and to several heuristic-based models, and find that a hierarchical Bayesian approach provides the best account of our data. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Instruction in information structuring improves Bayesian judgment in intelligence analysts.
Mandel, David R
2015-01-01
An experiment was conducted to test the effectiveness of brief instruction in information structuring (i.e., representing and integrating information) for improving the coherence of probability judgments and binary choices among intelligence analysts. Forty-three analysts were presented with comparable sets of Bayesian judgment problems before and immediately after instruction. After instruction, analysts' probability judgments were more coherent (i.e., more additive and compliant with Bayes theorem). Instruction also improved the coherence of binary choices regarding category membership: after instruction, subjects were more likely to invariably choose the category to which they assigned the higher probability of a target's membership. The research provides a rare example of evidence-based validation of effectiveness in instruction to improve the statistical assessment skills of intelligence analysts. Such instruction could also be used to improve the assessment quality of other types of experts who are required to integrate statistical information or make probabilistic assessments.
Procelewska, Joanna; Galilea, Javier Llamas; Clerc, Frederic; Farrusseng, David; Schüth, Ferdi
2007-01-01
The objective of this work is the construction of a correlation between characteristics of heterogeneous catalysts, encoded in a descriptor vector, and their experimentally measured performances in the propene oxidation reaction. In this paper the key issue in the modeling process, namely the selection of adequate input variables, is explored. Several data-driven feature selection strategies were applied in order to obtain an estimate of the differences in variance and information content of various attributes, furthermore to compare their relative importance. Quantitative property activity relationship techniques using probabilistic neural networks have been used for the creation of various semi-empirical models. Finally, a robust classification model, assigning selected attributes of solid compounds as input to an appropriate performance class in the model reaction was obtained. It has been evident that the mathematical support for the primary attributes set proposed by chemists can be highly desirable.
Behavior and neural basis of near-optimal visual search
Ma, Wei Ji; Navalpakkam, Vidhya; Beck, Jeffrey M; van den Berg, Ronald; Pouget, Alexandre
2013-01-01
The ability to search efficiently for a target in a cluttered environment is one of the most remarkable functions of the nervous system. This task is difficult under natural circumstances, as the reliability of sensory information can vary greatly across space and time and is typically a priori unknown to the observer. In contrast, visual-search experiments commonly use stimuli of equal and known reliability. In a target detection task, we randomly assigned high or low reliability to each item on a trial-by-trial basis. An optimal observer would weight the observations by their trial-to-trial reliability and combine them using a specific nonlinear integration rule. We found that humans were near-optimal, regardless of whether distractors were homogeneous or heterogeneous and whether reliability was manipulated through contrast or shape. We present a neural-network implementation of near-optimal visual search based on probabilistic population coding. The network matched human performance. PMID:21552276
Seismic hazard in the Nation's breadbasket
Boyd, Oliver; Haller, Kathleen; Luco, Nicolas; Moschetti, Morgan P.; Mueller, Charles; Petersen, Mark D.; Rezaeian, Sanaz; Rubinstein, Justin L.
2015-01-01
The USGS National Seismic Hazard Maps were updated in 2014 and included several important changes for the central United States (CUS). Background seismicity sources were improved using a new moment-magnitude-based catalog; a new adaptive, nearest-neighbor smoothing kernel was implemented; and maximum magnitudes for background sources were updated. Areal source zones developed by the Central and Eastern United States Seismic Source Characterization for Nuclear Facilities project were simplified and adopted. The weighting scheme for ground motion models was updated, giving more weight to models with a faster attenuation with distance compared to the previous maps. Overall, hazard changes (2% probability of exceedance in 50 years, across a range of ground-motion frequencies) were smaller than 10% in most of the CUS relative to the 2008 USGS maps despite new ground motion models and their assigned logic tree weights that reduced the probabilistic ground motions by 5–20%.
The neutral emergence of error minimized genetic codes superior to the standard genetic code.
Massey, Steven E
2016-11-07
The standard genetic code (SGC) assigns amino acids to codons in such a way that the impact of point mutations is reduced, this is termed 'error minimization' (EM). The occurrence of EM has been attributed to the direct action of selection, however it is difficult to explain how the searching of alternative codes for an error minimized code can occur via codon reassignments, given that these are likely to be disruptive to the proteome. An alternative scenario is that EM has arisen via the process of genetic code expansion, facilitated by the duplication of genes encoding charging enzymes and adaptor molecules. This is likely to have led to similar amino acids being assigned to similar codons. Strikingly, we show that if during code expansion the most similar amino acid to the parent amino acid, out of the set of unassigned amino acids, is assigned to codons related to those of the parent amino acid, then genetic codes with EM superior to the SGC easily arise. This scheme mimics code expansion via the gene duplication of charging enzymes and adaptors. The result is obtained for a variety of different schemes of genetic code expansion and provides a mechanistically realistic manner in which EM has arisen in the SGC. These observations might be taken as evidence for self-organization in the earliest stages of life. Copyright © 2016 Elsevier Ltd. All rights reserved.
A Probabilistic Approach to Data Integration in Biomedical Research: The IsBIG Experiments
ERIC Educational Resources Information Center
Anand, Vibha
2010-01-01
Biomedical research has produced vast amounts of new information in the last decade but has been slow to find its use in clinical applications. Data from disparate sources such as genetic studies and summary data from published literature have been amassed, but there is a significant gap, primarily due to a lack of normative methods, in combining…
A probabilistic watershed-based framework was developed to encompass wadeable streams within all three ecoregions of West Virginia, with the exclusion noted below. In Phase I of the project (year 2001), we developed and applied a probabilistic watershed-based sampling framework ...
Algebraic and Probabilistic Bases for Fuzzy Sets and the Development of Fuzzy Conditioning
1991-08-01
results; and also recently, among others, Bruno & Gilio (1985) bringing forth the basic is- e of combining implicatives compatible with conditional...probabilistic bases for fuzzy sets 67 7. Bruno, G. & Gilio , A. (1985), Confronto fra eventi condizionati di probabililiti nulla nell’ inferenza statistica
Amino acid fermentation at the origin of the genetic code.
de Vladar, Harold P
2012-02-10
There is evidence that the genetic code was established prior to the existence of proteins, when metabolism was powered by ribozymes. Also, early proto-organisms had to rely on simple anaerobic bioenergetic processes. In this work I propose that amino acid fermentation powered metabolism in the RNA world, and that this was facilitated by proto-adapters, the precursors of the tRNAs. Amino acids were used as carbon sources rather than as catalytic or structural elements. In modern bacteria, amino acid fermentation is known as the Stickland reaction. This pathway involves two amino acids: the first undergoes oxidative deamination, and the second acts as an electron acceptor through reductive deamination. This redox reaction results in two keto acids that are employed to synthesise ATP via substrate-level phosphorylation. The Stickland reaction is the basic bioenergetic pathway of some bacteria of the genus Clostridium. Two other facts support Stickland fermentation in the RNA world. First, several Stickland amino acid pairs are synthesised in abiotic amino acid synthesis. This suggests that amino acids that could be used as an energy substrate were freely available. Second, anticodons that have complementary sequences often correspond to amino acids that form Stickland pairs. The main hypothesis of this paper is that pairs of complementary proto-adapters were assigned to Stickland amino acids pairs. There are signatures of this hypothesis in the genetic code. Furthermore, it is argued that the proto-adapters formed double strands that brought amino acid pairs into proximity to facilitate their mutual redox reaction, structurally constraining the anticodon pairs that are assigned to these amino acid pairs. Significance tests which randomise the code are performed to study the extent of the variability of the energetic (ATP) yield. Random assignments can lead to a substantial yield of ATP and maintain enough variability, thus selection can act and refine the assignments into a proto-code that optimises the energetic yield. Monte Carlo simulations are performed to evaluate the establishment of these simple proto-codes, based on amino acid substitutions and codon swapping. In all cases, donor amino acids are assigned to anticodons composed of U+G, and have low redundancy (1-2 codons), whereas acceptor amino acids are assigned to the the remaining codons. These bioenergetic and structural constraints allow for a metabolic role for amino acids before their co-option as catalyst cofactors.
Kalafut, Tim; Schuerman, Curt; Sutton, Joel; Faris, Tom; Armogida, Luigi; Bright, Jo-Anne; Buckleton, John; Taylor, Duncan
2018-03-31
Modern probabilistic genotyping (PG) software is capable of modeling stutter as part of the profile weighting statistic. This allows for peaks in stutter positions to be considered as allelic or stutter or both. However, prior to running any sample through a PG calculator, the examiner must first interpret the sample, considering such things as artifacts and number of contributors (NOC or N). Stutter can play a major role both during the assignment of the number of contributors, and the assessment of inclusion and exclusion. If stutter peaks are not filtered when they should be, it can lead to the assignment of an additional contributor, causing N contributors to be assigned as N + 1. If peaks in the stutter position of a major contributor are filtered using a threshold that is too high, true alleles of minor contributors can be lost. Until now, the software used to view the electropherogram stutter filters are based on a locus specific model. Combined stutter peaks occur when a peak could be the result of both back stutter (stutter one repeat shorter than the allele) and forward stutter (stutter one repeat unit larger than the allele). This can challenge existing filters. We present here a novel stutter filter model in the ArmedXpert™ software package that uses a linear model based on allele for back stutter and applies an additive filter for combined stutter. We term this the allele specific stutter model (AM). We compared AM with a traditional model based on locus specific stutter filters (termed LM). This improved stutter model has the benefit of: Instances of over filtering were reduced 78% from 101 for a traditional model (LM) to 22 for the allele specific model (AM) when scored against each other. Instances of under filtering were reduced 80% from 85 (LM) to 17 (AM) when scored against ground truth mixtures. Published by Elsevier B.V.
Wasser, S K; Brown, L; Mailand, C; Mondol, S; Clark, W; Laurie, C; Weir, B S
2015-07-03
Poaching of elephants is now occurring at rates that threaten African populations with extinction. Identifying the number and location of Africa's major poaching hotspots may assist efforts to end poaching and facilitate recovery of elephant populations. We genetically assign origin to 28 large ivory seizures (≥0.5 metric tons) made between 1996 and 2014, also testing assignment accuracy. Results suggest that the major poaching hotspots in Africa may be currently concentrated in as few as two areas. Increasing law enforcement in these two hotspots could help curtail future elephant losses across Africa and disrupt this organized transnational crime. Copyright © 2015, American Association for the Advancement of Science.
Probabilistic structural analysis methods of hot engine structures
NASA Technical Reports Server (NTRS)
Chamis, C. C.; Hopkins, D. A.
1989-01-01
Development of probabilistic structural analysis methods for hot engine structures at Lewis Research Center is presented. Three elements of the research program are: (1) composite load spectra methodology; (2) probabilistic structural analysis methodology; and (3) probabilistic structural analysis application. Recent progress includes: (1) quantification of the effects of uncertainties for several variables on high pressure fuel turbopump (HPFT) turbine blade temperature, pressure, and torque of the space shuttle main engine (SSME); (2) the evaluation of the cumulative distribution function for various structural response variables based on assumed uncertainties in primitive structural variables; and (3) evaluation of the failure probability. Collectively, the results demonstrate that the structural durability of hot engine structural components can be effectively evaluated in a formal probabilistic/reliability framework.
NASA Technical Reports Server (NTRS)
Singhal, Surendra N.
2003-01-01
The SAE G-11 RMSL Division and Probabilistic Methods Committee meeting during October 6-8 at the Best Western Sterling Inn, Sterling Heights (Detroit), Michigan is co-sponsored by US Army Tank-automotive & Armaments Command (TACOM). The meeting will provide an industry/government/academia forum to review RMSL technology; reliability and probabilistic technology; reliability-based design methods; software reliability; and maintainability standards. With over 100 members including members with national/international standing, the mission of the G-11's Probabilistic Methods Committee is to "enable/facilitate rapid deployment of probabilistic technology to enhance the competitiveness of our industries by better, faster, greener, smarter, affordable and reliable product development."
A Markov Chain Approach to Probabilistic Swarm Guidance
NASA Technical Reports Server (NTRS)
Acikmese, Behcet; Bayard, David S.
2012-01-01
This paper introduces a probabilistic guidance approach for the coordination of swarms of autonomous agents. The main idea is to drive the swarm to a prescribed density distribution in a prescribed region of the configuration space. In its simplest form, the probabilistic approach is completely decentralized and does not require communication or collabo- ration between agents. Agents make statistically independent probabilistic decisions based solely on their own state, that ultimately guides the swarm to the desired density distribution in the configuration space. In addition to being completely decentralized, the probabilistic guidance approach has a novel autonomous self-repair property: Once the desired swarm density distribution is attained, the agents automatically repair any damage to the distribution without collaborating and without any knowledge about the damage.
Augmenting superpopulation capture-recapture models with population assignment data
Wen, Zhi; Pollock, Kenneth; Nichols, James; Waser, Peter
2011-01-01
Ecologists applying capture-recapture models to animal populations sometimes have access to additional information about individuals' populations of origin (e.g., information about genetics, stable isotopes, etc.). Tests that assign an individual's genotype to its most likely source population are increasingly used. Here we show how to augment a superpopulation capture-recapture model with such information. We consider a single superpopulation model without age structure, and split each entry probability into separate components due to births in situ and immigration. We show that it is possible to estimate these two probabilities separately. We first consider the case of perfect information about population of origin, where we can distinguish individuals born in situ from immigrants with certainty. Then we consider the more realistic case of imperfect information, where we use genetic or other information to assign probabilities to each individual's origin as in situ or outside the population. We use a resampling approach to impute the true population of origin from imperfect assignment information. The integration of data on population of origin with capture-recapture data allows us to determine the contributions of immigration and in situ reproduction to the growth of the population, an issue of importance to ecologists. We illustrate our new models with capture-recapture and genetic assignment data from a population of banner-tailed kangaroo rats Dipodomys spectabilis in Arizona.
Learning Probabilistic Logic Models from Probabilistic Examples
Chen, Jianzhong; Muggleton, Stephen; Santos, José
2009-01-01
Abstract We revisit an application developed originally using abductive Inductive Logic Programming (ILP) for modeling inhibition in metabolic networks. The example data was derived from studies of the effects of toxins on rats using Nuclear Magnetic Resonance (NMR) time-trace analysis of their biofluids together with background knowledge representing a subset of the Kyoto Encyclopedia of Genes and Genomes (KEGG). We now apply two Probabilistic ILP (PILP) approaches - abductive Stochastic Logic Programs (SLPs) and PRogramming In Statistical modeling (PRISM) to the application. Both approaches support abductive learning and probability predictions. Abductive SLPs are a PILP framework that provides possible worlds semantics to SLPs through abduction. Instead of learning logic models from non-probabilistic examples as done in ILP, the PILP approach applied in this paper is based on a general technique for introducing probability labels within a standard scientific experimental setting involving control and treated data. Our results demonstrate that the PILP approach provides a way of learning probabilistic logic models from probabilistic examples, and the PILP models learned from probabilistic examples lead to a significant decrease in error accompanied by improved insight from the learned results compared with the PILP models learned from non-probabilistic examples. PMID:19888348
Learning Probabilistic Logic Models from Probabilistic Examples.
Chen, Jianzhong; Muggleton, Stephen; Santos, José
2008-10-01
We revisit an application developed originally using abductive Inductive Logic Programming (ILP) for modeling inhibition in metabolic networks. The example data was derived from studies of the effects of toxins on rats using Nuclear Magnetic Resonance (NMR) time-trace analysis of their biofluids together with background knowledge representing a subset of the Kyoto Encyclopedia of Genes and Genomes (KEGG). We now apply two Probabilistic ILP (PILP) approaches - abductive Stochastic Logic Programs (SLPs) and PRogramming In Statistical modeling (PRISM) to the application. Both approaches support abductive learning and probability predictions. Abductive SLPs are a PILP framework that provides possible worlds semantics to SLPs through abduction. Instead of learning logic models from non-probabilistic examples as done in ILP, the PILP approach applied in this paper is based on a general technique for introducing probability labels within a standard scientific experimental setting involving control and treated data. Our results demonstrate that the PILP approach provides a way of learning probabilistic logic models from probabilistic examples, and the PILP models learned from probabilistic examples lead to a significant decrease in error accompanied by improved insight from the learned results compared with the PILP models learned from non-probabilistic examples.
An alternative to Rasch analysis using triadic comparisons and multi-dimensional scaling
NASA Astrophysics Data System (ADS)
Bradley, C.; Massof, R. W.
2016-11-01
Rasch analysis is a principled approach for estimating the magnitude of some shared property of a set of items when a group of people assign ordinal ratings to them. In the general case, Rasch analysis not only estimates person and item measures on the same invariant scale, but also estimates the average thresholds used by the population to define rating categories. However, Rasch analysis fails when there is insufficient variance in the observed responses because it assumes a probabilistic relationship between person measures, item measures and the rating assigned by a person to an item. When only a single person is rating all items, there may be cases where the person assigns the same rating to many items no matter how many times he rates them. We introduce an alternative to Rasch analysis for precisely these situations. Our approach leverages multi-dimensional scaling (MDS) and requires only rank orderings of items and rank orderings of pairs of distances between items to work. Simulations show one variant of this approach - triadic comparisons with non-metric MDS - provides highly accurate estimates of item measures in realistic situations.
Soft Mixer Assignment in a Hierarchical Generative Model of Natural Scene Statistics
Schwartz, Odelia; Sejnowski, Terrence J.; Dayan, Peter
2010-01-01
Gaussian scale mixture models offer a top-down description of signal generation that captures key bottom-up statistical characteristics of filter responses to images. However, the pattern of dependence among the filters for this class of models is prespecified. We propose a novel extension to the gaussian scale mixture model that learns the pattern of dependence from observed inputs and thereby induces a hierarchical representation of these inputs. Specifically, we propose that inputs are generated by gaussian variables (modeling local filter structure), multiplied by a mixer variable that is assigned probabilistically to each input from a set of possible mixers. We demonstrate inference of both components of the generative model, for synthesized data and for different classes of natural images, such as a generic ensemble and faces. For natural images, the mixer variable assignments show invariances resembling those of complex cells in visual cortex; the statistics of the gaussian components of the model are in accord with the outputs of divisive normalization models. We also show how our model helps interrelate a wide range of models of image statistics and cortical processing. PMID:16999575
Groth, Katrina M.; Smith, Curtis L.; Swiler, Laura P.
2014-04-05
In the past several years, several international agencies have begun to collect data on human performance in nuclear power plant simulators [1]. This data provides a valuable opportunity to improve human reliability analysis (HRA), but there improvements will not be realized without implementation of Bayesian methods. Bayesian methods are widely used in to incorporate sparse data into models in many parts of probabilistic risk assessment (PRA), but Bayesian methods have not been adopted by the HRA community. In this article, we provide a Bayesian methodology to formally use simulator data to refine the human error probabilities (HEPs) assigned by existingmore » HRA methods. We demonstrate the methodology with a case study, wherein we use simulator data from the Halden Reactor Project to update the probability assignments from the SPAR-H method. The case study demonstrates the ability to use performance data, even sparse data, to improve existing HRA methods. Furthermore, this paper also serves as a demonstration of the value of Bayesian methods to improve the technical basis of HRA.« less
To help address the Food Quality Protection Act of 1996, a physically-based, two-stage Monte Carlo probabilistic model has been developed to quantify and analyze aggregate exposure and dose to pesticides via multiple routes and pathways. To illustrate model capabilities and ide...
A ligand predication tool based on modeling and reasoning with imprecise probabilistic knowledge.
Liu, Weiru; Yue, Anbu; Timson, David J
2010-04-01
Ligand prediction has been driven by a fundamental desire to understand more about how biomolecules recognize their ligands and by the commercial imperative to develop new drugs. Most of the current available software systems are very complex and time-consuming to use. Therefore, developing simple and efficient tools to perform initial screening of interesting compounds is an appealing idea. In this paper, we introduce our tool for very rapid screening for likely ligands (either substrates or inhibitors) based on reasoning with imprecise probabilistic knowledge elicited from past experiments. Probabilistic knowledge is input to the system via a user-friendly interface showing a base compound structure. A prediction of whether a particular compound is a substrate is queried against the acquired probabilistic knowledge base and a probability is returned as an indication of the prediction. This tool will be particularly useful in situations where a number of similar compounds have been screened experimentally, but information is not available for all possible members of that group of compounds. We use two case studies to demonstrate how to use the tool. 2009 Elsevier Ireland Ltd. All rights reserved.
Assigning breed origin to alleles in crossbred animals.
Vandenplas, Jérémie; Calus, Mario P L; Sevillano, Claudia A; Windig, Jack J; Bastiaansen, John W M
2016-08-22
For some species, animal production systems are based on the use of crossbreeding to take advantage of the increased performance of crossbred compared to purebred animals. Effects of single nucleotide polymorphisms (SNPs) may differ between purebred and crossbred animals for several reasons: (1) differences in linkage disequilibrium between SNP alleles and a quantitative trait locus; (2) differences in genetic backgrounds (e.g., dominance and epistatic interactions); and (3) differences in environmental conditions, which result in genotype-by-environment interactions. Thus, SNP effects may be breed-specific, which has led to the development of genomic evaluations for crossbred performance that take such effects into account. However, to estimate breed-specific effects, it is necessary to know breed origin of alleles in crossbred animals. Therefore, our aim was to develop an approach for assigning breed origin to alleles of crossbred animals (termed BOA) without information on pedigree and to study its accuracy by considering various factors, including distance between breeds. The BOA approach consists of: (1) phasing genotypes of purebred and crossbred animals; (2) assigning breed origin to phased haplotypes; and (3) assigning breed origin to alleles of crossbred animals based on a library of assigned haplotypes, the breed composition of crossbred animals, and their SNP genotypes. The accuracy of allele assignments was determined for simulated datasets that include crosses between closely-related, distantly-related and unrelated breeds. Across these scenarios, the percentage of alleles of a crossbred animal that were correctly assigned to their breed origin was greater than 90 %, and increased with increasing distance between breeds, while the percentage of incorrectly assigned alleles was always less than 2 %. For the remaining alleles, i.e. 0 to 10 % of all alleles of a crossbred animal, breed origin could not be assigned. The BOA approach accurately assigns breed origin to alleles of crossbred animals, even if their pedigree is not recorded.
Hanson, Stanley L.; Perkins, David M.
1995-01-01
The construction of a probabilistic ground-motion hazard map for a region follows a sequence of analyses beginning with the selection of an earthquake catalog and ending with the mapping of calculated probabilistic ground-motion values (Hanson and others, 1992). An integral part of this process is the creation of sources used for the calculation of earthquake recurrence rates and ground motions. These sources consist of areas and lines that are representative of geologic or tectonic features and faults. After the design of the sources, it is necessary to arrange the coordinate points in a particular order compatible with the input format for the SEISRISK-III program (Bender and Perkins, 1987). Source zones are usually modeled as a point-rupture source. Where applicable, linear rupture sources are modeled with articulated lines, representing known faults, or a field of parallel lines, representing a generalized distribution of hypothetical faults. Based on the distribution of earthquakes throughout the individual source zones (or a collection of several sources), earthquake recurrence rates are computed for each of the sources, and a minimum and maximum magnitude is assigned. Over a period of time from 1978 to 1980 several conferences were held by the USGS to solicit information on regions of the United States for the purpose of creating source zones for computation of probabilistic ground motions (Thenhaus, 1983). As a result of these regional meetings and previous work in the Pacific Northwest, (Perkins and others, 1980), California continental shelf, (Thenhaus and others, 1980), and the Eastern outer continental shelf, (Perkins and others, 1979) a consensus set of source zones was agreed upon and subsequently used to produce a national ground motion hazard map for the United States (Algermissen and others, 1982). In this report and on the accompanying disk we provide a complete list of source areas and line sources as used for the 1982 and later 1990 seismic hazard maps for the conterminous U.S. and Alaska. These source zones are represented in the input form required for the hazard program SEISRISK-III, and they include the attenuation table and several other input parameter lines normally found at the beginning of an input data set for SEISRISK-III.
Kropatsch, Regina; Melis, Claudia; Stronen, Astrid V; Jensen, Henrik; Epplen, Joerg T
2015-01-01
The Norwegian Lundehund breed of dog has undergone a severe loss of genetic diversity as a result of inbreeding and epizootics of canine distemper. As a consequence, the breed is extremely homogeneous and accurate sex identification is not always possible by standard screening of X-chromosomal loci. To improve our genetic understanding of the breed we genotyped 17 individuals using a genome-wide array of 170 000 single nucleotide polymorphisms (SNPs). Standard analyses based on expected homozygosity of X-chromosomal loci failed in assigning individuals to the correct sex, as determined initially by physical examination and confirmed with the Y-chromosomal marker, amelogenin. This demonstrates that identification of sex using standard SNP assays can be erroneous in highly inbred individuals. © The American Genetic Association 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Libiger, Ondrej; Schork, Nicholas J.
2013-01-01
The determination of the ancestry and genetic backgrounds of the subjects in genetic and general epidemiology studies is a crucial component in the analysis of relevant outcomes or associations. Although there are many methods for differentiating ancestral subgroups among individuals based on genetic markers only a few of these methods provide actual estimates of the fraction of an individual’s genome that is likely to be associated with different ancestral populations. We propose a method for assigning ancestry that works in stages to refine estimates of ancestral population contributions to individual genomes. The method leverages genotype data in the public domain obtained from individuals with known ancestries. Although we showcase the method in the assessment of ancestral genome proportions leveraging largely continental populations, the strategy can be used for assessing within-continent or more subtle ancestral origins with the appropriate data. PMID:23335941
2014-01-01
Background Pedigree reconstruction using genetic analysis provides a useful means to estimate fundamental population biology parameters relating to population demography, trait heritability and individual fitness when combined with other sources of data. However, there remain limitations to pedigree reconstruction in wild populations, particularly in systems where parent-offspring relationships cannot be directly observed, there is incomplete sampling of individuals, or molecular parentage inference relies on low quality DNA from archived material. While much can still be inferred from incomplete or sparse pedigrees, it is crucial to evaluate the quality and power of available genetic information a priori to testing specific biological hypotheses. Here, we used microsatellite markers to reconstruct a multi-generation pedigree of wild Atlantic salmon (Salmo salar L.) using archived scale samples collected with a total trapping system within a river over a 10 year period. Using a simulation-based approach, we determined the optimal microsatellite marker number for accurate parentage assignment, and evaluated the power of the resulting partial pedigree to investigate important evolutionary and quantitative genetic characteristics of salmon in the system. Results We show that at least 20 microsatellites (ave. 12 alleles/locus) are required to maximise parentage assignment and to improve the power to estimate reproductive success and heritability in this study system. We also show that 1.5 fold differences can be detected between groups simulated to have differing reproductive success, and that it is possible to detect moderate heritability values for continuous traits (h2 ~ 0.40) with more than 80% power when using 28 moderately to highly polymorphic markers. Conclusion The methodologies and work flow described provide a robust approach for evaluating archived samples for pedigree-based research, even where only a proportion of the total population is sampled. The results demonstrate the feasibility of pedigree-based studies to address challenging ecological and evolutionary questions in free-living populations, where genealogies can be traced only using molecular tools, and that significant increases in pedigree assignment power can be achieved by using higher numbers of markers. PMID:24684698
Peakall, Rod; Smouse, Peter E.
2012-01-01
Summary: GenAlEx: Genetic Analysis in Excel is a cross-platform package for population genetic analyses that runs within Microsoft Excel. GenAlEx offers analysis of diploid codominant, haploid and binary genetic loci and DNA sequences. Both frequency-based (F-statistics, heterozygosity, HWE, population assignment, relatedness) and distance-based (AMOVA, PCoA, Mantel tests, multivariate spatial autocorrelation) analyses are provided. New features include calculation of new estimators of population structure: G′ST, G′′ST, Jost’s Dest and F′ST through AMOVA, Shannon Information analysis, linkage disequilibrium analysis for biallelic data and novel heterogeneity tests for spatial autocorrelation analysis. Export to more than 30 other data formats is provided. Teaching tutorials and expanded step-by-step output options are included. The comprehensive guide has been fully revised. Availability and implementation: GenAlEx is written in VBA and provided as a Microsoft Excel Add-in (compatible with Excel 2003, 2007, 2010 on PC; Excel 2004, 2011 on Macintosh). GenAlEx, and supporting documentation and tutorials are freely available at: http://biology.anu.edu.au/GenAlEx. Contact: rod.peakall@anu.edu.au PMID:22820204
Genetic structure of cougar populations across the Wyoming basin: Metapopulation or megapopulation
Anderson, C.R.; Lindzey, F.G.; McDonald, D.B.
2004-01-01
We examined the genetic structure of 5 Wyoming cougar (Puma concolor) populations surrounding the Wyoming Basin, as well as a population from southwestern Colorado. When using 9 microsatellite DNA loci, observed heterozygosity was similar among populations (HO = 0.49-0.59) and intermediate to that of other large carnivores. Estimates of genetic structure (FST = 0.028, RST = 0.029) and number of migrants per generation (Nm) suggested high gene flow. Nm was lowest between distant populations and highest among adjacent populations. Examination of these data, plus Mantel test results of genetic versus geographic distance (P ??? 0.01), suggested both isolation by distance and an effect of habitat matrix. Bayesian assignment to population based on individual genotypes showed that cougars in this region were best described as a single panmictic population. Total effective population size for cougars in this region ranged from 1,797 to 4,532 depending on mutation model and analytical method used. Based on measures of gene flow, extinction risk in the near future appears low. We found no support for the existence of metapopulation structure among cougars in this region.
Sentence-Based Attentional Mechanisms in Word Learning: Evidence from a Computational Model
Alishahi, Afra; Fazly, Afsaneh; Koehne, Judith; Crocker, Matthew W.
2012-01-01
When looking for the referents of novel nouns, adults and young children are sensitive to cross-situational statistics (Yu and Smith, 2007; Smith and Yu, 2008). In addition, the linguistic context that a word appears in has been shown to act as a powerful attention mechanism for guiding sentence processing and word learning (Landau and Gleitman, 1985; Altmann and Kamide, 1999; Kako and Trueswell, 2000). Koehne and Crocker (2010, 2011) investigate the interaction between cross-situational evidence and guidance from the sentential context in an adult language learning scenario. Their studies reveal that these learning mechanisms interact in a complex manner: they can be used in a complementary way when context helps reduce referential uncertainty; they influence word learning about equally strongly when cross-situational and contextual evidence are in conflict; and contextual cues block aspects of cross-situational learning when both mechanisms are independently applicable. To address this complex pattern of findings, we present a probabilistic computational model of word learning which extends a previous cross-situational model (Fazly et al., 2010) with an attention mechanism based on sentential cues. Our model uses a framework that seamlessly combines the two sources of evidence in order to study their emerging pattern of interaction during the process of word learning. Simulations of the experiments of (Koehne and Crocker, 2010, 2011) reveal an overall pattern of results that are in line with their findings. Importantly, we demonstrate that our model does not need to explicitly assign priority to either source of evidence in order to produce these results: learning patterns emerge as a result of a probabilistic interaction between the two clue types. Moreover, using a computational model allows us to examine the developmental trajectory of the differential roles of cross-situational and sentential cues in word learning. PMID:22783211
2014-01-01
Background To undertake an economic evaluation of rivaroxaban relative to the standard of care for stroke prevention in patients with non-valvular atrial fibrillation (AF) in Greece. Methods An existing Markov model designed to reflect the natural progression of AF patients through different health states, in the course of three month cycles, was adapted to the Greek setting. The analysis was undertaken from a payer perspective. Baseline event rates and efficacy data were obtained from the ROCKET-AF trial for rivaroxaban and vitamin-K-antagonists (VKAs). Utility values for events were based on literature. A treatment-related disutility of 0.05 was applied to the VKA arm. Costs assigned to each health state reflect the year 2013. An incremental cost effectiveness ratio (ICER) was calculated where the outcome was quality-adjusted-life year (QALY) and life-years gained. Probabilistic analysis was undertaken to deal with uncertainty. The horizon of analysis was over patient life time and both cost and outcomes were discounted at 3.5%. Results Based on safety-on-treatment data, rivaroxaban was associated with a 0.22 increment in QALYs compared to VKA. The average total lifetime cost of rivaroxaban-treated patients was €239 lower compared to VKA. Rivaroxaban was associated with additional drug acquisition cost (€4,033) and reduced monitoring cost (-€3,929). Therefore, rivaroxaban was a dominant alternative over VKA. Probabilistic analysis revealed that there is a 100% probability of rivaroxaban being cost-effective versus VKA at a willingness to pay threshold of €30,000/QALY gained. Conclusion Rivaroxaban may represent for payers a dominant option for the prevention of thromboembolic events in moderate to high risk AF patients in Greece. PMID:24512351
Fu, Yong-Bi
2014-01-01
Genotyping by sequencing (GBS) recently has emerged as a promising genomic approach for assessing genetic diversity on a genome-wide scale. However, concerns are not lacking about the uniquely large unbalance in GBS genotype data. Although some genotype imputation has been proposed to infer missing observations, little is known about the reliability of a genetic diversity analysis of GBS data, with up to 90% of observations missing. Here we performed an empirical assessment of accuracy in genetic diversity analysis of highly incomplete single nucleotide polymorphism genotypes with imputations. Three large single-nucleotide polymorphism genotype data sets for corn, wheat, and rice were acquired, and missing data with up to 90% of missing observations were randomly generated and then imputed for missing genotypes with three map-independent imputation methods. Estimating heterozygosity and inbreeding coefficient from original, missing, and imputed data revealed variable patterns of bias from assessed levels of missingness and genotype imputation, but the estimation biases were smaller for missing data without genotype imputation. The estimates of genetic differentiation were rather robust up to 90% of missing observations but became substantially biased when missing genotypes were imputed. The estimates of topology accuracy for four representative samples of interested groups generally were reduced with increased levels of missing genotypes. Probabilistic principal component analysis based imputation performed better in terms of topology accuracy than those analyses of missing data without genotype imputation. These findings are not only significant for understanding the reliability of the genetic diversity analysis with respect to large missing data and genotype imputation but also are instructive for performing a proper genetic diversity analysis of highly incomplete GBS or other genotype data. PMID:24626289
Larson, Wesley A; Seeb, Lisa W; Everett, Meredith V; Waples, Ryan K; Templin, William D; Seeb, James E
2014-01-01
Recent advances in population genomics have made it possible to detect previously unidentified structure, obtain more accurate estimates of demographic parameters, and explore adaptive divergence, potentially revolutionizing the way genetic data are used to manage wild populations. Here, we identified 10 944 single-nucleotide polymorphisms using restriction-site-associated DNA (RAD) sequencing to explore population structure, demography, and adaptive divergence in five populations of Chinook salmon (Oncorhynchus tshawytscha) from western Alaska. Patterns of population structure were similar to those of past studies, but our ability to assign individuals back to their region of origin was greatly improved (>90% accuracy for all populations). We also calculated effective size with and without removing physically linked loci identified from a linkage map, a novel method for nonmodel organisms. Estimates of effective size were generally above 1000 and were biased downward when physically linked loci were not removed. Outlier tests based on genetic differentiation identified 733 loci and three genomic regions under putative selection. These markers and genomic regions are excellent candidates for future research and can be used to create high-resolution panels for genetic monitoring and population assignment. This work demonstrates the utility of genomic data to inform conservation in highly exploited species with shallow population structure. PMID:24665338
A Unified Probabilistic Framework for Dose-Response Assessment of Human Health Effects.
Chiu, Weihsueh A; Slob, Wout
2015-12-01
When chemical health hazards have been identified, probabilistic dose-response assessment ("hazard characterization") quantifies uncertainty and/or variability in toxicity as a function of human exposure. Existing probabilistic approaches differ for different types of endpoints or modes-of-action, lacking a unifying framework. We developed a unified framework for probabilistic dose-response assessment. We established a framework based on four principles: a) individual and population dose responses are distinct; b) dose-response relationships for all (including quantal) endpoints can be recast as relating to an underlying continuous measure of response at the individual level; c) for effects relevant to humans, "effect metrics" can be specified to define "toxicologically equivalent" sizes for this underlying individual response; and d) dose-response assessment requires making adjustments and accounting for uncertainty and variability. We then derived a step-by-step probabilistic approach for dose-response assessment of animal toxicology data similar to how nonprobabilistic reference doses are derived, illustrating the approach with example non-cancer and cancer datasets. Probabilistically derived exposure limits are based on estimating a "target human dose" (HDMI), which requires risk management-informed choices for the magnitude (M) of individual effect being protected against, the remaining incidence (I) of individuals with effects ≥ M in the population, and the percent confidence. In the example datasets, probabilistically derived 90% confidence intervals for HDMI values span a 40- to 60-fold range, where I = 1% of the population experiences ≥ M = 1%-10% effect sizes. Although some implementation challenges remain, this unified probabilistic framework can provide substantially more complete and transparent characterization of chemical hazards and support better-informed risk management decisions.
Peer Influence, Genetic Propensity, and Binge Drinking: A Natural Experiment and a Replication.
Guo, Guang; Li, Yi; Wang, Hongyu; Cai, Tianji; Duncan, Greg J
2015-11-01
The authors draw data from the College Roommate Study (ROOM) and the National Longitudinal Study of Adolescent Health to investigate gene-environment interaction effects on youth binge drinking. In ROOM, the environmental influence was measured by the precollege drinking behavior of randomly assigned roommates. Random assignment safeguards against friend selection and removes the threat of gene-environment correlation that makes gene-environment interaction effects difficult to interpret. On average, being randomly assigned a drinking peer as opposed to a nondrinking peer increased college binge drinking by 0.5-1.0 episodes per month, or 20%-40% the average amount of binge drinking. However, this peer influence was found only among youths with a medium level of genetic propensity for alcohol use; those with either a low or high genetic propensity were not influenced by peer drinking. A replication of the findings is provided in data drawn from Add Health. The study shows that gene-environment interaction analysis can uncover social-contextual effects likely to be missed by traditional sociological approaches.
Reliability and risk assessment of structures
NASA Technical Reports Server (NTRS)
Chamis, C. C.
1991-01-01
Development of reliability and risk assessment of structural components and structures is a major activity at Lewis Research Center. It consists of five program elements: (1) probabilistic loads; (2) probabilistic finite element analysis; (3) probabilistic material behavior; (4) assessment of reliability and risk; and (5) probabilistic structural performance evaluation. Recent progress includes: (1) the evaluation of the various uncertainties in terms of cumulative distribution functions for various structural response variables based on known or assumed uncertainties in primitive structural variables; (2) evaluation of the failure probability; (3) reliability and risk-cost assessment; and (4) an outline of an emerging approach for eventual certification of man-rated structures by computational methods. Collectively, the results demonstrate that the structural durability/reliability of man-rated structural components and structures can be effectively evaluated by using formal probabilistic methods.
Supermultiplicative Speedups of Probabilistic Model-Building Genetic Algorithms
2009-02-01
physicists as well as practitioners in evolutionary computation. The project was later extended to the one-dimensional SK spin glass with power -law... Brasil ) 10. Yuji Sato (Hosei University, Japan) 11. Shunsukc Saruwatari (Tokyo University, Japan) 12. Jian-Hung Chen (Feng Chia University, Taiwan...scalability. In A. Tiwari, J. Knowlcs, E. Avincri, K. Dahal, and R. Roy (Eds.) Applications of Soft Computing: Recent Trends. Berlin: Springer (2006
Nagarajan, Mahesh B; Raman, Steven S; Lo, Pechin; Lin, Wei-Chan; Khoshnoodi, Pooria; Sayre, James W; Ramakrishna, Bharath; Ahuja, Preeti; Huang, Jiaoti; Margolis, Daniel J A; Lu, David S K; Reiter, Robert E; Goldin, Jonathan G; Brown, Matthew S; Enzmann, Dieter R
2018-02-19
We present a method for generating a T2 MR-based probabilistic model of tumor occurrence in the prostate to guide the selection of anatomical sites for targeted biopsies and serve as a diagnostic tool to aid radiological evaluation of prostate cancer. In our study, the prostate and any radiological findings within were segmented retrospectively on 3D T2-weighted MR images of 266 subjects who underwent radical prostatectomy. Subsequent histopathological analysis determined both the ground truth and the Gleason grade of the tumors. A randomly chosen subset of 19 subjects was used to generate a multi-subject-derived prostate template. Subsequently, a cascading registration algorithm involving both affine and non-rigid B-spline transforms was used to register the prostate of every subject to the template. Corresponding transformation of radiological findings yielded a population-based probabilistic model of tumor occurrence. The quality of our probabilistic model building approach was statistically evaluated by measuring the proportion of correct placements of tumors in the prostate template, i.e., the number of tumors that maintained their anatomical location within the prostate after their transformation into the prostate template space. Probabilistic model built with tumors deemed clinically significant demonstrated a heterogeneous distribution of tumors, with higher likelihood of tumor occurrence at the mid-gland anterior transition zone and the base-to-mid-gland posterior peripheral zones. Of 250 MR lesions analyzed, 248 maintained their original anatomical location with respect to the prostate zones after transformation to the prostate. We present a robust method for generating a probabilistic model of tumor occurrence in the prostate that could aid clinical decision making, such as selection of anatomical sites for MR-guided prostate biopsies.
Stewart, Kelly R; James, Michael C; Roden, Suzanne; Dutton, Peter H
2013-07-01
Investigating migratory connectivity between breeding and foraging areas is critical to effective management and conservation of highly mobile marine taxa, particularly threatened, endangered, or economically important species that cross through regional, national and international boundaries. The leatherback turtle (Dermochelys coriacea, Vandelli 1761) is one such transboundary species that spends time at breeding areas at low latitudes in the northwest Atlantic during spring and summer. From there, they migrate widely throughout the North Atlantic, but many show fidelity to one region off eastern Canada, where critical foraging habitat has been proposed. Our goal was to identify nesting beach origins for turtles foraging here. Using genetics, we identified natal beaches for 288 turtles that were live-captured off the coast of Nova Scotia, Canada. Turtles were sampled (skin or blood) and genotyped using 17 polymorphic microsatellite markers. Results from three assignment testing programs (ONCOR, GeneClass2 and Structure) were compared. Our nesting population reference data set included 1417 individuals from nine Atlantic nesting assemblages. A supplementary data set for 83 foraging turtles traced to nesting beaches using flipper tags and/or PIT tags (n = 72), or inferred from satellite telemetry (n = 11), enabled ground-truthing of the assignments. We first assigned turtles using only genetic information and then used the supplementary recapture information to verify assignments. ONCOR performed best, assigning 64 of the 83 recaptured turtles to natal beaches (77·1%). Turtles assigned to Trinidad (164), French Guiana (72), Costa Rica (44), St. Croix (7), and Florida (1) reflect the relative size of those nesting populations, although none of the turtles were assigned to four other potential source nesting assemblages. Our results demonstrate the utility of genetic approaches for determining source populations of foraging marine animals and include the first identification of natal rookeries of male leatherbacks, identified through satellite telemetry and verified with genetics. This work highlights the importance of long-term monitoring and tagging programmes in nesting and high-use foraging areas. Moreover, it provides a scientific basis for evaluating stock-specific effects of fisheries on migratory marine species, thus identifying where coordinated international recovery efforts may be most effective. © 2013 NOAA ‐ National Marine Fisheries Service. Journal of Animal Ecology © 2013 British Ecological Society.
Janes, J K; Roe, A D; Rice, A V; Gorrell, J C; Coltman, D W; Langor, D W; Sperling, F A H
2016-01-01
An understanding of mating systems and fine-scale spatial genetic structure is required to effectively manage forest pest species such as Dendroctonus ponderosae (mountain pine beetle). Here we used genome-wide single-nucleotide polymorphisms to assess the fine-scale genetic structure and mating system of D. ponderosae collected from a single stand in Alberta, Canada. Fine-scale spatial genetic structure was absent within the stand and the majority of genetic variation was best explained at the individual level. Relatedness estimates support previous reports of pre-emergence mating. Parentage assignment tests indicate that a polygamous mating system better explains the relationships among individuals within a gallery than the previously reported female monogamous/male polygynous system. Furthermore, there is some evidence to suggest that females may exploit the galleries of other females, at least under epidemic conditions. Our results suggest that current management models are likely to be effective across large geographic areas based on the absence of fine-scale genetic structure. PMID:26286666
NASA Astrophysics Data System (ADS)
Loher, Timothy; Woods, Monica A.; Jimenez-Hidalgo, Isadora; Hauser, Lorenz
2016-01-01
Declines in size at age of Pacific halibut Hippoglossus stenolepis, in concert with sexually-dimorphic growth and a constant minimum commercial size limit, have led to the expectation that the sex composition of commercial catches should be increasingly female-biased. Sensitivity analyses suggest that variance in sex composition of landings may be the most influential source of uncertainty affecting current understanding of spawning stock biomass. However, there is no reliable way to determine sex at landing because all halibut are eviscerated at sea. In 2014, a statistical method based on survey data was developed to estimate the probability that fish of any given length at age (LAA) would be female, derived from the fundamental observation that large, young fish are likely female whereas small, old fish have a high probability of being male. Here, we examine variability in age-specific sex composition using at-sea commercial and closed-season survey catches, and compare the accuracy of the survey-based LAA technique to genetic markers for reconstructing the sex composition of catches. Sexing by LAA performed best for summer-collected samples, consistent with the hypothesis that the ability to characterize catches can be influenced by seasonal demographic shifts. Additionally, differences between survey and commercial selectivity that allow fishers to harvest larger fish within cohorts may generate important mismatch between survey and commercial datasets. Length-at-age-based estimates ranged from 4.7% underestimation of female proportion to 12.0% overestimation, with mean error of 5.8 ± 1.5%. Ratios determined by genetics were closer to true sample proportions and displayed less variability; estimation to within < 1% of true ratios was limited to genetics. Genetic estimation of female proportions ranged from 4.9% underestimation to 2.5% overestimation, with a mean absolute error of 1.2 ± 1.2%. Males were generally more difficult to assign than females: 6.7% of males and 3.4% of females were incorrectly assigned. Although nuclear microsatellites proved more consistent at partitioning catches by sex, we recommend that SNP assays be developed to allow for rapid, cost-effective, and accurate sex identification.
A Novel Mittag-Leffler Kernel Based Hybrid Fault Diagnosis Method for Wheeled Robot Driving System.
Yuan, Xianfeng; Song, Mumin; Zhou, Fengyu; Chen, Zhumin; Li, Yan
2015-01-01
The wheeled robots have been successfully applied in many aspects, such as industrial handling vehicles, and wheeled service robots. To improve the safety and reliability of wheeled robots, this paper presents a novel hybrid fault diagnosis framework based on Mittag-Leffler kernel (ML-kernel) support vector machine (SVM) and Dempster-Shafer (D-S) fusion. Using sensor data sampled under different running conditions, the proposed approach initially establishes multiple principal component analysis (PCA) models for fault feature extraction. The fault feature vectors are then applied to train the probabilistic SVM (PSVM) classifiers that arrive at a preliminary fault diagnosis. To improve the accuracy of preliminary results, a novel ML-kernel based PSVM classifier is proposed in this paper, and the positive definiteness of the ML-kernel is proved as well. The basic probability assignments (BPAs) are defined based on the preliminary fault diagnosis results and their confidence values. Eventually, the final fault diagnosis result is archived by the fusion of the BPAs. Experimental results show that the proposed framework not only is capable of detecting and identifying the faults in the robot driving system, but also has better performance in stability and diagnosis accuracy compared with the traditional methods.
A Novel Mittag-Leffler Kernel Based Hybrid Fault Diagnosis Method for Wheeled Robot Driving System
Yuan, Xianfeng; Song, Mumin; Chen, Zhumin; Li, Yan
2015-01-01
The wheeled robots have been successfully applied in many aspects, such as industrial handling vehicles, and wheeled service robots. To improve the safety and reliability of wheeled robots, this paper presents a novel hybrid fault diagnosis framework based on Mittag-Leffler kernel (ML-kernel) support vector machine (SVM) and Dempster-Shafer (D-S) fusion. Using sensor data sampled under different running conditions, the proposed approach initially establishes multiple principal component analysis (PCA) models for fault feature extraction. The fault feature vectors are then applied to train the probabilistic SVM (PSVM) classifiers that arrive at a preliminary fault diagnosis. To improve the accuracy of preliminary results, a novel ML-kernel based PSVM classifier is proposed in this paper, and the positive definiteness of the ML-kernel is proved as well. The basic probability assignments (BPAs) are defined based on the preliminary fault diagnosis results and their confidence values. Eventually, the final fault diagnosis result is archived by the fusion of the BPAs. Experimental results show that the proposed framework not only is capable of detecting and identifying the faults in the robot driving system, but also has better performance in stability and diagnosis accuracy compared with the traditional methods. PMID:26229526
A Unified Probabilistic Framework for Dose–Response Assessment of Human Health Effects
Slob, Wout
2015-01-01
Background When chemical health hazards have been identified, probabilistic dose–response assessment (“hazard characterization”) quantifies uncertainty and/or variability in toxicity as a function of human exposure. Existing probabilistic approaches differ for different types of endpoints or modes-of-action, lacking a unifying framework. Objectives We developed a unified framework for probabilistic dose–response assessment. Methods We established a framework based on four principles: a) individual and population dose responses are distinct; b) dose–response relationships for all (including quantal) endpoints can be recast as relating to an underlying continuous measure of response at the individual level; c) for effects relevant to humans, “effect metrics” can be specified to define “toxicologically equivalent” sizes for this underlying individual response; and d) dose–response assessment requires making adjustments and accounting for uncertainty and variability. We then derived a step-by-step probabilistic approach for dose–response assessment of animal toxicology data similar to how nonprobabilistic reference doses are derived, illustrating the approach with example non-cancer and cancer datasets. Results Probabilistically derived exposure limits are based on estimating a “target human dose” (HDMI), which requires risk management–informed choices for the magnitude (M) of individual effect being protected against, the remaining incidence (I) of individuals with effects ≥ M in the population, and the percent confidence. In the example datasets, probabilistically derived 90% confidence intervals for HDMI values span a 40- to 60-fold range, where I = 1% of the population experiences ≥ M = 1%–10% effect sizes. Conclusions Although some implementation challenges remain, this unified probabilistic framework can provide substantially more complete and transparent characterization of chemical hazards and support better-informed risk management decisions. Citation Chiu WA, Slob W. 2015. A unified probabilistic framework for dose–response assessment of human health effects. Environ Health Perspect 123:1241–1254; http://dx.doi.org/10.1289/ehp.1409385 PMID:26006063
Tracking the genetic stability of a honeybee breeding program with genetic markers
USDA-ARS?s Scientific Manuscript database
A genetic stock identification (GSI) assay was developed in 2008 to distinguish Russian honey bees from other honey bee stocks that are commercially produced in the United States. Probability of assignment (POA) values have been collected and maintained since the stock release in 2008 to the Russian...
Performance of different SNP panels for parentage testing in two East Asian cattle breeds.
Strucken, E M; Gudex, B; Ferdosi, M H; Lee, H K; Song, K D; Gibson, J P; Kelly, M; Piper, E K; Porto-Neto, L R; Lee, S H; Gondro, C
2014-08-01
The International Society for Animal Genetics (ISAG) proposed a panel of single nucleotide polymorphisms (SNPs) for parentage testing in cattle (a core panel of 100 SNPs and an additional list of 100 SNPs). However, markers specific to East Asian taurine cattle breeds were not included, and no information is available as to whether the ISAG panel performs adequately for these breeds. We tested ISAG's core (100 SNP) and full (200 SNP) panels on two East Asian taurine breeds: the Korean Hanwoo and the Japanese Wagyu, the latter from the Australian herd. Even though the power of exclusion was high at 0.99 for both ISAG panels, the core panel performed poorly with 3.01% false-positive assignments in the Hanwoo population and 3.57% in the Wagyu. The full ISAG panel identified all sire-offspring relations correctly in both populations with 0.02% of relations wrongly excluded in the Hanwoo population. Based on these results, we created and tested two population-specific marker panels: one for the Wagyu population, which showed no false-positive assignments with either 100 or 200 SNPs, and a second panel for the Hanwoo, which still had some false-positive assignments with 100 SNPs but no false positives using 200 SNPs. In conclusion, for parentage assignment in East Asian cattle breeds, only the full ISAG panel is adequate for parentage testing. If fewer markers should be used, it is advisable to use population-specific markers rather than the ISAG panel. © 2014 Stichting International Foundation for Animal Genetics.
Probabilistic Reverse dOsimetry Estimating Exposure Distribution (PROcEED)
PROcEED is a web-based application used to conduct probabilistic reverse dosimetry calculations.The tool is used for estimating a distribution of exposure concentrations likely to have produced biomarker concentrations measured in a population.
Wang, Yue; Adalý, Tülay; Kung, Sun-Yuan; Szabo, Zsolt
2007-01-01
This paper presents a probabilistic neural network based technique for unsupervised quantification and segmentation of brain tissues from magnetic resonance images. It is shown that this problem can be solved by distribution learning and relaxation labeling, resulting in an efficient method that may be particularly useful in quantifying and segmenting abnormal brain tissues where the number of tissue types is unknown and the distributions of tissue types heavily overlap. The new technique uses suitable statistical models for both the pixel and context images and formulates the problem in terms of model-histogram fitting and global consistency labeling. The quantification is achieved by probabilistic self-organizing mixtures and the segmentation by a probabilistic constraint relaxation network. The experimental results show the efficient and robust performance of the new algorithm and that it outperforms the conventional classification based approaches. PMID:18172510
Probabilistic population projections with migration uncertainty
Azose, Jonathan J.; Ševčíková, Hana; Raftery, Adrian E.
2016-01-01
We produce probabilistic projections of population for all countries based on probabilistic projections of fertility, mortality, and migration. We compare our projections to those from the United Nations’ Probabilistic Population Projections, which uses similar methods for fertility and mortality but deterministic migration projections. We find that uncertainty in migration projection is a substantial contributor to uncertainty in population projections for many countries. Prediction intervals for the populations of Northern America and Europe are over 70% wider, whereas prediction intervals for the populations of Africa, Asia, and the world as a whole are nearly unchanged. Out-of-sample validation shows that the model is reasonably well calibrated. PMID:27217571
A hybrid quantum-inspired genetic algorithm for multiobjective flow shop scheduling.
Li, Bin-Bin; Wang, Ling
2007-06-01
This paper proposes a hybrid quantum-inspired genetic algorithm (HQGA) for the multiobjective flow shop scheduling problem (FSSP), which is a typical NP-hard combinatorial optimization problem with strong engineering backgrounds. On the one hand, a quantum-inspired GA (QGA) based on Q-bit representation is applied for exploration in the discrete 0-1 hyperspace by using the updating operator of quantum gate and genetic operators of Q-bit. Moreover, random-key representation is used to convert the Q-bit representation to job permutation for evaluating the objective values of the schedule solution. On the other hand, permutation-based GA (PGA) is applied for both performing exploration in permutation-based scheduling space and stressing exploitation for good schedule solutions. To evaluate solutions in multiobjective sense, a randomly weighted linear-sum function is used in QGA, and a nondominated sorting technique including classification of Pareto fronts and fitness assignment is applied in PGA with regard to both proximity and diversity of solutions. To maintain the diversity of the population, two trimming techniques for population are proposed. The proposed HQGA is tested based on some multiobjective FSSPs. Simulation results and comparisons based on several performance metrics demonstrate the effectiveness of the proposed HQGA.
Probabilistic Meteorological Characterization for Turbine Loads
NASA Astrophysics Data System (ADS)
Kelly, M.; Larsen, G.; Dimitrov, N. K.; Natarajan, A.
2014-06-01
Beyond the existing, limited IEC prescription to describe fatigue loads on wind turbines, we look towards probabilistic characterization of the loads via analogous characterization of the atmospheric flow, particularly for today's "taller" turbines with rotors well above the atmospheric surface layer. Based on both data from multiple sites as well as theoretical bases from boundary-layer meteorology and atmospheric turbulence, we offer probabilistic descriptions of shear and turbulence intensity, elucidating the connection of each to the other as well as to atmospheric stability and terrain. These are used as input to loads calculation, and with a statistical loads output description, they allow for improved design and loads calculations.
Probabilistic sizing of laminates with uncertainties
NASA Technical Reports Server (NTRS)
Shah, A. R.; Liaw, D. G.; Chamis, C. C.
1993-01-01
A reliability based design methodology for laminate sizing and configuration for a special case of composite structures is described. The methodology combines probabilistic composite mechanics with probabilistic structural analysis. The uncertainties of constituent materials (fiber and matrix) to predict macroscopic behavior are simulated using probabilistic theory. Uncertainties in the degradation of composite material properties are included in this design methodology. A multi-factor interaction equation is used to evaluate load and environment dependent degradation of the composite material properties at the micromechanics level. The methodology is integrated into a computer code IPACS (Integrated Probabilistic Assessment of Composite Structures). Versatility of this design approach is demonstrated by performing a multi-level probabilistic analysis to size the laminates for design structural reliability of random type structures. The results show that laminate configurations can be selected to improve the structural reliability from three failures in 1000, to no failures in one million. Results also show that the laminates with the highest reliability are the least sensitive to the loading conditions.
NASA Astrophysics Data System (ADS)
Fei, Cheng-Wei; Bai, Guang-Chen
2014-12-01
To improve the computational precision and efficiency of probabilistic design for mechanical dynamic assembly like the blade-tip radial running clearance (BTRRC) of gas turbine, a distribution collaborative probabilistic design method-based support vector machine of regression (SR)(called as DCSRM) is proposed by integrating distribution collaborative response surface method and support vector machine regression model. The mathematical model of DCSRM is established and the probabilistic design idea of DCSRM is introduced. The dynamic assembly probabilistic design of aeroengine high-pressure turbine (HPT) BTRRC is accomplished to verify the proposed DCSRM. The analysis results reveal that the optimal static blade-tip clearance of HPT is gained for designing BTRRC, and improving the performance and reliability of aeroengine. The comparison of methods shows that the DCSRM has high computational accuracy and high computational efficiency in BTRRC probabilistic analysis. The present research offers an effective way for the reliability design of mechanical dynamic assembly and enriches mechanical reliability theory and method.
Inference and Analysis of Population Structure Using Genetic Data and Network Theory.
Greenbaum, Gili; Templeton, Alan R; Bar-David, Shirli
2016-04-01
Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition's modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of prior conditions. The method is implemented in the software NetStruct (available at https://giligreenbaum.wordpress.com/software/). Copyright © 2016 by the Genetics Society of America.
Yeo, Matthew; Mauricio, Isabel L; Messenger, Louisa A; Lewis, Michael D; Llewellyn, Martin S; Acosta, Nidia; Bhattacharyya, Tapan; Diosque, Patricio; Carrasco, Hernan J; Miles, Michael A
2011-06-01
Multilocus sequence typing (MLST) is a powerful and highly discriminatory method for analysing pathogen population structure and epidemiology. Trypanosoma cruzi, the protozoan agent of American trypanosomiasis (Chagas disease), has remarkable genetic and ecological diversity. A standardised MLST protocol that is suitable for assignment of T. cruzi isolates to genetic lineage and for higher resolution diversity studies has not been developed. We have sequenced and diplotyped nine single copy housekeeping genes and assessed their value as part of a systematic MLST scheme for T. cruzi. A minimum panel of four MLST targets (Met-III, RB19, TcGPXII, and DHFR-TS) was shown to provide unambiguous assignment of isolates to the six known T. cruzi lineages (Discrete Typing Units, DTUs TcI-TcVI). In addition, we recommend six MLST targets (Met-II, Met-III, RB19, TcMPX, DHFR-TS, and TR) for more in depth diversity studies on the basis that diploid sequence typing (DST) with this expanded panel distinguished 38 out of 39 reference isolates. Phylogenetic analysis implies a subdivision between North and South American TcIV isolates. Single Nucleotide Polymorphism (SNP) data revealed high levels of heterozygosity among DTUs TcI, TcIII, TcIV and, for three targets, putative corresponding homozygous and heterozygous loci within DTUs TcI and TcIII. Furthermore, individual gene trees gave incongruent topologies at inter- and intra-DTU levels, inconsistent with a model of strict clonality. We demonstrate the value of systematic MLST diplotyping for describing inter-DTU relationships and for higher resolution diversity studies of T. cruzi, including presence of recombination events. The high levels of heterozygosity will facilitate future population genetics analysis based on MLST haplotypes.
Failed rib region prediction in a human body model during crash events with precrash braking.
Guleyupoglu, B; Koya, B; Barnard, R; Gayzik, F S
2018-02-28
The objective of this study is 2-fold. We used a validated human body finite element model to study the predicted chest injury (focusing on rib fracture as a function of element strain) based on varying levels of simulated precrash braking. Furthermore, we compare deterministic and probabilistic methods of rib injury prediction in the computational model. The Global Human Body Models Consortium (GHBMC) M50-O model was gravity settled in the driver position of a generic interior equipped with an advanced 3-point belt and airbag. Twelve cases were investigated with permutations for failure, precrash braking system, and crash severity. The severities used were median (17 kph), severe (34 kph), and New Car Assessment Program (NCAP; 56.4 kph). Cases with failure enabled removed rib cortical bone elements once 1.8% effective plastic strain was exceeded. Alternatively, a probabilistic framework found in the literature was used to predict rib failure. Both the probabilistic and deterministic methods take into consideration location (anterior, lateral, and posterior). The deterministic method is based on a rubric that defines failed rib regions dependent on a threshold for contiguous failed elements. The probabilistic method depends on age-based strain and failure functions. Kinematics between both methods were similar (peak max deviation: ΔX head = 17 mm; ΔZ head = 4 mm; ΔX thorax = 5 mm; ΔZ thorax = 1 mm). Seat belt forces at the time of probabilistic failed region initiation were lower than those at deterministic failed region initiation. The probabilistic method for rib fracture predicted more failed regions in the rib (an analog for fracture) than the deterministic method in all but 1 case where they were equal. The failed region patterns between models are similar; however, there are differences that arise due to stress reduced from element elimination that cause probabilistic failed regions to continue to rise after no deterministic failed region would be predicted. Both the probabilistic and deterministic methods indicate similar trends with regards to the effect of precrash braking; however, there are tradeoffs. The deterministic failed region method is more spatially sensitive to failure and is more sensitive to belt loads. The probabilistic failed region method allows for increased capability in postprocessing with respect to age. The probabilistic failed region method predicted more failed regions than the deterministic failed region method due to force distribution differences.
Comparison of probabilistic and deterministic fiber tracking of cranial nerves.
Zolal, Amir; Sobottka, Stephan B; Podlesek, Dino; Linn, Jennifer; Rieger, Bernhard; Juratli, Tareq A; Schackert, Gabriele; Kitzler, Hagen H
2017-09-01
OBJECTIVE The depiction of cranial nerves (CNs) using diffusion tensor imaging (DTI) is of great interest in skull base tumor surgery and DTI used with deterministic tracking methods has been reported previously. However, there are still no good methods usable for the elimination of noise from the resulting depictions. The authors have hypothesized that probabilistic tracking could lead to more accurate results, because it more efficiently extracts information from the underlying data. Moreover, the authors have adapted a previously described technique for noise elimination using gradual threshold increases to probabilistic tracking. To evaluate the utility of this new approach, a comparison is provided with this work between the gradual threshold increase method in probabilistic and deterministic tracking of CNs. METHODS Both tracking methods were used to depict CNs II, III, V, and the VII+VIII bundle. Depiction of 240 CNs was attempted with each of the above methods in 30 healthy subjects, which were obtained from 2 public databases: the Kirby repository (KR) and Human Connectome Project (HCP). Elimination of erroneous fibers was attempted by gradually increasing the respective thresholds (fractional anisotropy [FA] and probabilistic index of connectivity [PICo]). The results were compared with predefined ground truth images based on corresponding anatomical scans. Two label overlap measures (false-positive error and Dice similarity coefficient) were used to evaluate the success of both methods in depicting the CN. Moreover, the differences between these parameters obtained from the KR and HCP (with higher angular resolution) databases were evaluated. Additionally, visualization of 10 CNs in 5 clinical cases was attempted with both methods and evaluated by comparing the depictions with intraoperative findings. RESULTS Maximum Dice similarity coefficients were significantly higher with probabilistic tracking (p < 0.001; Wilcoxon signed-rank test). The false-positive error of the last obtained depiction was also significantly lower in probabilistic than in deterministic tracking (p < 0.001). The HCP data yielded significantly better results in terms of the Dice coefficient in probabilistic tracking (p < 0.001, Mann-Whitney U-test) and in deterministic tracking (p = 0.02). The false-positive errors were smaller in HCP data in deterministic tracking (p < 0.001) and showed a strong trend toward significance in probabilistic tracking (p = 0.06). In the clinical cases, the probabilistic method visualized 7 of 10 attempted CNs accurately, compared with 3 correct depictions with deterministic tracking. CONCLUSIONS High angular resolution DTI scans are preferable for the DTI-based depiction of the cranial nerves. Probabilistic tracking with a gradual PICo threshold increase is more effective for this task than the previously described deterministic tracking with a gradual FA threshold increase and might represent a method that is useful for depicting cranial nerves with DTI since it eliminates the erroneous fibers without manual intervention.
Awad, Lara; Fady, Bruno; Khater, Carla; Roig, Anne; Cheddadi, Rachid
2014-01-01
The threatened conifer Abies cilicica currently persists in Lebanon in geographically isolated forest patches. The impact of demographic and evolutionary processes on population genetic diversity and structure were assessed using 10 nuclear microsatellite loci. All remnant 15 local populations revealed a low genetic variation but a high recent effective population size. FST-based measures of population genetic differentiation revealed a low spatial genetic structure, but Bayesian analysis of population structure identified a significant Northeast-Southwest population structure. Populations showed significant but weak isolation-by-distance, indicating non-equilibrium conditions between dispersal and genetic drift. Bayesian assignment tests detected an asymmetric Northeast-Southwest migration involving some long-distance dispersal events. We suggest that the persistence and Northeast-Southwest geographic structure of Abies cilicica in Lebanon is the result of at least two demographic processes during its recent evolutionary history: (1) recent migration to currently marginal populations and (2) local persistence through altitudinal shifts along a mountainous topography. These results might help us better understand the mechanisms involved in the species response to expected climate change. PMID:24587219
Weighing costs and losses: A decision making game using probabilistic forecasts
NASA Astrophysics Data System (ADS)
Werner, Micha; Ramos, Maria-Helena; Wetterhall, Frederik; Cranston, Michael; van Andel, Schalk-Jan; Pappenberger, Florian; Verkade, Jan
2017-04-01
Probabilistic forecasts are increasingly recognised as an effective and reliable tool to communicate uncertainties. The economic value of probabilistic forecasts has been demonstrated by several authors, showing the benefit to using probabilistic forecasts over deterministic forecasts in several sectors, including flood and drought warning, hydropower, and agriculture. Probabilistic forecasting is also central to the emerging concept of risk-based decision making, and underlies emerging paradigms such as impact-based forecasting. Although the economic value of probabilistic forecasts is easily demonstrated in academic works, its evaluation in practice is more complex. The practical use of probabilistic forecasts requires decision makers to weigh the cost of an appropriate response to a probabilistic warning against the projected loss that would occur if the event forecast becomes reality. In this paper, we present the results of a simple game that aims to explore how decision makers are influenced by the costs required for taking a response and the potential losses they face in case the forecast flood event occurs. Participants play the role of one of three possible different shop owners. Each type of shop has losses of quite different magnitude, should a flood event occur. The shop owners are presented with several forecasts, each with a probability of a flood event occurring, which would inundate their shop and lead to those losses. In response, they have to decide if they want to do nothing, raise temporary defences, or relocate their inventory. Each action comes at a cost; and the different shop owners therefore have quite different cost/loss ratios. The game was played on four occasions. Players were attendees of the ensemble hydro-meteorological forecasting session of the 2016 EGU Assembly, professionals participating at two other conferences related to hydrometeorology, and a group of students. All audiences were familiar with the principles of forecasting and water-related risks, and one of the audiences comprised a group of experts in probabilistic forecasting. Results show that the different shop owners do take the costs of taking action and the potential losses into account in their decisions. Shop owners with a low cost/loss ratio were found to be more inclined to take actions based on the forecasts, though the absolute value of the losses also increased the willingness to take action. Little differentiation was found between the different groups of players.
Ancestry estimation and control of population stratification for sequence-based association studies.
Wang, Chaolong; Zhan, Xiaowei; Bragg-Gresham, Jennifer; Kang, Hyun Min; Stambolian, Dwight; Chew, Emily Y; Branham, Kari E; Heckenlively, John; Fulton, Robert; Wilson, Richard K; Mardis, Elaine R; Lin, Xihong; Swaroop, Anand; Zöllner, Sebastian; Abecasis, Gonçalo R
2014-04-01
Estimating individual ancestry is important in genetic association studies where population structure leads to false positive signals, although assigning ancestry remains challenging with targeted sequence data. We propose a new method for the accurate estimation of individual genetic ancestry, based on direct analysis of off-target sequence reads, and implement our method in the publicly available LASER software. We validate the method using simulated and empirical data and show that the method can accurately infer worldwide continental ancestry when used with sequencing data sets with whole-genome shotgun coverage as low as 0.001×. For estimates of fine-scale ancestry within Europe, the method performs well with coverage of 0.1×. On an even finer scale, the method improves discrimination between exome-sequenced study participants originating from different provinces within Finland. Finally, we show that our method can be used to improve case-control matching in genetic association studies and to reduce the risk of spurious findings due to population structure.
Iglesias, María José; García López, Jesús; Collados Luján, Juan Fernando; López Ortiz, Fernando; Bojórquez Pereznieto, Humberto; Toresano, Fernando; Camacho, Francisco
2014-01-01
The effects of genetic, technological and environmental factors on the chemical composition of four marmande type tomato varieties have been investigated. The study is based on the analysis of (1)H HRMAS NMR spectra of tomato purée using a combination of partial least squares (PLS) and assigned signal analysis (ASA). In agreement with genetic, morphological and taste characteristics of the tomatoes studied, the analysis of the NMR data allows two groups of samples to be differentiated. The type of culture and climatic conditions can reduce the compositional differences. The extension of the compositional changes produced by climatic conditions is variety-depend. Neither grafting nor perlite affect significantly the relative content of primary metabolites. This was not the case for tomatoes grown using the pure hydroponic production system based on the recirculation of nutrient solution, New Growing System NGS®, which seems to be an effective agricultural approach to improve tomato quality. Copyright © 2013 Elsevier Ltd. All rights reserved.
Haller, Toomas; Leitsalu, Liis; Fischer, Krista; Nuotio, Marja-Liisa; Esko, Tõnu; Boomsma, Dorothea Irene; Kyvik, Kirsten Ohm; Spector, Tim D; Perola, Markus; Metspalu, Andres
2017-01-01
Ancestry information at the individual level can be a valuable resource for personalized medicine, medical, demographical and history research, as well as for tracing back personal history. We report a new method for quantitatively determining personal genetic ancestry based on genome-wide data. Numerical ancestry component scores are assigned to individuals based on comparisons with reference populations. These comparisons are conducted with an existing analytical pipeline making use of genotype phasing, similarity matrix computation and our addition-multidimensional best fitting by MixFit. The method is demonstrated by studying Estonian and Finnish populations in geographical context. We show the main differences in the genetic composition of these otherwise close European populations and how they have influenced each other. The components of our analytical pipeline are freely available computer programs and scripts one of which was developed in house (available at: www.geenivaramu.ee/en/tools/mixfit).
Characterization of a novel variant of Mycobacterium chimaera.
van Ingen, J; Hoefsloot, W; Buijtels, P C A M; Tortoli, E; Supply, P; Dekhuijzen, P N R; Boeree, M J; van Soolingen, D
2012-09-01
In this study, nonchromogenic mycobacteria were isolated from pulmonary samples of three patients in the Netherlands. All isolates had identical, unique 16S rRNA gene and 16S-23S ITS sequences, which were closely related to those of Mycobacterium chimaera and Mycobacterium marseillense. The biochemical features of the isolates differed slightly from those of M. chimaera, suggesting that the isolates may represent a possible separate species within the Mycobacterium avium complex (MAC). However, the cell-wall mycolic acid pattern, analysed by HPLC, and the partial sequences of the hsp65 and rpoB genes were identical to those of M. chimaera. We concluded that the isolates represent a novel variant of M. chimaera. The results of this analysis have led us to question the currently used methods of species definition for members of the genus Mycobacterium, which are based largely on 16S rRNA or rpoB gene sequencing. Definitions based on a single genetic target are likely to be insufficient. Genetic divergence, especially in the MAC, yields strains that cannot be confidently assigned to a specific species based on the analysis of a single genetic target.
Feature generation using genetic programming with application to fault classification.
Guo, Hong; Jack, Lindsay B; Nandi, Asoke K
2005-02-01
One of the major challenges in pattern recognition problems is the feature extraction process which derives new features from existing features, or directly from raw data in order to reduce the cost of computation during the classification process, while improving classifier efficiency. Most current feature extraction techniques transform the original pattern vector into a new vector with increased discrimination capability but lower dimensionality. This is conducted within a predefined feature space, and thus, has limited searching power. Genetic programming (GP) can generate new features from the original dataset without prior knowledge of the probabilistic distribution. In this paper, a GP-based approach is developed for feature extraction from raw vibration data recorded from a rotating machine with six different conditions. The created features are then used as the inputs to a neural classifier for the identification of six bearing conditions. Experimental results demonstrate the ability of GP to discover autimatically the different bearing conditions using features expressed in the form of nonlinear functions. Furthermore, four sets of results--using GP extracted features with artificial neural networks (ANN) and support vector machines (SVM), as well as traditional features with ANN and SVM--have been obtained. This GP-based approach is used for bearing fault classification for the first time and exhibits superior searching power over other techniques. Additionaly, it significantly reduces the time for computation compared with genetic algorithm (GA), therefore, makes a more practical realization of the solution.
Gene–Environment Correlation: Difficulties and a Natural Experiment–Based Strategy
Li, Jiang; Liu, Hexuan; Guo, Guang
2013-01-01
Objectives. We explored how gene–environment correlations can result in endogenous models, how natural experiments can protect against this threat, and if unbiased estimates from natural experiments are generalizable to other contexts. Methods. We compared a natural experiment, the College Roommate Study, which measured genes and behaviors of college students and their randomly assigned roommates in a southern public university, with observational data from the National Longitudinal Study of Adolescent Health in 2008. We predicted exposure to exercising peers using genetic markers and estimated environmental effects on alcohol consumption. A mixed-linear model estimated an alcohol consumption variance that was attributable to genetic markers and across peer environments. Results. Peer exercise environment was associated with respondent genotype in observational data, but not in the natural experiment. The effects of peer drinking and presence of a general gene–environment interaction were similar between data sets. Conclusions. Natural experiments, like random roommate assignment, could protect against potential bias introduced by gene–environment correlations. When combined with representative observational data, unbiased and generalizable causal effects could be estimated. PMID:23927502
Quantifying introgression risk with realistic population genetics.
Ghosh, Atiyo; Meirmans, Patrick G; Haccou, Patsy
2012-12-07
Introgression is the permanent incorporation of genes from the genome of one population into another. This can have severe consequences, such as extinction of endemic species, or the spread of transgenes. Quantification of the risk of introgression is an important component of genetically modified crop regulation. Most theoretical introgression studies aimed at such quantification disregard one or more of the most important factors concerning introgression: realistic genetical mechanisms, repeated invasions and stochasticity. In addition, the use of linkage as a risk mitigation strategy has not been studied properly yet with genetic introgression models. Current genetic introgression studies fail to take repeated invasions and demographic stochasticity into account properly, and use incorrect measures of introgression risk that can be manipulated by arbitrary choices. In this study, we present proper methods for risk quantification that overcome these difficulties. We generalize a probabilistic risk measure, the so-called hazard rate of introgression, for application to introgression models with complex genetics and small natural population sizes. We illustrate the method by studying the effects of linkage and recombination on transgene introgression risk at different population sizes.
Quantifying introgression risk with realistic population genetics
Ghosh, Atiyo; Meirmans, Patrick G.; Haccou, Patsy
2012-01-01
Introgression is the permanent incorporation of genes from the genome of one population into another. This can have severe consequences, such as extinction of endemic species, or the spread of transgenes. Quantification of the risk of introgression is an important component of genetically modified crop regulation. Most theoretical introgression studies aimed at such quantification disregard one or more of the most important factors concerning introgression: realistic genetical mechanisms, repeated invasions and stochasticity. In addition, the use of linkage as a risk mitigation strategy has not been studied properly yet with genetic introgression models. Current genetic introgression studies fail to take repeated invasions and demographic stochasticity into account properly, and use incorrect measures of introgression risk that can be manipulated by arbitrary choices. In this study, we present proper methods for risk quantification that overcome these difficulties. We generalize a probabilistic risk measure, the so-called hazard rate of introgression, for application to introgression models with complex genetics and small natural population sizes. We illustrate the method by studying the effects of linkage and recombination on transgene introgression risk at different population sizes. PMID:23055068
Strucken, Eva M; Al-Mamun, Hawlader A; Esquivelzeta-Rabell, Cecilia; Gondro, Cedric; Mwai, Okeyo A; Gibson, John P
2017-09-12
Smallholder dairy farming in much of the developing world is based on the use of crossbred cows that combine local adaptation traits of indigenous breeds with high milk yield potential of exotic dairy breeds. Pedigree recording is rare in such systems which means that it is impossible to make informed breeding decisions. High-density single nucleotide polymorphism (SNP) assays allow accurate estimation of breed composition and parentage assignment but are too expensive for routine application. Our aim was to determine the level of accuracy achieved with low-density SNP assays. We constructed subsets of 100 to 1500 SNPs from the 735k-SNP Illumina panel by selecting: (a) on high minor allele frequencies (MAF) in a crossbred population; (b) on large differences in allele frequency between ancestral breeds; (c) at random; or (d) with a differential evolution algorithm. These panels were tested on a dataset of 1933 crossbred dairy cattle from Kenya/Uganda and on crossbred populations from Ethiopia (N = 545) and Tanzania (N = 462). Dairy breed proportions were estimated by using the ADMIXTURE program, a regression approach, and SNP-best linear unbiased prediction, and tested against estimates obtained by ADMIXTURE based on the 735k-SNP panel. Performance for parentage assignment was based on opposing homozygotes which were used to calculate the separation value (sv) between true and false assignments. Panels of SNPs based on the largest differences in allele frequency between European dairy breeds and a combined Nelore/N'Dama population gave the best predictions of dairy breed proportion (r 2 = 0.962 to 0.994 for 100 to 1500 SNPs) with an average absolute bias of 0.026. Panels of SNPs based on the highest MAF in the crossbred population (Kenya/Uganda) gave the most accurate parentage assignments (sv = -1 to 15 for 100 to 1500 SNPs). Due to the different required properties of SNPs, panels that did well for breed composition did poorly for parentage assignment and vice versa. A combined panel of 400 SNPs was not able to assign parentages correctly, thus we recommend the use of 200 SNPs either for breed proportion prediction or parentage assignment, independently.
USDA-ARS?s Scientific Manuscript database
Genetic differentiation among 10 populations of boll weevil, Anthonomus grandis grandis, sampled in 2009, in Texas and Mexico, was determined using ten microsatellite loci. In addition, temporal changes in genetic composition were examined in the eight populations for which samples were available fr...
Steven C. McKelvey; William D. Smith; Frank Koch
2012-01-01
This project summary describes a probabilistic model developed with funding support from the Forest Health Monitoring Program of the Forest Service, U.S. Department of Agriculture (BaseEM Project SO-R-08-01). The model has been implemented in SODBuster, a standalone software package developed using the Java software development kit from Sun Microsystems.
Probabilistic Wind Power Ramp Forecasting Based on a Scenario Generation Method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Qin; Florita, Anthony R; Krishnan, Venkat K
Wind power ramps (WPRs) are particularly important in the management and dispatch of wind power and currently drawing the attention of balancing authorities. With the aim to reduce the impact of WPRs for power system operations, this paper develops a probabilistic ramp forecasting method based on a large number of simulated scenarios. An ensemble machine learning technique is first adopted to forecast the basic wind power forecasting scenario and calculate the historical forecasting errors. A continuous Gaussian mixture model (GMM) is used to fit the probability distribution function (PDF) of forecasting errors. The cumulative distribution function (CDF) is analytically deduced.more » The inverse transform method based on Monte Carlo sampling and the CDF is used to generate a massive number of forecasting error scenarios. An optimized swinging door algorithm is adopted to extract all the WPRs from the complete set of wind power forecasting scenarios. The probabilistic forecasting results of ramp duration and start-time are generated based on all scenarios. Numerical simulations on publicly available wind power data show that within a predefined tolerance level, the developed probabilistic wind power ramp forecasting method is able to predict WPRs with a high level of sharpness and accuracy.« less
Probabilistic Wind Power Ramp Forecasting Based on a Scenario Generation Method: Preprint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Qin; Florita, Anthony R; Krishnan, Venkat K
2017-08-31
Wind power ramps (WPRs) are particularly important in the management and dispatch of wind power, and they are currently drawing the attention of balancing authorities. With the aim to reduce the impact of WPRs for power system operations, this paper develops a probabilistic ramp forecasting method based on a large number of simulated scenarios. An ensemble machine learning technique is first adopted to forecast the basic wind power forecasting scenario and calculate the historical forecasting errors. A continuous Gaussian mixture model (GMM) is used to fit the probability distribution function (PDF) of forecasting errors. The cumulative distribution function (CDF) ismore » analytically deduced. The inverse transform method based on Monte Carlo sampling and the CDF is used to generate a massive number of forecasting error scenarios. An optimized swinging door algorithm is adopted to extract all the WPRs from the complete set of wind power forecasting scenarios. The probabilistic forecasting results of ramp duration and start time are generated based on all scenarios. Numerical simulations on publicly available wind power data show that within a predefined tolerance level, the developed probabilistic wind power ramp forecasting method is able to predict WPRs with a high level of sharpness and accuracy.« less
Job shop scheduling problem with late work criterion
NASA Astrophysics Data System (ADS)
Piroozfard, Hamed; Wong, Kuan Yew
2015-05-01
Scheduling is considered as a key task in many industries, such as project based scheduling, crew scheduling, flight scheduling, machine scheduling, etc. In the machine scheduling area, the job shop scheduling problems are considered to be important and highly complex, in which they are characterized as NP-hard. The job shop scheduling problems with late work criterion and non-preemptive jobs are addressed in this paper. Late work criterion is a fairly new objective function. It is a qualitative measure and concerns with late parts of the jobs, unlike classical objective functions that are quantitative measures. In this work, simulated annealing was presented to solve the scheduling problem. In addition, operation based representation was used to encode the solution, and a neighbourhood search structure was employed to search for the new solutions. The case studies are Lawrence instances that were taken from the Operations Research Library. Computational results of this probabilistic meta-heuristic algorithm were compared with a conventional genetic algorithm, and a conclusion was made based on the algorithm and problem.
Hybrid Rendering with Scheduling under Uncertainty
Tamm, Georg; Krüger, Jens
2014-01-01
As scientific data of increasing size is generated by today’s simulations and measurements, utilizing dedicated server resources to process the visualization pipeline becomes necessary. In a purely server-based approach, requirements on the client-side are minimal as the client only displays results received from the server. However, the client may have a considerable amount of hardware available, which is left idle. Further, the visualization is put at the whim of possibly unreliable server and network conditions. Server load, bandwidth and latency may substantially affect the response time on the client. In this paper, we describe a hybrid method, where visualization workload is assigned to server and client. A capable client can produce images independently. The goal is to determine a workload schedule that enables a synergy between the two sides to provide rendering results to the user as fast as possible. The schedule is determined based on processing and transfer timings obtained at runtime. Our probabilistic scheduler adapts to changing conditions by shifting workload between server and client, and accounts for the performance variability in the dynamic system. PMID:25309115
Measurement level AIS/radar fusion for maritime surveillance
NASA Astrophysics Data System (ADS)
Habtemariam, Biruk K.; Tharmarasa, R.; Meger, Eric; Kirubarajan, T.
2012-05-01
Using the Automatic Identification System (AIS) ships identify themselves intermittently by broadcasting their location information. However, traditionally radars are used as the primary source of surveillance and AIS is considered as a supplement with a little interaction between these data sets. The data from AIS is much more accurate than radar data with practically no false alarms. But unlike the radar data, the AIS measurements arrive unpredictably, depending on the type and behavior of a ship. The AIS data includes target IDs that can be associated to initialized tracks. In multitarget maritime surveillance environment, for some targets the revisit interval form the AIS could be very large. In addition, the revisit intervals for various targets can be different. In this paper, we proposed a joint probabilistic data association based tracking algorithm that addresses the aforementioned issues to fuse the radar measurements with AIS data. Multiple AIS IDs are assigned to a track, with probabilities updated by both AIS and radar measurements to resolve the ambiguity in the AIS ID source. Experimental results based on simulated data demonstrate the performance the proposed technique.
Andromeda: a peptide search engine integrated into the MaxQuant environment.
Cox, Jürgen; Neuhauser, Nadin; Michalski, Annette; Scheltema, Richard A; Olsen, Jesper V; Mann, Matthias
2011-04-01
A key step in mass spectrometry (MS)-based proteomics is the identification of peptides in sequence databases by their fragmentation spectra. Here we describe Andromeda, a novel peptide search engine using a probabilistic scoring model. On proteome data, Andromeda performs as well as Mascot, a widely used commercial search engine, as judged by sensitivity and specificity analysis based on target decoy searches. Furthermore, it can handle data with arbitrarily high fragment mass accuracy, is able to assign and score complex patterns of post-translational modifications, such as highly phosphorylated peptides, and accommodates extremely large databases. The algorithms of Andromeda are provided. Andromeda can function independently or as an integrated search engine of the widely used MaxQuant computational proteomics platform and both are freely available at www.maxquant.org. The combination enables analysis of large data sets in a simple analysis workflow on a desktop computer. For searching individual spectra Andromeda is also accessible via a web server. We demonstrate the flexibility of the system by implementing the capability to identify cofragmented peptides, significantly improving the total number of identified peptides.
Parallel approach for bioinspired algorithms
NASA Astrophysics Data System (ADS)
Zaporozhets, Dmitry; Zaruba, Daria; Kulieva, Nina
2018-05-01
In the paper, a probabilistic parallel approach based on the population heuristic, such as a genetic algorithm, is suggested. The authors proposed using a multithreading approach at the micro level at which new alternative solutions are generated. On each iteration, several threads that independently used the same population to generate new solutions can be started. After the work of all threads, a selection operator combines obtained results in the new population. To confirm the effectiveness of the suggested approach, the authors have developed software on the basis of which experimental computations can be carried out. The authors have considered a classic optimization problem – finding a Hamiltonian cycle in a graph. Experiments show that due to the parallel approach at the micro level, increment of running speed can be obtained on graphs with 250 and more vertices.
PubMed related articles: a probabilistic topic-based model for content similarity
Lin, Jimmy; Wilbur, W John
2007-01-01
Background We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance–but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH ® in MEDLINE ®. Results The pmra retrieval model was compared against bm25, a competitive probabilistic model that shares theoretical similarities. Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of pmra over bm25 in terms of precision. Conclusion Our experiments suggest that the pmra model provides an effective ranking algorithm for related article search. PMID:17971238
Probabilistic Risk Assessment to Inform Decision Making: Frequently Asked Questions
General concepts and principles of Probabilistic Risk Assessment (PRA), describe how PRA can improve the bases of Agency decisions, and provide illustrations of how PRA has been used in risk estimation and in describing the uncertainty in decision making.
Preliminary Earthquake Hazard Map of Afghanistan
Boyd, Oliver S.; Mueller, Charles S.; Rukstales, Kenneth S.
2007-01-01
Introduction Earthquakes represent a serious threat to the people and institutions of Afghanistan. As part of a United States Agency for International Development (USAID) effort to assess the resource potential and seismic hazards of Afghanistan, the Seismic Hazard Mapping group of the United States Geological Survey (USGS) has prepared a series of probabilistic seismic hazard maps that help quantify the expected frequency and strength of ground shaking nationwide. To construct the maps, we do a complete hazard analysis for each of ~35,000 sites in the study area. We use a probabilistic methodology that accounts for all potential seismic sources and their rates of earthquake activity, and we incorporate modeling uncertainty by using logic trees for source and ground-motion parameters. See the Appendix for an explanation of probabilistic seismic hazard analysis and discussion of seismic risk. Afghanistan occupies a southward-projecting, relatively stable promontory of the Eurasian tectonic plate (Ambraseys and Bilham, 2003; Wheeler and others, 2005). Active plate boundaries, however, surround Afghanistan on the west, south, and east. To the west, the Arabian plate moves northward relative to Eurasia at about 3 cm/yr. The active plate boundary trends northwestward through the Zagros region of southwestern Iran. Deformation is accommodated throughout the territory of Iran; major structures include several north-south-trending, right-lateral strike-slip fault systems in the east and, farther to the north, a series of east-west-trending reverse- and strike-slip faults. This deformation apparently does not cross the border into relatively stable western Afghanistan. In the east, the Indian plate moves northward relative to Eurasia at a rate of about 4 cm/yr. A broad, transpressional plate-boundary zone extends into eastern Afghanistan, trending southwestward from the Hindu Kush in northeast Afghanistan, through Kabul, and along the Afghanistan-Pakistan border. Deformation here is expressed as a belt of major, north-northeast-trending, left-lateral strike-slip faults and abundant seismicity. The seismicity intensifies farther to the northeast and includes a prominent zone of deep earthquakes associated with northward subduction of the Indian plate beneath Eurasia that extends beneath the Hindu Kush and Pamirs Mountains. Production of the seismic hazard maps is challenging because the geological and seismological data required to produce a seismic hazard model are limited. The data that are available for this project include historical seismicity and poorly constrained slip rates on only a few of the many active faults in the country. Much of the hazard is derived from a new catalog of historical earthquakes: from 1964 to the present, with magnitude equal to or greater than about 4.5, and with depth between 0 and 250 kilometers. We also include four specific faults in the model: the Chaman fault with an assigned slip rate of 10 mm/yr, the Central Badakhshan fault with an assigned slip rate of 12 mm/yr, the Darvaz fault with an assigned slip rate of 7 mm/yr, and the Hari Rud fault with an assigned slip rate of 2 mm/yr. For these faults and for shallow seismicity less than 50 km deep, we incorporate published ground-motion estimates from tectonically active regions of western North America, Europe, and the Middle East. Ground-motion estimates for deeper seismicity are derived from data in subduction environments. We apply estimates derived for tectonic regions where subduction is the main tectonic process for intermediate-depth seismicity between 50- and 250-km depth. Within the framework of these limitations, we have developed a preliminary probabilistic seismic-hazard assessment of Afghanistan, the type of analysis that underpins the seismic components of modern building codes in the United States. The assessment includes maps of estimated peak ground-acceleration (PGA), 0.2-second spectral acceleration (SA), and 1.0-secon
Li, Xuehui; Wei, Yanling; Acharya, Ananta; Jiang, Qingzhen; Kang, Junmei; Brummer, E. Charles
2014-01-01
A genetic linkage map is a valuable tool for quantitative trait locus mapping, map-based gene cloning, comparative mapping, and whole-genome assembly. Alfalfa, one of the most important forage crops in the world, is autotetraploid, allogamous, and highly heterozygous, characteristics that have impeded the construction of a high-density linkage map using traditional genetic marker systems. Using genotyping-by-sequencing (GBS), we constructed low-cost, reasonably high-density linkage maps for both maternal and paternal parental genomes of an autotetraploid alfalfa F1 population. The resulting maps contain 3591 single-nucleotide polymorphism markers on 64 linkage groups across both parents, with an average density of one marker per 1.5 and 1.0 cM for the maternal and paternal haplotype maps, respectively. Chromosome assignments were made based on homology of markers to the M. truncatula genome. Four linkage groups representing the four haplotypes of each alfalfa chromosome were assigned to each of the eight Medicago chromosomes in both the maternal and paternal parents. The alfalfa linkage groups were highly syntenous with M. truncatula, and clearly identified the known translocation between Chromosomes 4 and 8. In addition, a small inversion on Chromosome 1 was identified between M. truncatula and M. sativa. GBS enabled us to develop a saturated linkage map for alfalfa that greatly improved genome coverage relative to previous maps and that will facilitate investigation of genome structure. GBS could be used in breeding populations to accelerate molecular breeding in alfalfa. PMID:25147192
Li, Xuehui; Wei, Yanling; Acharya, Ananta; Jiang, Qingzhen; Kang, Junmei; Brummer, E Charles
2014-08-21
A genetic linkage map is a valuable tool for quantitative trait locus mapping, map-based gene cloning, comparative mapping, and whole-genome assembly. Alfalfa, one of the most important forage crops in the world, is autotetraploid, allogamous, and highly heterozygous, characteristics that have impeded the construction of a high-density linkage map using traditional genetic marker systems. Using genotyping-by-sequencing (GBS), we constructed low-cost, reasonably high-density linkage maps for both maternal and paternal parental genomes of an autotetraploid alfalfa F1 population. The resulting maps contain 3591 single-nucleotide polymorphism markers on 64 linkage groups across both parents, with an average density of one marker per 1.5 and 1.0 cM for the maternal and paternal haplotype maps, respectively. Chromosome assignments were made based on homology of markers to the M. truncatula genome. Four linkage groups representing the four haplotypes of each alfalfa chromosome were assigned to each of the eight Medicago chromosomes in both the maternal and paternal parents. The alfalfa linkage groups were highly syntenous with M. truncatula, and clearly identified the known translocation between Chromosomes 4 and 8. In addition, a small inversion on Chromosome 1 was identified between M. truncatula and M. sativa. GBS enabled us to develop a saturated linkage map for alfalfa that greatly improved genome coverage relative to previous maps and that will facilitate investigation of genome structure. GBS could be used in breeding populations to accelerate molecular breeding in alfalfa. Copyright © 2014 Li et al.
Berry, Kristin H.; Edwards, Taylor
2013-01-01
The conservation of tortoises poses a unique situation because several threatened species are commonly kept as pets within their native ranges. Thus, there is potential for captive populations to be a reservoir for repatriation efforts. We assess the utility of captive populations of the threatened Agassiz’s desert tortoise (Gopherus agassizii) for recovery efforts based on genetic affinity to local areas. We collected samples from 130 captive desert tortoises from three desert communities: two in California (Ridgecrest and Joshua Tree) and the Desert Tortoise Conservation Center (Las Vegas) in Nevada. We tested all samples for 25 short tandem repeats and sequenced 1,109 bp of the mitochondrial genome. We compared captive genotypes to a database of 1,258 Gopherus samples, including 657 wild caught G. agassizii spanning the full range of the species. We conducted population assignment tests to determine the genetic origins of the captive individuals. For our total sample set, only 44 % of captive individuals were assigned to local populations based on genetic units derived from the reference database. One individual from Joshua Tree, California, was identified as being a Morafka’s desert tortoise, G. morafkai, a cryptic species which is not native to the Mojave Desert. Our data suggest that captive desert tortoises kept within the native range of G. agassizii cannot be presumed to have a genealogical affiliation to wild tortoises in their geographic proximity. Precautions should be taken before considering the release of captive tortoises into the wild as a management tool for recovery.
Probabilistic brain tissue segmentation in neonatal magnetic resonance imaging.
Anbeek, Petronella; Vincken, Koen L; Groenendaal, Floris; Koeman, Annemieke; van Osch, Matthias J P; van der Grond, Jeroen
2008-02-01
A fully automated method has been developed for segmentation of four different structures in the neonatal brain: white matter (WM), central gray matter (CEGM), cortical gray matter (COGM), and cerebrospinal fluid (CSF). The segmentation algorithm is based on information from T2-weighted (T2-w) and inversion recovery (IR) scans. The method uses a K nearest neighbor (KNN) classification technique with features derived from spatial information and voxel intensities. Probabilistic segmentations of each tissue type were generated. By applying thresholds on these probability maps, binary segmentations were obtained. These final segmentations were evaluated by comparison with a gold standard. The sensitivity, specificity, and Dice similarity index (SI) were calculated for quantitative validation of the results. High sensitivity and specificity with respect to the gold standard were reached: sensitivity >0.82 and specificity >0.9 for all tissue types. Tissue volumes were calculated from the binary and probabilistic segmentations. The probabilistic segmentation volumes of all tissue types accurately estimated the gold standard volumes. The KNN approach offers valuable ways for neonatal brain segmentation. The probabilistic outcomes provide a useful tool for accurate volume measurements. The described method is based on routine diagnostic magnetic resonance imaging (MRI) and is suitable for large population studies.
Probabilistic evaluation of on-line checks in fault-tolerant multiprocessor systems
NASA Technical Reports Server (NTRS)
Nair, V. S. S.; Hoskote, Yatin V.; Abraham, Jacob A.
1992-01-01
The analysis of fault-tolerant multiprocessor systems that use concurrent error detection (CED) schemes is much more difficult than the analysis of conventional fault-tolerant architectures. Various analytical techniques have been proposed to evaluate CED schemes deterministically. However, these approaches are based on worst-case assumptions related to the failure of system components. Often, the evaluation results do not reflect the actual fault tolerance capabilities of the system. A probabilistic approach to evaluate the fault detecting and locating capabilities of on-line checks in a system is developed. The various probabilities associated with the checking schemes are identified and used in the framework of the matrix-based model. Based on these probabilistic matrices, estimates for the fault tolerance capabilities of various systems are derived analytically.
A Multiatlas Segmentation Using Graph Cuts with Applications to Liver Segmentation in CT Scans
2014-01-01
An atlas-based segmentation approach is presented that combines low-level operations, an affine probabilistic atlas, and a multiatlas-based segmentation. The proposed combination provides highly accurate segmentation due to registrations and atlas selections based on the regions of interest (ROIs) and coarse segmentations. Our approach shares the following common elements between the probabilistic atlas and multiatlas segmentation: (a) the spatial normalisation and (b) the segmentation method, which is based on minimising a discrete energy function using graph cuts. The method is evaluated for the segmentation of the liver in computed tomography (CT) images. Low-level operations define a ROI around the liver from an abdominal CT. We generate a probabilistic atlas using an affine registration based on geometry moments from manually labelled data. Next, a coarse segmentation of the liver is obtained from the probabilistic atlas with low computational effort. Then, a multiatlas segmentation approach improves the accuracy of the segmentation. Both the atlas selections and the nonrigid registrations of the multiatlas approach use a binary mask defined by coarse segmentation. We experimentally demonstrate that this approach performs better than atlas selections and nonrigid registrations in the entire ROI. The segmentation results are comparable to those obtained by human experts and to other recently published results. PMID:25276219
An Effective Evolutionary Approach for Bicriteria Shortest Path Routing Problems
NASA Astrophysics Data System (ADS)
Lin, Lin; Gen, Mitsuo
Routing problem is one of the important research issues in communication network fields. In this paper, we consider a bicriteria shortest path routing (bSPR) model dedicated to calculating nondominated paths for (1) the minimum total cost and (2) the minimum transmission delay. To solve this bSPR problem, we propose a new multiobjective genetic algorithm (moGA): (1) an efficient chromosome representation using the priority-based encoding method; (2) a new operator of GA parameters auto-tuning, which is adaptively regulation of exploration and exploitation based on the change of the average fitness of parents and offspring which is occurred at each generation; and (3) an interactive adaptive-weight fitness assignment mechanism is implemented that assigns weights to each objective and combines the weighted objectives into a single objective function. Numerical experiments with various scales of network design problems show the effectiveness and the efficiency of our approach by comparing with the recent researches.
Mathew, Boby; Holand, Anna Marie; Koistinen, Petri; Léon, Jens; Sillanpää, Mikko J
2016-02-01
A novel reparametrization-based INLA approach as a fast alternative to MCMC for the Bayesian estimation of genetic parameters in multivariate animal model is presented. Multi-trait genetic parameter estimation is a relevant topic in animal and plant breeding programs because multi-trait analysis can take into account the genetic correlation between different traits and that significantly improves the accuracy of the genetic parameter estimates. Generally, multi-trait analysis is computationally demanding and requires initial estimates of genetic and residual correlations among the traits, while those are difficult to obtain. In this study, we illustrate how to reparametrize covariance matrices of a multivariate animal model/animal models using modified Cholesky decompositions. This reparametrization-based approach is used in the Integrated Nested Laplace Approximation (INLA) methodology to estimate genetic parameters of multivariate animal model. Immediate benefits are: (1) to avoid difficulties of finding good starting values for analysis which can be a problem, for example in Restricted Maximum Likelihood (REML); (2) Bayesian estimation of (co)variance components using INLA is faster to execute than using Markov Chain Monte Carlo (MCMC) especially when realized relationship matrices are dense. The slight drawback is that priors for covariance matrices are assigned for elements of the Cholesky factor but not directly to the covariance matrix elements as in MCMC. Additionally, we illustrate the concordance of the INLA results with the traditional methods like MCMC and REML approaches. We also present results obtained from simulated data sets with replicates and field data in rice.
Pettengill, James B; Moeller, David A
2012-09-01
The origins of hybrid zones between parapatric taxa have been of particular interest for understanding the evolution of reproductive isolation and the geographic context of species divergence. One challenge has been to distinguish between allopatric divergence (followed by secondary contact) versus primary intergradation (parapatric speciation) as alternative divergence histories. Here, we use complementary phylogeographic and population genetic analyses to investigate the recent divergence of two subspecies of Clarkia xantiana and the formation of a hybrid zone within the narrow region of sympatry. We tested alternative phylogeographic models of divergence using approximate Bayesian computation (ABC) and found strong support for a secondary contact model and little support for a model allowing for gene flow throughout the divergence process (i.e. primary intergradation). Two independent methods for inferring the ancestral geography of each subspecies, one based on probabilistic character state reconstructions and the other on palaeo-distribution modelling, also support a model of divergence in allopatry and range expansion leading to secondary contact. The membership of individuals to genetic clusters suggests geographic substructure within each taxon where allopatric and sympatric samples are primarily found in separate clusters. We also observed coincidence and concordance of genetic clines across three types of molecular markers, which suggests that there is a strong barrier to gene flow. Taken together, our results provide evidence for allopatric divergence followed by range expansion leading to secondary contact. The location of refugial populations and the directionality of range expansion are consistent with expectations based on climate change since the last glacial maximum. Our approach also illustrates the utility of combining phylogeographic hypothesis testing with species distribution modelling and fine-scale population genetic analyses for inferring the geography of the divergence process. © 2012 Blackwell Publishing Ltd.
Amino acid fermentation at the origin of the genetic code
2012-01-01
There is evidence that the genetic code was established prior to the existence of proteins, when metabolism was powered by ribozymes. Also, early proto-organisms had to rely on simple anaerobic bioenergetic processes. In this work I propose that amino acid fermentation powered metabolism in the RNA world, and that this was facilitated by proto-adapters, the precursors of the tRNAs. Amino acids were used as carbon sources rather than as catalytic or structural elements. In modern bacteria, amino acid fermentation is known as the Stickland reaction. This pathway involves two amino acids: the first undergoes oxidative deamination, and the second acts as an electron acceptor through reductive deamination. This redox reaction results in two keto acids that are employed to synthesise ATP via substrate-level phosphorylation. The Stickland reaction is the basic bioenergetic pathway of some bacteria of the genus Clostridium. Two other facts support Stickland fermentation in the RNA world. First, several Stickland amino acid pairs are synthesised in abiotic amino acid synthesis. This suggests that amino acids that could be used as an energy substrate were freely available. Second, anticodons that have complementary sequences often correspond to amino acids that form Stickland pairs. The main hypothesis of this paper is that pairs of complementary proto-adapters were assigned to Stickland amino acids pairs. There are signatures of this hypothesis in the genetic code. Furthermore, it is argued that the proto-adapters formed double strands that brought amino acid pairs into proximity to facilitate their mutual redox reaction, structurally constraining the anticodon pairs that are assigned to these amino acid pairs. Significance tests which randomise the code are performed to study the extent of the variability of the energetic (ATP) yield. Random assignments can lead to a substantial yield of ATP and maintain enough variability, thus selection can act and refine the assignments into a proto-code that optimises the energetic yield. Monte Carlo simulations are performed to evaluate the establishment of these simple proto-codes, based on amino acid substitutions and codon swapping. In all cases, donor amino acids are assigned to anticodons composed of U+G, and have low redundancy (1-2 codons), whereas acceptor amino acids are assigned to the the remaining codons. These bioenergetic and structural constraints allow for a metabolic role for amino acids before their co-option as catalyst cofactors. Reviewers: this article was reviewed by Prof. William Martin, Prof. Eörs Szathmáry (nominated by Dr. Gáspár Jékely) and Dr. Ádám Kun (nominated by Dr. Sandor Pongor) PMID:22325238
The case for probabilistic forecasting in hydrology
NASA Astrophysics Data System (ADS)
Krzysztofowicz, Roman
2001-08-01
That forecasts should be stated in probabilistic, rather than deterministic, terms has been argued from common sense and decision-theoretic perspectives for almost a century. Yet most operational hydrological forecasting systems produce deterministic forecasts and most research in operational hydrology has been devoted to finding the 'best' estimates rather than quantifying the predictive uncertainty. This essay presents a compendium of reasons for probabilistic forecasting of hydrological variates. Probabilistic forecasts are scientifically more honest, enable risk-based warnings of floods, enable rational decision making, and offer additional economic benefits. The growing demand for information about risk and the rising capability to quantify predictive uncertainties create an unparalleled opportunity for the hydrological profession to dramatically enhance the forecasting paradigm.
NASA Astrophysics Data System (ADS)
Yeh, Cheng-Ta; Lin, Yi-Kuei; Yang, Jo-Yun
2018-07-01
Network reliability is an important performance index for many real-life systems, such as electric power systems, computer systems and transportation systems. These systems can be modelled as stochastic-flow networks (SFNs) composed of arcs and nodes. Most system supervisors respect the network reliability maximization by finding the optimal multi-state resource assignment, which is one resource to each arc. However, a disaster may cause correlated failures for the assigned resources, affecting the network reliability. This article focuses on determining the optimal resource assignment with maximal network reliability for SFNs. To solve the problem, this study proposes a hybrid algorithm integrating the genetic algorithm and tabu search to determine the optimal assignment, called the hybrid GA-TS algorithm (HGTA), and integrates minimal paths, recursive sum of disjoint products and the correlated binomial distribution to calculate network reliability. Several practical numerical experiments are adopted to demonstrate that HGTA has better computational quality than several popular soft computing algorithms.
Remais, Justin V; Xiao, Ning; Akullian, Adam; Qiu, Dongchuan; Blair, David
2011-04-01
For many pathogens with environmental stages, or those carried by vectors or intermediate hosts, disease transmission is strongly influenced by pathogen, host, and vector movements across complex landscapes, and thus quantitative measures of movement rate and direction can reveal new opportunities for disease management and intervention. Genetic assignment methods are a set of powerful statistical approaches useful for establishing population membership of individuals. Recent theoretical improvements allow these techniques to be used to cost-effectively estimate the magnitude and direction of key movements in infectious disease systems, revealing important ecological and environmental features that facilitate or limit transmission. Here, we review the theory, statistical framework, and molecular markers that underlie assignment methods, and we critically examine recent applications of assignment tests in infectious disease epidemiology. Research directions that capitalize on use of the techniques are discussed, focusing on key parameters needing study for improved understanding of patterns of disease.
Integration of Evidence Base into a Probabilistic Risk Assessment
NASA Technical Reports Server (NTRS)
Saile, Lyn; Lopez, Vilma; Bickham, Grandin; Kerstman, Eric; FreiredeCarvalho, Mary; Byrne, Vicky; Butler, Douglas; Myers, Jerry; Walton, Marlei
2011-01-01
INTRODUCTION: A probabilistic decision support model such as the Integrated Medical Model (IMM) utilizes an immense amount of input data that necessitates a systematic, integrated approach for data collection, and management. As a result of this approach, IMM is able to forecasts medical events, resource utilization and crew health during space flight. METHODS: Inflight data is the most desirable input for the Integrated Medical Model. Non-attributable inflight data is collected from the Lifetime Surveillance for Astronaut Health study as well as the engineers, flight surgeons, and astronauts themselves. When inflight data is unavailable cohort studies, other models and Bayesian analyses are used, in addition to subject matters experts input on occasion. To determine the quality of evidence of a medical condition, the data source is categorized and assigned a level of evidence from 1-5; the highest level is one. The collected data reside and are managed in a relational SQL database with a web-based interface for data entry and review. The database is also capable of interfacing with outside applications which expands capabilities within the database itself. Via the public interface, customers can access a formatted Clinical Findings Form (CLiFF) that outlines the model input and evidence base for each medical condition. Changes to the database are tracked using a documented Configuration Management process. DISSCUSSION: This strategic approach provides a comprehensive data management plan for IMM. The IMM Database s structure and architecture has proven to support additional usages. As seen by the resources utilization across medical conditions analysis. In addition, the IMM Database s web-based interface provides a user-friendly format for customers to browse and download the clinical information for medical conditions. It is this type of functionality that will provide Exploratory Medicine Capabilities the evidence base for their medical condition list. CONCLUSION: The IMM Database in junction with the IMM is helping NASA aerospace program improve the health care and reduce risk for the astronauts crew. Both the database and model will continue to expand to meet customer needs through its multi-disciplinary evidence based approach to managing data. Future expansion could serve as a platform for a Space Medicine Wiki of medical conditions.
A Markov chain model for reliability growth and decay
NASA Technical Reports Server (NTRS)
Siegrist, K.
1982-01-01
A mathematical model is developed to describe a complex system undergoing a sequence of trials in which there is interaction between the internal states of the system and the outcomes of the trials. For example, the model might describe a system undergoing testing that is redesigned after each failure. The basic assumptions for the model are that the state of the system after a trial depends probabilistically only on the state before the trial and on the outcome of the trial and that the outcome of a trial depends probabilistically only on the state of the system before the trial. It is shown that under these basic assumptions, the successive states form a Markov chain and the successive states and outcomes jointly form a Markov chain. General results are obtained for the transition probabilities, steady-state distributions, etc. A special case studied in detail describes a system that has two possible state ('repaired' and 'unrepaired') undergoing trials that have three possible outcomes ('inherent failure', 'assignable-cause' 'failure' and 'success'). For this model, the reliability function is computed explicitly and an optimal repair policy is obtained.
Genetic stock identification of Russian honey bees.
Bourgeois, Lelania; Sheppard, Walter S; Sylvester, H Allen; Rinderer, Thomas E
2010-06-01
A genetic stock certification assay was developed to distinguish Russian honey bees from other European (Apis mellifera L.) stocks that are commercially produced in the United States. In total, 11 microsatellite and five single-nucleotide polymorphism loci were used. Loci were selected for relatively high levels of homogeneity within each group and for differences in allele frequencies between groups. A baseline sample consisted of the 18 lines of Russian honey bees released to the Russian Bee Breeders Association and bees from 34 queen breeders representing commercially produced European honey bee stocks. Suitability tests of the baseline sample pool showed high levels of accuracy. The probability of correct assignment was 94.2% for non-Russian bees and 93.3% for Russian bees. A neighbor-joining phenogram representing genetic distance data showed clear distinction of Russian and non-Russian honey bee stocks. Furthermore, a test of appropriate sample size showed a sample of eight bees per colony maximizes accuracy and consistency of the results. An additional 34 samples were tested as blind samples (origin unknown to those collecting data) to determine accuracy of individual assignment tests. Only one of these samples was incorrectly assigned. The 18 current breeding lines were represented among the 2009 blind sampling, demonstrating temporal stability of the genetic stock identification assay. The certification assay will be used through services provided by a service laboratory, by the Russian Bee Breeders Association to genetically certify their stock. The genetic certification will be used in conjunction with continued selection for favorable traits, such as honey production and varroa and tracheal mite resistance.
Probabilistic structural analysis to quantify uncertainties associated with turbopump blades
NASA Technical Reports Server (NTRS)
Nagpal, Vinod K.; Rubinstein, Robert; Chamis, Christos C.
1988-01-01
A probabilistic study of turbopump blades has been in progress at NASA Lewis Research Center for over the last two years. The objectives of this study are to evaluate the effects of uncertainties in geometry and material properties on the structural response of the turbopump blades to evaluate the tolerance limits on the design. A methodology based on probabilistic approach was developed to quantify the effects of the random uncertainties. The results indicate that only the variations in geometry have significant effects.
NASA Astrophysics Data System (ADS)
Győri, Erzsébet; Gráczer, Zoltán; Tóth, László; Bán, Zoltán; Horváth, Tibor
2017-04-01
Liquefaction potential evaluations are generally made to assess the hazard from specific scenario earthquakes. These evaluations may estimate the potential in a binary fashion (yes/no), define a factor of safety or predict the probability of liquefaction given a scenario event. Usually the level of ground shaking is obtained from the results of PSHA. Although it is determined probabilistically, a single level of ground shaking is selected and used within the liquefaction potential evaluation. In contrary, the fully probabilistic liquefaction potential assessment methods provide a complete picture of liquefaction hazard, namely taking into account the joint probability distribution of PGA and magnitude of earthquake scenarios; both of which are key inputs in the stress-based simplified methods. Kramer and Mayfield (2007) has developed a fully probabilistic liquefaction potential evaluation method using a performance-based earthquake engineering (PBEE) framework. The results of the procedure are the direct estimate of the return period of liquefaction and the liquefaction hazard curves in function of depth. The method combines the disaggregation matrices computed for different exceedance frequencies during probabilistic seismic hazard analysis with one of the recent models for the conditional probability of liquefaction. We have developed a software for the assessment of performance-based liquefaction triggering on the basis of Kramer and Mayfield method. Originally the SPT based probabilistic method of Cetin et al. (2004) was built-in into the procedure of Kramer and Mayfield to compute the conditional probability however there is no professional consensus about its applicability. Therefore we have included not only Cetin's method but Idriss and Boulanger (2012) SPT based moreover Boulanger and Idriss (2014) CPT based procedures into our computer program. In 1956, a damaging earthquake of magnitude 5.6 occurred in Dunaharaszti, in Hungary. Its epicenter was located about 5 km from the southern boundary of Budapest. The quake caused serious damages in the epicentral area and in the southern districts of the capital. The epicentral area of the earthquake is located along the Danube River. Sand boils were observed in some locations that indicated the occurrence of liquefaction. Because their exact locations were recorded at the time of the earthquake, in situ geotechnical measurements (CPT and SPT) could be performed at two (Dunaharaszti and Taksony) sites. The different types of measurements enabled the probabilistic liquefaction hazard computations at the two studied sites. We have compared the return periods of liquefaction that were computed using different built-in simplified stress based methods.
NASA Astrophysics Data System (ADS)
Faybishenko, B.; Flach, G. P.
2012-12-01
The objectives of this presentation are: (a) to illustrate the application of Monte Carlo and fuzzy-probabilistic approaches for uncertainty quantification (UQ) in predictions of potential evapotranspiration (PET), actual evapotranspiration (ET), and infiltration (I), using uncertain hydrological or meteorological time series data, and (b) to compare the results of these calculations with those from field measurements at the U.S. Department of Energy Savannah River Site (SRS), near Aiken, South Carolina, USA. The UQ calculations include the evaluation of aleatory (parameter uncertainty) and epistemic (model) uncertainties. The effect of aleatory uncertainty is expressed by assigning the probability distributions of input parameters, using historical monthly averaged data from the meteorological station at the SRS. The combined effect of aleatory and epistemic uncertainties on the UQ of PET, ET, and Iis then expressed by aggregating the results of calculations from multiple models using a p-box and fuzzy numbers. The uncertainty in PETis calculated using the Bair-Robertson, Blaney-Criddle, Caprio, Hargreaves-Samani, Hamon, Jensen-Haise, Linacre, Makkink, Priestly-Taylor, Penman, Penman-Monteith, Thornthwaite, and Turc models. Then, ET is calculated from the modified Budyko model, followed by calculations of I from the water balance equation. We show that probabilistic and fuzzy-probabilistic calculations using multiple models generate the PET, ET, and Idistributions, which are well within the range of field measurements. We also show that a selection of a subset of models can be used to constrain the uncertainty quantification of PET, ET, and I.
Lebowitz, Matthew S; Ahn, Woo-Kyoung
2017-11-01
Depression, like other mental disorders and health conditions generally, is increasingly construed as genetically based. This research sought to determine whether merely telling people that they have a genetic predisposition to depression can cause them to retroactively remember having experienced it. U.S. adults (men and women) were recruited online to participate (Experiment 1: N = 288; Experiment 2: N = 599). After conducting a test disguised as genetic screening, we randomly assigned some participants to be told that they carried elevated genetic susceptibility to depression, whereas others were told that they did not carry this genetic liability or were told that they carried elevated susceptibility to a different disorder. Participants then rated their experience of depressive symptoms over the prior 2 weeks on a modified version of the Beck Depression Inventory-II. Participants who were told that their genes predisposed them to depression generally reported higher levels of depressive symptomatology over the previous 2 weeks, compared to those who did not receive this feedback. Given the central role of self-report in psychiatric diagnosis, these findings highlight potentially harmful consequences of personalized genetic testing in mental health. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Probabilistic thinking and death anxiety: a terror management based study.
Hayslip, Bert; Schuler, Eric R; Page, Kyle S; Carver, Kellye S
2014-01-01
Terror Management Theory has been utilized to understand how death can change behavioral outcomes and social dynamics. One area that is not well researched is why individuals willingly engage in risky behavior that could accelerate their mortality. One method of distancing a potential life threatening outcome when engaging in risky behaviors is through stacking probability in favor of the event not occurring, termed probabilistic thinking. The present study examines the creation and psychometric properties of the Probabilistic Thinking scale in a sample of young, middle aged, and older adults (n = 472). The scale demonstrated adequate internal consistency reliability for each of the four subscales, excellent overall internal consistency, and good construct validity regarding relationships with measures of death anxiety. Reliable age and gender effects in probabilistic thinking were also observed. The relationship of probabilistic thinking as part of a cultural buffer against death anxiety is discussed, as well as its implications for Terror Management research.
Do probabilistic forecasts lead to better decisions?
NASA Astrophysics Data System (ADS)
Ramos, M. H.; van Andel, S. J.; Pappenberger, F.
2012-12-01
The last decade has seen growing research in producing probabilistic hydro-meteorological forecasts and increasing their reliability. This followed the promise that, supplied with information about uncertainty, people would take better risk-based decisions. In recent years, therefore, research and operational developments have also start putting attention to ways of communicating the probabilistic forecasts to decision makers. Communicating probabilistic forecasts includes preparing tools and products for visualization, but also requires understanding how decision makers perceive and use uncertainty information in real-time. At the EGU General Assembly 2012, we conducted a laboratory-style experiment in which several cases of flood forecasts and a choice of actions to take were presented as part of a game to participants, who acted as decision makers. Answers were collected and analyzed. In this paper, we present the results of this exercise and discuss if indeed we make better decisions on the basis of probabilistic forecasts.
Do probabilistic forecasts lead to better decisions?
NASA Astrophysics Data System (ADS)
Ramos, M. H.; van Andel, S. J.; Pappenberger, F.
2013-06-01
The last decade has seen growing research in producing probabilistic hydro-meteorological forecasts and increasing their reliability. This followed the promise that, supplied with information about uncertainty, people would take better risk-based decisions. In recent years, therefore, research and operational developments have also started focusing attention on ways of communicating the probabilistic forecasts to decision-makers. Communicating probabilistic forecasts includes preparing tools and products for visualisation, but also requires understanding how decision-makers perceive and use uncertainty information in real time. At the EGU General Assembly 2012, we conducted a laboratory-style experiment in which several cases of flood forecasts and a choice of actions to take were presented as part of a game to participants, who acted as decision-makers. Answers were collected and analysed. In this paper, we present the results of this exercise and discuss if we indeed make better decisions on the basis of probabilistic forecasts.
A Genome Wide Survey of SNP Variation Reveals the Genetic Structure of Sheep Breeds
Kijas, James W.; Townley, David; Dalrymple, Brian P.; Heaton, Michael P.; Maddox, Jillian F.; McGrath, Annette; Wilson, Peter; Ingersoll, Roxann G.; McCulloch, Russell; McWilliam, Sean; Tang, Dave; McEwan, John; Cockett, Noelle; Oddy, V. Hutton; Nicholas, Frank W.; Raadsma, Herman
2009-01-01
The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identifying the first genome-wide set of SNP for sheep, we report on levels of genetic variability both within and between a diverse sample of ovine populations. Then, using cluster analysis and the partitioning of genetic variation, we demonstrate sheep are characterised by weak phylogeographic structure, overlapping genetic similarity and generally low differentiation which is consistent with their short evolutionary history. The degree of population substructure was, however, sufficient to cluster individuals based on geographic origin and known breed history. Specifically, African and Asian populations clustered separately from breeds of European origin sampled from Australia, New Zealand, Europe and North America. Furthermore, we demonstrate the presence of stratification within some, but not all, ovine breeds. The results emphasize that careful documentation of genetic structure will be an essential prerequisite when mapping the genetic basis of complex traits. Furthermore, the identification of a subset of SNP able to assign individuals into broad groupings demonstrates even a small panel of markers may be suitable for applications such as traceability. PMID:19270757
Tackling the achilles' heel of genetic testing.
Watkins, Hugh
2015-01-14
Assigning pathogenicity to rare genetic variants is at its hardest with the enormous titin gene, but comprehensive genomic analysis makes the task more tractable (Roberts et al., this issue). Copyright © 2015, American Association for the Advancement of Science.
Predicting mining activity with parallel genetic algorithms
Talaie, S.; Leigh, R.; Louis, S.J.; Raines, G.L.; Beyer, H.G.; O'Reilly, U.M.; Banzhaf, Arnold D.; Blum, W.; Bonabeau, C.; Cantu-Paz, E.W.; ,; ,
2005-01-01
We explore several different techniques in our quest to improve the overall model performance of a genetic algorithm calibrated probabilistic cellular automata. We use the Kappa statistic to measure correlation between ground truth data and data predicted by the model. Within the genetic algorithm, we introduce a new evaluation function sensitive to spatial correctness and we explore the idea of evolving different rule parameters for different subregions of the land. We reduce the time required to run a simulation from 6 hours to 10 minutes by parallelizing the code and employing a 10-node cluster. Our empirical results suggest that using the spatially sensitive evaluation function does indeed improve the performance of the model and our preliminary results also show that evolving different rule parameters for different regions tends to improve overall model performance. Copyright 2005 ACM.
Privacy rules for DNA databanks. Protecting coded 'future diaries'.
Annas, G J
1993-11-17
In privacy terms, genetic information is like medical information. But the information contained in the DNA molecule itself is more sensitive because it contains an individual's probabilistic "future diary," is written in a code that has only partially been broken, and contains information about an individual's parents, siblings, and children. Current rules for protecting the privacy of medical information cannot protect either genetic information or identifiable DNA samples stored in DNA databanks. A review of the legal and public policy rationales for protecting genetic privacy suggests that specific enforceable privacy rules for DNA databanks are needed. Four preliminary rules are proposed to govern the creation of DNA databanks, the collection of DNA samples for storage, limits on the use of information derived from the samples, and continuing obligations to those whose DNA samples are in the databanks.
Single-tier city logistics model for single product
NASA Astrophysics Data System (ADS)
Saragih, N. I.; Nur Bahagia, S.; Suprayogi; Syabri, I.
2017-11-01
This research develops single-tier city logistics model which consists of suppliers, UCCs, and retailers. The problem that will be answered in this research is how to determine the location of UCCs, to allocate retailers to opened UCCs, to assign suppliers to opened UCCs, to control inventory in the three entities involved, and to determine the route of the vehicles from opened UCCs to retailers. This model has never been developed before. All the decisions will be simultaneously optimized. Characteristic of the demand is probabilistic following a normal distribution, and the number of product is single.
Zaikin, Alexey; Míguez, Joaquín
2017-01-01
We compare three state-of-the-art Bayesian inference methods for the estimation of the unknown parameters in a stochastic model of a genetic network. In particular, we introduce a stochastic version of the paradigmatic synthetic multicellular clock model proposed by Ullner et al., 2007. By introducing dynamical noise in the model and assuming that the partial observations of the system are contaminated by additive noise, we enable a principled mechanism to represent experimental uncertainties in the synthesis of the multicellular system and pave the way for the design of probabilistic methods for the estimation of any unknowns in the model. Within this setup, we tackle the Bayesian estimation of a subset of the model parameters. Specifically, we compare three Monte Carlo based numerical methods for the approximation of the posterior probability density function of the unknown parameters given a set of partial and noisy observations of the system. The schemes we assess are the particle Metropolis-Hastings (PMH) algorithm, the nonlinear population Monte Carlo (NPMC) method and the approximate Bayesian computation sequential Monte Carlo (ABC-SMC) scheme. We present an extensive numerical simulation study, which shows that while the three techniques can effectively solve the problem there are significant differences both in estimation accuracy and computational efficiency. PMID:28797087
VizieR Online Data Catalog: Proper motions of PM2000 open clusters (Krone-Martins+, 2010)
NASA Astrophysics Data System (ADS)
Krone-Martins, A.; Soubiran, C.; Ducourant, C.; Teixeira, R.; Le Campion, J. F.
2010-04-01
We present lists of proper-motions and kinematic membership probabilities in the region of 49 open clusters or possible open clusters. The stellar proper motions were taken from the Bordeaux PM2000 catalogue. The segregation between cluster and field stars and the assignment of membership probabilities was accomplished by applying a fully automated method based on parametrisations for the probability distribution functions and genetic algorithm optimisation heuristics associated with a derivative-based hill climbing algorithm for the likelihood optimization. (3 data files).
Wang, Junbai; Wu, Qianqian; Hu, Xiaohua Tony; Tian, Tianhai
2016-11-01
Investigating the dynamics of genetic regulatory networks through high throughput experimental data, such as microarray gene expression profiles, is a very important but challenging task. One of the major hindrances in building detailed mathematical models for genetic regulation is the large number of unknown model parameters. To tackle this challenge, a new integrated method is proposed by combining a top-down approach and a bottom-up approach. First, the top-down approach uses probabilistic graphical models to predict the network structure of DNA repair pathway that is regulated by the p53 protein. Two networks are predicted, namely a network of eight genes with eight inferred interactions and an extended network of 21 genes with 17 interactions. Then, the bottom-up approach using differential equation models is developed to study the detailed genetic regulations based on either a fully connected regulatory network or a gene network obtained by the top-down approach. Model simulation error, parameter identifiability and robustness property are used as criteria to select the optimal network. Simulation results together with permutation tests of input gene network structures indicate that the prediction accuracy and robustness property of the two predicted networks using the top-down approach are better than those of the corresponding fully connected networks. In particular, the proposed approach reduces computational cost significantly for inferring model parameters. Overall, the new integrated method is a promising approach for investigating the dynamics of genetic regulation. Copyright © 2016 Elsevier Inc. All rights reserved.
Accurate continuous geographic assignment from low- to high-density SNP data.
Guillot, Gilles; Jónsson, Hákon; Hinge, Antoine; Manchih, Nabil; Orlando, Ludovic
2016-04-01
Large-scale genotype datasets can help track the dispersal patterns of epidemiological outbreaks and predict the geographic origins of individuals. Such genetically-based geographic assignments also show a range of possible applications in forensics for profiling both victims and criminals, and in wildlife management, where poaching hotspot areas can be located. They, however, require fast and accurate statistical methods to handle the growing amount of genetic information made available from genotype arrays and next-generation sequencing technologies. We introduce a novel statistical method for geopositioning individuals of unknown origin from genotypes. Our method is based on a geostatistical model trained with a dataset of georeferenced genotypes. Statistical inference under this model can be implemented within the theoretical framework of Integrated Nested Laplace Approximation, which represents one of the major recent breakthroughs in statistics, as it does not require Monte Carlo simulations. We compare the performance of our method and an alternative method for geospatial inference, SPA in a simulation framework. We highlight the accuracy and limits of continuous spatial assignment methods at various scales by analyzing genotype datasets from a diversity of species, including Florida Scrub-jay birds Aphelocoma coerulescens, Arabidopsis thaliana and humans, representing 41-197,146 SNPs. Our method appears to be best suited for the analysis of medium-sized datasets (a few tens of thousands of loci), such as reduced-representation sequencing data that become increasingly available in ecology. http://www2.imm.dtu.dk/∼gigu/Spasiba/ gilles.b.guillot@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Integration of Advanced Probabilistic Analysis Techniques with Multi-Physics Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cetiner, Mustafa Sacit; none,; Flanagan, George F.
2014-07-30
An integrated simulation platform that couples probabilistic analysis-based tools with model-based simulation tools can provide valuable insights for reactive and proactive responses to plant operating conditions. The objective of this work is to demonstrate the benefits of a partial implementation of the Small Modular Reactor (SMR) Probabilistic Risk Assessment (PRA) Detailed Framework Specification through the coupling of advanced PRA capabilities and accurate multi-physics plant models. Coupling a probabilistic model with a multi-physics model will aid in design, operations, and safety by providing a more accurate understanding of plant behavior. This represents the first attempt at actually integrating these two typesmore » of analyses for a control system used for operations, on a faster than real-time basis. This report documents the development of the basic communication capability to exchange data with the probabilistic model using Reliability Workbench (RWB) and the multi-physics model using Dymola. The communication pathways from injecting a fault (i.e., failing a component) to the probabilistic and multi-physics models were successfully completed. This first version was tested with prototypic models represented in both RWB and Modelica. First, a simple event tree/fault tree (ET/FT) model was created to develop the software code to implement the communication capabilities between the dynamic-link library (dll) and RWB. A program, written in C#, successfully communicates faults to the probabilistic model through the dll. A systems model of the Advanced Liquid-Metal Reactor–Power Reactor Inherently Safe Module (ALMR-PRISM) design developed under another DOE project was upgraded using Dymola to include proper interfaces to allow data exchange with the control application (ConApp). A program, written in C+, successfully communicates faults to the multi-physics model. The results of the example simulation were successfully plotted.« less
Schlaier, Juergen R; Beer, Anton L; Faltermeier, Rupert; Fellner, Claudia; Steib, Kathrin; Lange, Max; Greenlee, Mark W; Brawanski, Alexander T; Anthofer, Judith M
2017-06-01
This study compared tractography approaches for identifying cerebellar-thalamic fiber bundles relevant to planning target sites for deep brain stimulation (DBS). In particular, probabilistic and deterministic tracking of the dentate-rubro-thalamic tract (DRTT) and differences between the spatial courses of the DRTT and the cerebello-thalamo-cortical (CTC) tract were compared. Six patients with movement disorders were examined by magnetic resonance imaging (MRI), including two sets of diffusion-weighted images (12 and 64 directions). Probabilistic and deterministic tractography was applied on each diffusion-weighted dataset to delineate the DRTT. Results were compared with regard to their sensitivity in revealing the DRTT and additional fiber tracts and processing time. Two sets of regions-of-interests (ROIs) guided deterministic tractography of the DRTT or the CTC, respectively. Tract distances to an atlas-based reference target were compared. Probabilistic fiber tracking with 64 orientations detected the DRTT in all twelve hemispheres. Deterministic tracking detected the DRTT in nine (12 directions) and in only two (64 directions) hemispheres. Probabilistic tracking was more sensitive in detecting additional fibers (e.g. ansa lenticularis and medial forebrain bundle) than deterministic tracking. Probabilistic tracking lasted substantially longer than deterministic. Deterministic tracking was more sensitive in detecting the CTC than the DRTT. CTC tracts were located adjacent but consistently more posterior to DRTT tracts. These results suggest that probabilistic tracking is more sensitive and robust in detecting the DRTT but harder to implement than deterministic approaches. Although sensitivity of deterministic tracking is higher for the CTC than the DRTT, targets for DBS based on these tracts likely differ. © 2017 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Fully probabilistic control for stochastic nonlinear control systems with input dependent noise.
Herzallah, Randa
2015-03-01
Robust controllers for nonlinear stochastic systems with functional uncertainties can be consistently designed using probabilistic control methods. In this paper a generalised probabilistic controller design for the minimisation of the Kullback-Leibler divergence between the actual joint probability density function (pdf) of the closed loop control system, and an ideal joint pdf is presented emphasising how the uncertainty can be systematically incorporated in the absence of reliable systems models. To achieve this objective all probabilistic models of the system are estimated from process data using mixture density networks (MDNs) where all the parameters of the estimated pdfs are taken to be state and control input dependent. Based on this dependency of the density parameters on the input values, explicit formulations to the construction of optimal generalised probabilistic controllers are obtained through the techniques of dynamic programming and adaptive critic methods. Using the proposed generalised probabilistic controller, the conditional joint pdfs can be made to follow the ideal ones. A simulation example is used to demonstrate the implementation of the algorithm and encouraging results are obtained. Copyright © 2014 Elsevier Ltd. All rights reserved.
Inference and Analysis of Population Structure Using Genetic Data and Network Theory
Greenbaum, Gili; Templeton, Alan R.; Bar-David, Shirli
2016-01-01
Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition’s modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of prior conditions. The method is implemented in the software NetStruct (available at https://giligreenbaum.wordpress.com/software/). PMID:26888080
Unique Applications for Artificial Neural Networks. Phase 1
1991-08-08
significance. For the VRP, a problem that has received considerable attention in the literature, the new NGO-VRP methodology generates better solutions...represent the stop assignments of each route. The effect of the genetic recombinations is to make simple local exchanges to the relative positions of the...technique for representing a computer-based associative memory [Arbib, 1987]. In our routing system, the basic job of the neural network system is to accept
Balkanization and Unification of Probabilistic Inferences
ERIC Educational Resources Information Center
Yu, Chong-Ho
2005-01-01
Many research-related classes in social sciences present probability as a unified approach based upon mathematical axioms, but neglect the diversity of various probability theories and their associated philosophical assumptions. Although currently the dominant statistical and probabilistic approach is the Fisherian tradition, the use of Fisherian…
W-tree indexing for fast visual word generation.
Shi, Miaojing; Xu, Ruixin; Tao, Dacheng; Xu, Chao
2013-03-01
The bag-of-visual-words representation has been widely used in image retrieval and visual recognition. The most time-consuming step in obtaining this representation is the visual word generation, i.e., assigning visual words to the corresponding local features in a high-dimensional space. Recently, structures based on multibranch trees and forests have been adopted to reduce the time cost. However, these approaches cannot perform well without a large number of backtrackings. In this paper, by considering the spatial correlation of local features, we can significantly speed up the time consuming visual word generation process while maintaining accuracy. In particular, visual words associated with certain structures frequently co-occur; hence, we can build a co-occurrence table for each visual word for a large-scale data set. By associating each visual word with a probability according to the corresponding co-occurrence table, we can assign a probabilistic weight to each node of a certain index structure (e.g., a KD-tree and a K-means tree), in order to re-direct the searching path to be close to its global optimum within a small number of backtrackings. We carefully study the proposed scheme by comparing it with the fast library for approximate nearest neighbors and the random KD-trees on the Oxford data set. Thorough experimental results suggest the efficiency and effectiveness of the new scheme.
Probabilistic structural analysis methods for improving Space Shuttle engine reliability
NASA Technical Reports Server (NTRS)
Boyce, L.
1989-01-01
Probabilistic structural analysis methods are particularly useful in the design and analysis of critical structural components and systems that operate in very severe and uncertain environments. These methods have recently found application in space propulsion systems to improve the structural reliability of Space Shuttle Main Engine (SSME) components. A computer program, NESSUS, based on a deterministic finite-element program and a method of probabilistic analysis (fast probability integration) provides probabilistic structural analysis for selected SSME components. While computationally efficient, it considers both correlated and nonnormal random variables as well as an implicit functional relationship between independent and dependent variables. The program is used to determine the response of a nickel-based superalloy SSME turbopump blade. Results include blade tip displacement statistics due to the variability in blade thickness, modulus of elasticity, Poisson's ratio or density. Modulus of elasticity significantly contributed to blade tip variability while Poisson's ratio did not. Thus, a rational method for choosing parameters to be modeled as random is provided.
Pajak, Bozena; Fine, Alex B; Kleinschmidt, Dave F; Jaeger, T Florian
2016-12-01
We present a framework of second and additional language (L2/L n ) acquisition motivated by recent work on socio-indexical knowledge in first language (L1) processing. The distribution of linguistic categories covaries with socio-indexical variables (e.g., talker identity, gender, dialects). We summarize evidence that implicit probabilistic knowledge of this covariance is critical to L1 processing, and propose that L2/L n learning uses the same type of socio-indexical information to probabilistically infer latent hierarchical structure over previously learned and new languages. This structure guides the acquisition of new languages based on their inferred place within that hierarchy, and is itself continuously revised based on new input from any language. This proposal unifies L1 processing and L2/L n acquisition as probabilistic inference under uncertainty over socio-indexical structure. It also offers a new perspective on crosslinguistic influences during L2/L n learning, accommodating gradient and continued transfer (both negative and positive) from previously learned to novel languages, and vice versa.
Pajak, Bozena; Fine, Alex B.; Kleinschmidt, Dave F.; Jaeger, T. Florian
2015-01-01
We present a framework of second and additional language (L2/Ln) acquisition motivated by recent work on socio-indexical knowledge in first language (L1) processing. The distribution of linguistic categories covaries with socio-indexical variables (e.g., talker identity, gender, dialects). We summarize evidence that implicit probabilistic knowledge of this covariance is critical to L1 processing, and propose that L2/Ln learning uses the same type of socio-indexical information to probabilistically infer latent hierarchical structure over previously learned and new languages. This structure guides the acquisition of new languages based on their inferred place within that hierarchy, and is itself continuously revised based on new input from any language. This proposal unifies L1 processing and L2/Ln acquisition as probabilistic inference under uncertainty over socio-indexical structure. It also offers a new perspective on crosslinguistic influences during L2/Ln learning, accommodating gradient and continued transfer (both negative and positive) from previously learned to novel languages, and vice versa. PMID:28348442
Hefke, Gwynneth; Davison, Sean; D'Amato, Maria Eugenia
2015-12-01
The utilization of binary markers in human individual identification is gaining ground in forensic genetics. We analyzed the polymorphisms from the first commercial indel kit Investigator DIPplex (Qiagen) in 512 individuals from Afrikaner, Indian, admixed Cape Colored, and the native Bantu Xhosa and Zulu origin in South Africa and evaluated forensic and population genetics parameters for their forensic application in South Africa. The levels of genetic diversity in population and forensic parameters in South Africa are similar to other published data, with lower diversity values for the native Bantu. Departures from Hardy-Weinberg expectations were observed in HLD97 in Indians, Admixed and Bantus, along with 6.83% null homozygotes in the Bantu populations. Sequencing of the flanking regions showed a previously reported transition G>A in rs17245568. Strong population structure was detected with Fst, AMOVA, and the Bayesian unsupervised clustering method in STRUCTURE. Therefore we evaluated the efficiency of individual assignments to population groups using the ancestral membership proportions from STRUCTURE and the Bayesian classification algorithm in Snipper App Suite. Both methods showed low cross-assignment error (0-4%) between Bantus and either Afrikaners or Indians. The differentiation between populations seems to be driven by four loci under positive selection pressure. Based on these results, we draw recommendations for the application of this kit in SA. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Genetic screening and testing in an episode-based payment model: preserving patient autonomy.
Sutherland, Sharon; Farrell, Ruth M; Lockwood, Charles
2014-11-01
The State of Ohio is implementing an episode-based payment model for perinatal care. All costs of care will be tabulated for each live birth and assigned to the delivering provider, creating a three-tiered model for reimbursement for care. Providers will be reimbursed as usual for care that is average in cost and quality, while instituting rewards or penalties for those outside the expected range in either domain. There are few exclusions, and all methods of genetic screening and diagnostic testing are included in the episode cost calculation as proposed. Prenatal ultrasonography, genetic screening, and diagnostic testing are critical components of the delivery of high-quality, evidence-based prenatal care. These tests provide pregnant women with key information about the pregnancy, which, in turn, allows them to work closely with their health care provider to determine optimal prenatal care. The concepts of informed consent and decision-making, cornerstones of the ethical practice of medicine, are founded on the principles of autonomy and respect for persons. These principles recognize that patients' rights to make choices and take actions are based on their personal beliefs and values. Given the personal nature of such decisions, it is critical that patients have unbarred access to prenatal genetic tests if they elect to use them as part of their prenatal care. The proposed restructuring of reimbursement creates a clear conflict between patient autonomy and physician financial incentives.
Genetic methods improve accuracy of gender determination in beaver
Williams, C.L.; Breck, S.W.; Baker, B.W.
2004-01-01
Gender identification of sexually monomorphic mammals can be difficult. We used analysis of zinc-finger protein (Zfx and Zfy) DNA regions to determine gender of 96 beavers (Castor canadensis) from 3 areas and used these results to verify gender determined in the field. Gender was correctly determined for 86 (89.6%) beavers. Incorrect assignments were not attributed to errors in any one age or sex class. Although methods that can be used in the field (such as morphological methods) can provide reasonably accurate gender assignments in beavers, the genetic method might be preferred in certain situations.
Evolutionary squeaky wheel optimization: a new framework for analysis.
Li, Jingpeng; Parkes, Andrew J; Burke, Edmund K
2011-01-01
Squeaky wheel optimization (SWO) is a relatively new metaheuristic that has been shown to be effective for many real-world problems. At each iteration SWO does a complete construction of a solution starting from the empty assignment. Although the construction uses information from previous iterations, the complete rebuilding does mean that SWO is generally effective at diversification but can suffer from a relatively weak intensification. Evolutionary SWO (ESWO) is a recent extension to SWO that is designed to improve the intensification by keeping the good components of solutions and only using SWO to reconstruct other poorer components of the solution. In such algorithms a standard challenge is to understand how the various parameters affect the search process. In order to support the future study of such issues, we propose a formal framework for the analysis of ESWO. The framework is based on Markov chains, and the main novelty arises because ESWO moves through the space of partial assignments. This makes it significantly different from the analyses used in local search (such as simulated annealing) which only move through complete assignments. Generally, the exact details of ESWO will depend on various heuristics; so we focus our approach on a case of ESWO that we call ESWO-II and that has probabilistic as opposed to heuristic selection and construction operators. For ESWO-II, we study a simple problem instance and explicitly compute the stationary distribution probability over the states of the search space. We find interesting properties of the distribution. In particular, we find that the probabilities of states generally, but not always, increase with their fitness. This nonmonotonocity is quite different from the monotonicity expected in algorithms such as simulated annealing.
McIntyre, Chloe L.; Knowles, Nick J.
2013-01-01
Human rhinoviruses (HRVs) frequently cause mild upper respiratory tract infections and more severe disease manifestations such as bronchiolitis and asthma exacerbations. HRV is classified into three species within the genus Enterovirus of the family Picornaviridae. HRV species A and B contain 75 and 25 serotypes identified by cross-neutralization assays, although the use of such assays for routine HRV typing is hampered by the large number of serotypes, replacement of virus isolation by molecular methods in HRV diagnosis and the poor or absent replication of HRV species C in cell culture. To address these problems, we propose an alternative, genotypic classification of HRV-based genetic relatedness analogous to that used for enteroviruses. Nucleotide distances between 384 complete VP1 sequences of currently assigned HRV (sero)types identified divergence thresholds of 13, 12 and 13 % for species A, B and C, respectively, that divided inter- and intra-type comparisons. These were paralleled by 10, 9.5 and 10 % thresholds in the larger dataset of >3800 VP4 region sequences. Assignments based on VP1 sequences led to minor revisions of existing type designations (such as the reclassification of serotype pairs, e.g. A8/A95 and A29/A44, as single serotypes) and the designation of new HRV types A101–106, B101–103 and C34–C51. A protocol for assignment and numbering of new HRV types using VP1 sequences and the restriction of VP4 sequence comparisons to type identification and provisional type assignments is proposed. Genotypic assignment and identification of HRV types will be of considerable value in the future investigation of type-associated differences in disease outcomes, transmission and epidemiology. PMID:23677786
Dominating Scale-Free Networks Using Generalized Probabilistic Methods
Molnár,, F.; Derzsy, N.; Czabarka, É.; Székely, L.; Szymanski, B. K.; Korniss, G.
2014-01-01
We study ensemble-based graph-theoretical methods aiming to approximate the size of the minimum dominating set (MDS) in scale-free networks. We analyze both analytical upper bounds of dominating sets and numerical realizations for applications. We propose two novel probabilistic dominating set selection strategies that are applicable to heterogeneous networks. One of them obtains the smallest probabilistic dominating set and also outperforms the deterministic degree-ranked method. We show that a degree-dependent probabilistic selection method becomes optimal in its deterministic limit. In addition, we also find the precise limit where selecting high-degree nodes exclusively becomes inefficient for network domination. We validate our results on several real-world networks, and provide highly accurate analytical estimates for our methods. PMID:25200937
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dickson, T.L.; Simonen, F.A.
1992-05-01
Probabilistic fracture mechanics analysis is a major element of comprehensive probabilistic methodology on which current NRC regulatory requirements for pressurized water reactor vessel integrity evaluation are based. Computer codes such as OCA-P and VISA-II perform probabilistic fracture analyses to estimate the increase in vessel failure probability that occurs as the vessel material accumulates radiation damage over the operating life of the vessel. The results of such analyses, when compared with limits of acceptable failure probabilities, provide an estimation of the residual life of a vessel. Such codes can be applied to evaluate the potential benefits of plant-specific mitigating actions designedmore » to reduce the probability of failure of a reactor vessel. 10 refs.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dickson, T.L.; Simonen, F.A.
1992-01-01
Probabilistic fracture mechanics analysis is a major element of comprehensive probabilistic methodology on which current NRC regulatory requirements for pressurized water reactor vessel integrity evaluation are based. Computer codes such as OCA-P and VISA-II perform probabilistic fracture analyses to estimate the increase in vessel failure probability that occurs as the vessel material accumulates radiation damage over the operating life of the vessel. The results of such analyses, when compared with limits of acceptable failure probabilities, provide an estimation of the residual life of a vessel. Such codes can be applied to evaluate the potential benefits of plant-specific mitigating actions designedmore » to reduce the probability of failure of a reactor vessel. 10 refs.« less
Lentz, Erika E.; Stippa, Sawyer R.; Thieler, E. Robert; Plant, Nathaniel G.; Gesch, Dean B.; Horton, Radley M.
2014-02-13
The U.S. Geological Survey is examining effects of future sea-level rise on the coastal landscape from Maine to Virginia by producing spatially explicit, probabilistic predictions using sea-level projections, vertical land movement rates (due to isostacy), elevation data, and land-cover data. Sea-level-rise scenarios used as model inputs are generated by using multiple sources of information, including Coupled Model Intercomparison Project Phase 5 models following representative concentration pathways 4.5 and 8.5 in the Intergovernmental Panel on Climate Change Fifth Assessment Report. A Bayesian network is used to develop a predictive coastal response model that integrates the sea-level, elevation, and land-cover data with assigned probabilities that account for interactions with coastal geomorphology as well as the corresponding ecological and societal systems it supports. The effects of sea-level rise are presented as (1) level of landscape submergence and (2) coastal response type characterized as either static (that is, inundation) or dynamic (that is, landform or landscape change). Results are produced at a spatial scale of 30 meters for four decades (the 2020s, 2030s, 2050s, and 2080s). The probabilistic predictions can be applied to landscape management decisions based on sea-level-rise effects as well as on assessments of the prediction uncertainty and need for improved data or fundamental understanding. This report describes the methods used to produce predictions, including information on input datasets; the modeling approach; model outputs; data-quality-control procedures; and information on how to access the data and metadata online.
NASA Technical Reports Server (NTRS)
Lentz, Erika E.; Stippa, Sawyer R.; Thieler, E. Robert; Plant, Nathaniel G.; Gesch, Dean B.; Horton, Radley M.
2015-01-01
The U.S. Geological Survey is examining effects of future sea-level rise on the coastal landscape from Maine to Virginia by producing spatially explicit, probabilistic predictions using sea-level projections, vertical land movement rates (due to isostacy), elevation data, and land-cover data. Sea-level-rise scenarios used as model inputs are generated by using multiple sources of information, including Coupled Model Intercomparison Project Phase 5 models following representative concentration pathways 4.5 and 8.5 in the Intergovernmental Panel on Climate Change Fifth Assessment Report. A Bayesian network is used to develop a predictive coastal response model that integrates the sea-level, elevation, and land-cover data with assigned probabilities that account for interactions with coastal geomorphology as well as the corresponding ecological and societal systems it supports. The effects of sea-level rise are presented as (1) level of landscape submergence and (2) coastal response type characterized as either static (that is, inundation) or dynamic (that is, landform or landscape change). Results are produced at a spatial scale of 30 meters for four decades (the 2020s, 2030s, 2050s, and 2080s). The probabilistic predictions can be applied to landscape management decisions based on sea-level-rise effects as well as on assessments of the prediction uncertainty and need for improved data or fundamental understanding. This report describes the methods used to produce predictions, including information on input datasets; the modeling approach; model outputs; data-quality-control procedures; and information on how to access the data and metadata online.
Preliminary Seismic Probabilistic Tsunami Hazard Map for Italy
NASA Astrophysics Data System (ADS)
Lorito, Stefano; Selva, Jacopo; Basili, Roberto; Grezio, Anita; Molinari, Irene; Piatanesi, Alessio; Romano, Fabrizio; Tiberti, Mara Monica; Tonini, Roberto; Bonini, Lorenzo; Michelini, Alberto; Macias, Jorge; Castro, Manuel J.; González-Vida, José Manuel; de la Asunción, Marc
2015-04-01
We present a preliminary release of the first seismic probabilistic tsunami hazard map for Italy. The map aims to become an important tool for the Italian Department of Civil Protection (DPC), as well as a support tool for the NEAMTWS Tsunami Service Provider, the Centro Allerta Tsunami (CAT) at INGV, Rome. The map shows the offshore maximum tsunami elevation expected for several average return periods. Both crustal and subduction earthquakes are considered. The probability for each scenario (location, depth, mechanism, source size, magnitude and temporal rate) is defined on a uniform grid covering the entire Mediterranean for crustal earthquakes and on the plate interface for subduction earthquakes. Activity rates are assigned from seismic catalogues and basing on a tectonic regionalization of the Mediterranean area. The methodology explores the associated aleatory uncertainty through the innovative application of an Event Tree. Main sources of epistemic uncertainty are also addressed although in preliminary way. The whole procedure relies on a database of pre-calculated Gaussian-shaped Green's functions for the sea level elevation, to be used also as a real time hazard assessment tool by CAT. Tsunami simulations are performed using the non-linear shallow water multi-GPU code HySEA, over a 30 arcsec bathymetry (from the SRTM30+ dataset) and the maximum elevations are stored at the 50-meter isobath and then extrapolated through the Green's law at 1 meter depth. This work is partially funded by project ASTARTE - Assessment, Strategy And Risk Reduction for Tsunamis in Europe - FP7-ENV2013 6.4-3, Grant 603839, and by the Italian flagship project RITMARE.
Unifying framework for multimodal brain MRI segmentation based on Hidden Markov Chains.
Bricq, S; Collet, Ch; Armspach, J P
2008-12-01
In the frame of 3D medical imaging, accurate segmentation of multimodal brain MR images is of interest for many brain disorders. However, due to several factors such as noise, imaging artifacts, intrinsic tissue variation and partial volume effects, tissue classification remains a challenging task. In this paper, we present a unifying framework for unsupervised segmentation of multimodal brain MR images including partial volume effect, bias field correction, and information given by a probabilistic atlas. Here-proposed method takes into account neighborhood information using a Hidden Markov Chain (HMC) model. Due to the limited resolution of imaging devices, voxels may be composed of a mixture of different tissue types, this partial volume effect is included to achieve an accurate segmentation of brain tissues. Instead of assigning each voxel to a single tissue class (i.e., hard classification), we compute the relative amount of each pure tissue class in each voxel (mixture estimation). Further, a bias field estimation step is added to the proposed algorithm to correct intensity inhomogeneities. Furthermore, atlas priors were incorporated using probabilistic brain atlas containing prior expectations about the spatial localization of different tissue classes. This atlas is considered as a complementary sensor and the proposed method is extended to multimodal brain MRI without any user-tunable parameter (unsupervised algorithm). To validate this new unifying framework, we present experimental results on both synthetic and real brain images, for which the ground truth is available. Comparison with other often used techniques demonstrates the accuracy and the robustness of this new Markovian segmentation scheme.
Is urbanisation scrambling the genetic structure of human populations? A case study
Ashrafian-Bonab, Maziar; Handley, Lori Lawson; Balloux, François
2007-01-01
Recent population expansion and increased migration linked to urbanisation are assumed to be eroding the genetic structure of human populations. We investigated change in population structure over three generations by analysing both demographic and mitochondrial DNA (mtDNA) data from a random sample of 2351 men from twenty-two Iranian populations. Potential changes in genetic diversity (θ) and genetic distance (FST) over the last three generations were analysed by assigning mtDNA sequences to populations based on the individual's place of birth or that of their mother or grandmother. Despite the fact that several areas included cities of over one million inhabitants, we detected no change in genetic diversity, and only a small decrease in population structure, except in the capital city (Tehran), which was characterised by massive immigration, increased θ and a large decrease in FST over time. Our results suggest that recent erosion of human population structure might not be as important as previously thought, except in some large conurbations, and this clearly has important implications for future sampling strategies. PMID:17106453
Exploiting the functional and taxonomic structure of genomic data by probabilistic topic modeling.
Chen, Xin; Hu, Xiaohua; Lim, Tze Y; Shen, Xiajiong; Park, E K; Rosen, Gail L
2012-01-01
In this paper, we present a method that enable both homology-based approach and composition-based approach to further study the functional core (i.e., microbial core and gene core, correspondingly). In the proposed method, the identification of major functionality groups is achieved by generative topic modeling, which is able to extract useful information from unlabeled data. We first show that generative topic model can be used to model the taxon abundance information obtained by homology-based approach and study the microbial core. The model considers each sample as a “document,” which has a mixture of functional groups, while each functional group (also known as a “latent topic”) is a weight mixture of species. Therefore, estimating the generative topic model for taxon abundance data will uncover the distribution over latent functions (latent topic) in each sample. Second, we show that, generative topic model can also be used to study the genome-level composition of “N-mer” features (DNA subreads obtained by composition-based approaches). The model consider each genome as a mixture of latten genetic patterns (latent topics), while each functional pattern is a weighted mixture of the “N-mer” features, thus the existence of core genomes can be indicated by a set of common N-mer features. After studying the mutual information between latent topics and gene regions, we provide an explanation of the functional roles of uncovered latten genetic patterns. The experimental results demonstrate the effectiveness of proposed method.
[Early warning on measles through the neural networks].
Yu, Bin; Ding, Chun; Wei, Shan-bo; Chen, Bang-hua; Liu, Pu-lin; Luo, Tong-yong; Wang, Jia-gang; Pan, Zhi-wei; Lu, Jun-an
2011-01-01
To discuss the effects on early warning of measles, using the neural networks. Based on the available data through monthly and weekly reports on measles from January 1986 to August 2006 in Wuhan city. The modal was developed using the neural networks to predict and analyze the prevalence and incidence of measles. When the dynamic time series modal was established with back propagation (BP) networks consisting of two layers, if p was assigned as 9, the convergence speed was acceptable and the correlation coefficient was equal to 0.85. It was more acceptable for monthly forecasting the specific value, but better for weekly forecasting the classification under probabilistic neural networks (PNN). When data was big enough to serve the purpose, it seemed more feasible for early warning using the two-layer BP networks. However, when data was not enough, then PNN could be used for the purpose of prediction. This method seemed feasible to be used in the system for early warning.
Forecasting the impact of transport improvements on commuting and residential choice
NASA Astrophysics Data System (ADS)
Elhorst, J. Paul; Oosterhaven, Jan
2006-03-01
This paper develops a probabilistic, competing-destinations, assignment model that predicts changes in the spatial pattern of the working population as a result of transport improvements. The choice of residence is explained by a new non-parametric model, which represents an alternative to the popular multinominal logit model. Travel times between zones are approximated by a normal distribution function with different mean and variance for each pair of zones, whereas previous models only use average travel times. The model’s forecast error of the spatial distribution of the Dutch working population is 7% when tested on 1998 base-year data. To incorporate endogenous changes in its causal variables, an almost ideal demand system is estimated to explain the choice of transport mode, and a new economic geography inter-industry model (RAEM) is estimated to explain the spatial distribution of employment. In the application, the model is used to forecast the impact of six mutually exclusive Dutch core-periphery railway proposals in the projection year 2020.
NASA Technical Reports Server (NTRS)
Singhal, Surendra N.
2003-01-01
The SAE G-11 RMSL (Reliability, Maintainability, Supportability, and Logistics) Division activities include identification and fulfillment of joint industry, government, and academia needs for development and implementation of RMSL technologies. Four Projects in the Probabilistic Methods area and two in the area of RMSL have been identified. These are: (1) Evaluation of Probabilistic Technology - progress has been made toward the selection of probabilistic application cases. Future effort will focus on assessment of multiple probabilistic softwares in solving selected engineering problems using probabilistic methods. Relevance to Industry & Government - Case studies of typical problems encountering uncertainties, results of solutions to these problems run by different codes, and recommendations on which code is applicable for what problems; (2) Probabilistic Input Preparation - progress has been made in identifying problem cases such as those with no data, little data and sufficient data. Future effort will focus on developing guidelines for preparing input for probabilistic analysis, especially with no or little data. Relevance to Industry & Government - Too often, we get bogged down thinking we need a lot of data before we can quantify uncertainties. Not True. There are ways to do credible probabilistic analysis with little data; (3) Probabilistic Reliability - probabilistic reliability literature search has been completed along with what differentiates it from statistical reliability. Work on computation of reliability based on quantification of uncertainties in primitive variables is in progress. Relevance to Industry & Government - Correct reliability computations both at the component and system level are needed so one can design an item based on its expected usage and life span; (4) Real World Applications of Probabilistic Methods (PM) - A draft of volume 1 comprising aerospace applications has been released. Volume 2, a compilation of real world applications of probabilistic methods with essential information demonstrating application type and timehost savings by the use of probabilistic methods for generic applications is in progress. Relevance to Industry & Government - Too often, we say, 'The Proof is in the Pudding'. With help from many contributors, we hope to produce such a document. Problem is - not too many people are coming forward due to proprietary nature. So, we are asking to document only minimum information including problem description, what method used, did it result in any savings, and how much?; (5) Software Reliability - software reliability concept, program, implementation, guidelines, and standards are being documented. Relevance to Industry & Government - software reliability is a complex issue that must be understood & addressed in all facets of business in industry, government, and other institutions. We address issues, concepts, ways to implement solutions, and guidelines for maximizing software reliability; (6) Maintainability Standards - maintainability/serviceability industry standard/guidelines and industry best practices and methodologies used in performing maintainability/ serviceability tasks are being documented. Relevance to Industry & Government - Any industry or government process, project, and/or tool must be maintained and serviced to realize the life and performance it was designed for. We address issues and develop guidelines for optimum performance & life.
A probabilistic maintenance model for diesel engines
NASA Astrophysics Data System (ADS)
Pathirana, Shan; Abeygunawardane, Saranga Kumudu
2018-02-01
In this paper, a probabilistic maintenance model is developed for inspection based preventive maintenance of diesel engines based on the practical model concepts discussed in the literature. Developed model is solved using real data obtained from inspection and maintenance histories of diesel engines and experts' views. Reliability indices and costs were calculated for the present maintenance policy of diesel engines. A sensitivity analysis is conducted to observe the effect of inspection based preventive maintenance on the life cycle cost of diesel engines.
Probabilistic Analysis of Radiation Doses for Shore-Based Individuals in Operation Tomodachi
2013-05-01
Based Upon Oxygen Consumption Rates. EPA/600/R-06/129F, U.S. Environmental Protection Agency, Washington, D.C. May. USEPA (U.S. Environmental...pascal (Pa) pound-force per square inch (psi) 6.894 757 × 103 pascal (Pa) Angle/ Temperature /Time hour (h) 3.6 × 103 second (s) degree of arc (o...equivalent and effective dose is the sievert (Sv). (1 Sv = 1 J kg–1). 1 DTRA-TR-12-002: Probabilistic Analysis of Radiation Doses for Shore-Based
Martínez-Díaz, Yesenia; González-Rodríguez, Antonio; Rico-Ponce, Héctor Rómulo; Rocha-Ramírez, Víctor; Ovando-Medina, Isidro; Espinosa-García, Francisco J
2017-01-01
Jatropha curcas L. (Euphorbiaceae) is a shrub native to Mexico and Central America, which produces seeds with a high oil content that can be converted to biodiesel. The genetic diversity of this plant has been widely studied, but it is not known whether the diversity of the seed oil chemical composition correlates with neutral genetic diversity. The total seed oil content, the diversity of profiles of fatty acids and phorbol esters were quantified, also, the genetic diversity obtained from simple sequence repeats was analyzed in native populations of J. curcas in Mexico. Using the fatty acids profiles, a discriminant analysis recognized three groups of individuals according to geographical origin. Bayesian assignment analysis revealed two genetic groups, while the genetic structure of the populations could not be explained by isolation-by-distance. Genetic and fatty acid profile data were not correlated based on Mantel test. Also, phorbol ester content and genetic diversity were not associated. Multiple linear regression analysis showed that total oil content was associated with altitude and seasonality of temperature. The content of unsaturated fatty acids was associated with altitude. Therefore, the cultivation planning of J. curcas should take into account chemical variation related to environmental factors. © 2017 Wiley-VHCA AG, Zurich, Switzerland.
Debbi, Ali; Boureghda, Houda; Monte, Enrique; Hermosa, Rosa
2018-01-01
Fifty fungal isolates were sampled from diseased tomato plants as result of a survey conducted in seven tomato crop areas in Algeria from 2012 to 2015. Morphological criteria and PCR-based identification, using the primers PF02 and PF03, assigned 29 out of 50 isolates to Fusarium oxysporum ( Fo ). The banding patterns amplified for genes SIX1, SIX3 and SIX4 served to identify races 2 and 3 of Fo f. sp. lycopersici (FOL), and Fo f. sp. radicis lycopersici (FORL) among the Algerian isolates. All FOL isolates showed pathogenicity on the susceptible tomato cv. "Super Marmande," while nine of out 10 Algerian FORL isolates were pathogenic on tomato cv. "Rio Grande." Inter simple sequence repeat (ISSR) fingerprints showed high genetic diversity among Algerian Fo isolates. Seventeen Algerian Trichoderma isolates were also obtained and assigned to the species T. asperellum (12 isolates), T. harzianum (four isolates) and T. ghanense (one isolate) based on ITS and tef1 α gene sequences. Different in vitro tests identified the antagonistic potential of native Trichoderma isolates against FORL and FOL. Greenhouse biocontrol assays performed on "SM" tomato plants with T. ghanense T8 and T. asperellum T9 and T17, and three Fo isolates showed that isolate T8 performed well against FORL and FOL. This finding was based on an incidence reduction of crown and root rot and Fusarium wilt diseases by 53.1 and 48.3%, respectively.
Cerda-Flores, R M; Barton, S A; Marty-Gonzalez, L F; Rivas, F; Chakraborty, R
1999-07-01
A method for estimating the general rate of nonpaternity in a population was validated using phenotype data on seven blood groups (A1A2BO, MNSs, Rh, Duffy, Lutheran, Kidd, and P) on 396 mother, child, and legal father trios from Nuevo León, Mexico. In all, 32 legal fathers were excluded as the possible father based on genetic exclusions at one or more loci (combined average exclusion probability of 0.694 for specific mother-child phenotype pairs). The maximum likelihood estimate of the general nonpaternity rate in the population was 0.118 +/- 0.020. The nonpaternity rates in Nuevo León were also seen to be inversely related with the socioeconomic status of the families, i.e., the highest in the low and the lowest in the high socioeconomic class. We further argue that with the moderately low (69.4%) power of exclusion for these seven blood group systems, the traditional critical values of paternity index (PI > or = 19) were not good indicators of true paternity, since a considerable fraction (307/364) of nonexcluded legal fathers had a paternity index below 19 based on the seven markers. Implications of these results in the context of genetic-epidemiological studies as well as for detection of true fathers for child-support adjudications are discussed, implying the need to employ a battery of genetic markers (possibly DNA-based tests) that yield a higher power of exclusion. We conclude that even though DNA markers are more informative, the probabilistic approach developed here would still be needed to estimate the true rate of nonpaternity in a population or to evaluate the precision of detecting true fathers.
Probabilistic Tractography of the Cranial Nerves in Vestibular Schwannoma.
Zolal, Amir; Juratli, Tareq A; Podlesek, Dino; Rieger, Bernhard; Kitzler, Hagen H; Linn, Jennifer; Schackert, Gabriele; Sobottka, Stephan B
2017-11-01
Multiple recent studies have reported on diffusion tensor-based fiber tracking of cranial nerves in vestibular schwannoma, with conflicting results as to the accuracy of the method and the occurrence of cochlear nerve depiction. Probabilistic nontensor-based tractography might offer advantages in terms of better extraction of directional information from the underlying data in cranial nerves, which are of subvoxel size. Twenty-one patients with large vestibular schwannomas were recruited. The probabilistic tracking was run preoperatively and the position of the potential depictions of the facial and cochlear nerves was estimated postoperatively by 3 independent observers in a blinded fashion. The true position of the nerve was determined intraoperatively by the surgeon. Thereafter, the imaging-based estimated position was compared with the intraoperatively determined position. Tumor size, cystic appearance, and postoperative House-Brackmann score were analyzed with regard to the accuracy of the depiction of the nerves. The probabilistic tracking showed a connection that correlated to the position of the facial nerve in 81% of the cases and to the position of the cochlear nerve in 33% of the cases. Altogether, the resulting depiction did not correspond to the intraoperative position of any of the nerves in 3 cases. In a majority of cases, the position of the facial nerve, but not of the cochlear nerve, could be estimated by evaluation of the probabilistic tracking results. However, false depictions not corresponding to any nerve do occur and cannot be discerned as such from the image only. Copyright © 2017 Elsevier Inc. All rights reserved.
Biophysical connectivity explains population genetic structure in a highly dispersive marine species
NASA Astrophysics Data System (ADS)
Truelove, Nathan K.; Kough, Andrew S.; Behringer, Donald C.; Paris, Claire B.; Box, Stephen J.; Preziosi, Richard F.; Butler, Mark J.
2017-03-01
Connectivity, the exchange of individuals among locations, is a fundamental ecological process that explains how otherwise disparate populations interact. For most marine organisms, dispersal occurs primarily during a pelagic larval phase that connects populations. We paired population structure from comprehensive genetic sampling and biophysical larval transport modeling to describe how spiny lobster ( Panulirus argus) population differentiation is related to biological oceanography. A total of 581 lobsters were genotyped with 11 microsatellites from ten locations around the greater Caribbean. The overall F ST of 0.0016 ( P = 0.005) suggested low yet significant levels of structuring among sites. An isolation by geographic distance model did not explain spatial patterns of genetic differentiation in P. argus ( P = 0.19; Mantel r = 0.18), whereas a biophysical connectivity model provided a significant explanation of population differentiation ( P = 0.04; Mantel r = 0.47). Thus, even for a widely dispersing species, dispersal occurs over a continuum where basin-wide larval retention creates genetic structure. Our study provides a framework for future explorations of wide-scale larval dispersal and marine connectivity by integrating empirical genetic research and probabilistic modeling.
Development of probabilistic emission inventories of air toxics for Jacksonville, Florida, USA.
Zhao, Yuchao; Frey, H Christopher
2004-11-01
Probabilistic emission inventories were developed for 1,3-butadiene, mercury (Hg), arsenic (As), benzene, formaldehyde, and lead for Jacksonville, FL. To quantify inter-unit variability in empirical emission factor data, the Maximum Likelihood Estimation (MLE) method or the Method of Matching Moments was used to fit parametric distributions. For data sets that contain nondetected measurements, a method based upon MLE was used for parameter estimation. To quantify the uncertainty in urban air toxic emission factors, parametric bootstrap simulation and empirical bootstrap simulation were applied to uncensored and censored data, respectively. The probabilistic emission inventories were developed based on the product of the uncertainties in the emission factors and in the activity factors. The uncertainties in the urban air toxics emission inventories range from as small as -25 to +30% for Hg to as large as -83 to +243% for As. The key sources of uncertainty in the emission inventory for each toxic are identified based upon sensitivity analysis. Typically, uncertainty in the inventory of a given pollutant can be attributed primarily to a small number of source categories. Priorities for improving the inventories and for refining the probabilistic analysis are discussed.
ProbCD: enrichment analysis accounting for categorization uncertainty.
Vêncio, Ricardo Z N; Shmulevich, Ilya
2007-10-12
As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. We developed an open-source R-based software to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: http://xerad.systemsbiology.net/ProbCD/. We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation.
Probabilistic structural analysis to quantify uncertainties associated with turbopump blades
NASA Technical Reports Server (NTRS)
Nagpal, Vinod K.; Rubinstein, Robert; Chamis, Christos C.
1987-01-01
A probabilistic study of turbopump blades has been in progress at NASA Lewis Research Center for over the last two years. The objectives of this study are to evaluate the effects of uncertainties in geometry and material properties on the structural response of the turbopump blades to evaluate the tolerance limits on the design. A methodology based on probabilistic approach has been developed to quantify the effects of the random uncertainties. The results of this study indicate that only the variations in geometry have significant effects.
The fourfold way of the genetic code.
Jiménez-Montaño, Miguel Angel
2009-11-01
We describe a compact representation of the genetic code that factorizes the table in quartets. It represents a "least grammar" for the genetic language. It is justified by the Klein-4 group structure of RNA bases and codon doublets. The matrix of the outer product between the column-vector of bases and the corresponding row-vector V(T)=(C G U A), considered as signal vectors, has a block structure consisting of the four cosets of the KxK group of base transformations acting on doublet AA. This matrix, translated into weak/strong (W/S) and purine/pyrimidine (R/Y) nucleotide classes, leads to a code table with mixed and unmixed families in separate regions. A basic difference between them is the non-commuting (R/Y) doublets: AC/CA, GU/UG. We describe the degeneracy in the canonical code and the systematic changes in deviant codes in terms of the divisors of 24, employing modulo multiplication groups. We illustrate binary sub-codes characterizing mutations in the quartets. We introduce a decision-tree to predict the mode of tRNA recognition corresponding to each codon, and compare our result with related findings by Jestin and Soulé [Jestin, J.-L., Soulé, C., 2007. Symmetries by base substitutions in the genetic code predict 2' or 3' aminoacylation of tRNAs. J. Theor. Biol. 247, 391-394], and the rearrangements of the table by Delarue [Delarue, M., 2007. An asymmetric underlying rule in the assignment of codons: possible clue to a quick early evolution of the genetic code via successive binary choices. RNA 13, 161-169] and Rodin and Rodin [Rodin, S.N., Rodin, A.S., 2008. On the origin of the genetic code: signatures of its primordial complementarity in tRNAs and aminoacyl-tRNA synthetases. Heredity 100, 341-355], respectively.
Zhang, Li; Luo, Jiang-Tao; Hao, Ming; Zhang, Lian-Quan; Yuan, Zhong-Wei; Yan, Ze-Hong; Liu, Ya-Xi; Zhang, Bo; Liu, Bao-Long; Liu, Chun-Ji; Zhang, Huai-Gang; Zheng, You-Liang; Liu, Deng-Cai
2012-08-13
A synthetic doubled-haploid hexaploid wheat population, SynDH1, derived from the spontaneous chromosome doubling of triploid F1 hybrid plants obtained from the cross of hybrids Triticum turgidum ssp. durum line Langdon (LDN) and ssp. turgidum line AS313, with Aegilops tauschii ssp. tauschii accession AS60, was previously constructed. SynDH1 is a tetraploidization-hexaploid doubled haploid (DH) population because it contains recombinant A and B chromosomes from two different T. turgidum genotypes, while all the D chromosomes from Ae. tauschii are homogenous across the whole population. This paper reports the construction of a genetic map using this population. Of the 606 markers used to assemble the genetic map, 588 (97%) were assigned to linkage groups. These included 513 Diversity Arrays Technology (DArT) markers, 72 simple sequence repeat (SSR), one insertion site-based polymorphism (ISBP), and two high-molecular-weight glutenin subunit (HMW-GS) markers. These markers were assigned to the 14 chromosomes, covering 2048.79 cM, with a mean distance of 3.48 cM between adjacent markers. This map showed good coverage of the A and B genome chromosomes, apart from 3A, 5A, 6A, and 4B. Compared with previously reported maps, most shared markers showed highly consistent orders. This map was successfully used to identify five quantitative trait loci (QTL), including two for spikelet number on chromosomes 7A and 5B, two for spike length on 7A and 3B, and one for 1000-grain weight on 4B. However, differences in crossability QTL between the two T. turgidum parents may explain the segregation distortion regions on chromosomes 1A, 3B, and 6B. A genetic map of T. turgidum including 588 markers was constructed using a synthetic doubled haploid (SynDH) hexaploid wheat population. Five QTLs for three agronomic traits were identified from this population. However, more markers are needed to increase the density and resolution of this map in the future study.
Wolfe, Christopher R.; Reyna, Valerie F.; Widmer, Colin L.; Cedillos, Elizabeth M.; Fisher, Christopher R.; Brust-Renck, Priscila G.; Weil, Audrey M.
2014-01-01
Background Many healthy women consider genetic testing for breast cancer risk, yet BRCA testing issues are complex. Objective Determining whether an intelligent tutor, BRCA Gist, grounded in fuzzy-trace theory (FTT), increases gist comprehension and knowledge about genetic testing for breast cancer risk, improving decision-making. Design In two experiments, 410 healthy undergraduate women were randomly assigned to one of three groups: an online module using a web-based tutoring system (BRCA Gist) that uses artificial intelligence technology, a second group read highly similar content from the NCI web site, and a third completed an unrelated tutorial. Intervention BRCA Gist applied fuzzy trace theory and was designed to help participants develop gist comprehension of topics relevant to decisions about BRCA genetic testing, including how breast cancer spreads, inherited genetic mutations, and base rates. Measures We measured content knowledge, gist comprehension of decision-relevant information, interest in testing, and genetic risk and testing judgments. Results Control knowledge scores ranged from 54% to 56%, NCI improved significantly to 65% and 70%, and BRCA Gist improved significantly more to 75% and 77%, p<.0001. BRCA Gist scored higher on gist comprehension than NCI and control, p<.0001. Control genetic risk-assessment mean was 48% correct; BRCA Gist (61%), and NCI (56%) were significantly higher, p<.0001. BRCA Gist participants recommended less testing for women without risk factors (not good candidates), (24% and 19%) than controls (50%, both experiments) and NCI, (32%) Experiment 2, p<.0001. BRCA Gist testing interest was lower than controls, p<.0001. Limitations BRCA Gist has not been tested with older women from diverse groups. Conclusions Intelligent tutors, such as BRCA Gist, are scalable, cost effective ways of helping people understand complex issues, improving decision-making. PMID:24829276
Pattern recognition for passive polarimetric data using nonparametric classifiers
NASA Astrophysics Data System (ADS)
Thilak, Vimal; Saini, Jatinder; Voelz, David G.; Creusere, Charles D.
2005-08-01
Passive polarization based imaging is a useful tool in computer vision and pattern recognition. A passive polarization imaging system forms a polarimetric image from the reflection of ambient light that contains useful information for computer vision tasks such as object detection (classification) and recognition. Applications of polarization based pattern recognition include material classification and automatic shape recognition. In this paper, we present two target detection algorithms for images captured by a passive polarimetric imaging system. The proposed detection algorithms are based on Bayesian decision theory. In these approaches, an object can belong to one of any given number classes and classification involves making decisions that minimize the average probability of making incorrect decisions. This minimum is achieved by assigning an object to the class that maximizes the a posteriori probability. Computing a posteriori probabilities requires estimates of class conditional probability density functions (likelihoods) and prior probabilities. A Probabilistic neural network (PNN), which is a nonparametric method that can compute Bayes optimal boundaries, and a -nearest neighbor (KNN) classifier, is used for density estimation and classification. The proposed algorithms are applied to polarimetric image data gathered in the laboratory with a liquid crystal-based system. The experimental results validate the effectiveness of the above algorithms for target detection from polarimetric data.
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions. PMID:26089862
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.
Just, Rebecca S; Irwin, Jodi A
2018-05-01
Some of the expected advantages of next generation sequencing (NGS) for short tandem repeat (STR) typing include enhanced mixture detection and genotype resolution via sequence variation among non-homologous alleles of the same length. However, at the same time that NGS methods for forensic DNA typing have advanced in recent years, many caseworking laboratories have implemented or are transitioning to probabilistic genotyping to assist the interpretation of complex autosomal STR typing results. Current probabilistic software programs are designed for length-based data, and were not intended to accommodate sequence strings as the product input. Yet to leverage the benefits of NGS for enhanced genotyping and mixture deconvolution, the sequence variation among same-length products must be utilized in some form. Here, we propose use of the longest uninterrupted stretch (LUS) in allele designations as a simple method to represent sequence variation within the STR repeat regions and facilitate - in the nearterm - probabilistic interpretation of NGS-based typing results. An examination of published population data indicated that a reference LUS region is straightforward to define for most autosomal STR loci, and that using repeat unit plus LUS length as the allele designator can represent greater than 80% of the alleles detected by sequencing. A proof of concept study performed using a freely available probabilistic software demonstrated that the LUS length can be used in allele designations when a program does not require alleles to be integers, and that utilizing sequence information improves interpretation of both single-source and mixed contributor STR typing results as compared to using repeat unit information alone. The LUS concept for allele designation maintains the repeat-based allele nomenclature that will permit backward compatibility to extant STR databases, and the LUS lengths themselves will be concordant regardless of the NGS assay or analysis tools employed. Further, these biologically based, easy-to-derive designations uphold clear relationships between parent alleles and their stutter products, enabling analysis in fully continuous probabilistic programs that model stutter while avoiding the algorithmic complexities that come with string based searches. Though using repeat unit plus LUS length as the allele designator does not capture variation that occurs outside of the core repeat regions, this straightforward approach would permit the large majority of known STR sequence variation to be used for mixture deconvolution and, in turn, result in more informative mixture statistics in the near term. Ultimately, the method could bridge the gap from current length-based probabilistic systems to facilitate broader adoption of NGS by forensic DNA testing laboratories. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Scalable DB+IR Technology: Processing Probabilistic Datalog with HySpirit.
Frommholz, Ingo; Roelleke, Thomas
2016-01-01
Probabilistic Datalog (PDatalog, proposed in 1995) is a probabilistic variant of Datalog and a nice conceptual idea to model Information Retrieval in a logical, rule-based programming paradigm. Making PDatalog work in real-world applications requires more than probabilistic facts and rules, and the semantics associated with the evaluation of the programs. We report in this paper some of the key features of the HySpirit system required to scale the execution of PDatalog programs. Firstly, there is the requirement to express probability estimation in PDatalog. Secondly, fuzzy-like predicates are required to model vague predicates (e.g. vague match of attributes such as age or price). Thirdly, to handle large data sets there are scalability issues to be addressed, and therefore, HySpirit provides probabilistic relational indexes and parallel and distributed processing . The main contribution of this paper is a consolidated view on the methods of the HySpirit system to make PDatalog applicable in real-scale applications that involve a wide range of requirements typical for data (information) management and analysis.
Probabilistic interpretation of Peelle's pertinent puzzle and its resolution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hanson, Kenneth M.; Kawano, T.; Talou, P.
2004-01-01
Peelle's Pertinent Puzzle (PPP) states a seemingly plausible set of measurements with their covariance matrix, which produce an implausible answer. To answer the PPP question, we describe a reasonable experimental situation that is consistent with the PPP solution. The confusion surrounding the PPP arises in part because of its imprecise statement, which permits to a variety of interpretations and resulting answers, some of which seem implausible. We emphasize the importance of basing the analysis on an unambiguous probabilistic model that reflects the experimental situation. We present several different models of how the measurements quoted in the PPP problem could bemore » obtained, and interpret their solution in terms of a detailed probabilistic analysis. We suggest a probabilistic approach to handling uncertainties about which model to use.« less
Probabilistic Interpretation of Peelle's Pertinent Puzzle and its Resolution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hanson, Kenneth M.; Kawano, Toshihiko; Talou, Patrick
2005-05-24
Peelle's Pertinent Puzzle (PPP) states a seemingly plausible set of measurements with their covariance matrix, which produce an implausible answer. To answer the PPP question, we describe a reasonable experimental situation that is consistent with the PPP solution. The confusion surrounding the PPP arises in part because of its imprecise statement, which permits to a variety of interpretations and resulting answers, some of which seem implausible. We emphasize the importance of basing the analysis on an unambiguous probabilistic model that reflects the experimental situation. We present several different models of how the measurements quoted in the PPP problem could bemore » obtained, and interpret their solution in terms of a detailed probabilistic analysis. We suggest a probabilistic approach to handling uncertainties about which model to use.« less
Probabilistic drug connectivity mapping
2014-01-01
Background The aim of connectivity mapping is to match drugs using drug-treatment gene expression profiles from multiple cell lines. This can be viewed as an information retrieval task, with the goal of finding the most relevant profiles for a given query drug. We infer the relevance for retrieval by data-driven probabilistic modeling of the drug responses, resulting in probabilistic connectivity mapping, and further consider the available cell lines as different data sources. We use a special type of probabilistic model to separate what is shared and specific between the sources, in contrast to earlier connectivity mapping methods that have intentionally aggregated all available data, neglecting information about the differences between the cell lines. Results We show that the probabilistic multi-source connectivity mapping method is superior to alternatives in finding functionally and chemically similar drugs from the Connectivity Map data set. We also demonstrate that an extension of the method is capable of retrieving combinations of drugs that match different relevant parts of the query drug response profile. Conclusions The probabilistic modeling-based connectivity mapping method provides a promising alternative to earlier methods. Principled integration of data from different cell lines helps to identify relevant responses for specific drug repositioning applications. PMID:24742351
NASA Astrophysics Data System (ADS)
Bin, Che; Ruoying, Yu; Dongsheng, Dang; Xiangyan, Wang
2017-05-01
Distributed Generation (DG) integrating to the network would cause the harmonic pollution which would cause damages on electrical devices and affect the normal operation of power system. On the other hand, due to the randomness of the wind and solar irradiation, the output of DG is random, too, which leads to an uncertainty of the harmonic generated by the DG. Thus, probabilistic methods are needed to analyse the impacts of the DG integration. In this work we studied the harmonic voltage probabilistic distribution and the harmonic distortion in distributed network after the distributed photovoltaic (DPV) system integrating in different weather conditions, mainly the sunny day, cloudy day, rainy day and the snowy day. The probabilistic distribution function of the DPV output power in different typical weather conditions could be acquired via the parameter identification method of maximum likelihood estimation. The Monte-Carlo simulation method was adopted to calculate the probabilistic distribution of harmonic voltage content at different frequency orders as well as the harmonic distortion (THD) in typical weather conditions. The case study was based on the IEEE33 system and the results of harmonic voltage content probabilistic distribution as well as THD in typical weather conditions were compared.
A probabilistic and continuous model of protein conformational space for template-free modeling.
Zhao, Feng; Peng, Jian; Debartolo, Joe; Freed, Karl F; Sosnick, Tobin R; Xu, Jinbo
2010-06-01
One of the major challenges with protein template-free modeling is an efficient sampling algorithm that can explore a huge conformation space quickly. The popular fragment assembly method constructs a conformation by stringing together short fragments extracted from the Protein Data Base (PDB). The discrete nature of this method may limit generated conformations to a subspace in which the native fold does not belong. Another worry is that a protein with really new fold may contain some fragments not in the PDB. This article presents a probabilistic model of protein conformational space to overcome the above two limitations. This probabilistic model employs directional statistics to model the distribution of backbone angles and 2(nd)-order Conditional Random Fields (CRFs) to describe sequence-angle relationship. Using this probabilistic model, we can sample protein conformations in a continuous space, as opposed to the widely used fragment assembly and lattice model methods that work in a discrete space. We show that when coupled with a simple energy function, this probabilistic method compares favorably with the fragment assembly method in the blind CASP8 evaluation, especially on alpha or small beta proteins. To our knowledge, this is the first probabilistic method that can search conformations in a continuous space and achieves favorable performance. Our method also generated three-dimensional (3D) models better than template-based methods for a couple of CASP8 hard targets. The method described in this article can also be applied to protein loop modeling, model refinement, and even RNA tertiary structure prediction.
Korolev, Igor O.; Symonds, Laura L.; Bozoki, Andrea C.
2016-01-01
Background Individuals with mild cognitive impairment (MCI) have a substantially increased risk of developing dementia due to Alzheimer's disease (AD). In this study, we developed a multivariate prognostic model for predicting MCI-to-dementia progression at the individual patient level. Methods Using baseline data from 259 MCI patients and a probabilistic, kernel-based pattern classification approach, we trained a classifier to distinguish between patients who progressed to AD-type dementia (n = 139) and those who did not (n = 120) during a three-year follow-up period. More than 750 variables across four data sources were considered as potential predictors of progression. These data sources included risk factors, cognitive and functional assessments, structural magnetic resonance imaging (MRI) data, and plasma proteomic data. Predictive utility was assessed using a rigorous cross-validation framework. Results Cognitive and functional markers were most predictive of progression, while plasma proteomic markers had limited predictive utility. The best performing model incorporated a combination of cognitive/functional markers and morphometric MRI measures and predicted progression with 80% accuracy (83% sensitivity, 76% specificity, AUC = 0.87). Predictors of progression included scores on the Alzheimer's Disease Assessment Scale, Rey Auditory Verbal Learning Test, and Functional Activities Questionnaire, as well as volume/cortical thickness of three brain regions (left hippocampus, middle temporal gyrus, and inferior parietal cortex). Calibration analysis revealed that the model is capable of generating probabilistic predictions that reliably reflect the actual risk of progression. Finally, we found that the predictive accuracy of the model varied with patient demographic, genetic, and clinical characteristics and could be further improved by taking into account the confidence of the predictions. Conclusions We developed an accurate prognostic model for predicting MCI-to-dementia progression over a three-year period. The model utilizes widely available, cost-effective, non-invasive markers and can be used to improve patient selection in clinical trials and identify high-risk MCI patients for early treatment. PMID:26901338
Tracing the source of campylobacteriosis.
Wilson, Daniel J; Gabriel, Edith; Leatherbarrow, Andrew J H; Cheesbrough, John; Gee, Steven; Bolton, Eric; Fox, Andrew; Fearnhead, Paul; Hart, C Anthony; Diggle, Peter J
2008-09-26
Campylobacter jejuni is the leading cause of bacterial gastro-enteritis in the developed world. It is thought to infect 2-3 million people a year in the US alone, at a cost to the economy in excess of US $4 billion. C. jejuni is a widespread zoonotic pathogen that is carried by animals farmed for meat and poultry. A connection with contaminated food is recognized, but C. jejuni is also commonly found in wild animals and water sources. Phylogenetic studies have suggested that genotypes pathogenic to humans bear greatest resemblance to non-livestock isolates. Moreover, seasonal variation in campylobacteriosis bears the hallmarks of water-borne disease, and certain outbreaks have been attributed to contamination of drinking water. As a result, the relative importance of these reservoirs to human disease is controversial. We use multilocus sequence typing to genotype 1,231 cases of C. jejuni isolated from patients in Lancashire, England. By modeling the DNA sequence evolution and zoonotic transmission of C. jejuni between host species and the environment, we assign human cases probabilistically to source populations. Our novel population genetics approach reveals that the vast majority (97%) of sporadic disease can be attributed to animals farmed for meat and poultry. Chicken and cattle are the principal sources of C. jejuni pathogenic to humans, whereas wild animal and environmental sources are responsible for just 3% of disease. Our results imply that the primary transmission route is through the food chain, and suggest that incidence could be dramatically reduced by enhanced on-farm biosecurity or preventing food-borne transmission.
Khosravi, Rasoul; Rezaei, Hamid Reza; Kaboli, Mohammad
2013-01-01
The genetic threat due to hybridization with free-ranging dogs is one major concern in wolf conservation. The identification of hybrids and extent of hybridization is important in the conservation and management of wolf populations. Genetic variation was analyzed at 15 unlinked loci in 28 dogs, 28 wolves, four known hybrids, two black wolves, and one dog with abnormal traits in Iran. Pritchard's model, multivariate ordination by principal component analysis and neighbor joining clustering were used for population clustering and individual assignment. Analysis of genetic variation showed that genetic variability is high in both wolf and dog populations in Iran. Values of H(E) in dog and wolf samples ranged from 0.75-0.92 and 0.77-0.92, respectively. The results of AMOVA showed that the two groups of dog and wolf were significantly different (F(ST) = 0.05 and R(ST) = 0.36; P < 0.001). In each of the three methods, wolf and dog samples were separated into two distinct clusters. Two dark wolves were assigned to the wolf cluster. Also these models detected D32 (dog with abnormal traits) and some other samples, which were assigned to more than one cluster and could be a hybrid. This study is the beginning of a genetic study in wolf populations in Iran, and our results reveal that as in other countries, hybridization between wolves and dogs is sporadic in Iran and can be a threat to wolf populations if human perturbations increase.
Bourgeois, Lelania; Beaman, Lorraine
2017-08-01
A genetic stock identification (GSI) assay was developed in 2008 to distinguish Russian honey bees from other honey bee stocks that are commercially produced in the United States. Probability of assignment (POA) values have been collected and maintained since the stock release in 2008 to the Russian Honey Bee Breeders Association. These data were used to assess stability of the breeding program and the diversity levels of the contemporary breeding stock through comparison of POA values and genetic diversity parameters from the initial release to current values. POA values fluctuated throughout 2010-2016, but have recovered to statistically similar levels in 2016 (POA(2010) = 0.82, POA(2016) = 0.74; P = 0.33). Genetic diversity parameters (i.e., allelic richness and gene diversity) in 2016 also remained at similar levels when compared to those in 2010. Estimates of genetic structure revealed stability (FST(2009/2016) = 0.0058) with a small increase in the estimate of the inbreeding coefficient (FIS(2010) = 0.078, FIS(2016) = 0.149). The relationship among breeding lines, based on genetic distance measurement, was similar in 2008 and 2016 populations, but with increased homogeneity among lines (i.e., decreased genetic distance). This was expected based on the closed breeding system used for Russian honey bees. The successful application of the GSI assay in a commercial breeding program demonstrates the utility and stability of such technology to contribute to and monitor the genetic integrity of a breeding stock of an insect species. Published by Oxford University Press on behalf of Entomological Society of America 2017. This work is written by US Government employees and is in the public domain in the US.
Quantum annealing for combinatorial clustering
NASA Astrophysics Data System (ADS)
Kumar, Vaibhaw; Bass, Gideon; Tomlin, Casey; Dulny, Joseph
2018-02-01
Clustering is a powerful machine learning technique that groups "similar" data points based on their characteristics. Many clustering algorithms work by approximating the minimization of an objective function, namely the sum of within-the-cluster distances between points. The straightforward approach involves examining all the possible assignments of points to each of the clusters. This approach guarantees the solution will be a global minimum; however, the number of possible assignments scales quickly with the number of data points and becomes computationally intractable even for very small datasets. In order to circumvent this issue, cost function minima are found using popular local search-based heuristic approaches such as k-means and hierarchical clustering. Due to their greedy nature, such techniques do not guarantee that a global minimum will be found and can lead to sub-optimal clustering assignments. Other classes of global search-based techniques, such as simulated annealing, tabu search, and genetic algorithms, may offer better quality results but can be too time-consuming to implement. In this work, we describe how quantum annealing can be used to carry out clustering. We map the clustering objective to a quadratic binary optimization problem and discuss two clustering algorithms which are then implemented on commercially available quantum annealing hardware, as well as on a purely classical solver "qbsolv." The first algorithm assigns N data points to K clusters, and the second one can be used to perform binary clustering in a hierarchical manner. We present our results in the form of benchmarks against well-known k-means clustering and discuss the advantages and disadvantages of the proposed techniques.
Probabilistic Assessment of National Wind Tunnel
NASA Technical Reports Server (NTRS)
Shah, A. R.; Shiao, M.; Chamis, C. C.
1996-01-01
A preliminary probabilistic structural assessment of the critical section of National Wind Tunnel (NWT) is performed using NESSUS (Numerical Evaluation of Stochastic Structures Under Stress) computer code. Thereby, the capabilities of NESSUS code have been demonstrated to address reliability issues of the NWT. Uncertainties in the geometry, material properties, loads and stiffener location on the NWT are considered to perform the reliability assessment. Probabilistic stress, frequency, buckling, fatigue and proof load analyses are performed. These analyses cover the major global and some local design requirements. Based on the assumed uncertainties, the results reveal the assurance of minimum 0.999 reliability for the NWT. Preliminary life prediction analysis results show that the life of the NWT is governed by the fatigue of welds. Also, reliability based proof test assessment is performed.
Sukumaran, Jeet; Knowles, L Lacey
2018-06-01
The development of process-based probabilistic models for historical biogeography has transformed the field by grounding it in modern statistical hypothesis testing. However, most of these models abstract away biological differences, reducing species to interchangeable lineages. We present here the case for reintegration of biology into probabilistic historical biogeographical models, allowing a broader range of questions about biogeographical processes beyond ancestral range estimation or simple correlation between a trait and a distribution pattern, as well as allowing us to assess how inferences about ancestral ranges themselves might be impacted by differential biological traits. We show how new approaches to inference might cope with the computational challenges resulting from the increased complexity of these trait-based historical biogeographical models. Copyright © 2018 Elsevier Ltd. All rights reserved.
UQTools: The Uncertainty Quantification Toolbox - Introduction and Tutorial
NASA Technical Reports Server (NTRS)
Kenny, Sean P.; Crespo, Luis G.; Giesy, Daniel P.
2012-01-01
UQTools is the short name for the Uncertainty Quantification Toolbox, a software package designed to efficiently quantify the impact of parametric uncertainty on engineering systems. UQTools is a MATLAB-based software package and was designed to be discipline independent, employing very generic representations of the system models and uncertainty. Specifically, UQTools accepts linear and nonlinear system models and permits arbitrary functional dependencies between the system s measures of interest and the probabilistic or non-probabilistic parametric uncertainty. One of the most significant features incorporated into UQTools is the theoretical development centered on homothetic deformations and their application to set bounding and approximating failure probabilities. Beyond the set bounding technique, UQTools provides a wide range of probabilistic and uncertainty-based tools to solve key problems in science and engineering.
Design-based Sample and Probability Law-Assumed Sample: Their Role in Scientific Investigation.
ERIC Educational Resources Information Center
Ojeda, Mario Miguel; Sahai, Hardeo
2002-01-01
Discusses some key statistical concepts in probabilistic and non-probabilistic sampling to provide an overview for understanding the inference process. Suggests a statistical model constituting the basis of statistical inference and provides a brief review of the finite population descriptive inference and a quota sampling inferential theory.…
Distinct Roles of Dopamine and Subthalamic Nucleus in Learning and Probabilistic Decision Making
ERIC Educational Resources Information Center
Coulthard, Elizabeth J.; Bogacz, Rafal; Javed, Shazia; Mooney, Lucy K.; Murphy, Gillian; Keeley, Sophie; Whone, Alan L.
2012-01-01
Even simple behaviour requires us to make decisions based on combining multiple pieces of learned and new information. Making such decisions requires both learning the optimal response to each given stimulus as well as combining probabilistic information from multiple stimuli before selecting a response. Computational theories of decision making…
Tabachnick, W J; Wallis, G P; Aitken, T H; Miller, B R; Amato, G D; Lorenz, L; Powell, J R; Beaty, B J
1985-11-01
Twenty-eight populations representing a worldwide distribution of Aedes aegypti were tested for their ability to become orally infected with yellow fever virus (YFV). Populations had been analyzed for genetic variations at 11 isozyme loci and assigned to one of 8 genetic geographic groups of Ae. aegypti. Infection rates suggest that populations showing isozyme genetic relatedness also demonstrate similarity to oral infection rates with YFV. The findings support the hypothesis that genetic variation exists for oral susceptibility to YFV in Ae. aegypti.
NASA Technical Reports Server (NTRS)
Sobel, Larry; Buttitta, Claudio; Suarez, James
1993-01-01
Probabilistic predictions based on the Integrated Probabilistic Assessment of Composite Structures (IPACS) code are presented for the material and structural response of unnotched and notched, 1M6/3501-6 Gr/Ep laminates. Comparisons of predicted and measured modulus and strength distributions are given for unnotched unidirectional, cross-ply, and quasi-isotropic laminates. The predicted modulus distributions were found to correlate well with the test results for all three unnotched laminates. Correlations of strength distributions for the unnotched laminates are judged good for the unidirectional laminate and fair for the cross-ply laminate, whereas the strength correlation for the quasi-isotropic laminate is deficient because IPACS did not yet have a progressive failure capability. The paper also presents probabilistic and structural reliability analysis predictions for the strain concentration factor (SCF) for an open-hole, quasi-isotropic laminate subjected to longitudinal tension. A special procedure was developed to adapt IPACS for the structural reliability analysis. The reliability results show the importance of identifying the most significant random variables upon which the SCF depends, and of having accurate scatter values for these variables.
Kindermans, Pieter-Jan; Verschore, Hannes; Schrauwen, Benjamin
2013-10-01
In recent years, in an attempt to maximize performance, machine learning approaches for event-related potential (ERP) spelling have become more and more complex. In this paper, we have taken a step back as we wanted to improve the performance without building an overly complex model, that cannot be used by the community. Our research resulted in a unified probabilistic model for ERP spelling, which is based on only three assumptions and incorporates language information. On top of that, the probabilistic nature of our classifier yields a natural dynamic stopping strategy. Furthermore, our method uses the same parameters across 25 subjects from three different datasets. We show that our classifier, when enhanced with language models and dynamic stopping, improves the spelling speed and accuracy drastically. Additionally, we would like to point out that as our model is entirely probabilistic, it can easily be used as the foundation for complex systems in future work. All our experiments are executed on publicly available datasets to allow for future comparison with similar techniques.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Staschus, K.
1985-01-01
In this dissertation, efficient algorithms for electric-utility capacity expansion planning with renewable energy are developed. The algorithms include a deterministic phase that quickly finds a near-optimal expansion plan using derating and a linearized approximation to the time-dependent availability of nondispatchable energy sources. A probabilistic second phase needs comparatively few computer-time consuming probabilistic simulation iterations to modify this solution towards the optimal expansion plan. For the deterministic first phase, two algorithms, based on a Lagrangian Dual decomposition and a Generalized Benders Decomposition, are developed. The probabilistic second phase uses a Generalized Benders Decomposition approach. Extensive computational tests of the algorithms aremore » reported. Among the deterministic algorithms, the one based on Lagrangian Duality proves fastest. The two-phase approach is shown to save up to 80% in computing time as compared to a purely probabilistic algorithm. The algorithms are applied to determine the optimal expansion plan for the Tijuana-Mexicali subsystem of the Mexican electric utility system. A strong recommendation to push conservation programs in the desert city of Mexicali results from this implementation.« less
NASA Astrophysics Data System (ADS)
Wellons, Sarah; Torrey, Paul
2017-06-01
Galaxy populations at different cosmic epochs are often linked by cumulative comoving number density in observational studies. Many theoretical works, however, have shown that the cumulative number densities of tracked galaxy populations not only evolve in bulk, but also spread out over time. We present a method for linking progenitor and descendant galaxy populations which takes both of these effects into account. We define probability distribution functions that capture the evolution and dispersion of galaxy populations in number density space, and use these functions to assign galaxies at redshift zf probabilities of being progenitors/descendants of a galaxy population at another redshift z0. These probabilities are used as weights for calculating distributions of physical progenitor/descendant properties such as stellar mass, star formation rate or velocity dispersion. We demonstrate that this probabilistic method provides more accurate predictions for the evolution of physical properties than the assumption of either a constant number density or an evolving number density in a bin of fixed width by comparing predictions against galaxy populations directly tracked through a cosmological simulation. We find that the constant number density method performs least well at recovering galaxy properties, the evolving method density slightly better and the probabilistic method best of all. The improvement is present for predictions of stellar mass as well as inferred quantities such as star formation rate and velocity dispersion. We demonstrate that this method can also be applied robustly and easily to observational data, and provide a code package for doing so.
Bao, Wenquan; Li, Tiezhu; Liu, Huimin; Jiang, Zhongmao; Zhu, Xuchun; Du, Hongyan; Bai, Yu-e
2017-01-01
Prunus mira Koehne, an important economic fruit crop with high breeding and medicinal values, and an ancestral species of many cultivated peach species, has recently been declared an endangered species. However, basic information about genetic diversity, population structure, and morphological variation is still limited for this species. In this study, we sampled 420 P. mira individuals from 21 wild populations in the Tibet plateau to conduct a comprehensive analysis of genetic and morphological characteristics. The results of molecular analyses based on simple sequence repeat (SSR) markers indicated moderate genetic diversity and inbreeding (A = 3.8, Ae = 2.5, He = 0.52, Ho = 0.44, I = 0.95, FIS = 0.17) within P. mira populations. STRUCTURE, GENELAND, and phylogenetic analyses assigned the 21 populations to three genetic clusters that were moderately correlated with geographic altitudes, and this may have resulted from significantly different climatic and environmental factors at different altitudinal ranges. Significant isolation-by-distance was detected across the entire distribution of P. mira populations, but geographic altitude might have more significant effects on genetic structure than geographic distance in partial small-scale areas. Furthermore, clear genetic structure, high genetic differentiation, and restricted gene flow were detected between pairwise populations from different geographic groups, indicating that geographic barriers and genetic drift have significant effects on P. mira populations. Analyses of molecular variance based on the SSR markers indicated high variation (83.7% and 81.7%), whereas morphological analyses revealed low variation (1.30%–36.17%) within the populations. Large and heavy fruits were better adapted than light fruits and nutlets to poor climate and environmental conditions at high altitudes. Based on the results of molecular and morphological analyses, we classified the area into three conservation units and proposed several conservation strategies for wild P. mira populations in the Tibet plateau. PMID:29186199
Bao, Wenquan; Wuyun, Tana; Li, Tiezhu; Liu, Huimin; Jiang, Zhongmao; Zhu, Xuchun; Du, Hongyan; Bai, Yu-E
2017-01-01
Prunus mira Koehne, an important economic fruit crop with high breeding and medicinal values, and an ancestral species of many cultivated peach species, has recently been declared an endangered species. However, basic information about genetic diversity, population structure, and morphological variation is still limited for this species. In this study, we sampled 420 P. mira individuals from 21 wild populations in the Tibet plateau to conduct a comprehensive analysis of genetic and morphological characteristics. The results of molecular analyses based on simple sequence repeat (SSR) markers indicated moderate genetic diversity and inbreeding (A = 3.8, Ae = 2.5, He = 0.52, Ho = 0.44, I = 0.95, FIS = 0.17) within P. mira populations. STRUCTURE, GENELAND, and phylogenetic analyses assigned the 21 populations to three genetic clusters that were moderately correlated with geographic altitudes, and this may have resulted from significantly different climatic and environmental factors at different altitudinal ranges. Significant isolation-by-distance was detected across the entire distribution of P. mira populations, but geographic altitude might have more significant effects on genetic structure than geographic distance in partial small-scale areas. Furthermore, clear genetic structure, high genetic differentiation, and restricted gene flow were detected between pairwise populations from different geographic groups, indicating that geographic barriers and genetic drift have significant effects on P. mira populations. Analyses of molecular variance based on the SSR markers indicated high variation (83.7% and 81.7%), whereas morphological analyses revealed low variation (1.30%-36.17%) within the populations. Large and heavy fruits were better adapted than light fruits and nutlets to poor climate and environmental conditions at high altitudes. Based on the results of molecular and morphological analyses, we classified the area into three conservation units and proposed several conservation strategies for wild P. mira populations in the Tibet plateau.
Zulkifley, Mohd Asyraf; Rawlinson, David; Moran, Bill
2012-01-01
In video analytics, robust observation detection is very important as the content of the videos varies a lot, especially for tracking implementation. Contrary to the image processing field, the problems of blurring, moderate deformation, low illumination surroundings, illumination change and homogenous texture are normally encountered in video analytics. Patch-Based Observation Detection (PBOD) is developed to improve detection robustness to complex scenes by fusing both feature- and template-based recognition methods. While we believe that feature-based detectors are more distinctive, however, for finding the matching between the frames are best achieved by a collection of points as in template-based detectors. Two methods of PBOD—the deterministic and probabilistic approaches—have been tested to find the best mode of detection. Both algorithms start by building comparison vectors at each detected points of interest. The vectors are matched to build candidate patches based on their respective coordination. For the deterministic method, patch matching is done in 2-level test where threshold-based position and size smoothing are applied to the patch with the highest correlation value. For the second approach, patch matching is done probabilistically by modelling the histograms of the patches by Poisson distributions for both RGB and HSV colour models. Then, maximum likelihood is applied for position smoothing while a Bayesian approach is applied for size smoothing. The result showed that probabilistic PBOD outperforms the deterministic approach with average distance error of 10.03% compared with 21.03%. This algorithm is best implemented as a complement to other simpler detection methods due to heavy processing requirement. PMID:23202226
Uncertainty estimation of long-range ensemble forecasts of snowmelt flood characteristics
NASA Astrophysics Data System (ADS)
Kuchment, L.
2012-04-01
Long-range forecasts of snowmelt flood characteristics with the lead time of 2-3 months have important significance for regulation of flood runoff and mitigation of flood damages at almost all large Russian rivers At the same time, the application of current forecasting techniques based on regression relationships between the runoff volume and the indexes of river basin conditions can lead to serious errors in forecasting resulted in large economic losses caused by wrong flood regulation. The forecast errors can be caused by complicated processes of soil freezing and soil moisture redistribution, too high rate of snow melt, large liquid precipitation before snow melt. or by large difference of meteorological conditions during the lead-time periods from climatologic ones. Analysis of economic losses had shown that the largest damages could, to a significant extent, be avoided if the decision makers had an opportunity to take into account predictive uncertainty and could use more cautious strategies in runoff regulation. Development of methodology of long-range ensemble forecasting of spring/summer floods which is based on distributed physically-based runoff generation models has created, in principle, a new basis for improving hydrological predictions as well as for estimating their uncertainty. This approach is illustrated by forecasting of the spring-summer floods at the Vyatka River and the Seim River basins. The application of the physically - based models of snowmelt runoff generation give a essential improving of statistical estimates of the deterministic forecasts of the flood volume in comparison with the forecasts obtained from the regression relationships. These models had been used also for the probabilistic forecasts assigning meteorological inputs during lead time periods from the available historical daily series, and from the series simulated by using a weather generator and the Monte Carlo procedure. The weather generator consists of the stochastic models of daily temperature and precipitation. The performance of the probabilistic forecasts were estimated by the ranked probability skill scores. The application of Monte Carlo simulations using weather generator has given better results then using the historical meteorological series.
The ethics of disclosing genetic diagnosis for Alzheimer's disease: do we need a new paradigm?
Arribas-Ayllon, Michael
2011-01-01
Genetic testing for rare Mendelian disorders represents the dominant ethical paradigm in clinical and professional practice. Predictive testing for Huntington's disease is the model against which other kinds of genetic testing are evaluated, including testing for Alzheimer's disease. This paper retraces the historical development of ethical reasoning in relation to predictive genetic testing and reviews a range of ethical, sociological and psychological literature from the 1970s to the present. In the past, ethical reasoning has embodied a distinct style whereby normative principles are developed from a dominant disease exemplar. This reductionist approach to formulating ethical frameworks breaks down in the case of disease susceptibility. Recent developments in the genetics of Alzheimer's disease present a significant case for reconsidering the ethics of disclosing risk for common complex diseases. Disclosing the results of susceptibility testing for Alzheimer's disease has different social, psychological and behavioural consequences. Furthermore, what genetic susceptibility means to individuals and their families is diffuse and often mitigated by other factors and concerns. The ethics of disclosing a genetic diagnosis of susceptibility is contingent on whether professionals accept that probabilistic risk information is in fact 'diagnostic' and it will rely substantially on empirical evidence of how people actually perceive, recall and communicate complex risk information.
Probabilistic methods for rotordynamics analysis
NASA Technical Reports Server (NTRS)
Wu, Y.-T.; Torng, T. Y.; Millwater, H. R.; Fossum, A. F.; Rheinfurth, M. H.
1991-01-01
This paper summarizes the development of the methods and a computer program to compute the probability of instability of dynamic systems that can be represented by a system of second-order ordinary linear differential equations. Two instability criteria based upon the eigenvalues or Routh-Hurwitz test functions are investigated. Computational methods based on a fast probability integration concept and an efficient adaptive importance sampling method are proposed to perform efficient probabilistic analysis. A numerical example is provided to demonstrate the methods.
Bayesian Networks Improve Causal Environmental Assessments for Evidence-Based Policy.
Carriger, John F; Barron, Mace G; Newman, Michael C
2016-12-20
Rule-based weight of evidence approaches to ecological risk assessment may not account for uncertainties and generally lack probabilistic integration of lines of evidence. Bayesian networks allow causal inferences to be made from evidence by including causal knowledge about the problem, using this knowledge with probabilistic calculus to combine multiple lines of evidence, and minimizing biases in predicting or diagnosing causal relationships. Too often, sources of uncertainty in conventional weight of evidence approaches are ignored that can be accounted for with Bayesian networks. Specifying and propagating uncertainties improve the ability of models to incorporate strength of the evidence in the risk management phase of an assessment. Probabilistic inference from a Bayesian network allows evaluation of changes in uncertainty for variables from the evidence. The network structure and probabilistic framework of a Bayesian approach provide advantages over qualitative approaches in weight of evidence for capturing the impacts of multiple sources of quantifiable uncertainty on predictions of ecological risk. Bayesian networks can facilitate the development of evidence-based policy under conditions of uncertainty by incorporating analytical inaccuracies or the implications of imperfect information, structuring and communicating causal issues through qualitative directed graph formulations, and quantitatively comparing the causal power of multiple stressors on valued ecological resources. These aspects are demonstrated through hypothetical problem scenarios that explore some major benefits of using Bayesian networks for reasoning and making inferences in evidence-based policy.
Zhao, Zhenguo; Shi, Wenbo
2014-01-01
Probabilistic signature scheme has been widely used in modern electronic commerce since it could provide integrity, authenticity, and nonrepudiation. Recently, Wu and Lin proposed a novel probabilistic signature (PS) scheme using the bilinear square Diffie-Hellman (BSDH) problem. They also extended it to a universal designated verifier signature (UDVS) scheme. In this paper, we analyze the security of Wu et al.'s PS scheme and UDVS scheme. Through concrete attacks, we demonstrate both of their schemes are not unforgeable. The security analysis shows that their schemes are not suitable for practical applications.
An approximate methods approach to probabilistic structural analysis
NASA Technical Reports Server (NTRS)
Mcclung, R. C.; Millwater, H. R.; Wu, Y.-T.; Thacker, B. H.; Burnside, O. H.
1989-01-01
A probabilistic structural analysis method (PSAM) is described which makes an approximate calculation of the structural response of a system, including the associated probabilistic distributions, with minimal computation time and cost, based on a simplified representation of the geometry, loads, and material. The method employs the fast probability integration (FPI) algorithm of Wu and Wirsching. Typical solution strategies are illustrated by formulations for a representative critical component chosen from the Space Shuttle Main Engine (SSME) as part of a major NASA-sponsored program on PSAM. Typical results are presented to demonstrate the role of the methodology in engineering design and analysis.
Genomic and Genetic Diversity within the Pseudomonas fluorescens Complex
Garrido-Sanz, Daniel; Meier-Kolthoff, Jan P.; Göker, Markus; Martín, Marta; Rivilla, Rafael; Redondo-Nieto, Miguel
2016-01-01
The Pseudomonas fluorescens complex includes Pseudomonas strains that have been taxonomically assigned to more than fifty different species, many of which have been described as plant growth-promoting rhizobacteria (PGPR) with potential applications in biocontrol and biofertilization. So far the phylogeny of this complex has been analyzed according to phenotypic traits, 16S rDNA, MLSA and inferred by whole-genome analysis. However, since most of the type strains have not been fully sequenced and new species are frequently described, correlation between taxonomy and phylogenomic analysis is missing. In recent years, the genomes of a large number of strains have been sequenced, showing important genomic heterogeneity and providing information suitable for genomic studies that are important to understand the genomic and genetic diversity shown by strains of this complex. Based on MLSA and several whole-genome sequence-based analyses of 93 sequenced strains, we have divided the P. fluorescens complex into eight phylogenomic groups that agree with previous works based on type strains. Digital DDH (dDDH) identified 69 species and 75 subspecies within the 93 genomes. The eight groups corresponded to clustering with a threshold of 31.8% dDDH, in full agreement with our MLSA. The Average Nucleotide Identity (ANI) approach showed inconsistencies regarding the assignment to species and to the eight groups. The small core genome of 1,334 CDSs and the large pan-genome of 30,848 CDSs, show the large diversity and genetic heterogeneity of the P. fluorescens complex. However, a low number of strains were enough to explain most of the CDSs diversity at core and strain-specific genomic fractions. Finally, the identification and analysis of group-specific genome and the screening for distinctive characters revealed a phylogenomic distribution of traits among the groups that provided insights into biocontrol and bioremediation applications as well as their role as PGPR. PMID:26915094
NASA Technical Reports Server (NTRS)
Crespo, Luis G.; Bushnell, Dennis M. (Technical Monitor)
2002-01-01
This paper presents a study on the optimization of systems with structured uncertainties, whose inputs and outputs can be exhaustively described in the probabilistic sense. By propagating the uncertainty from the input to the output in the space of the probability density functions and the moments, optimization problems that pursue performance, robustness and reliability based designs are studied. Be specifying the desired outputs in terms of desired probability density functions and then in terms of meaningful probabilistic indices, we settle a computationally viable framework for solving practical optimization problems. Applications to static optimization and stability control are used to illustrate the relevance of incorporating uncertainty in the early stages of the design. Several examples that admit a full probabilistic description of the output in terms of the design variables and the uncertain inputs are used to elucidate the main features of the generic problem and its solution. Extensions to problems that do not admit closed form solutions are also evaluated. Concrete evidence of the importance of using a consistent probabilistic formulation of the optimization problem and a meaningful probabilistic description of its solution is provided in the examples. In the stability control problem the analysis shows that standard deterministic approaches lead to designs with high probability of running into instability. The implementation of such designs can indeed have catastrophic consequences.
NASA Astrophysics Data System (ADS)
khawaldeh, Salem A. Al
2013-07-01
Background and purpose: The purpose of this study was to investigate the comparative effects of a prediction/discussion-based learning cycle (HPD-LC), conceptual change text (CCT) and traditional instruction on 10th grade students' understanding of genetics concepts. Sample: Participants were 112 10th basic grade male students in three classes of the same school located in an urban area. The three classes taught by the same biology teacher were randomly assigned as a prediction/discussion-based learning cycle class (n = 39), conceptual change text class (n = 37) and traditional class (n = 36). Design and method: A quasi-experimental research design of pre-test-post-test non-equivalent control group was adopted. Participants completed the Genetics Concept Test as pre-test-post-test, to examine the effects of instructional strategies on their genetics understanding. Pre-test scores and Test of Logical Thinking scores were used as covariates. Results: The analysis of covariance showed a statistically significant difference between the experimental and control groups in the favor of experimental groups after treatment. However, no statistically significant difference between the experimental groups (HPD-LC versus CCT instruction) was found. Conclusions: Overall, the findings of this study support the use of the prediction/discussion-based learning cycle and conceptual change text in both research and teaching. The findings may be useful for improving classroom practices in teaching science concepts and for the development of suitable materials promoting students' understanding of science.
Impact of refining the assessment of dietary exposure to cadmium in the European adult population.
Ferrari, Pietro; Arcella, Davide; Heraud, Fanny; Cappé, Stefano; Fabiansson, Stefan
2013-01-01
Exposure assessment constitutes an important step in any risk assessment of potentially harmful substances present in food. The European Food Safety Authority (EFSA) first assessed dietary exposure to cadmium in Europe using a deterministic framework, resulting in mean values of exposure in the range of health-based guidance values. Since then, the characterisation of foods has been refined to better match occurrence and consumption data, and a new strategy to handle left-censoring in occurrence data was devised. A probabilistic assessment was performed and compared with deterministic estimates, using occurrence values at the European level and consumption data from 14 national dietary surveys. Mean estimates in the probabilistic assessment ranged from 1.38 (95% CI = 1.35-1.44) to 2.08 (1.99-2.23) µg kg⁻¹ bodyweight (bw) week⁻¹ across the different surveys, which were less than 10% lower than deterministic (middle bound) mean values that ranged from 1.50 to 2.20 µg kg⁻¹ bw week⁻¹. Probabilistic 95th percentile estimates of dietary exposure ranged from 2.65 (2.57-2.72) to 4.99 (4.62-5.38) µg kg⁻¹ bw week⁻¹, which were, with the exception of one survey, between 3% and 17% higher than middle-bound deterministic estimates. Overall, the proportion of subjects exceeding the tolerable weekly intake of 2.5 µg kg⁻¹ bw ranged from 14.8% (13.6-16.0%) to 31.2% (29.7-32.5%) according to the probabilistic assessment. The results of this work indicate that mean values of dietary exposure to cadmium in the European population were of similar magnitude using determinist or probabilistic assessments. For higher exposure levels, probabilistic estimates were almost consistently larger than deterministic counterparts, thus reflecting the impact of using the full distribution of occurrence values to determine exposure levels. It is considered prudent to use probabilistic methodology should exposure estimates be close to or exceeding health-based guidance values.
Development of optimization-based probabilistic earthquake scenarios for the city of Tehran
NASA Astrophysics Data System (ADS)
Zolfaghari, M. R.; Peyghaleh, E.
2016-01-01
This paper presents the methodology and practical example for the application of optimization process to select earthquake scenarios which best represent probabilistic earthquake hazard in a given region. The method is based on simulation of a large dataset of potential earthquakes, representing the long-term seismotectonic characteristics in a given region. The simulation process uses Monte-Carlo simulation and regional seismogenic source parameters to generate a synthetic earthquake catalogue consisting of a large number of earthquakes, each characterized with magnitude, location, focal depth and fault characteristics. Such catalogue provides full distributions of events in time, space and size; however, demands large computation power when is used for risk assessment, particularly when other sources of uncertainties are involved in the process. To reduce the number of selected earthquake scenarios, a mixed-integer linear program formulation is developed in this study. This approach results in reduced set of optimization-based probabilistic earthquake scenario, while maintaining shape of hazard curves and full probabilistic picture by minimizing the error between hazard curves driven by full and reduced sets of synthetic earthquake scenarios. To test the model, the regional seismotectonic and seismogenic characteristics of northern Iran are used to simulate a set of 10,000-year worth of events consisting of some 84,000 earthquakes. The optimization model is then performed multiple times with various input data, taking into account probabilistic seismic hazard for Tehran city as the main constrains. The sensitivity of the selected scenarios to the user-specified site/return period error-weight is also assessed. The methodology could enhance run time process for full probabilistic earthquake studies like seismic hazard and risk assessment. The reduced set is the representative of the contributions of all possible earthquakes; however, it requires far less computation power. The authors have used this approach for risk assessment towards identification of effectiveness-profitability of risk mitigation measures, using optimization model for resource allocation. Based on the error-computation trade-off, 62-earthquake scenarios are chosen to be used for this purpose.
Fan, Ming; Thongsri, Tepwitoon; Axe, Lisa; Tyson, Trevor A
2005-06-01
A probabilistic approach was applied in an ecological risk assessment (ERA) to characterize risk and address uncertainty employing Monte Carlo simulations for assessing parameter and risk probabilistic distributions. This simulation tool (ERA) includes a Window's based interface, an interactive and modifiable database management system (DBMS) that addresses a food web at trophic levels, and a comprehensive evaluation of exposure pathways. To illustrate this model, ecological risks from depleted uranium (DU) exposure at the US Army Yuma Proving Ground (YPG) and Aberdeen Proving Ground (APG) were assessed and characterized. Probabilistic distributions showed that at YPG, a reduction in plant root weight is considered likely to occur (98% likelihood) from exposure to DU; for most terrestrial animals, likelihood for adverse reproduction effects ranges from 0.1% to 44%. However, for the lesser long-nosed bat, the effects are expected to occur (>99% likelihood) through the reduction in size and weight of offspring. Based on available DU data for the firing range at APG, DU uptake will not likely affect survival of aquatic plants and animals (<0.1% likelihood). Based on field and laboratory studies conducted at APG and YPG on pocket mice, kangaroo rat, white-throated woodrat, deer, and milfoil, body burden concentrations observed fall into the distributions simulated at both sites.
Zhang, Kejiang; Achari, Gopal; Pei, Yuansheng
2010-10-01
Different types of uncertain information-linguistic, probabilistic, and possibilistic-exist in site characterization. Their representation and propagation significantly influence the management of contaminated sites. In the absence of a framework with which to properly represent and integrate these quantitative and qualitative inputs together, decision makers cannot fully take advantage of the available and necessary information to identify all the plausible alternatives. A systematic methodology was developed in the present work to incorporate linguistic, probabilistic, and possibilistic information into the Preference Ranking Organization METHod for Enrichment Evaluation (PROMETHEE), a subgroup of Multi-Criteria Decision Analysis (MCDA) methods for ranking contaminated sites. The identification of criteria based on the paradigm of comparative risk assessment provides a rationale for risk-based prioritization. Uncertain linguistic, probabilistic, and possibilistic information identified in characterizing contaminated sites can be properly represented as numerical values, intervals, probability distributions, and fuzzy sets or possibility distributions, and linguistic variables according to their nature. These different kinds of representation are first transformed into a 2-tuple linguistic representation domain. The propagation of hybrid uncertainties is then carried out in the same domain. This methodology can use the original site information directly as much as possible. The case study shows that this systematic methodology provides more reasonable results. © 2010 SETAC.
Broad-Scale Genetic Diversity of Cannabis for Forensic Applications.
Dufresnes, Christophe; Jan, Catherine; Bienert, Friederike; Goudet, Jérôme; Fumagalli, Luca
2017-01-01
Cannabis (hemp and marijuana) is an iconic yet controversial crop. On the one hand, it represents a growing market for pharmaceutical and agricultural sectors. On the other hand, plants synthesizing the psychoactive THC produce the most widespread illicit drug in the world. Yet, the difficulty to reliably distinguish between Cannabis varieties based on morphological or biochemical criteria impedes the development of promising industrial programs and hinders the fight against narcotrafficking. Genetics offers an appropriate alternative to characterize drug vs. non-drug Cannabis. However, forensic applications require rapid and affordable genotyping of informative and reliable molecular markers for which a broad-scale reference database, representing both intra- and inter-variety variation, is available. Here we provide such a resource for Cannabis, by genotyping 13 microsatellite loci (STRs) in 1 324 samples selected specifically for fibre (24 hemp varieties) and drug (15 marijuana varieties) production. We showed that these loci are sufficient to capture most of the genome-wide diversity patterns recently revealed by NGS data. We recovered strong genetic structure between marijuana and hemp and demonstrated that anonymous samples can be confidently assigned to either plant types. Fibres appear genetically homogeneous whereas drugs show low (often clonal) diversity within varieties, but very high genetic differentiation between them, likely resulting from breeding practices. Based on an additional test dataset including samples from 41 local police seizures, we showed that the genetic signature of marijuana cultivars could be used to trace crime scene evidence. To date, our study provides the most comprehensive genetic resource for Cannabis forensics worldwide.
Broad-Scale Genetic Diversity of Cannabis for Forensic Applications
Dufresnes, Christophe; Jan, Catherine; Bienert, Friederike; Goudet, Jérôme; Fumagalli, Luca
2017-01-01
Cannabis (hemp and marijuana) is an iconic yet controversial crop. On the one hand, it represents a growing market for pharmaceutical and agricultural sectors. On the other hand, plants synthesizing the psychoactive THC produce the most widespread illicit drug in the world. Yet, the difficulty to reliably distinguish between Cannabis varieties based on morphological or biochemical criteria impedes the development of promising industrial programs and hinders the fight against narcotrafficking. Genetics offers an appropriate alternative to characterize drug vs. non-drug Cannabis. However, forensic applications require rapid and affordable genotyping of informative and reliable molecular markers for which a broad-scale reference database, representing both intra- and inter-variety variation, is available. Here we provide such a resource for Cannabis, by genotyping 13 microsatellite loci (STRs) in 1 324 samples selected specifically for fibre (24 hemp varieties) and drug (15 marijuana varieties) production. We showed that these loci are sufficient to capture most of the genome-wide diversity patterns recently revealed by NGS data. We recovered strong genetic structure between marijuana and hemp and demonstrated that anonymous samples can be confidently assigned to either plant types. Fibres appear genetically homogeneous whereas drugs show low (often clonal) diversity within varieties, but very high genetic differentiation between them, likely resulting from breeding practices. Based on an additional test dataset including samples from 41 local police seizures, we showed that the genetic signature of marijuana cultivars could be used to trace crime scene evidence. To date, our study provides the most comprehensive genetic resource for Cannabis forensics worldwide. PMID:28107530
Population-genetic nature of copy number variations in the human genome.
Kato, Mamoru; Kawaguchi, Takahisa; Ishikawa, Shumpei; Umeda, Takayoshi; Nakamichi, Reiichiro; Shapero, Michael H; Jones, Keith W; Nakamura, Yusuke; Aburatani, Hiroyuki; Tsunoda, Tatsuhiko
2010-03-01
Copy number variations (CNVs) are universal genetic variations, and their association with disease has been increasingly recognized. We designed high-density microarrays for CNVs, and detected 3000-4000 CNVs (4-6% of the genomic sequence) per population that included CNVs previously missed because of smaller sizes and residing in segmental duplications. The patterns of CNVs across individuals were surprisingly simple at the kilo-base scale, suggesting the applicability of a simple genetic analysis for these genetic loci. We utilized the probabilistic theory to determine integer copy numbers of CNVs and employed a recently developed phasing tool to estimate the population frequencies of integer copy number alleles and CNV-SNP haplotypes. The results showed a tendency toward a lower frequency of CNV alleles and that most of our CNVs were explained only by zero-, one- and two-copy alleles. Using the estimated population frequencies, we found several CNV regions with exceptionally high population differentiation. Investigation of CNV-SNP linkage disequilibrium (LD) for 500-900 bi- and multi-allelic CNVs per population revealed that previous conflicting reports on bi-allelic LD were unexpectedly consistent and explained by an LD increase correlated with deletion-allele frequencies. Typically, the bi-allelic LD was lower than SNP-SNP LD, whereas the multi-allelic LD was somewhat stronger than the bi-allelic LD. After further investigation of tag SNPs for CNVs, we conclude that the customary tagging strategy for disease association studies can be applicable for common deletion CNVs, but direct interrogation is needed for other types of CNVs.
Probabilistic modeling of children's handwriting
NASA Astrophysics Data System (ADS)
Puri, Mukta; Srihari, Sargur N.; Hanson, Lisa
2013-12-01
There is little work done in the analysis of children's handwriting, which can be useful in developing automatic evaluation systems and in quantifying handwriting individuality. We consider the statistical analysis of children's handwriting in early grades. Samples of handwriting of children in Grades 2-4 who were taught the Zaner-Bloser style were considered. The commonly occurring word "and" written in cursive style as well as hand-print were extracted from extended writing. The samples were assigned feature values by human examiners using a truthing tool. The human examiners looked at how the children constructed letter formations in their writing, looking for similarities and differences from the instructions taught in the handwriting copy book. These similarities and differences were measured using a feature space distance measure. Results indicate that the handwriting develops towards more conformity with the class characteristics of the Zaner-Bloser copybook which, with practice, is the expected result. Bayesian networks were learnt from the data to enable answering various probabilistic queries, such as determining students who may continue to produce letter formations as taught during lessons in school and determining the students who will develop a different and/or variation of the those letter formations and the number of different types of letter formations.
Real-time adaptive aircraft scheduling
NASA Technical Reports Server (NTRS)
Kolitz, Stephan E.; Terrab, Mostafa
1990-01-01
One of the most important functions of any air traffic management system is the assignment of ground-holding times to flights, i.e., the determination of whether and by how much the take-off of a particular aircraft headed for a congested part of the air traffic control (ATC) system should be postponed in order to reduce the likelihood and extent of airborne delays. An analysis is presented for the fundamental case in which flights from many destinations must be scheduled for arrival at a single congested airport; the formulation is also useful in scheduling the landing of airborne flights within the extended terminal area. A set of approaches is described for addressing a deterministic and a probabilistic version of this problem. For the deterministic case, where airport capacities are known and fixed, several models were developed with associated low-order polynomial-time algorithms. For general delay cost functions, these algorithms find an optimal solution. Under a particular natural assumption regarding the delay cost function, an extremely fast (O(n ln n)) algorithm was developed. For the probabilistic case, using an estimated probability distribution of airport capacities, a model was developed with an associated low-order polynomial-time heuristic algorithm with useful properties.
USDA-ARS?s Scientific Manuscript database
Dominant and co-dominant molecular markers are routinely used in plant genetic diversity research. In the present study we assessed the success-rate of three marker-systems for estimating genotypic diversity, clustering varieties into populations, and assigning a single variety into the expected pop...
Protein complexes are assemblies of subunits that have co-evolved to execute one or many coordinated functions in the cellular environment. Functional annotation of mammalian protein complexes is critical to understanding biological processes, as well as disease mechanisms. Here, we used genetic co-essentiality derived from genome-scale RNAi- and CRISPR-Cas9-based fitness screens performed across hundreds of human cancer cell lines to assign measures of functional similarity.
Methods Development for Spectral Simplification of Room-Temperature Rotational Spectra
NASA Astrophysics Data System (ADS)
Kent, Erin B.; Shipman, Steven
2014-06-01
Room-temperature rotational spectra are dense and difficult to assign, and so we have been working to develop methods to accelerate this process. We have tested two different methods with our waveguide-based spectrometer, which operates from 8.7 to 26.5 GHz. The first method, based on previous work by Medvedev and De Lucia, was used to estimate lower state energies of transitions by performing relative intensity measurements at a range of temperatures between -20 and +50 °C. The second method employed hundreds of microwave-microwave double resonance measurements to determine level connectivity between rotational transitions. The relative intensity measurements were not particularly successful in this frequency range (the reasons for this will be discussed), but the information gleaned from the double-resonance measurements can be incorporated into other spectral search algorithms (such as autofit or genetic algorithm approaches) via scoring or penalty functions to help with the spectral assignment process. I.R. Medvedev, F.C. De Lucia, Astrophys. J. 656, 621-628 (2007).
Application of a modified selection index for honey bees (Hymenoptera: Apidae).
van Engelsdorp, D; Otis, G W
2000-12-01
Nine different genetic families of honey bees (Apis mellifera L.) were compared using summed z-scores (phenotypic values) and a modified selection index (Imod). Imod values incorporated both the phenotypic scores of the different traits and the economic weightings of these traits, as determined by a survey of commercial Ontario beekeepers. Largely because of the high weight all beekeepers place on honey production, a distinct difference between line rankings based on phenotypic scores and Imod scores was apparent, thereby emphasizing the need to properly weight the traits being evaluated to select bee stocks most valuable for beekeepers. Furthermore, when beekeepers who made >10% of their income from queen and nucleus colony sales assigned relative values to the traits used in the Imod calculations, the results differed from those based on weightings assigned by honey producers. Our results underscore the difficulties the North American beekeeping industry must overcome to devise effective methods of evaluating colonies for breeding purposes.
Probabilistic modeling of discourse-aware sentence processing.
Dubey, Amit; Keller, Frank; Sturt, Patrick
2013-07-01
Probabilistic models of sentence comprehension are increasingly relevant to questions concerning human language processing. However, such models are often limited to syntactic factors. This restriction is unrealistic in light of experimental results suggesting interactions between syntax and other forms of linguistic information in human sentence processing. To address this limitation, this article introduces two sentence processing models that augment a syntactic component with information about discourse co-reference. The novel combination of probabilistic syntactic components with co-reference classifiers permits them to more closely mimic human behavior than existing models. The first model uses a deep model of linguistics, based in part on probabilistic logic, allowing it to make qualitative predictions on experimental data; the second model uses shallow processing to make quantitative predictions on a broad-coverage reading-time corpus. Copyright © 2013 Cognitive Science Society, Inc.
Thematic clustering of text documents using an EM-based approach
2012-01-01
Clustering textual contents is an important step in mining useful information on the web or other text-based resources. The common task in text clustering is to handle text in a multi-dimensional space, and to partition documents into groups, where each group contains documents that are similar to each other. However, this strategy lacks a comprehensive view for humans in general since it cannot explain the main subject of each cluster. Utilizing semantic information can solve this problem, but it needs a well-defined ontology or pre-labeled gold standard set. In this paper, we present a thematic clustering algorithm for text documents. Given text, subject terms are extracted and used for clustering documents in a probabilistic framework. An EM approach is used to ensure documents are assigned to correct subjects, hence it converges to a locally optimal solution. The proposed method is distinctive because its results are sufficiently explanatory for human understanding as well as efficient for clustering performance. The experimental results show that the proposed method provides a competitive performance compared to other state-of-the-art approaches. We also show that the extracted themes from the MEDLINE® dataset represent the subjects of clusters reasonably well. PMID:23046528
Cui, Jian; Liu, Jinghua; Li, Yuhua; Shi, Tieliu
2011-01-01
Mitochondria are major players on the production of energy, and host several key reactions involved in basic metabolism and biosynthesis of essential molecules. Currently, the majority of nucleus-encoded mitochondrial proteins are unknown even for model plant Arabidopsis. We reported a computational framework for predicting Arabidopsis mitochondrial proteins based on a probabilistic model, called Naive Bayesian Network, which integrates disparate genomic data generated from eight bioinformatics tools, multiple orthologous mappings, protein domain properties and co-expression patterns using 1,027 microarray profiles. Through this approach, we predicted 2,311 candidate mitochondrial proteins with 84.67% accuracy and 2.53% FPR performances. Together with those experimental confirmed proteins, 2,585 mitochondria proteins (named CoreMitoP) were identified, we explored those proteins with unknown functions based on protein-protein interaction network (PIN) and annotated novel functions for 26.65% CoreMitoP proteins. Moreover, we found newly predicted mitochondrial proteins embedded in particular subnetworks of the PIN, mainly functioning in response to diverse environmental stresses, like salt, draught, cold, and wound etc. Candidate mitochondrial proteins involved in those physiological acitivites provide useful targets for further investigation. Assigned functions also provide comprehensive information for Arabidopsis mitochondrial proteome. PMID:21297957
Patient Electronic Health Records as a Means to Approach Genetic Research in Gastroenterology
Ananthakrishnan, Ashwin N; Lieberman, David
2015-01-01
Electronic health records (EHR) are being increasingly utilized and form a unique source of extensive data gathered during routine clinical care. Through use of codified and free text concepts identified using clinical informatics tools, disease labels can be assigned with a high degree of accuracy. Analysis linking such EHR-assigned disease labels to a biospecimen repository has demonstrated that genetic associations identified in prospective cohorts can be replicated with adequate statistical power, and novel phenotypic associations identified. In addition, genetic discovery research can be performed utilizing clinical, laboratory, and procedure data obtained during care. Challenges with such research include the need to tackle variability in quality and quantity of EHR data and importance of maintaining patient privacy and data security. With appropriate safeguards, this novel and emerging field of research offers considerable promise and potential to further scientific research in gastroenterology efficiently, cost-effectively, and with engagement of patients and communities. PMID:26073373
The Stag Hunt Game: An Example of an Excel-Based Probabilistic Game
ERIC Educational Resources Information Center
Bridge, Dave
2016-01-01
With so many role-playing simulations already in the political science education literature, the recent repeated calls for new games is both timely and appropriate. This article answers and extends those calls by advocating the creation of probabilistic games using Microsoft Excel. I introduce the example of the Stag Hunt Game--a short, effective,…
ERIC Educational Resources Information Center
Denison, Stephanie; Trikutam, Pallavi; Xu, Fei
2014-01-01
A rich tradition in developmental psychology explores physical reasoning in infancy. However, no research to date has investigated whether infants can reason about physical objects that behave probabilistically, rather than deterministically. Physical events are often quite variable, in that similar-looking objects can be placed in similar…
Denis Valle; Benjamin Baiser; Christopher W. Woodall; Robin Chazdon; Jerome Chave
2014-01-01
We propose a novel multivariate method to analyse biodiversity data based on the Latent Dirichlet Allocation (LDA) model. LDA, a probabilistic model, reduces assemblages to sets of distinct component communities. It produces easily interpretable results, can represent abrupt and gradual changes in composition, accommodates missing data and allows for coherent estimates...
Sweet, Kevin; Sturm, Amy C; Rettig, Amy; McElroy, Joseph; Agnese, Doreen
2015-06-01
A descriptive retrospective study was performed using two separate user cohorts to determine the effectiveness of Family HealthLink as a clinical triage tool. Cohort 1 consisted of 2,502 users who accessed the public website. Cohort 2 consisted of 194 new patients in a Comprehensive Breast Center setting. For patient users, we assessed documentation of family history and genetics referral. For all users seen in a genetics clinic, the Family HealthLink assessment was compared with that performed by genetic counselors and genetic testing outcomes. For general public users, the percentage meeting high-risk criteria were: for cancer only, 22.2%; for coronary heart disease only, 24.3%; and for both diseases, 10.4%. These risk stratification percentages were similar for the patient users. For the patient users, there often was documentation of family history of certain cancer types by oncology professionals, but age of onset and coronary heart disease family history were less complete. Of 142 with high-risk assignments seen in a genetics clinic, 130 (91.5%) of these assignments were corroborated. Forty-two underwent genetic testing and 17 (40.5%) had new molecular diagnoses established. A significant percentage of individuals are at high familial risk and may require more intensive screening and referral. Interactive family history triage tools can aid this process.Genet Med 17 6, 493-500.
Summing up the noise in gene networks
NASA Astrophysics Data System (ADS)
Paulsson, Johan
2004-01-01
Random fluctuations in genetic networks are inevitable as chemical reactions are probabilistic and many genes, RNAs and proteins are present in low numbers per cell. Such `noise' affects all life processes and has recently been measured using green fluorescent protein (GFP). Two studies show that negative feedback suppresses noise, and three others identify the sources of noise in gene expression. Here I critically analyse these studies and present a simple equation that unifies and extends both the mathematical and biological perspectives.
The role of medical libraries in undergraduate education: a case study in genetics*
Tennant, Michele R.; Miyamoto, Michael M.
2002-01-01
Between 1996 and 2001, the Health Science Center Libraries and Department of Zoology at the University of Florida partnered to provide a cohesive and comprehensive learning experience to undergraduate students in PCB3063, “Genetics.” During one semester each year, a librarian worked with up to 120 undergraduates, providing bibliographic and database instruction in the tools that practicing geneticists use (MEDLINE, GenBank, BLAST, etc.). Students learned to evaluate and synthesize the information that they retrieved, coupling it with information provided in classroom lectures, thus resulting in well-researched short papers on an assigned genetics topic. Exit surveys of students indicated that the majority found the library sessions and librarian's instruction to be useful. Responses also indicated that the project facilitated increased understanding of genetics concepts and appreciation for the scientific research process and the relevance of genetics to the real world. The library benefited from this partnership on a variety of fronts, including the development of skilled library users, pretrained future clientele, and increased visibility among campus research laboratories. The course and associated information instruction and assigned projects can be considered models for course-integrated instruction and the role of medical libraries in undergraduate education. PMID:11999176
Urwyler, S K; Glaubitz, J
2016-02-01
Fast microbial identification is becoming increasingly necessary in industry to improve microbial control and reduce biocide consumption. We compared the performances of two systems based on MALDI-TOF MS (VITEK MS and BIOTYPER) and two based on biochemical testing (BIOLOG, VITEK 2 Compact) with genetic methods for the identification of environmental bacteria. At genus level both MALDI-TOF MS-based systems showed the lowest number of false (4%) and approx. 60% correct identifications. In contrast, the biochemical-based systems assigned 25% of the genera incorrectly. The differences were even more apparent at the species level. The BIOTYPER was most conservative, where assigning a species led to the lowest percentage of species identifications (54%) but also to the least wrong assignments (4%). The other three systems showed higher levels of false assignments: 8·7, 40 and 46% respectively. The genus identification performance on four industrial products of the BIOTYPER could be increased up to 94·3% (average 88% of 167 isolates) by evolving the database in a product specific manner. Comparison of the bacterial population in the example of paints, and raw materials used therein, at different production steps demonstrated unequivocally that the contamination of the final paint product originated not from the main raw material. MALDI-TOF-MS has revolutionized speed and precision of microbial identification for clinical isolates outperforming conventional methods. In contrast, few performance studies have been published so far focusing on suitability for particularly industrial applications, geomicrobiology and environmental analytics. This study evaluates the performance of this proteomic phenotyping on such industrial isolates in comparison with biochemical-based phenotyping and genotyping. Further the study exemplifies the power of MALDI-TOF-MS to trace cost-efficiently the dominating cultivable bacterial species throughout an industrial paint production process. Vital information can be retrieved to identify the most crucial contaminating source for the final product. © 2015 The Authors published by John Wiley & Sons Ltd on behalf of Society for Applied Microbiology.
Debbi, Ali; Boureghda, Houda; Monte, Enrique; Hermosa, Rosa
2018-01-01
Fifty fungal isolates were sampled from diseased tomato plants as result of a survey conducted in seven tomato crop areas in Algeria from 2012 to 2015. Morphological criteria and PCR-based identification, using the primers PF02 and PF03, assigned 29 out of 50 isolates to Fusarium oxysporum (Fo). The banding patterns amplified for genes SIX1, SIX3 and SIX4 served to identify races 2 and 3 of Fo f. sp. lycopersici (FOL), and Fo f. sp. radicis lycopersici (FORL) among the Algerian isolates. All FOL isolates showed pathogenicity on the susceptible tomato cv. “Super Marmande,” while nine of out 10 Algerian FORL isolates were pathogenic on tomato cv. “Rio Grande.” Inter simple sequence repeat (ISSR) fingerprints showed high genetic diversity among Algerian Fo isolates. Seventeen Algerian Trichoderma isolates were also obtained and assigned to the species T. asperellum (12 isolates), T. harzianum (four isolates) and T. ghanense (one isolate) based on ITS and tef1α gene sequences. Different in vitro tests identified the antagonistic potential of native Trichoderma isolates against FORL and FOL. Greenhouse biocontrol assays performed on “SM” tomato plants with T. ghanense T8 and T. asperellum T9 and T17, and three Fo isolates showed that isolate T8 performed well against FORL and FOL. This finding was based on an incidence reduction of crown and root rot and Fusarium wilt diseases by 53.1 and 48.3%, respectively. PMID:29515557
Probabilistic Seismic Risk Model for Western Balkans
NASA Astrophysics Data System (ADS)
Stejskal, Vladimir; Lorenzo, Francisco; Pousse, Guillaume; Radovanovic, Slavica; Pekevski, Lazo; Dojcinovski, Dragi; Lokin, Petar; Petronijevic, Mira; Sipka, Vesna
2010-05-01
A probabilistic seismic risk model for insurance and reinsurance purposes is presented for an area of Western Balkans, covering former Yugoslavia and Albania. This territory experienced many severe earthquakes during past centuries producing significant damage to many population centres in the region. The highest hazard is related to external Dinarides, namely to the collision zone of the Adriatic plate. The model is based on a unified catalogue for the region and a seismic source model consisting of more than 30 zones covering all the three main structural units - Southern Alps, Dinarides and the south-western margin of the Pannonian Basin. A probabilistic methodology using Monte Carlo simulation was applied to generate the hazard component of the model. Unique set of damage functions based on both loss experience and engineering assessments is used to convert the modelled ground motion severity into the monetary loss.
Fusar-Poli, P; Schultze-Lutter, F
2016-02-01
Prediction of psychosis in patients at clinical high risk (CHR) has become a mainstream focus of clinical and research interest worldwide. When using CHR instruments for clinical purposes, the predicted outcome is but only a probability; and, consequently, any therapeutic action following the assessment is based on probabilistic prognostic reasoning. Yet, probabilistic reasoning makes considerable demands on the clinicians. We provide here a scholarly practical guide summarising the key concepts to support clinicians with probabilistic prognostic reasoning in the CHR state. We review risk or cumulative incidence of psychosis in, person-time rate of psychosis, Kaplan-Meier estimates of psychosis risk, measures of prognostic accuracy, sensitivity and specificity in receiver operator characteristic curves, positive and negative predictive values, Bayes' theorem, likelihood ratios, potentials and limits of real-life applications of prognostic probabilistic reasoning in the CHR state. Understanding basic measures used for prognostic probabilistic reasoning is a prerequisite for successfully implementing the early detection and prevention of psychosis in clinical practice. Future refinement of these measures for CHR patients may actually influence risk management, especially as regards initiating or withholding treatment. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Gunasekera, T S; Holland, R J; Gillings, M R; Briscoe, D A; Neethling, D C; Williams, K L; Nevalainen, K M
2000-09-01
Efficient selection of fungi for biological control of nematodes requires a series of screening assays. Assessment of genetic diversity in the candidate species maximizes the variety of the isolates tested and permits the assignment of a particular genotype with high nematophagous potential using a rapid novel assay. Molecular analyses also facilitate separation between isolates, allowing the identification of proprietary strains and trace biocontrol strains in the environment. The resistance of propagules to UV radiation is an important factor in the survival of a biocontrol agent. We have analyzed 15 strains of the nematophagous fungus Paecilomyces lilacinus using these principles. Arbitrarily primed DNA and allozyme assays were applied to place the isolates into genetic clusters, and demonstrated that some genetically related P. lilacinus strains exhibit widespread geographic distributions. When exposed to UV radiation, some weakly nematophagous strains were generally more susceptible than effective isolates. A microtitre tray-based assay used to screen the pathogenic activity of each isolate to Meloidogyne javanica egg masses revealed that the nematophagous ability varied between 37%-100%. However, there was no clear relationship between nematophagous ability and genetic clusters. Molecular characterizations revealed sufficient diversity to allow tracking of strains released into the environment.
An overview of STRUCTURE: applications, parameter settings, and supporting software
Porras-Hurtado, Liliana; Ruiz, Yarimar; Santos, Carla; Phillips, Christopher; Carracedo, Ángel; Lareu, Maria V.
2013-01-01
Objectives: We present an up-to-date review of STRUCTURE software: one of the most widely used population analysis tools that allows researchers to assess patterns of genetic structure in a set of samples. STRUCTURE can identify subsets of the whole sample by detecting allele frequency differences within the data and can assign individuals to those sub-populations based on analysis of likelihoods. The review covers STRUCTURE's most commonly used ancestry and frequency models, plus an overview of the main applications of the software in human genetics including case-control association studies (CCAS), population genetics, and forensic analysis. The review is accompanied by supplementary material providing a step-by-step guide to running STRUCTURE. Methods: With reference to a worked example, we explore the effects of changing the principal analysis parameters on STRUCTURE results when analyzing a uniform set of human genetic data. Use of the supporting software: CLUMPP and distruct is detailed and we provide an overview and worked example of STRAT software, applicable to CCAS. Conclusion: The guide offers a simplified view of how STRUCTURE, CLUMPP, distruct, and STRAT can be applied to provide researchers with an informed choice of parameter settings and supporting software when analyzing their own genetic data. PMID:23755071
NASA Astrophysics Data System (ADS)
Taner, M. U.; Ray, P.; Brown, C.
2016-12-01
Hydroclimatic nonstationarity due to climate change poses challenges for long-term water infrastructure planning in river basin systems. While designing strategies that are flexible or adaptive hold intuitive appeal, development of well-performing strategies requires rigorous quantitative analysis that address uncertainties directly while making the best use of scientific information on the expected evolution of future climate. Multi-stage robust optimization (RO) offers a potentially effective and efficient technique for addressing the problem of staged basin-level planning under climate change, however the necessity of assigning probabilities to future climate states or scenarios is an obstacle to implementation, given that methods to reliably assign probabilities to future climate states are not well developed. We present a method that overcomes this challenge by creating a bottom-up RO-based framework that decreases the dependency on probability distributions of future climate and rather employs them after optimization to aid selection amongst competing alternatives. The iterative process yields a vector of `optimal' decision pathways each under the associated set of probabilistic assumptions. In the final phase, the vector of optimal decision pathways is evaluated to identify the solutions that are least sensitive to the scenario probabilities and are most-likely conditional on the climate information. The framework is illustrated for the planning of new dam and hydro-agricultural expansions projects in the Niger River Basin over a 45-year planning period from 2015 to 2060.
Catastrophe loss modelling of storm-surge flood risk in eastern England.
Muir Wood, Robert; Drayton, Michael; Berger, Agnete; Burgess, Paul; Wright, Tom
2005-06-15
Probabilistic catastrophe loss modelling techniques, comprising a large stochastic set of potential storm-surge flood events, each assigned an annual rate of occurrence, have been employed for quantifying risk in the coastal flood plain of eastern England. Based on the tracks of the causative extratropical cyclones, historical storm-surge events are categorized into three classes, with distinct windfields and surge geographies. Extreme combinations of "tide with surge" are then generated for an extreme value distribution developed for each class. Fragility curves are used to determine the probability and magnitude of breaching relative to water levels and wave action for each section of sea defence. Based on the time-history of water levels in the surge, and the simulated configuration of breaching, flow is time-stepped through the defences and propagated into the flood plain using a 50 m horizontal-resolution digital elevation model. Based on the values and locations of the building stock in the flood plain, losses are calculated using vulnerability functions linking flood depth and flood velocity to measures of property loss. The outputs from this model for a UK insurance industry portfolio include "loss exceedence probabilities" as well as "average annualized losses", which can be employed for calculating coastal flood risk premiums in each postcode.
Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.
Mørk, Søren; Holmes, Ian
2012-03-01
Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. Supplementary data are available at Bioinformatics online.
Casjens, S.; Eppler, K.; Sampson, L.; Parr, R.; Wyckoff, E.
1991-01-01
The mechanism by which dsDNA is packaged by viruses is not yet understood in any system. Bacteriophage P22 has been a productive system in which to study the molecular genetics of virus particle assembly and DNA packaging. Only five phage encoded proteins, the products of genes 3, 2, 1, 8 and 5, are required for packaging the virus chromosome inside the coat protein shell. We report here the construction of a detailed genetic and physical map of these genes, the neighboring gene 4 and a portion of gene 10, in which 289 conditional lethal amber, opal, temperature sensitive and cold sensitive mutations are mapped into 44 small (several hundred base pair) intervals of known sequence. Knowledge of missense mutant phenotypes and information on the location of these mutations allows us to begin the assignment of partial protein functions to portions of these genes. The map and mapping strains will be of use in the further genetic dissection of the P22 DNA packaging and prohead assembly processes. PMID:2029965
Molecular Marker Systems for Oenothera Genetics
Rauwolf, Uwe; Golczyk, Hieronim; Meurer, Jörg; Herrmann, Reinhold G.; Greiner, Stephan
2008-01-01
The genus Oenothera has an outstanding scientific tradition. It has been a model for studying aspects of chromosome evolution and speciation, including the impact of plastid nuclear co-evolution. A large collection of strains analyzed during a century of experimental work and unique genetic possibilities allow the exchange of genetically definable plastids, individual or multiple chromosomes, and/or entire haploid genomes (Renner complexes) between species. However, molecular genetic approaches for the genus are largely lacking. In this study, we describe the development of efficient PCR-based marker systems for both the nuclear genome and the plastome. They allow distinguishing individual chromosomes, Renner complexes, plastomes, and subplastomes. We demonstrate their application by monitoring interspecific exchanges of genomes, chromosome pairs, and/or plastids during crossing programs, e.g., to produce plastome–genome incompatible hybrids. Using an appropriate partial permanent translocation heterozygous hybrid, linkage group 7 of the molecular map could be assigned to chromosome 9·8 of the classical Oenothera map. Finally, we provide the first direct molecular evidence that homologous recombination and free segregation of chromosomes in permanent translocation heterozygous strains is suppressed. PMID:18791241
Isolation and genetic diversity of endangered grey nurse shark (Carcharias taurus) populations.
Stow, Adam; Zenger, Kyall; Briscoe, David; Gillings, Michael; Peddemors, Victor; Otway, Nicholas; Harcourt, Robert
2006-06-22
Anthropogenic impacts are believed to be the primary threats to the eastern Australian population of grey nurse sharks (Carcharias taurus), which is listed as critically endangered, and the most threatened population globally. Analyses of 235 polymorphic amplified fragment length polymorphisms (AFLP) loci and 700 base pairs of mitochondrial DNA control region provide the first account of genetic variation and geographical partitioning (east and west coasts of Australia, South Africa) in C. taurus. Assignment tests, analysis of relatedness and Fst values all indicate that the Australian populations are isolated from South Africa, with negligible migration between the east and west Australian coasts. There are significant differences in levels of genetic variation among regions. Australian C. taurus, particularly the eastern population, has significantly less AFLP variation than the other sampling localities. Further, the eastern Australian sharks possess only a single mitochondrial haplotype, also suggesting a small number of founding individuals. Therefore, historical, rather than anthropogenic processes most likely account for their depauperate genetic variation. These findings have implications for the viability of the eastern Australian population of grey nurse sharks.
Molecular marker systems for Oenothera genetics.
Rauwolf, Uwe; Golczyk, Hieronim; Meurer, Jörg; Herrmann, Reinhold G; Greiner, Stephan
2008-11-01
The genus Oenothera has an outstanding scientific tradition. It has been a model for studying aspects of chromosome evolution and speciation, including the impact of plastid nuclear co-evolution. A large collection of strains analyzed during a century of experimental work and unique genetic possibilities allow the exchange of genetically definable plastids, individual or multiple chromosomes, and/or entire haploid genomes (Renner complexes) between species. However, molecular genetic approaches for the genus are largely lacking. In this study, we describe the development of efficient PCR-based marker systems for both the nuclear genome and the plastome. They allow distinguishing individual chromosomes, Renner complexes, plastomes, and subplastomes. We demonstrate their application by monitoring interspecific exchanges of genomes, chromosome pairs, and/or plastids during crossing programs, e.g., to produce plastome-genome incompatible hybrids. Using an appropriate partial permanent translocation heterozygous hybrid, linkage group 7 of the molecular map could be assigned to chromosome 9.8 of the classical Oenothera map. Finally, we provide the first direct molecular evidence that homologous recombination and free segregation of chromosomes in permanent translocation heterozygous strains is suppressed.
Improvement of the Threespine Stickleback Genome Using a Hi-C-Based Proximity-Guided Assembly.
Peichel, Catherine L; Sullivan, Shawn T; Liachko, Ivan; White, Michael A
2017-09-01
Scaffolding genomes into complete chromosome assemblies remains challenging even with the rapidly increasing sequence coverage generated by current next-generation sequence technologies. Even with scaffolding information, many genome assemblies remain incomplete. The genome of the threespine stickleback (Gasterosteus aculeatus), a fish model system in evolutionary genetics and genomics, is not completely assembled despite scaffolding with high-density linkage maps. Here, we first test the ability of a Hi-C based proximity-guided assembly (PGA) to perform a de novo genome assembly from relatively short contigs. Using Hi-C based PGA, we generated complete chromosome assemblies from a distribution of short contigs (20-100 kb). We found that 96.40% of contigs were correctly assigned to linkage groups (LGs), with ordering nearly identical to the previous genome assembly. Using available bacterial artificial chromosome (BAC) end sequences, we provide evidence that some of the few discrepancies between the Hi-C assembly and the existing assembly are due to structural variation between the populations used for the 2 assemblies or errors in the existing assembly. This Hi-C assembly also allowed us to improve the existing assembly, assigning over 60% (13.35 Mb) of the previously unassigned (~21.7 Mb) contigs to LGs. Together, our results highlight the potential of the Hi-C based PGA method to be used in combination with short read data to perform relatively inexpensive de novo genome assemblies. This approach will be particularly useful in organisms in which it is difficult to perform linkage mapping or to obtain high molecular weight DNA required for other scaffolding methods. © The American Genetic Association 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Donnan, Jennifer R; Ungar, Wendy J; Mathews, Maria; Hancock-Howard, Rebecca L; Rahman, Proton
2011-08-01
An increased understanding of the genetic basis of disease creates a demand for personalized medicine and more genetic testing for diagnosis and treatment. The objective was to assess the incremental cost-effectiveness per life-month gained of thiopurine methyltransferase (TPMT) genotyping to guide doses of 6-mercaptopurine (6-MP) in children with acute lymphoblastic leukemia (ALL) compared to enzymatic testing and standard weight-based dosing. A cost-effectiveness analysis was conducted from a health care system perspective comparing costs and consequences over 3 months. Decision analysis was used to evaluate the impact of TPMT tests on preventing myelosuppression and improving survival in ALL patients receiving 6-MP. Direct medical costs included laboratory tests, medications, physician services, pharmacy and inpatient care. Probabilities were derived from published evidence. Survival was measured in life-months. The robustness of the results to variable uncertainty was tested in one-way sensitivity analyses. Probabilistic sensitivity analysis examined the impact of parameter uncertainty and generated confidence intervals around point estimates. Neither of the testing interventions showed a benefit in survival compared to weight-based dosing. Both test strategies were more costly compared to weight-based dosing. Incremental costs per child (95% confidence interval) were $277 ($112, $442) and $298 ($392, $421) for the genotyping and phenotyping strategies, respectively, compared to weight-based dosing. The present analysis suggests that screening for TPMT mutations using either genotype or enzymatic laboratory tests prior to the administration of 6-MP in pediatric ALL patients is not cost-effective. Copyright © 2011 Wiley-Liss, Inc.
Size Evolution and Stochastic Models: Explaining Ostracod Size through Probabilistic Distributions
NASA Astrophysics Data System (ADS)
Krawczyk, M.; Decker, S.; Heim, N. A.; Payne, J.
2014-12-01
The biovolume of animals has functioned as an important benchmark for measuring evolution throughout geologic time. In our project, we examined the observed average body size of ostracods over time in order to understand the mechanism of size evolution in these marine organisms. The body size of ostracods has varied since the beginning of the Ordovician, where the first true ostracods appeared. We created a stochastic branching model to create possible evolutionary trees of ostracod size. Using stratigraphic ranges for ostracods compiled from over 750 genera in the Treatise on Invertebrate Paleontology, we calculated overall speciation and extinction rates for our model. At each timestep in our model, new lineages can evolve or existing lineages can become extinct. Newly evolved lineages are assigned sizes based on their parent genera. We parameterized our model to generate neutral and directional changes in ostracod size to compare with the observed data. New sizes were chosen via a normal distribution, and the neutral model selected new sizes differentials centered on zero, allowing for an equal chance of larger or smaller ostracods at each speciation. Conversely, the directional model centered the distribution on a negative value, giving a larger chance of smaller ostracods. Our data strongly suggests that the overall direction of ostracod evolution has been following a model that directionally pushes mean ostracod size down, shying away from a neutral model. Our model was able to match the magnitude of size decrease. Our models had a constant linear decrease while the actual data had a much more rapid initial rate followed by a constant size. The nuance of the observed trends ultimately suggests a more complex method of size evolution. In conclusion, probabilistic methods can provide valuable insight into possible evolutionary mechanisms determining size evolution in ostracods.
A probabilistic framework for identifying biosignatures using Pathway Complexity
NASA Astrophysics Data System (ADS)
Marshall, Stuart M.; Murray, Alastair R. G.; Cronin, Leroy
2017-11-01
One thing that discriminates living things from inanimate matter is their ability to generate similarly complex or non-random structures in a large abundance. From DNA sequences to folded protein structures, living cells, microbial communities and multicellular structures, the material configurations in biology can easily be distinguished from non-living material assemblies. Many complex artefacts, from ordinary bioproducts to human tools, though they are not living things, are ultimately produced by biological processes-whether those processes occur at the scale of cells or societies, they are the consequences of living systems. While these objects are not living, they cannot randomly form, as they are the product of a biological organism and hence are either technological or cultural biosignatures. A generalized approach that aims to evaluate complex objects as possible biosignatures could be useful to explore the cosmos for new life forms. However, it is not obvious how it might be possible to create such a self-contained approach. This would require us to prove rigorously that a given artefact is too complex to have formed by chance. In this paper, we present a new type of complexity measure, which we call `Pathway Complexity', that allows us not only to threshold the abiotic-biotic divide, but also to demonstrate a probabilistic approach based on object abundance and complexity which can be used to unambiguously assign complex objects as biosignatures. We hope that this approach will not only open up the search for biosignatures beyond the Earth, but also allow us to explore the Earth for new types of biology, and to determine when a complex chemical system discovered in the laboratory could be considered alive. This article is part of the themed issue 'Reconceptualizing the origins of life'.
NASA Astrophysics Data System (ADS)
Jackson, Andrew
2015-07-01
On launch, one of Swarm's absolute scalar magnetometers (ASMs) failed to function, leaving an asymmetrical arrangement of redundant spares on different spacecrafts. A decision was required concerning the deployment of individual satellites into the low-orbit pair or the higher "lonely" orbit. I analyse the probabilities for successful operation of two of the science components of the Swarm mission in terms of a classical probabilistic failure analysis, with a view to concluding a favourable assignment for the satellite with the single working ASM. I concentrate on the following two science aspects: the east-west gradiometer aspect of the lower pair of satellites and the constellation aspect, which requires a working ASM in each of the two orbital planes. I use the so-called "expert solicitation" probabilities for instrument failure solicited from Mission Advisory Group (MAG) members. My conclusion from the analysis is that it is better to have redundancy of ASMs in the lonely satellite orbit. Although the opposite scenario, having redundancy (and thus four ASMs) in the lower orbit, increases the chance of a working gradiometer late in the mission; it does so at the expense of a likely constellation. Although the results are presented based on actual MAG members' probabilities, the results are rather generic, excepting the case when the probability of individual ASM failure is very small; in this case, any arrangement will ensure a successful mission since there is essentially no failure expected at all. Since the very design of the lower pair is to enable common mode rejection of external signals, it is likely that its work can be successfully achieved during the first 5 years of the mission.
Denovan, Andrew; Dagnall, Neil; Drinkwater, Kenneth; Parker, Andrew; Clough, Peter
2017-01-01
The present study assessed the degree to which probabilistic reasoning performance and thinking style influenced perception of risk and self-reported levels of terrorism-related behavior change. A sample of 263 respondents, recruited via convenience sampling, completed a series of measures comprising probabilistic reasoning tasks (perception of randomness, base rate, probability, and conjunction fallacy), the Reality Testing subscale of the Inventory of Personality Organization (IPO-RT), the Domain-Specific Risk-Taking Scale, and a terrorism-related behavior change scale. Structural equation modeling examined three progressive models. Firstly, the Independence Model assumed that probabilistic reasoning, perception of risk and reality testing independently predicted terrorism-related behavior change. Secondly, the Mediation Model supposed that probabilistic reasoning and reality testing correlated, and indirectly predicted terrorism-related behavior change through perception of risk. Lastly, the Dual-Influence Model proposed that probabilistic reasoning indirectly predicted terrorism-related behavior change via perception of risk, independent of reality testing. Results indicated that performance on probabilistic reasoning tasks most strongly predicted perception of risk, and preference for an intuitive thinking style (measured by the IPO-RT) best explained terrorism-related behavior change. The combination of perception of risk with probabilistic reasoning ability in the Dual-Influence Model enhanced the predictive power of the analytical-rational route, with conjunction fallacy having a significant indirect effect on terrorism-related behavior change via perception of risk. The Dual-Influence Model possessed superior fit and reported similar predictive relations between intuitive-experiential and analytical-rational routes and terrorism-related behavior change. The discussion critically examines these findings in relation to dual-processing frameworks. This includes considering the limitations of current operationalisations and recommendations for future research that align outcomes and subsequent work more closely to specific dual-process models.
Denovan, Andrew; Dagnall, Neil; Drinkwater, Kenneth; Parker, Andrew; Clough, Peter
2017-01-01
The present study assessed the degree to which probabilistic reasoning performance and thinking style influenced perception of risk and self-reported levels of terrorism-related behavior change. A sample of 263 respondents, recruited via convenience sampling, completed a series of measures comprising probabilistic reasoning tasks (perception of randomness, base rate, probability, and conjunction fallacy), the Reality Testing subscale of the Inventory of Personality Organization (IPO-RT), the Domain-Specific Risk-Taking Scale, and a terrorism-related behavior change scale. Structural equation modeling examined three progressive models. Firstly, the Independence Model assumed that probabilistic reasoning, perception of risk and reality testing independently predicted terrorism-related behavior change. Secondly, the Mediation Model supposed that probabilistic reasoning and reality testing correlated, and indirectly predicted terrorism-related behavior change through perception of risk. Lastly, the Dual-Influence Model proposed that probabilistic reasoning indirectly predicted terrorism-related behavior change via perception of risk, independent of reality testing. Results indicated that performance on probabilistic reasoning tasks most strongly predicted perception of risk, and preference for an intuitive thinking style (measured by the IPO-RT) best explained terrorism-related behavior change. The combination of perception of risk with probabilistic reasoning ability in the Dual-Influence Model enhanced the predictive power of the analytical-rational route, with conjunction fallacy having a significant indirect effect on terrorism-related behavior change via perception of risk. The Dual-Influence Model possessed superior fit and reported similar predictive relations between intuitive-experiential and analytical-rational routes and terrorism-related behavior change. The discussion critically examines these findings in relation to dual-processing frameworks. This includes considering the limitations of current operationalisations and recommendations for future research that align outcomes and subsequent work more closely to specific dual-process models. PMID:29062288
A continuum model of transcriptional bursting
Corrigan, Adam M; Tunnacliffe, Edward; Cannon, Danielle; Chubb, Jonathan R
2016-01-01
Transcription occurs in stochastic bursts. Early models based upon RNA hybridisation studies suggest bursting dynamics arise from alternating inactive and permissive states. Here we investigate bursting mechanism in live cells by quantitative imaging of actin gene transcription, combined with molecular genetics, stochastic simulation and probabilistic modelling. In contrast to early models, our data indicate a continuum of transcriptional states, with a slowly fluctuating initiation rate converting the gene between different levels of activity, interspersed with extended periods of inactivity. We place an upper limit of 40 s on the lifetime of fluctuations in elongation rate, with initiation rate variations persisting an order of magnitude longer. TATA mutations reduce the accessibility of high activity states, leaving the lifetime of on- and off-states unchanged. A continuum or spectrum of gene states potentially enables a wide dynamic range for cell responses to stimuli. DOI: http://dx.doi.org/10.7554/eLife.13051.001 PMID:26896676
Modified Mahalanobis Taguchi System for Imbalance Data Classification
2017-01-01
The Mahalanobis Taguchi System (MTS) is considered one of the most promising binary classification algorithms to handle imbalance data. Unfortunately, MTS lacks a method for determining an efficient threshold for the binary classification. In this paper, a nonlinear optimization model is formulated based on minimizing the distance between MTS Receiver Operating Characteristics (ROC) curve and the theoretical optimal point named Modified Mahalanobis Taguchi System (MMTS). To validate the MMTS classification efficacy, it has been benchmarked with Support Vector Machines (SVMs), Naive Bayes (NB), Probabilistic Mahalanobis Taguchi Systems (PTM), Synthetic Minority Oversampling Technique (SMOTE), Adaptive Conformal Transformation (ACT), Kernel Boundary Alignment (KBA), Hidden Naive Bayes (HNB), and other improved Naive Bayes algorithms. MMTS outperforms the benchmarked algorithms especially when the imbalance ratio is greater than 400. A real life case study on manufacturing sector is used to demonstrate the applicability of the proposed model and to compare its performance with Mahalanobis Genetic Algorithm (MGA). PMID:28811820