Are EMS call volume predictions based on demand pattern analysis accurate?
Brown, Lawrence H; Lerner, E Brooke; Larmon, Baxter; LeGassick, Todd; Taigman, Michael
2007-01-01
Most EMS systems determine the number of crews they will deploy in their communities and when those crews will be scheduled based on anticipated call volumes. Many systems use historical data to calculate their anticipated call volumes, a method of prediction known as demand pattern analysis. To evaluate the accuracy of call volume predictions calculated using demand pattern analysis. Seven EMS systems provided 73 consecutive weeks of hourly call volume data. The first 20 weeks of data were used to calculate three common demand pattern analysis constructs for call volume prediction: average peak demand (AP), smoothed average peak demand (SAP), and 90th percentile rank (90%R). The 21st week served as a buffer. Actual call volumes in the last 52 weeks were then compared to the predicted call volumes by using descriptive statistics. There were 61,152 hourly observations in the test period. All three constructs accurately predicted peaks and troughs in call volume but not exact call volume. Predictions were accurate (+/-1 call) 13% of the time using AP, 10% using SAP, and 19% using 90%R. Call volumes were overestimated 83% of the time using AP, 86% using SAP, and 74% using 90%R. When call volumes were overestimated, predictions exceeded actual call volume by a median (Interquartile range) of 4 (2-6) calls for AP, 4 (2-6) for SAP, and 3 (2-5) for 90%R. Call volumes were underestimated 4% of time using AP, 4% using SAP, and 7% using 90%R predictions. When call volumes were underestimated, call volumes exceeded predictions by a median (Interquartile range; maximum under estimation) of 1 (1-2; 18) call for AP, 1 (1-2; 18) for SAP, and 2 (1-3; 20) for 90%R. Results did not vary between systems. Generally, demand pattern analysis estimated or overestimated call volume, making it a reasonable predictor for ambulance staffing patterns. However, it did underestimate call volume between 4% and 7% of the time. Communities need to determine if these rates of over-and underestimation are acceptable given their resources and local priorities.
A new method for enhancer prediction based on deep belief network.
Bu, Hongda; Gan, Yanglan; Wang, Yang; Zhou, Shuigeng; Guan, Jihong
2017-10-16
Studies have shown that enhancers are significant regulatory elements to play crucial roles in gene expression regulation. Since enhancers are unrelated to the orientation and distance to their target genes, it is a challenging mission for scholars and researchers to accurately predicting distal enhancers. In the past years, with the high-throughout ChiP-seq technologies development, several computational techniques emerge to predict enhancers using epigenetic or genomic features. Nevertheless, the inconsistency of computational models across different cell-lines and the unsatisfactory prediction performance call for further research in this area. Here, we propose a new Deep Belief Network (DBN) based computational method for enhancer prediction, which is called EnhancerDBN. This method combines diverse features, composed of DNA sequence compositional features, DNA methylation and histone modifications. Our computational results indicate that 1) EnhancerDBN outperforms 13 existing methods in prediction, and 2) GC content and DNA methylation can serve as relevant features for enhancer prediction. Deep learning is effective in boosting the performance of enhancer prediction.
Flight-Test Evaluation of Flutter-Prediction Methods
NASA Technical Reports Server (NTRS)
Lind, RIck; Brenner, Marty
2003-01-01
The flight-test community routinely spends considerable time and money to determine a range of flight conditions, called a flight envelope, within which an aircraft is safe to fly. The cost of determining a flight envelope could be greatly reduced if there were a method of safely and accurately predicting the speed associated with the onset of an instability called flutter. Several methods have been developed with the goal of predicting flutter speeds to improve the efficiency of flight testing. These methods include (1) data-based methods, in which one relies entirely on information obtained from the flight tests and (2) model-based approaches, in which one relies on a combination of flight data and theoretical models. The data-driven methods include one based on extrapolation of damping trends, one that involves an envelope function, one that involves the Zimmerman-Weissenburger flutter margin, and one that involves a discrete-time auto-regressive model. An example of a model-based approach is that of the flutterometer. These methods have all been shown to be theoretically valid and have been demonstrated on simple test cases; however, until now, they have not been thoroughly evaluated in flight tests. An experimental apparatus called the Aerostructures Test Wing (ATW) was developed to test these prediction methods.
Improved Method for Prediction of Attainable Wing Leading-Edge Thrust
NASA Technical Reports Server (NTRS)
Carlson, Harry W.; McElroy, Marcus O.; Lessard, Wendy B.; McCullers, L. Arnold
1996-01-01
Prediction of the loss of wing leading-edge thrust and the accompanying increase in drag due to lift, when flow is not completely attached, presents a difficult but commonly encountered problem. A method (called the previous method) for the prediction of attainable leading-edge thrust and the resultant effect on airplane aerodynamic performance has been in use for more than a decade. Recently, the method has been revised to enhance its applicability to current airplane design and evaluation problems. The improved method (called the present method) provides for a greater range of airfoil shapes from very sharp to very blunt leading edges. It is also based on a wider range of Reynolds numbers than was available for the previous method. The present method, when employed in computer codes for aerodynamic analysis, generally results in improved correlation with experimental wing-body axial-force data and provides reasonable estimates of the measured drag.
A Machine Learning Method for Power Prediction on the Mobile Devices.
Chen, Da-Ren; Chen, You-Shyang; Chen, Lin-Chih; Hsu, Ming-Yang; Chiang, Kai-Feng
2015-10-01
Energy profiling and estimation have been popular areas of research in multicore mobile architectures. While short sequences of system calls have been recognized by machine learning as pattern descriptions for anomalous detection, power consumption of running processes with respect to system-call patterns are not well studied. In this paper, we propose a fuzzy neural network (FNN) for training and analyzing process execution behaviour with respect to series of system calls, parameters and their power consumptions. On the basis of the patterns of a series of system calls, we develop a power estimation daemon (PED) to analyze and predict the energy consumption of the running process. In the initial stage, PED categorizes sequences of system calls as functional groups and predicts their energy consumptions by FNN. In the operational stage, PED is applied to identify the predefined sequences of system calls invoked by running processes and estimates their energy consumption.
Humble, Emily; Thorne, Michael A S; Forcada, Jaume; Hoffman, Joseph I
2016-08-26
Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of 'putative' SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be explored in future studies of non-model organisms.
Model-Based and Model-Free Pavlovian Reward Learning: Revaluation, Revision and Revelation
Dayan, Peter; Berridge, Kent C.
2014-01-01
Evidence supports at least two methods for learning about reward and punishment and making predictions for guiding actions. One method, called model-free, progressively acquires cached estimates of the long-run values of circumstances and actions from retrospective experience. The other method, called model-based, uses representations of the environment, expectations and prospective calculations to make cognitive predictions of future value. Extensive attention has been paid to both methods in computational analyses of instrumental learning. By contrast, although a full computational analysis has been lacking, Pavlovian learning and prediction has typically been presumed to be solely model-free. Here, we revise that presumption and review compelling evidence from Pavlovian revaluation experiments showing that Pavlovian predictions can involve their own form of model-based evaluation. In model-based Pavlovian evaluation, prevailing states of the body and brain influence value computations, and thereby produce powerful incentive motivations that can sometimes be quite new. We consider the consequences of this revised Pavlovian view for the computational landscape of prediction, response and choice. We also revisit differences between Pavlovian and instrumental learning in the control of incentive motivation. PMID:24647659
Model-based and model-free Pavlovian reward learning: revaluation, revision, and revelation.
Dayan, Peter; Berridge, Kent C
2014-06-01
Evidence supports at least two methods for learning about reward and punishment and making predictions for guiding actions. One method, called model-free, progressively acquires cached estimates of the long-run values of circumstances and actions from retrospective experience. The other method, called model-based, uses representations of the environment, expectations, and prospective calculations to make cognitive predictions of future value. Extensive attention has been paid to both methods in computational analyses of instrumental learning. By contrast, although a full computational analysis has been lacking, Pavlovian learning and prediction has typically been presumed to be solely model-free. Here, we revise that presumption and review compelling evidence from Pavlovian revaluation experiments showing that Pavlovian predictions can involve their own form of model-based evaluation. In model-based Pavlovian evaluation, prevailing states of the body and brain influence value computations, and thereby produce powerful incentive motivations that can sometimes be quite new. We consider the consequences of this revised Pavlovian view for the computational landscape of prediction, response, and choice. We also revisit differences between Pavlovian and instrumental learning in the control of incentive motivation.
Analysis of a virtual memory model for maintaining database views
NASA Technical Reports Server (NTRS)
Kinsley, Kathryn C.; Hughes, Charles E.
1992-01-01
This paper presents an analytical model for predicting the performance of a new support strategy for database views. This strategy, called the virtual method, is compared with traditional methods for supporting views. The analytical model's predictions of improved performance by the virtual method are then validated by comparing these results with those achieved in an experimental implementation.
NASA Technical Reports Server (NTRS)
Gramoll, K. C.; Dillard, D. A.; Brinson, H. F.
1989-01-01
In response to the tremendous growth in the development of advanced materials, such as fiber-reinforced plastic (FRP) composite materials, a new numerical method is developed to analyze and predict the time-dependent properties of these materials. Basic concepts in viscoelasticity, laminated composites, and previous viscoelastic numerical methods are presented. A stable numerical method, called the nonlinear differential equation method (NDEM), is developed to calculate the in-plane stresses and strains over any time period for a general laminate constructed from nonlinear viscoelastic orthotropic plies. The method is implemented in an in-plane stress analysis computer program, called VCAP, to demonstrate its usefulness and to verify its accuracy. A number of actual experimental test results performed on Kevlar/epoxy composite laminates are compared to predictions calculated from the numerical method.
Predicting chaos in memristive oscillator via harmonic balance method.
Wang, Xin; Li, Chuandong; Huang, Tingwen; Duan, Shukai
2012-12-01
This paper studies the possible chaotic behaviors in a memristive oscillator with cubic nonlinearities via harmonic balance method which is also called the method of describing function. This method was proposed to detect chaos in classical Chua's circuit. We first transform the considered memristive oscillator system into Lur'e model and present the prediction of the existence of chaotic behaviors. To ensure the prediction result is correct, the distortion index is also measured. Numerical simulations are presented to show the effectiveness of theoretical results.
An evidential link prediction method and link predictability based on Shannon entropy
NASA Astrophysics Data System (ADS)
Yin, Likang; Zheng, Haoyang; Bian, Tian; Deng, Yong
2017-09-01
Predicting missing links is of both theoretical value and practical interest in network science. In this paper, we empirically investigate a new link prediction method base on similarity and compare nine well-known local similarity measures on nine real networks. Most of the previous studies focus on the accuracy, however, it is crucial to consider the link predictability as an initial property of networks itself. Hence, this paper has proposed a new link prediction approach called evidential measure (EM) based on Dempster-Shafer theory. Moreover, this paper proposed a new method to measure link predictability via local information and Shannon entropy.
Li, John; Maclehose, Rich; Smith, Kirk; Kaehler, Dawn; Hedberg, Craig
2011-01-01
Foodborne illness surveillance based on consumer complaints detects outbreaks by finding common exposures among callers, but this process is often difficult. Laboratory testing of ill callers could also help identify potential outbreaks. However, collection of stool samples from all callers is not feasible. Methods to help screen calls for etiology are needed to increase the efficiency of complaint surveillance systems and increase the likelihood of detecting foodborne outbreaks caused by Salmonella. Data from the Minnesota Department of Health foodborne illness surveillance database (2000 to 2008) were analyzed. Complaints with identified etiologies were examined to create a predictive model for Salmonella. Bootstrap methods were used to internally validate the model. Seventy-one percent of complaints in the foodborne illness database with known etiologies were due to norovirus. The predictive model had a good discriminatory ability to identify Salmonella calls. Three cutoffs for the predictive model were tested: one that maximized sensitivity, one that maximized specificity, and one that maximized predictive ability, providing sensitivities and specificities of 32 and 96%, 100 and 54%, and 89 and 72%, respectively. Development of a predictive model for Salmonella could help screen calls for etiology. The cutoff that provided the best predictive ability for Salmonella corresponded to a caller reporting diarrhea and fever with no vomiting, and five or fewer people ill. Screening calls for etiology would help identify complaints for further follow-up and result in identifying Salmonella cases that would otherwise go unconfirmed; in turn, this could lead to the identification of more outbreaks.
The Development of MST Test Information for the Prediction of Test Performances
ERIC Educational Resources Information Center
Park, Ryoungsun; Kim, Jiseon; Chung, Hyewon; Dodd, Barbara G.
2017-01-01
The current study proposes novel methods to predict multistage testing (MST) performance without conducting simulations. This method, called MST test information, is based on analytic derivation of standard errors of ability estimates across theta levels. We compared standard errors derived analytically to the simulation results to demonstrate the…
Cetacean population density estimation from single fixed sensors using passive acoustics.
Küsel, Elizabeth T; Mellinger, David K; Thomas, Len; Marques, Tiago A; Moretti, David; Ward, Jessica
2011-06-01
Passive acoustic methods are increasingly being used to estimate animal population density. Most density estimation methods are based on estimates of the probability of detecting calls as functions of distance. Typically these are obtained using receivers capable of localizing calls or from studies of tagged animals. However, both approaches are expensive to implement. The approach described here uses a MonteCarlo model to estimate the probability of detecting calls from single sensors. The passive sonar equation is used to predict signal-to-noise ratios (SNRs) of received clicks, which are then combined with a detector characterization that predicts probability of detection as a function of SNR. Input distributions for source level, beam pattern, and whale depth are obtained from the literature. Acoustic propagation modeling is used to estimate transmission loss. Other inputs for density estimation are call rate, obtained from the literature, and false positive rate, obtained from manual analysis of a data sample. The method is applied to estimate density of Blainville's beaked whales over a 6-day period around a single hydrophone located in the Tongue of the Ocean, Bahamas. Results are consistent with those from previous analyses, which use additional tag data. © 2011 Acoustical Society of America
Wang, Yan; Ma, Guangkai; An, Le; Shi, Feng; Zhang, Pei; Lalush, David S.; Wu, Xi; Pu, Yifei; Zhou, Jiliu; Shen, Dinggang
2017-01-01
Objective To obtain high-quality positron emission tomography (PET) image with low-dose tracer injection, this study attempts to predict the standard-dose PET (S-PET) image from both its low-dose PET (L-PET) counterpart and corresponding magnetic resonance imaging (MRI). Methods It was achieved by patch-based sparse representation (SR), using the training samples with a complete set of MRI, L-PET and S-PET modalities for dictionary construction. However, the number of training samples with complete modalities is often limited. In practice, many samples generally have incomplete modalities (i.e., with one or two missing modalities) that thus cannot be used in the prediction process. In light of this, we develop a semi-supervised tripled dictionary learning (SSTDL) method for S-PET image prediction, which can utilize not only the samples with complete modalities (called complete samples) but also the samples with incomplete modalities (called incomplete samples), to take advantage of the large number of available training samples and thus further improve the prediction performance. Results Validation was done on a real human brain dataset consisting of 18 subjects, and the results show that our method is superior to the SR and other baseline methods. Conclusion This work proposed a new S-PET prediction method, which can significantly improve the PET image quality with low-dose injection. Significance The proposed method is favorable in clinical application since it can decrease the potential radiation risk for patients. PMID:27187939
ERIC Educational Resources Information Center
Akerson, Valarie L.; Carter, Ingrid S.; Park Rogers, Meredith A.; Pongsanon, Khemmawadee
2018-01-01
In this mixed methods study, the researchers developed a video-based measure called a "Prediction Assessment" to determine preservice elementary teachers' abilities to predict students' scientific reasoning. The instrument is based on teachers' need to develop pedagogical content knowledge for teaching science. Developing a knowledge…
Agnihotri, Samira; Sundeep, P. V. D. S.; Seelamantula, Chandra Sekhar; Balakrishnan, Rohini
2014-01-01
Objective identification and description of mimicked calls is a primary component of any study on avian vocal mimicry but few studies have adopted a quantitative approach. We used spectral feature representations commonly used in human speech analysis in combination with various distance metrics to distinguish between mimicked and non-mimicked calls of the greater racket-tailed drongo, Dicrurus paradiseus and cross-validated the results with human assessment of spectral similarity. We found that the automated method and human subjects performed similarly in terms of the overall number of correct matches of mimicked calls to putative model calls. However, the two methods also misclassified different subsets of calls and we achieved a maximum accuracy of ninety five per cent only when we combined the results of both the methods. This study is the first to use Mel-frequency Cepstral Coefficients and Relative Spectral Amplitude - filtered Linear Predictive Coding coefficients to quantify vocal mimicry. Our findings also suggest that in spite of several advances in automated methods of song analysis, corresponding cross-validation by humans remains essential. PMID:24603717
Brasil, Christiane Regina Soares; Delbem, Alexandre Claudio Botazzo; da Silva, Fernando Luís Barroso
2013-07-30
This article focuses on the development of an approach for ab initio protein structure prediction (PSP) without using any earlier knowledge from similar protein structures, as fragment-based statistics or inference of secondary structures. Such an approach is called purely ab initio prediction. The article shows that well-designed multiobjective evolutionary algorithms can predict relevant protein structures in a purely ab initio way. One challenge for purely ab initio PSP is the prediction of structures with β-sheets. To work with such proteins, this research has also developed procedures to efficiently estimate hydrogen bond and solvation contribution energies. Considering van der Waals, electrostatic, hydrogen bond, and solvation contribution energies, the PSP is a problem with four energetic terms to be minimized. Each interaction energy term can be considered an objective of an optimization method. Combinatorial problems with four objectives have been considered too complex for the available multiobjective optimization (MOO) methods. The proposed approach, called "Multiobjective evolutionary algorithms with many tables" (MEAMT), can efficiently deal with four objectives through the combination thereof, performing a more adequate sampling of the objective space. Therefore, this method can better map the promising regions in this space, predicting structures in a purely ab initio way. In other words, MEAMT is an efficient optimization method for MOO, which explores simultaneously the search space as well as the objective space. MEAMT can predict structures with one or two domains with RMSDs comparable to values obtained by recently developed ab initio methods (GAPFCG , I-PAES, and Quark) that use different levels of earlier knowledge. Copyright © 2013 Wiley Periodicals, Inc.
Analysis Methods and Models for Small Unit Operations
2006-07-01
wordt in andere studies ogebruikt orn a-an te geven welke op welke wijze operationele effectiviteit kan worden gekwalificeerd en gekwanuificeerd...the node ’Prediction’ is called a child of the node ’Success’ and the node ’Success’ is called a parent of the node ’Prediction’. Figure C.2 A simple...event A is a child of event B and event B is a child of event C ( C -- B -- A). The belief network or influence diagram has to be a directed network
Allele-specific copy-number discovery from whole-genome and whole-exome sequencing
Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J.; Szatkiewicz, Jin P.
2015-01-01
Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. PMID:25883151
Aircraft Noise Prediction Program theoretical manual: Propeller aerodynamics and noise
NASA Technical Reports Server (NTRS)
Zorumski, W. E. (Editor); Weir, D. S. (Editor)
1986-01-01
The prediction sequence used in the aircraft noise prediction program (ANOPP) is described. The elements of the sequence are called program modules. The first group of modules analyzes the propeller geometry, the aerodynamics, including both potential and boundary-layer flow, the propeller performance, and the surface loading distribution. This group of modules is based entirely on aerodynamic strip theory. The next group of modules deals with the first group. Predictions of periodic thickness and loading noise are determined with time-domain methods. Broadband noise is predicted by a semiempirical method. Near-field predictions of fuselage surface pressrues include the effects of boundary layer refraction and scattering. Far-field predictions include atmospheric and ground effects.
Small Area Variance Estimation for the Siuslaw NF in Oregon and Some Results
S. Lin; D. Boes; H.T. Schreuder
2006-01-01
The results of a small area prediction study for the Siuslaw National Forest in Oregon are presented. Predictions were made for total basal area, number of trees and mortality per ha on a 0.85 mile grid using data on a 1.7 mile grid and additional ancillary information from TM. A reliable method of estimating prediction errors for individual plot predictions called the...
Poly-Omic Prediction of Complex Traits: OmicKriging
Wheeler, Heather E.; Aquino-Michaels, Keston; Gamazon, Eric R.; Trubetskoy, Vassily V.; Dolan, M. Eileen; Huang, R. Stephanie; Cox, Nancy J.; Im, Hae Kyung
2014-01-01
High-confidence prediction of complex traits such as disease risk or drug response is an ultimate goal of personalized medicine. Although genome-wide association studies have discovered thousands of well-replicated polymorphisms associated with a broad spectrum of complex traits, the combined predictive power of these associations for any given trait is generally too low to be of clinical relevance. We propose a novel systems approach to complex trait prediction, which leverages and integrates similarity in genetic, transcriptomic, or other omics-level data. We translate the omic similarity into phenotypic similarity using a method called Kriging, commonly used in geostatistics and machine learning. Our method called OmicKriging emphasizes the use of a wide variety of systems-level data, such as those increasingly made available by comprehensive surveys of the genome, transcriptome, and epigenome, for complex trait prediction. Furthermore, our OmicKriging framework allows easy integration of prior information on the function of subsets of omics-level data from heterogeneous sources without the sometimes heavy computational burden of Bayesian approaches. Using seven disease datasets from the Wellcome Trust Case Control Consortium (WTCCC), we show that OmicKriging allows simple integration of sparse and highly polygenic components yielding comparable performance at a fraction of the computing time of a recently published Bayesian sparse linear mixed model method. Using a cellular growth phenotype, we show that integrating mRNA and microRNA expression data substantially increases performance over either dataset alone. Using clinical statin response, we show improved prediction over existing methods. PMID:24799323
A simple method to predict regional fish abundance: an example in the McKenzie River Basin, Oregon
D.J. McGarvey; J.M. Johnston
2011-01-01
Regional assessments of fisheries resources are increasingly called for, but tools with which to perform them are limited. We present a simple method that can be used to estimate regional carrying capacity and apply it to the McKenzie River Basin, Oregon. First, we use a macroecological model to predict trout densities within small, medium, and large streams in the...
Recommendation Techniques for Drug-Target Interaction Prediction and Drug Repositioning.
Alaimo, Salvatore; Giugno, Rosalba; Pulvirenti, Alfredo
2016-01-01
The usage of computational methods in drug discovery is a common practice. More recently, by exploiting the wealth of biological knowledge bases, a novel approach called drug repositioning has raised. Several computational methods are available, and these try to make a high-level integration of all the knowledge in order to discover unknown mechanisms. In this chapter, we review drug-target interaction prediction methods based on a recommendation system. We also give some extensions which go beyond the bipartite network case.
Polyadenylation site prediction using PolyA-iEP method.
Kavakiotis, Ioannis; Tzanis, George; Vlahavas, Ioannis
2014-01-01
This chapter presents a method called PolyA-iEP that has been developed for the prediction of polyadenylation sites. More precisely, PolyA-iEP is a method that recognizes mRNA 3'ends which contain polyadenylation sites. It is a modular system which consists of two main components. The first exploits the advantages of emerging patterns and the second is a distance-based scoring method. The outputs of the two components are finally combined by a classifier. The final results reach very high scores of sensitivity and specificity.
Cheng, Chia-Yang; Chu, Chia-Han; Hsu, Hung-Wei; Hsu, Fang-Rong; Tang, Chung Yi; Wang, Wen-Ching; Kung, Hsing-Jien; Chang, Pei-Ching
2014-01-01
Post-translational modification (PTM) of transcriptional factors and chromatin remodelling proteins is recognized as a major mechanism by which transcriptional regulation occurs. Chromatin immunoprecipitation (ChIP) in combination with high-throughput sequencing (ChIP-seq) is being applied as a gold standard when studying the genome-wide binding sites of transcription factor (TFs). This has greatly improved our understanding of protein-DNA interactions on a genomic-wide scale. However, current ChIP-seq peak calling tools are not sufficiently sensitive and are unable to simultaneously identify post-translational modified TFs based on ChIP-seq analysis; this is largely due to the wide-spread presence of multiple modified TFs. Using SUMO-1 modification as an example; we describe here an improved approach that allows the simultaneous identification of the particular genomic binding regions of all TFs with SUMO-1 modification. Traditional peak calling methods are inadequate when identifying multiple TF binding sites that involve long genomic regions and therefore we designed a ChIP-seq processing pipeline for the detection of peaks via a combinatorial fusion method. Then, we annotate the peaks with known transcription factor binding sites (TFBS) using the Transfac Matrix Database (v7.0), which predicts potential SUMOylated TFs. Next, the peak calling result was further analyzed based on the promoter proximity, TFBS annotation, a literature review, and was validated by ChIP-real-time quantitative PCR (qPCR) and ChIP-reChIP real-time qPCR. The results show clearly that SUMOylated TFs are able to be pinpointed using our pipeline. A methodology is presented that analyzes SUMO-1 ChIP-seq patterns and predicts related TFs. Our analysis uses three peak calling tools. The fusion of these different tools increases the precision of the peak calling results. TFBS annotation method is able to predict potential SUMOylated TFs. Here, we offer a new approach that enhances ChIP-seq data analysis and allows the identification of multiple SUMOylated TF binding sites simultaneously, which can then be utilized for other functional PTM binding site prediction in future.
High-energy evolution to three loops
NASA Astrophysics Data System (ADS)
Caron-Huot, Simon; Herranen, Matti
2018-02-01
The Balitsky-Kovchegov equation describes the high-energy growth of gauge theory scattering amplitudes as well as nonlinear saturation effects which stop it. We obtain the three-loop corrections to the equation in planar N = 4 super Yang-Mills theory. Our method exploits a recently established equivalence with the physics of soft wide-angle radiation, so-called non-global logarithms, and thus yields at the same time the threeloop evolution equation for non-global logarithms. As a by-product of our analysis, we develop a Lorentz-covariant method to subtract infrared and collinear divergences in crosssection calculations in the planar limit. We compare our result in the linear regime with a recent prediction for the so-called Pomeron trajectory, and compare its collinear limit with predictions from the spectrum of twist-two operators.
Allele-specific copy-number discovery from whole-genome and whole-exome sequencing.
Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J; Szatkiewicz, Jin P
2015-08-18
Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Prediction of competitive diffusion on complex networks
NASA Astrophysics Data System (ADS)
Zhao, Jiuhua; Liu, Qipeng; Wang, Lin; Wang, Xiaofan
2018-10-01
In this paper, we study the prediction problem of diffusion process on complex networks in competitive circumstances. With this problem solved, the competitors could timely intervene the diffusion process if needed such that an expected outcome might be obtained. We consider a model with two groups of competitors spreading opposite opinions on a network. A prediction method based on the mutual influences among the agents is proposed, called Influence Matrix (IM for short), and simulations on real-world networks show that the proposed IM method has quite high accuracy on predicting both the preference of any normal agent and the final competition result. For comparison purpose, classic centrality measures are also used to predict the competition result. It is shown that PageRank, Degree, Katz Centrality, and the IM method are suitable for predicting the competition result. More precisely, in undirected networks, the IM method performs better than these centrality measures when the competing group contains more than one agent; in directed networks, the IM method performs only second to PageRank.
Predicting links based on knowledge dissemination in complex network
NASA Astrophysics Data System (ADS)
Zhou, Wen; Jia, Yifan
2017-04-01
Link prediction is the task of mining the missing links in networks or predicting the next vertex pair to be connected by a link. A lot of link prediction methods were inspired by evolutionary processes of networks. In this paper, a new mechanism for the formation of complex networks called knowledge dissemination (KD) is proposed with the assumption of knowledge disseminating through the paths of a network. Accordingly, a new link prediction method-knowledge dissemination based link prediction (KDLP)-is proposed to test KD. KDLP characterizes vertex similarity based on knowledge quantity (KQ) which measures the importance of a vertex through H-index. Extensive numerical simulations on six real-world networks demonstrate that KDLP is a strong link prediction method which performs at a higher prediction accuracy than four well-known similarity measures including common neighbors, local path index, average commute time and matrix forest index. Furthermore, based on the common conclusion that an excellent link prediction method reveals a good evolving mechanism, the experiment results suggest that KD is a considerable network evolving mechanism for the formation of complex networks.
HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features.
Zaman, Rianon; Chowdhury, Shahana Yasmin; Rashid, Mahmood A; Sharma, Alok; Dehzangi, Abdollah; Shatabda, Swakkhar
2017-01-01
DNA-binding proteins often play important role in various processes within the cell. Over the last decade, a wide range of classification algorithms and feature extraction techniques have been used to solve this problem. In this paper, we propose a novel DNA-binding protein prediction method called HMMBinder. HMMBinder uses monogram and bigram features extracted from the HMM profiles of the protein sequences. To the best of our knowledge, this is the first application of HMM profile based features for the DNA-binding protein prediction problem. We applied Support Vector Machines (SVM) as a classification technique in HMMBinder. Our method was tested on standard benchmark datasets. We experimentally show that our method outperforms the state-of-the-art methods found in the literature.
A Generalized Approach for Measuring Relationships Among Genes.
Wang, Lijun; Ahsan, Md Asif; Chen, Ming
2017-07-21
Several methods for identifying relationships among pairs of genes have been developed. In this article, we present a generalized approach for measuring relationships between any pairs of genes, which is based on statistical prediction. We derive two particular versions of the generalized approach, least squares estimation (LSE) and nearest neighbors prediction (NNP). According to mathematical proof, LSE is equivalent to the methods based on correlation; and NNP is approximate to one popular method called the maximal information coefficient (MIC) according to the performances in simulations and real dataset. Moreover, the approach based on statistical prediction can be extended from two-genes relationships to multi-genes relationships. This application would help to identify relationships among multi-genes.
Toward a standard in structural genome annotation for prokaryotes
Tripp, H. James; Sutton, Granger; White, Owen; ...
2015-07-25
In an effort to identify the best practice for finding genes in prokaryotic genomes and propose it as a standard for automated annotation pipelines, we collected 1,004,576 peptides from various publicly available resources, and these were used as a basis to evaluate various gene-calling methods. The peptides came from 45 bacterial replicons with an average GC content from 31 % to 74 %, biased toward higher GC content genomes. Automated, manual, and semi-manual methods were used to tally errors in three widely used gene calling methods, as evidenced by peptides mapped outside the boundaries of called genes. We found thatmore » the consensus set of identical genes predicted by the three methods constitutes only about 70 % of the genes predicted by each individual method (with start and stop required to coincide). Peptide data was useful for evaluating some of the differences between gene callers, but not reliable enough to make the results conclusive, due to limitations inherent in any proteogenomic study. A single, unambiguous, unanimous best practice did not emerge from this analysis, since the available proteomics data were not adequate to provide an objective measurement of differences in the accuracy between these methods. However, as a result of this study, software, reference data, and procedures have been better matched among participants, representing a step toward a much-needed standard. In the absence of sufficient amount of experimental data to achieve a universal standard, our recommendation is that any of these methods can be used by the community, as long as a single method is employed across all datasets to be compared.« less
Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating
Wang, Bingkun; Huang, Yongfeng; Li, Xing
2016-01-01
E-commerce develops rapidly. Learning and taking good advantage of the myriad reviews from online customers has become crucial to the success in this game, which calls for increasingly more accuracy in sentiment classification of these reviews. Therefore the finer-grained review rating prediction is preferred over the rough binary sentiment classification. There are mainly two types of method in current review rating prediction. One includes methods based on review text content which focus almost exclusively on textual content and seldom relate to those reviewers and items remarked in other relevant reviews. The other one contains methods based on collaborative filtering which extract information from previous records in the reviewer-item rating matrix, however, ignoring review textual content. Here we proposed a framework for review rating prediction which shows the effective combination of the two. Then we further proposed three specific methods under this framework. Experiments on two movie review datasets demonstrate that our review rating prediction framework has better performance than those previous methods. PMID:26880879
Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating.
Wang, Bingkun; Huang, Yongfeng; Li, Xing
2016-01-01
E-commerce develops rapidly. Learning and taking good advantage of the myriad reviews from online customers has become crucial to the success in this game, which calls for increasingly more accuracy in sentiment classification of these reviews. Therefore the finer-grained review rating prediction is preferred over the rough binary sentiment classification. There are mainly two types of method in current review rating prediction. One includes methods based on review text content which focus almost exclusively on textual content and seldom relate to those reviewers and items remarked in other relevant reviews. The other one contains methods based on collaborative filtering which extract information from previous records in the reviewer-item rating matrix, however, ignoring review textual content. Here we proposed a framework for review rating prediction which shows the effective combination of the two. Then we further proposed three specific methods under this framework. Experiments on two movie review datasets demonstrate that our review rating prediction framework has better performance than those previous methods.
Predicting β-turns and their types using predicted backbone dihedral angles and secondary structures
2010-01-01
Background β-turns are secondary structure elements usually classified as coil. Their prediction is important, because of their role in protein folding and their frequent occurrence in protein chains. Results We have developed a novel method that predicts β-turns and their types using information from multiple sequence alignments, predicted secondary structures and, for the first time, predicted dihedral angles. Our method uses support vector machines, a supervised classification technique, and is trained and tested on three established datasets of 426, 547 and 823 protein chains. We achieve a Matthews correlation coefficient of up to 0.49, when predicting the location of β-turns, the highest reported value to date. Moreover, the additional dihedral information improves the prediction of β-turn types I, II, IV, VIII and "non-specific", achieving correlation coefficients up to 0.39, 0.33, 0.27, 0.14 and 0.38, respectively. Our results are more accurate than other methods. Conclusions We have created an accurate predictor of β-turns and their types. Our method, called DEBT, is available online at http://comp.chem.nottingham.ac.uk/debt/. PMID:20673368
Kountouris, Petros; Hirst, Jonathan D
2010-07-31
Beta-turns are secondary structure elements usually classified as coil. Their prediction is important, because of their role in protein folding and their frequent occurrence in protein chains. We have developed a novel method that predicts beta-turns and their types using information from multiple sequence alignments, predicted secondary structures and, for the first time, predicted dihedral angles. Our method uses support vector machines, a supervised classification technique, and is trained and tested on three established datasets of 426, 547 and 823 protein chains. We achieve a Matthews correlation coefficient of up to 0.49, when predicting the location of beta-turns, the highest reported value to date. Moreover, the additional dihedral information improves the prediction of beta-turn types I, II, IV, VIII and "non-specific", achieving correlation coefficients up to 0.39, 0.33, 0.27, 0.14 and 0.38, respectively. Our results are more accurate than other methods. We have created an accurate predictor of beta-turns and their types. Our method, called DEBT, is available online at http://comp.chem.nottingham.ac.uk/debt/.
PARTS: Probabilistic Alignment for RNA joinT Secondary structure prediction
Harmanci, Arif Ozgun; Sharma, Gaurav; Mathews, David H.
2008-01-01
A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http://rna.urmc.rochester.edu. PMID:18304945
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition
Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina
2007-01-01
Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145
Reliable prediction intervals with regression neural networks.
Papadopoulos, Harris; Haralambous, Haris
2011-10-01
This paper proposes an extension to conventional regression neural networks (NNs) for replacing the point predictions they produce with prediction intervals that satisfy a required level of confidence. Our approach follows a novel machine learning framework, called Conformal Prediction (CP), for assigning reliable confidence measures to predictions without assuming anything more than that the data are independent and identically distributed (i.i.d.). We evaluate the proposed method on four benchmark datasets and on the problem of predicting Total Electron Content (TEC), which is an important parameter in trans-ionospheric links; for the latter we use a dataset of more than 60000 TEC measurements collected over a period of 11 years. Our experimental results show that the prediction intervals produced by our method are both well calibrated and tight enough to be useful in practice. Copyright © 2011 Elsevier Ltd. All rights reserved.
The Application of FIA-based Data to Wildlife Habitat Modeling: A Comparative Study
Thomas C., Jr. Edwards; Gretchen G. Moisen; Tracey S. Frescino; Randall J. Schultz
2005-01-01
We evaluated the capability of two types of models, one based on spatially explicit variables derived from FIA data and one using so-called traditional habitat evaluation methods, for predicting the presence of cavity-nesting bird habitat in Fishlake National Forest, Utah. Both models performed equally well, in measures of predictive accuracy, with the FIA-based model...
NASA Astrophysics Data System (ADS)
Di Pasquale, Nicodemo; Davie, Stuart J.; Popelier, Paul L. A.
2018-06-01
Using the machine learning method kriging, we predict the energies of atoms in ion-water clusters, consisting of either Cl- or Na+ surrounded by a number of water molecules (i.e., without Na+Cl- interaction). These atomic energies are calculated following the topological energy partitioning method called Interacting Quantum Atoms (IQAs). Kriging predicts atomic properties (in this case IQA energies) by a model that has been trained over a small set of geometries with known property values. The results presented here are part of the development of an advanced type of force field, called FFLUX, which offers quantum mechanical information to molecular dynamics simulations without the limiting computational cost of ab initio calculations. The results reported for the prediction of the IQA components of the energy in the test set exhibit an accuracy of a few kJ/mol, corresponding to an average error of less than 5%, even when a large cluster of water molecules surrounding an ion is considered. Ions represent an important chemical system and this work shows that they can be correctly taken into account in the framework of the FFLUX force field.
Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi
2013-04-10
Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.
Jelínek, Jan; Škoda, Petr; Hoksza, David
2017-12-06
Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.
THE PRACTICE OF STRUCTURE ACTIVITY RELATIONSHIPS (SAR) IN TOXICOLOGY
Both qualitative and quantitative modeling methods relating chemical structure to biological activity, called structure-activity relationship analyses or SAR, are applied to the prediction and characterization of chemical toxicity. This minireview will discuss some generic issue...
NASA Astrophysics Data System (ADS)
Wang, Hongcui; Kawahara, Tatsuya
CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.
BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.
Kaur, Harpreet; Raghava, G P S
2002-03-01
beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/
NASA Astrophysics Data System (ADS)
Feng, Shou; Fu, Ping; Zheng, Wenbin
2018-03-01
Predicting gene function based on biological instrumental data is a complicated and challenging hierarchical multi-label classification (HMC) problem. When using local approach methods to solve this problem, a preliminary results processing method is usually needed. This paper proposed a novel preliminary results processing method called the nodes interaction method. The nodes interaction method revises the preliminary results and guarantees that the predictions are consistent with the hierarchy constraint. This method exploits the label dependency and considers the hierarchical interaction between nodes when making decisions based on the Bayesian network in its first phase. In the second phase, this method further adjusts the results according to the hierarchy constraint. Implementing the nodes interaction method in the HMC framework also enhances the HMC performance for solving the gene function prediction problem based on the Gene Ontology (GO), the hierarchy of which is a directed acyclic graph that is more difficult to tackle. The experimental results validate the promising performance of the proposed method compared to state-of-the-art methods on eight benchmark yeast data sets annotated by the GO.
Variable context Markov chains for HIV protease cleavage site prediction.
Oğul, Hasan
2009-06-01
Deciphering the knowledge of HIV protease specificity and developing computational tools for detecting its cleavage sites in protein polypeptide chain are very desirable for designing efficient and specific chemical inhibitors to prevent acquired immunodeficiency syndrome. In this study, we developed a generative model based on a generalization of variable order Markov chains (VOMC) for peptide sequences and adapted the model for prediction of their cleavability by certain proteases. The new method, called variable context Markov chains (VCMC), attempts to identify the context equivalence based on the evolutionary similarities between individual amino acids. It was applied for HIV-1 protease cleavage site prediction problem and shown to outperform existing methods in terms of prediction accuracy on a common dataset. In general, the method is a promising tool for prediction of cleavage sites of all proteases and encouraged to be used for any kind of peptide classification problem as well.
Intelligent monitoring and control of semiconductor manufacturing equipment
NASA Technical Reports Server (NTRS)
Murdock, Janet L.; Hayes-Roth, Barbara
1991-01-01
The use of AI methods to monitor and control semiconductor fabrication in a state-of-the-art manufacturing environment called the Rapid Thermal Multiprocessor is described. Semiconductor fabrication involves many complex processing steps with limited opportunities to measure process and product properties. By applying additional process and product knowledge to that limited data, AI methods augment classical control methods by detecting abnormalities and trends, predicting failures, diagnosing, planning corrective action sequences, explaining diagnoses or predictions, and reacting to anomalous conditions that classical control systems typically would not correct. Research methodology and issues are discussed, and two diagnosis scenarios are examined.
Ligand-protein docking using a quantum stochastic tunneling optimization method.
Mancera, Ricardo L; Källblad, Per; Todorov, Nikolay P
2004-04-30
A novel hybrid optimization method called quantum stochastic tunneling has been recently introduced. Here, we report its implementation within a new docking program called EasyDock and a validation with the CCDC/Astex data set of ligand-protein complexes using the PLP score to represent the ligand-protein potential energy surface and ScreenScore to score the ligand-protein binding energies. When taking the top energy-ranked ligand binding mode pose, we were able to predict the correct crystallographic ligand binding mode in up to 75% of the cases. By using this novel optimization method run times for typical docking simulations are significantly shortened. Copyright 2004 Wiley Periodicals, Inc. J Comput Chem 25: 858-864, 2004
NASA Astrophysics Data System (ADS)
Toropov, Andrey A.; Toropova, Alla P.
2018-06-01
Predictive model of logP for Pt(II) and Pt(IV) complexes built up with the Monte Carlo method using the CORAL software has been validated with six different splits into the training and validation sets. The improving of the predictive potential of models for six different splits has been obtained using so-called index of ideality of correlation. The suggested models give possibility to extract molecular features, which cause the increase or vice versa decrease of the logP.
Improved Method for Linear B-Cell Epitope Prediction Using Antigen’s Primary Sequence
Raghava, Gajendra P. S.
2013-01-01
One of the major challenges in designing a peptide-based vaccine is the identification of antigenic regions in an antigen that can stimulate B-cell’s response, also called B-cell epitopes. In the past, several methods have been developed for the prediction of conformational and linear (or continuous) B-cell epitopes. However, the existing methods for predicting linear B-cell epitopes are far from perfection. In this study, an attempt has been made to develop an improved method for predicting linear B-cell epitopes. We have retrieved experimentally validated B-cell epitopes as well as non B-cell epitopes from Immune Epitope Database and derived two types of datasets called Lbtope_Variable and Lbtope_Fixed length datasets. The Lbtope_Variable dataset contains 14876 B-cell epitope and 23321 non-epitopes of variable length where as Lbtope_Fixed length dataset contains 12063 B-cell epitopes and 20589 non-epitopes of fixed length. We also evaluated the performance of models on above datasets after removing highly identical peptides from the datasets. In addition, we have derived third dataset Lbtope_Confirm having 1042 epitopes and 1795 non-epitopes where each epitope or non-epitope has been experimentally validated in at least two studies. A number of models have been developed to discriminate epitopes and non-epitopes using different machine-learning techniques like Support Vector Machine, and K-Nearest Neighbor. We achieved accuracy from ∼54% to 86% using diverse s features like binary profile, dipeptide composition, AAP (amino acid pair) profile. In this study, for the first time experimentally validated non B-cell epitopes have been used for developing method for predicting linear B-cell epitopes. In previous studies, random peptides have been used as non B-cell epitopes. In order to provide service to scientific community, a web server LBtope has been developed for predicting and designing B-cell epitopes (http://crdd.osdd.net/raghava/lbtope/). PMID:23667458
How the environment shapes animal signals: a test of the acoustic adaptation hypothesis in frogs.
Goutte, S; Dubois, A; Howard, S D; Márquez, R; Rowley, J J L; Dehling, J M; Grandcolas, P; Xiong, R C; Legendre, F
2018-01-01
Long-distance acoustic signals are widely used in animal communication systems and, in many cases, are essential for reproduction. The acoustic adaptation hypothesis (AAH) implies that acoustic signals should be selected for further transmission and better content integrity under the acoustic constraints of the habitat in which they are produced. In this study, we test predictions derived from the AAH in frogs. Specifically, we focus on the difference between torrent frogs and frogs calling in less noisy habitats. Torrents produce sounds that can mask frog vocalizations and constitute a major acoustic constraint on call evolution. We combine data collected in the field, material from scientific collections and the literature for a total of 79 primarily Asian species, of the families Ranidae, Rhacophoridae, Dicroglossidae and Microhylidae. Using phylogenetic comparative methods and including morphological and environmental potential confounding factors, we investigate putatively adaptive call features in torrent frogs. We use broad habitat categories as well as fine-scale habitat measurements and test their correlation with six call characteristics. We find mixed support for the AAH. Spectral features of torrent frog calls are different from those of frogs calling in other habitats and are related to ambient noise levels, as predicted by the AAH. However, temporal call features do not seem to be shaped by the frogs' calling habitats. Our results underline both the complexity of call evolution and the need to consider multiple factors when investigating this issue. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.
Linear reduction methods for tag SNP selection.
He, Jingwu; Zelikovsky, Alex
2004-01-01
It is widely hoped that constructing a complete human haplotype map will help to associate complex diseases with certain SNP's. Unfortunately, the number of SNP's is huge and it is very costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNP's that should be sequenced to considerably small number of informative representatives, so called tag SNP's. In this paper, we propose a new linear algebra based method for selecting and using tag SNP's. Our method is purely combinatorial and can be combined with linkage disequilibrium (LD) and block based methods. We measure the quality of our tag SNP selection algorithm by comparing actual SNP's with SNP's linearly predicted from linearly chosen tag SNP's. We obtain an extremely good compression and prediction rates. For example, for long haplotypes (>25000 SNP's), knowing only 0.4% of all SNP's we predict the entire unknown haplotype with 2% accuracy while the prediction method is based on a 10% sample of the population.
On some methods for assessing earthquake predictions
NASA Astrophysics Data System (ADS)
Molchan, G.; Romashkova, L.; Peresan, A.
2017-09-01
A regional approach to the problem of assessing earthquake predictions inevitably faces a deficit of data. We point out some basic limits of assessment methods reported in the literature, considering the practical case of the performance of the CN pattern recognition method in the prediction of large Italian earthquakes. Along with the classical hypothesis testing, a new game approach, the so-called parimutuel gambling (PG) method, is examined. The PG, originally proposed for the evaluation of the probabilistic earthquake forecast, has been recently adapted for the case of 'alarm-based' CN prediction. The PG approach is a non-standard method; therefore it deserves careful examination and theoretical analysis. We show that the PG alarm-based version leads to an almost complete loss of information about predicted earthquakes (even for a large sample). As a result, any conclusions based on the alarm-based PG approach are not to be trusted. We also show that the original probabilistic PG approach does not necessarily identifies the genuine forecast correctly among competing seismicity rate models, even when applied to extensive data.
Fluid mechanics of slurry flow through the grinding media in ball mills
DOE Office of Scientific and Technical Information (OSTI.GOV)
Songfack, P.K.; Rajamani, R.K.
1995-12-31
The slurry transport within the ball mill greatly influences the mill holdup, residence time, breakage rate, and hence the power draw and the particle size distribution of the mill product. However, residence-time distribution and holdup in industrial mills could not be predicted a priori. Indeed, it is impossible to determine the slurry loading in continuously operating mills by direct measurement, especially in industrial mills. In this paper, the slurry transport problem is solved using the principles of fluid mechanics. First, the motion of the ball charge and its expansion are predicted by a technique called discrete element method. Then themore » slurry flow through the porous ball charge is tackled with a fluid-flow technique called the marker and cell method. This may be the only numerical technique capable of tracking the slurry free surface as it fluctuates with the motion of the ball charge. The result is a prediction of the slurry profile in both the radial and axial directions. Hence, it leads to the detailed description of slurry mass and ball charge within the mill. The model predictions are verified with pilot-scale experimental work. This novel approach based on the physics of fluid flow is devoid of any empiricism. It is shown that the holdup of industrial mills at a given feed percent solids can be predicted successfully.« less
Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors.
Sun, Meijian; Wang, Xia; Zou, Chuanxin; He, Zenghui; Liu, Wei; Li, Honglin
2016-06-07
RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers. In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631. The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .
NASA Technical Reports Server (NTRS)
Zorumski, W. E.
1983-01-01
Analytic propeller noise prediction involves a sequence of computations culminating in the application of acoustic equations. The prediction sequence currently used by NASA in its ANOPP (aircraft noise prediction) program is described. The elements of the sequence are called program modules. The first group of modules analyzes the propeller geometry, the aerodynamics, including both potential and boundary layer flow, the propeller performance, and the surface loading distribution. This group of modules is based entirely on aerodynamic strip theory. The next group of modules deals with the actual noise prediction, based on data from the first group. Deterministic predictions of periodic thickness and loading noise are made using Farassat's time-domain methods. Broadband noise is predicted by the semi-empirical Schlinker-Amiet method. Near-field predictions of fuselage surface pressures include the effects of boundary layer refraction and (for a cylinder) scattering. Far-field predictions include atmospheric and ground effects. Experimental data from subsonic and transonic propellers are compared and NASA's future direction is propeller noise technology development are indicated.
In Silico Labeling: Predicting Fluorescent Labels in Unlabeled Images.
Christiansen, Eric M; Yang, Samuel J; Ando, D Michael; Javaherian, Ashkan; Skibinski, Gaia; Lipnick, Scott; Mount, Elliot; O'Neil, Alison; Shah, Kevan; Lee, Alicia K; Goyal, Piyush; Fedus, William; Poplin, Ryan; Esteva, Andre; Berndl, Marc; Rubin, Lee L; Nelson, Philip; Finkbeiner, Steven
2018-04-19
Microscopy is a central method in life sciences. Many popular methods, such as antibody labeling, are used to add physical fluorescent labels to specific cellular constituents. However, these approaches have significant drawbacks, including inconsistency; limitations in the number of simultaneous labels because of spectral overlap; and necessary perturbations of the experiment, such as fixing the cells, to generate the measurement. Here, we show that a computational machine-learning approach, which we call "in silico labeling" (ISL), reliably predicts some fluorescent labels from transmitted-light images of unlabeled fixed or live biological samples. ISL predicts a range of labels, such as those for nuclei, cell type (e.g., neural), and cell state (e.g., cell death). Because prediction happens in silico, the method is consistent, is not limited by spectral overlap, and does not disturb the experiment. ISL generates biological measurements that would otherwise be problematic or impossible to acquire. Copyright © 2018 Elsevier Inc. All rights reserved.
Synthesized airfoil data method for prediction of dynamic stall and unsteady airloads
NASA Technical Reports Server (NTRS)
Gangwani, S. T.
1983-01-01
A detailed analysis of dynamic stall experiments has led to a set of relatively compact analytical expressions, called synthesized unsteady airfoil data, which accurately describe in the time-domain the unsteady aerodynamic characteristics of stalled airfoils. An analytical research program was conducted to expand and improve this synthesized unsteady airfoil data method using additional available sets of unsteady airfoil data. The primary objectives were to reduce these data to synthesized form for use in rotor airload prediction analyses and to generalize the results. Unsteady drag data were synthesized which provided the basis for successful expansion of the formulation to include computation of the unsteady pressure drag of airfoils and rotor blades. Also, an improved prediction model for airfoil flow reattachment was incorporated in the method. Application of this improved unsteady aerodynamics model has resulted in an improved correlation between analytic predictions and measured full scale helicopter blade loads and stress data.
NASA Astrophysics Data System (ADS)
Iwasaki, Ryosuke; Takagi, Ryo; Tomiyasu, Kentaro; Yoshizawa, Shin; Umemura, Shin-ichiro
2017-07-01
The targeting of the ultrasound beam and the prediction of thermal lesion formation in advance are the requirements for monitoring high-intensity focused ultrasound (HIFU) treatment with safety and reproducibility. To visualize the HIFU focal zone, we utilized an acoustic radiation force impulse (ARFI) imaging-based method. After inducing displacements inside tissues with pulsed HIFU called the push pulse exposure, the distribution of axial displacements started expanding and moving. To acquire RF data immediately after and during the HIFU push pulse exposure to improve prediction accuracy, we attempted methods using extrapolation estimation and applying HIFU noise elimination. The distributions going back in the time domain from the end of push pulse exposure are in good agreement with tissue coagulation at the center. The results suggest that the proposed focal zone visualization employing pulsed HIFU entailing the high-speed ARFI imaging method is useful for the prediction of thermal coagulation in advance.
Personalized Modeling for Prediction with Decision-Path Models
Visweswaran, Shyam; Ferreira, Antonio; Ribeiro, Guilherme A.; Oliveira, Alexandre C.; Cooper, Gregory F.
2015-01-01
Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach. PMID:26098570
Ching, Travers; Zhu, Xun; Garmire, Lana X
2018-04-01
Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet.
Linear reduction method for predictive and informative tag SNP selection.
He, Jingwu; Westbrooks, Kelly; Zelikovsky, Alexander
2005-01-01
Constructing a complete human haplotype map is helpful when associating complex diseases with their related SNPs. Unfortunately, the number of SNPs is very large and it is costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNPs that should be sequenced to a small number of informative representatives called tag SNPs. In this paper, we propose a new linear algebra-based method for selecting and using tag SNPs. We measure the quality of our tag SNP selection algorithm by comparing actual SNPs with SNPs predicted from selected linearly independent tag SNPs. Our experiments show that for sufficiently long haplotypes, knowing only 0.4% of all SNPs the proposed linear reduction method predicts an unknown haplotype with the error rate below 2% based on 10% of the population.
NASA Technical Reports Server (NTRS)
Kao, G. C.
1973-01-01
Method has been developed for predicting interaction between components and corresponding support structures subjected to acoustic excitations. Force environments determined in spectral form are called force spectra. Force-spectra equation is determined based on one-dimensional structural impedance model.
ExpoCast: Exposure Science for Prioritization and Toxicity Testing (S)
The US EPA is completing the Phase I pilot for a chemical prioritization research program, called ToxCast. Here EPA is developing methods for using computational chemistry, high-throughput screening, and toxicogenomic technologies to predict potential toxicity and prioritize limi...
ExpoCast: Exposure Science for Prioritization and Toxicity Testing
The US EPA is completing the Phase I pilot for a chemical prioritization research program, called ToxCastTM. Here EPA is developing methods for using computational chemistry, high-throughput screening, and toxicogenomic technologies to predict potential toxicity and prioritize l...
Pitchers, W. R.; Brooks, R.; Jennions, M. D.; Tregenza, T.; Dworkin, I.; Hunt, J.
2013-01-01
Phenotypic integration and plasticity are central to our understanding of how complex phenotypic traits evolve. Evolutionary change in complex quantitative traits can be predicted using the multivariate breeders’ equation, but such predictions are only accurate if the matrices involved are stable over evolutionary time. Recent work, however, suggests that these matrices are temporally plastic, spatially variable and themselves evolvable. The data available on phenotypic variance-covariance matrix (P) stability is sparse, and largely focused on morphological traits. Here we compared P for the structure of the complex sexual advertisement call of six divergent allopatric populations of the Australian black field cricket, Teleogryllus commodus. We measured a subset of calls from wild-caught crickets from each of the populations and then a second subset after rearing crickets under common-garden conditions for three generations. In a second experiment, crickets from each population were reared in the laboratory on high- and low-nutrient diets and their calls recorded. In both experiments, we estimated P for call traits and used multiple methods to compare them statistically (Flury hierarchy, geometric subspace comparisons and random skewers). Despite considerable variation in means and variances of individual call traits, the structure of P was largely conserved among populations, across generations and between our rearing diets. Our finding that P remains largely stable, among populations and between environmental conditions, suggests that selection has preserved the structure of call traits in order that they can function as an integrated unit. PMID:23530814
ACCEPT: Introduction of the Adverse Condition and Critical Event Prediction Toolbox
NASA Technical Reports Server (NTRS)
Martin, Rodney A.; Santanu, Das; Janakiraman, Vijay Manikandan; Hosein, Stefan
2015-01-01
The prediction of anomalies or adverse events is a challenging task, and there are a variety of methods which can be used to address the problem. In this paper, we introduce a generic framework developed in MATLAB (sup registered mark) called ACCEPT (Adverse Condition and Critical Event Prediction Toolbox). ACCEPT is an architectural framework designed to compare and contrast the performance of a variety of machine learning and early warning algorithms, and tests the capability of these algorithms to robustly predict the onset of adverse events in any time-series data generating systems or processes.
sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides
DOE Office of Scientific and Technical Information (OSTI.GOV)
Luo, Heng; Ye, Hao; Ng, Hui Wen
Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. Furthermore, this algorithmmore » can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.« less
sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides
Luo, Heng; Ye, Hao; Ng, Hui Wen; Sakkiah, Sugunadevi; Mendrick, Donna L.; Hong, Huixiao
2016-01-01
Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. This algorithm can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system. PMID:27558848
sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides
Luo, Heng; Ye, Hao; Ng, Hui Wen; ...
2016-08-25
Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. Furthermore, this algorithmmore » can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.« less
NASA Technical Reports Server (NTRS)
Putnam, L. E.
1979-01-01
A Neumann solution for inviscid external flow was coupled to a modified Reshotko-Tucker integral boundary-layer technique, the control volume method of Presz for calculating flow in the separated region, and an inviscid one-dimensional solution for the jet exhaust flow in order to predict axisymmetric nozzle afterbody pressure distributions and drag. The viscous and inviscid flows are solved iteratively until convergence is obtained. A computer algorithm of this procedure was written and is called DONBOL. A description of the computer program and a guide to its use is given. Comparisons of the predictions of this method with experiments show that the method accurately predicts the pressure distributions of boattail afterbodies which have the jet exhaust flow simulated by solid bodies. For nozzle configurations which have the jet exhaust simulated by high-pressure air, the present method significantly underpredicts the magnitude of nozzle pressure drag. This deficiency results because the method neglects the effects of jet plume entrainment. This method is limited to subsonic free-stream Mach numbers below that for which the flow over the body of revolution becomes sonic.
NASA Astrophysics Data System (ADS)
Zheng, Qin; Yang, Zubin; Sha, Jianxin; Yan, Jun
2017-02-01
In predictability problem research, the conditional nonlinear optimal perturbation (CNOP) describes the initial perturbation that satisfies a certain constraint condition and causes the largest prediction error at the prediction time. The CNOP has been successfully applied in estimation of the lower bound of maximum predictable time (LBMPT). Generally, CNOPs are calculated by a gradient descent algorithm based on the adjoint model, which is called ADJ-CNOP. This study, through the two-dimensional Ikeda model, investigates the impacts of the nonlinearity on ADJ-CNOP and the corresponding precision problems when using ADJ-CNOP to estimate the LBMPT. Our conclusions are that (1) when the initial perturbation is large or the prediction time is long, the strong nonlinearity of the dynamical model in the prediction variable will lead to failure of the ADJ-CNOP method, and (2) when the objective function has multiple extreme values, ADJ-CNOP has a large probability of producing local CNOPs, hence making a false estimation of the LBMPT. Furthermore, the particle swarm optimization (PSO) algorithm, one kind of intelligent algorithm, is introduced to solve this problem. The method using PSO to compute CNOP is called PSO-CNOP. The results of numerical experiments show that even with a large initial perturbation and long prediction time, or when the objective function has multiple extreme values, PSO-CNOP can always obtain the global CNOP. Since the PSO algorithm is a heuristic search algorithm based on the population, it can overcome the impact of nonlinearity and the disturbance from multiple extremes of the objective function. In addition, to check the estimation accuracy of the LBMPT presented by PSO-CNOP and ADJ-CNOP, we partition the constraint domain of initial perturbations into sufficiently fine grid meshes and take the LBMPT obtained by the filtering method as a benchmark. The result shows that the estimation presented by PSO-CNOP is closer to the true value than the one by ADJ-CNOP with the forecast time increasing.
Boosting compound-protein interaction prediction by deep learning.
Tian, Kai; Shao, Mingyu; Wang, Yang; Guan, Jihong; Zhou, Shuigeng
2016-11-01
The identification of interactions between compounds and proteins plays an important role in network pharmacology and drug discovery. However, experimentally identifying compound-protein interactions (CPIs) is generally expensive and time-consuming, computational approaches are thus introduced. Among these, machine-learning based methods have achieved a considerable success. However, due to the nonlinear and imbalanced nature of biological data, many machine learning approaches have their own limitations. Recently, deep learning techniques show advantages over many state-of-the-art machine learning methods in some applications. In this study, we aim at improving the performance of CPI prediction based on deep learning, and propose a method called DL-CPI (the abbreviation of Deep Learning for Compound-Protein Interactions prediction), which employs deep neural network (DNN) to effectively learn the representations of compound-protein pairs. Extensive experiments show that DL-CPI can learn useful features of compound-protein pairs by a layerwise abstraction, and thus achieves better prediction performance than existing methods on both balanced and imbalanced datasets. Copyright © 2016 Elsevier Inc. All rights reserved.
Improving prediction accuracy of cooling load using EMD, PSR and RBFNN
NASA Astrophysics Data System (ADS)
Shen, Limin; Wen, Yuanmei; Li, Xiaohong
2017-08-01
To increase the accuracy for the prediction of cooling load demand, this work presents an EMD (empirical mode decomposition)-PSR (phase space reconstruction) based RBFNN (radial basis function neural networks) method. Firstly, analyzed the chaotic nature of the real cooling load demand, transformed the non-stationary cooling load historical data into several stationary intrinsic mode functions (IMFs) by using EMD. Secondly, compared the RBFNN prediction accuracies of each IMFs and proposed an IMF combining scheme that is combine the lower-frequency components (called IMF4-IMF6 combined) while keep the higher frequency component (IMF1, IMF2, IMF3) and the residual unchanged. Thirdly, reconstruct phase space for each combined components separately, process the highest frequency component (IMF1) by differential method and predict with RBFNN in the reconstructed phase spaces. Real cooling load data of a centralized ice storage cooling systems in Guangzhou are used for simulation. The results show that the proposed hybrid method outperforms the traditional methods.
Liang, Yunyun; Liu, Sanyang; Zhang, Shengli
2015-01-01
Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
Prediction of dynamical systems by symbolic regression
NASA Astrophysics Data System (ADS)
Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K.; Noack, Bernd R.
2016-07-01
We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.
An, Ji‐Yong; Meng, Fan‐Rong; Chen, Xing; Yan, Gui‐Ying; Hu, Ji‐Pu
2016-01-01
Abstract Predicting protein–protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high‐throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM‐BiGP that combines the relevance vector machine (RVM) model and Bi‐gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi‐gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five‐fold cross‐validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state‐of‐the‐art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM‐BiGP method is significantly better than the SVM‐based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM‐BiGP‐PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/. PMID:27452983
NASA Astrophysics Data System (ADS)
Banabic, D.; Vos, M.; Paraianu, L.; Jurco, P.
2007-05-01
The experimental research on the formability of metal sheets has shown that there is a significant dispersion of the limit strains in an area delimited by two curves: a lower curve (LFLC) and an upper one (UFLC). The region between the two curves defines the so-called Forming Limit Band (FLB). So far, this forming band has only been determined experimentally. In this paper the authors suggested a method to predict the Forming Limit Band. The proposed method is illustrated on the AA6111-T43 aluminium alloy.
Huang, Yi-Fei; Gulko, Brad; Siepel, Adam
2017-04-01
Many genetic variants that influence phenotypes of interest are located outside of protein-coding genes, yet existing methods for identifying such variants have poor predictive power. Here we introduce a new computational method, called LINSIGHT, that substantially improves the prediction of noncoding nucleotide sites at which mutations are likely to have deleterious fitness consequences, and which, therefore, are likely to be phenotypically important. LINSIGHT combines a generalized linear model for functional genomic data with a probabilistic model of molecular evolution. The method is fast and highly scalable, enabling it to exploit the 'big data' available in modern genomics. We show that LINSIGHT outperforms the best available methods in identifying human noncoding variants associated with inherited diseases. In addition, we apply LINSIGHT to an atlas of human enhancers and show that the fitness consequences at enhancers depend on cell type, tissue specificity, and constraints at associated promoters.
Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding
2013-01-01
Background In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other. Results The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best. Conclusions The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least biased, the most precise, robust and stable estimates of predictive accuracy. These properties argue for routinely using Methods 5 and 7 to assess predictive accuracy in genomic selection studies. PMID:24314298
Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding.
Ould Estaghvirou, Sidi Boubacar; Ogutu, Joseph O; Schulz-Streeck, Torben; Knaak, Carsten; Ouzunova, Milena; Gordillo, Andres; Piepho, Hans-Peter
2013-12-06
In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other. The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best. The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least biased, the most precise, robust and stable estimates of predictive accuracy. These properties argue for routinely using Methods 5 and 7 to assess predictive accuracy in genomic selection studies.
NASA Astrophysics Data System (ADS)
Intarasothonchun, Silada; Thipchaksurat, Sakchai; Varakulsiripunth, Ruttikorn; Onozato, Yoshikuni
In this paper, we propose a modified scheme of MSODB and PMS, called Predictive User Mobility Behavior (PUMB) to improve performance of resource reservation and call admission control for cellular networks. This algorithm is proposed in which bandwidth is allocated more efficiently to neighboring cells by key mobility parameters in order to provide QoS guarantees for transferring traffic. The probability is used to form a cluster of cells and the shadow cluster, where a mobile unit is likely to visit. When a mobile unit may change the direction and migrate to the cell that does not belong to its shadow cluster, we can support it by making efficient use of predicted nonconforming call. Concomitantly, to ensure continuity of on-going calls with better utilization of resources, bandwidth is borrowed from predicted nonconforming calls and existing adaptive calls without affecting the minimum QoS guarantees. The performance of the PUMB is demonstrated by simulation results in terms of new call blocking probability, handoff call dropping probability, bandwidth utilization, call successful probability, and overhead message transmission when arrival rate and speed of mobile units are varied. Our results show that PUMB provides the better performances comparing with those of MSODB and PMS under different traffic conditions.
NASA Technical Reports Server (NTRS)
Hartung, Lin C.
1991-01-01
A method for predicting radiation adsorption and emission coefficients in thermochemical nonequilibrium flows is developed. The method is called the Langley optimized radiative nonequilibrium code (LORAN). It applies the smeared band approximation for molecular radiation to produce moderately detailed results and is intended to fill the gap between detailed but costly prediction methods and very fast but highly approximate methods. The optimization of the method to provide efficient solutions allowing coupling to flowfield solvers is discussed. Representative results are obtained and compared to previous nonequilibrium radiation methods, as well as to ground- and flight-measured data. Reasonable agreement is found in all cases. A multidimensional radiative transport method is also developed for axisymmetric flows. Its predictions for wall radiative flux are 20 to 25 percent lower than those of the tangent slab transport method, as expected, though additional investigation of the symmetry and outflow boundary conditions is indicated. The method was applied to the peak heating condition of the aeroassist flight experiment (AFE) trajectory, with results comparable to predictions from other methods. The LORAN method was also applied in conjunction with the computational fluid dynamics (CFD) code LAURA to study the sensitivity of the radiative heating prediction to various models used in nonequilibrium CFD. This study suggests that radiation measurements can provide diagnostic information about the detailed processes occurring in a nonequilibrium flowfield because radiation phenomena are very sensitive to these processes.
Young, Victoria; Rochon, Elizabeth; Mihailidis, Alex
2016-11-14
The purpose of this study was to derive data from real, recorded, personal emergency response call conversations to help improve the artificial intelligence and decision making capability of a spoken dialogue system in a smart personal emergency response system. The main study objectives were to: develop a model of personal emergency response; determine categories for the model's features; identify and calculate measures from call conversations (verbal ability, conversational structure, timing); and examine conversational patterns and relationships between measures and model features applicable for improving the system's ability to automatically identify call model categories and predict a target response. This study was exploratory and used mixed methods. Personal emergency response calls were pre-classified according to call model categories identified qualitatively from response call transcripts. The relationships between six verbal ability measures, three conversational structure measures, two timing measures and three independent factors: caller type, risk level, and speaker type, were examined statistically. Emergency medical response services were the preferred response for the majority of medium and high risk calls for both caller types. Older adult callers mainly requested non-emergency medical service responders during medium risk situations. By measuring the number of spoken words-per-minute and turn-length-in-words for the first spoken utterance of a call, older adult and care provider callers could be identified with moderate accuracy. Average call taker response time was calculated using the number-of-speaker-turns and time-in-seconds measures. Care providers and older adults used different conversational strategies when responding to call takers. The words 'ambulance' and 'paramedic' may hold different latent connotations for different callers. The data derived from the real personal emergency response recordings may help a spoken dialogue system classify incoming calls by caller type with moderate probability shortly after the initial caller utterance. Knowing the caller type, the target response for the call may be predicted with some degree of probability and the output dialogue could be tailored to this caller type. The average call taker response time measured from real calls may be used to limit the conversation length in a spoken dialogue system before defaulting to a live call taker.
Bulashevska, Alla; Eils, Roland
2006-06-14
The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.
NASA Astrophysics Data System (ADS)
Niu, Mingfei; Wang, Yufang; Sun, Shaolong; Li, Yongwu
2016-06-01
To enhance prediction reliability and accuracy, a hybrid model based on the promising principle of "decomposition and ensemble" and a recently proposed meta-heuristic called grey wolf optimizer (GWO) is introduced for daily PM2.5 concentration forecasting. Compared with existing PM2.5 forecasting methods, this proposed model has improved the prediction accuracy and hit rates of directional prediction. The proposed model involves three main steps, i.e., decomposing the original PM2.5 series into several intrinsic mode functions (IMFs) via complementary ensemble empirical mode decomposition (CEEMD) for simplifying the complex data; individually predicting each IMF with support vector regression (SVR) optimized by GWO; integrating all predicted IMFs for the ensemble result as the final prediction by another SVR optimized by GWO. Seven benchmark models, including single artificial intelligence (AI) models, other decomposition-ensemble models with different decomposition methods and models with the same decomposition-ensemble method but optimized by different algorithms, are considered to verify the superiority of the proposed hybrid model. The empirical study indicates that the proposed hybrid decomposition-ensemble model is remarkably superior to all considered benchmark models for its higher prediction accuracy and hit rates of directional prediction.
Bayesian model aggregation for ensemble-based estimates of protein pKa values
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gosink, Luke J.; Hogan, Emilie A.; Pulsipher, Trenton C.
2014-03-01
This paper investigates an ensemble-based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid pmore » $$K_a$$ predictions. Structure-based p$$K_a$$ calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for p$$K_a$$ prediction, ranging from empirical statistical models to {\\it ab initio} quantum mechanical approaches. However, each of these methods are based on a set of assumptions that have inherent bias and sensitivities that can effect a model's accuracy and generalizability for p$$K_a$$ prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate pKa values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the pKa Cooperative and the pKa measurements are based on experimental work conducted by the Garc{\\'i}a-Moreno lab. Our study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods in our cross-validation study with improvements from 40-70\\% over other method classes. This work illustrates a new possible mechanism for improving the accuracy of p$$K_a$$ prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy.« less
Prediction of distribution coefficient from structure. 1. Estimation method.
Csizmadia, F; Tsantili-Kakoulidou, A; Panderi, I; Darvas, F
1997-07-01
A method has been developed for the estimation of the distribution coefficient (D), which considers the microspecies of a compound. D is calculated from the microscopic dissociation constants (microconstants), the partition coefficients of the microspecies, and the counterion concentration. A general equation for the calculation of D at a given pH is presented. The microconstants are calculated from the structure using Hammett and Taft equations. The partition coefficients of the ionic microspecies are predicted by empirical equations using the dissociation constants and the partition coefficient of the uncharged species, which are estimated from the structure by a Linear Free Energy Relationship method. The algorithm is implemented in a program module called PrologD.
Kim, Min Kyung; Lane, Anatoliy; Kelley, James J; Lun, Desmond S
2016-01-01
Several methods have been developed to predict system-wide and condition-specific intracellular metabolic fluxes by integrating transcriptomic data with genome-scale metabolic models. While powerful in many settings, existing methods have several shortcomings, and it is unclear which method has the best accuracy in general because of limited validation against experimentally measured intracellular fluxes. We present a general optimization strategy for inferring intracellular metabolic flux distributions from transcriptomic data coupled with genome-scale metabolic reconstructions. It consists of two different template models called DC (determined carbon source model) and AC (all possible carbon sources model) and two different new methods called E-Flux2 (E-Flux method combined with minimization of l2 norm) and SPOT (Simplified Pearson cOrrelation with Transcriptomic data), which can be chosen and combined depending on the availability of knowledge on carbon source or objective function. This enables us to simulate a broad range of experimental conditions. We examined E. coli and S. cerevisiae as representative prokaryotic and eukaryotic microorganisms respectively. The predictive accuracy of our algorithm was validated by calculating the uncentered Pearson correlation between predicted fluxes and measured fluxes. To this end, we compiled 20 experimental conditions (11 in E. coli and 9 in S. cerevisiae), of transcriptome measurements coupled with corresponding central carbon metabolism intracellular flux measurements determined by 13C metabolic flux analysis (13C-MFA), which is the largest dataset assembled to date for the purpose of validating inference methods for predicting intracellular fluxes. In both organisms, our method achieves an average correlation coefficient ranging from 0.59 to 0.87, outperforming a representative sample of competing methods. Easy-to-use implementations of E-Flux2 and SPOT are available as part of the open-source package MOST (http://most.ccib.rutgers.edu/). Our method represents a significant advance over existing methods for inferring intracellular metabolic flux from transcriptomic data. It not only achieves higher accuracy, but it also combines into a single method a number of other desirable characteristics including applicability to a wide range of experimental conditions, production of a unique solution, fast running time, and the availability of a user-friendly implementation.
NASA Astrophysics Data System (ADS)
Cai, Lei; Yuan, Wei; Zhang, Zhou; He, Lin; Chou, Kuo-Chen
2016-11-01
Four popular somatic single nucleotide variant (SNV) calling methods (Varscan, SomaticSniper, Strelka and MuTect2) were carefully evaluated on the real whole exome sequencing (WES, depth of ~50X) and ultra-deep targeted sequencing (UDT-Seq, depth of ~370X) data. The four tools returned poor consensus on candidates (only 20% of calls were with multiple hits by the callers). For both WES and UDT-Seq, MuTect2 and Strelka obtained the largest proportion of COSMIC entries as well as the lowest rate of dbSNP presence and high-alternative-alleles-in-control calls, demonstrating their superior sensitivity and accuracy. Combining different callers does increase reliability of candidates, but narrows the list down to very limited range of tumor read depth and variant allele frequency. Calling SNV on UDT-Seq data, which were of much higher read-depth, discovered additional true-positive variations, despite an even more tremendous growth in false positive predictions. Our findings not only provide valuable benchmark for state-of-the-art SNV calling methods, but also shed light on the access to more accurate SNV identification in the future.
A Performance Weighted Collaborative Filtering algorithm for personalized radiology education.
Lin, Hongli; Yang, Xuedong; Wang, Weisheng; Luo, Jiawei
2014-10-01
Devising an accurate prediction algorithm that can predict the difficulty level of cases for individuals and then selects suitable cases for them is essential to the development of a personalized training system. In this paper, we propose a novel approach, called Performance Weighted Collaborative Filtering (PWCF), to predict the difficulty level of each case for individuals. The main idea of PWCF is to assign an optimal weight to each rating used for predicting the difficulty level of a target case for a trainee, rather than using an equal weight for all ratings as in traditional collaborative filtering methods. The assigned weight is a function of the performance level of the trainee at which the rating was made. The PWCF method and the traditional method are compared using two datasets. The experimental data are then evaluated by means of the MAE metric. Our experimental results show that PWCF outperforms the traditional methods by 8.12% and 17.05%, respectively, over the two datasets, in terms of prediction precision. This suggests that PWCF is a viable method for the development of personalized training systems in radiology education. Copyright © 2014. Published by Elsevier Inc.
GAPIT: genome association and prediction integrated tool.
Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu
2012-09-15
Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.
Shabangu, Fannie W.; Yemane, Dawit; Stafford, Kathleen M.; Ensor, Paul; Findlay, Ken P.
2017-01-01
Harvested to perilously low numbers by commercial whaling during the past century, the large scale response of Antarctic blue whales Balaenoptera musculus intermedia to environmental variability is poorly understood. This study uses acoustic data collected from 586 sonobuoys deployed in the austral summers of 1997 through 2009, south of 38°S, coupled with visual observations of blue whales during the IWC SOWER line-transect surveys. The characteristic Z-call and D-call of Antarctic blue whales were detected using an automated detection template and visual verification method. Using a random forest model, we showed the environmental preferences pattern, spatial occurrence and acoustic behaviour of Antarctic blue whales. Distance to the southern boundary of the Antarctic Circumpolar Current (SBACC), latitude and distance from the nearest Antarctic shores were the main geographic predictors of blue whale call occurrence. Satellite-derived sea surface height, sea surface temperature, and productivity (chlorophyll-a) were the most important environmental predictors of blue whale call occurrence. Call rates of D-calls were strongly predicted by the location of the SBACC, latitude and visually detected number of whales in an area while call rates of Z-call were predicted by the SBACC, latitude and longitude. Satellite-derived sea surface height, wind stress, wind direction, water depth, sea surface temperatures, chlorophyll-a and wind speed were important environmental predictors of blue whale call rates in the Southern Ocean. Blue whale call occurrence and call rates varied significantly in response to inter-annual and long term variability of those environmental predictors. Our results identify the response of Antarctic blue whales to inter-annual variability in environmental conditions and highlighted potential suitable habitats for this population. Such emerging knowledge about the acoustic behaviour, environmental and habitat preferences of Antarctic blue whales is important in improving the management and conservation of this highly depleted species. PMID:28222124
Shabangu, Fannie W; Yemane, Dawit; Stafford, Kathleen M; Ensor, Paul; Findlay, Ken P
2017-01-01
Harvested to perilously low numbers by commercial whaling during the past century, the large scale response of Antarctic blue whales Balaenoptera musculus intermedia to environmental variability is poorly understood. This study uses acoustic data collected from 586 sonobuoys deployed in the austral summers of 1997 through 2009, south of 38°S, coupled with visual observations of blue whales during the IWC SOWER line-transect surveys. The characteristic Z-call and D-call of Antarctic blue whales were detected using an automated detection template and visual verification method. Using a random forest model, we showed the environmental preferences pattern, spatial occurrence and acoustic behaviour of Antarctic blue whales. Distance to the southern boundary of the Antarctic Circumpolar Current (SBACC), latitude and distance from the nearest Antarctic shores were the main geographic predictors of blue whale call occurrence. Satellite-derived sea surface height, sea surface temperature, and productivity (chlorophyll-a) were the most important environmental predictors of blue whale call occurrence. Call rates of D-calls were strongly predicted by the location of the SBACC, latitude and visually detected number of whales in an area while call rates of Z-call were predicted by the SBACC, latitude and longitude. Satellite-derived sea surface height, wind stress, wind direction, water depth, sea surface temperatures, chlorophyll-a and wind speed were important environmental predictors of blue whale call rates in the Southern Ocean. Blue whale call occurrence and call rates varied significantly in response to inter-annual and long term variability of those environmental predictors. Our results identify the response of Antarctic blue whales to inter-annual variability in environmental conditions and highlighted potential suitable habitats for this population. Such emerging knowledge about the acoustic behaviour, environmental and habitat preferences of Antarctic blue whales is important in improving the management and conservation of this highly depleted species.
Efficient prediction of human protein-protein interactions at a global scale.
Schoenrock, Andrew; Samanfar, Bahram; Pitre, Sylvain; Hooshyar, Mohsen; Jin, Ke; Phillips, Charles A; Wang, Hui; Phanse, Sadhna; Omidi, Katayoun; Gui, Yuan; Alamgir, Md; Wong, Alex; Barrenäs, Fredrik; Babu, Mohan; Benson, Mikael; Langston, Michael A; Green, James R; Dehne, Frank; Golshani, Ashkan
2014-12-10
Our knowledge of global protein-protein interaction (PPI) networks in complex organisms such as humans is hindered by technical limitations of current methods. On the basis of short co-occurring polypeptide regions, we developed a tool called MP-PIPE capable of predicting a global human PPI network within 3 months. With a recall of 23% at a precision of 82.1%, we predicted 172,132 putative PPIs. We demonstrate the usefulness of these predictions through a range of experiments. The speed and accuracy associated with MP-PIPE can make this a potential tool to study individual human PPI networks (from genomic sequences alone) for personalized medicine.
NASA Astrophysics Data System (ADS)
Kayasith, Prakasith; Theeramunkong, Thanaruk
It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
Predicting Robust Vocabulary Growth from Measures of Incremental Learning
ERIC Educational Resources Information Center
Frishkoff, Gwen A.; Perfetti, Charles A.; Collins-Thompson, Kevyn
2011-01-01
We report a study of incremental learning of new word meanings over multiple episodes. A new method called MESA (Markov Estimation of Semantic Association) tracked this learning through the automated assessment of learner-generated definitions. The multiple word learning episodes varied in the strength of contextual constraint provided by…
Confidence Wagering during Mathematics and Science Testing
ERIC Educational Resources Information Center
Jack, Brady Michael; Liu, Chia-Ju; Chiu, Hoan-Lin; Shymansky, James A.
2009-01-01
This proposal presents the results of a case study involving five 8th grade Taiwanese classes, two mathematics and three science classes. These classes used a new method of testing called confidence wagering. This paper advocates the position that confidence wagering can predict the accuracy of a student's test answer selection during…
Connecting clinical and actuarial prediction with rule-based methods.
Fokkema, Marjolein; Smits, Niels; Kelderman, Henk; Penninx, Brenda W J H
2015-06-01
Meta-analyses comparing the accuracy of clinical versus actuarial prediction have shown actuarial methods to outperform clinical methods, on average. However, actuarial methods are still not widely used in clinical practice, and there has been a call for the development of actuarial prediction methods for clinical practice. We argue that rule-based methods may be more useful than the linear main effect models usually employed in prediction studies, from a data and decision analytic as well as a practical perspective. In addition, decision rules derived with rule-based methods can be represented as fast and frugal trees, which, unlike main effects models, can be used in a sequential fashion, reducing the number of cues that have to be evaluated before making a prediction. We illustrate the usability of rule-based methods by applying RuleFit, an algorithm for deriving decision rules for classification and regression problems, to a dataset on prediction of the course of depressive and anxiety disorders from Penninx et al. (2011). The RuleFit algorithm provided a model consisting of 2 simple decision rules, requiring evaluation of only 2 to 4 cues. Predictive accuracy of the 2-rule model was very similar to that of a logistic regression model incorporating 20 predictor variables, originally applied to the dataset. In addition, the 2-rule model required, on average, evaluation of only 3 cues. Therefore, the RuleFit algorithm appears to be a promising method for creating decision tools that are less time consuming and easier to apply in psychological practice, and with accuracy comparable to traditional actuarial methods. (c) 2015 APA, all rights reserved).
Prediction of protein secondary structure content for the twilight zone sequences.
Homaeian, Leila; Kurgan, Lukasz A; Ruan, Jishou; Cios, Krzysztof J; Chen, Ke
2007-11-15
Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure. (c) 2007 Wiley-Liss, Inc.
Bhat, Somanath; Polanowski, Andrea M; Double, Mike C; Jarman, Simon N; Emslie, Kerry R
2012-01-01
Recent advances in nanofluidic technologies have enabled the use of Integrated Fluidic Circuits (IFCs) for high-throughput Single Nucleotide Polymorphism (SNP) genotyping (GT). In this study, we implemented and validated a relatively low cost nanofluidic system for SNP-GT with and without Specific Target Amplification (STA). As proof of principle, we first validated the effect of input DNA copy number on genotype call rate using well characterised, digital PCR (dPCR) quantified human genomic DNA samples and then implemented the validated method to genotype 45 SNPs in the humpback whale, Megaptera novaeangliae, nuclear genome. When STA was not incorporated, for a homozygous human DNA sample, reaction chambers containing, on average 9 to 97 copies, showed 100% call rate and accuracy. Below 9 copies, the call rate decreased, and at one copy it was 40%. For a heterozygous human DNA sample, the call rate decreased from 100% to 21% when predicted copies per reaction chamber decreased from 38 copies to one copy. The tightness of genotype clusters on a scatter plot also decreased. In contrast, when the same samples were subjected to STA prior to genotyping a call rate and a call accuracy of 100% were achieved. Our results demonstrate that low input DNA copy number affects the quality of data generated, in particular for a heterozygous sample. Similar to human genomic DNA, a call rate and a call accuracy of 100% was achieved with whale genomic DNA samples following multiplex STA using either 15 or 45 SNP-GT assays. These calls were 100% concordant with their true genotypes determined by an independent method, suggesting that the nanofluidic system is a reliable platform for executing call rates with high accuracy and concordance in genomic sequences derived from biological tissue.
Prediction and Validation of Disease Genes Using HeteSim Scores.
Zeng, Xiangxiang; Liao, Yuanlu; Liu, Yuansheng; Zou, Quan
2017-01-01
Deciphering the gene disease association is an important goal in biomedical research. In this paper, we use a novel relevance measure, called HeteSim, to prioritize candidate disease genes. Two methods based on heterogeneous networks constructed using protein-protein interaction, gene-phenotype associations, and phenotype-phenotype similarity, are presented. In HeteSim_MultiPath (HSMP), HeteSim scores of different paths are combined with a constant that dampens the contributions of longer paths. In HeteSim_SVM (HSSVM), HeteSim scores are combined with a machine learning method. The 3-fold experiments show that our non-machine learning method HSMP performs better than the existing non-machine learning methods, our machine learning method HSSVM obtains similar accuracy with the best existing machine learning method CATAPULT. From the analysis of the top 10 predicted genes for different diseases, we found that HSSVM avoid the disadvantage of the existing machine learning based methods, which always predict similar genes for different diseases. The data sets and Matlab code for the two methods are freely available for download at http://lab.malab.cn/data/HeteSim/index.jsp.
Evaluating the evaluation of cancer driver genes
Tokheim, Collin J.; Papadopoulos, Nickolas; Kinzler, Kenneth W.; Vogelstein, Bert; Karchin, Rachel
2016-01-01
Sequencing has identified millions of somatic mutations in human cancers, but distinguishing cancer driver genes remains a major challenge. Numerous methods have been developed to identify driver genes, but evaluation of the performance of these methods is hindered by the lack of a gold standard, that is, bona fide driver gene mutations. Here, we establish an evaluation framework that can be applied to driver gene prediction methods. We used this framework to compare the performance of eight such methods. One of these methods, described here, incorporated a machine-learning–based ratiometric approach. We show that the driver genes predicted by each of the eight methods vary widely. Moreover, the P values reported by several of the methods were inconsistent with the uniform values expected, thus calling into question the assumptions that were used to generate them. Finally, we evaluated the potential effects of unexplained variability in mutation rates on false-positive driver gene predictions. Our analysis points to the strengths and weaknesses of each of the currently available methods and offers guidance for improving them in the future. PMID:27911828
Li, Han; Liu, Yashu; Gong, Pinghua; Zhang, Changshui; Ye, Jieping
2014-01-01
Identifying patients with Mild Cognitive Impairment (MCI) who are likely to convert to dementia has recently attracted increasing attention in Alzheimer's disease (AD) research. An accurate prediction of conversion from MCI to AD can aid clinicians to initiate treatments at early stage and monitor their effectiveness. However, existing prediction systems based on the original biosignatures are not satisfactory. In this paper, we propose to fit the prediction models using pairwise biosignature interactions, thus capturing higher-order relationship among biosignatures. Specifically, we employ hierarchical constraints and sparsity regularization to prune the high-dimensional input features. Based on the significant biosignatures and underlying interactions identified, we build classifiers to predict the conversion probability based on the selected features. We further analyze the underlying interaction effects of different biosignatures based on the so-called stable expectation scores. We have used 293 MCI subjects from Alzheimer's Disease Neuroimaging Initiative (ADNI) database that have MRI measurements at the baseline to evaluate the effectiveness of the proposed method. Our proposed method achieves better classification performance than state-of-the-art methods. Moreover, we discover several significant interactions predictive of MCI-to-AD conversion. These results shed light on improving the prediction performance using interaction features. PMID:24416143
PPCM: Combing multiple classifiers to improve protein-protein interaction prediction
Yao, Jianzhuang; Guo, Hong; Yang, Xiaohan
2015-08-01
Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using anmore » assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. Ultimately, this pipeline will be useful for predicting PPI in nonmodel species.« less
Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data
Ching, Travers; Zhu, Xun
2018-01-01
Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet. PMID:29634719
Modifications of the PCPT method for HJB equations
NASA Astrophysics Data System (ADS)
Kossaczký, I.; Ehrhardt, M.; Günther, M.
2016-10-01
In this paper we will revisit the modification of the piecewise constant policy timestepping (PCPT) method for solving Hamilton-Jacobi-Bellman (HJB) equations. This modification is called piecewise predicted policy timestepping (PPPT) method and if properly used, it may be significantly faster. We will quickly recapitulate the algorithms of PCPT, PPPT methods and of the classical implicit method and apply them on a passport option pricing problem with non-standard payoff. We will present modifications needed to solve this problem effectively with the PPPT method and compare the performance with the PCPT method and the classical implicit method.
Unified treatment of the luminosity distance in cosmology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, Jaiyul; Scaccabarozzi, Fulvio, E-mail: jyoo@physik.uzh.ch, E-mail: fulvio@physik.uzh.ch
Comparing the luminosity distance measurements to its theoretical predictions is one of the cornerstones in establishing the modern cosmology. However, as shown in Biern and Yoo, its theoretical predictions in literature are often plagued with infrared divergences and gauge-dependences. This trend calls into question the sanity of the methods used to derive the luminosity distance. Here we critically investigate four different methods—the geometric approach, the Sachs approach, the Jacobi mapping approach, and the geodesic light cone (GLC) approach to modeling the luminosity distance, and we present a unified treatment of such methods, facilitating the comparison among the methods and checkingmore » their sanity. All of these four methods, if exercised properly, can be used to reproduce the correct description of the luminosity distance.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Ray -Bing; Wang, Weichung; Jeff Wu, C. F.
A numerical method, called OBSM, was recently proposed which employs overcomplete basis functions to achieve sparse representations. While the method can handle non-stationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach which first imposes a normal prior on the large space of linear coefficients, then applies the MCMC algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction ofmore » sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. As a result, numerical studies show that the proposed schemes are capable of solving problems of positive point identification, optimization, and surrogate fitting.« less
Chen, Ray -Bing; Wang, Weichung; Jeff Wu, C. F.
2017-04-12
A numerical method, called OBSM, was recently proposed which employs overcomplete basis functions to achieve sparse representations. While the method can handle non-stationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach which first imposes a normal prior on the large space of linear coefficients, then applies the MCMC algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction ofmore » sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. As a result, numerical studies show that the proposed schemes are capable of solving problems of positive point identification, optimization, and surrogate fitting.« less
Efficient biprediction decision scheme for fast high efficiency video coding encoding
NASA Astrophysics Data System (ADS)
Park, Sang-hyo; Lee, Seung-ho; Jang, Euee S.; Jun, Dongsan; Kang, Jung-Won
2016-11-01
An efficient biprediction decision scheme of high efficiency video coding (HEVC) is proposed for fast-encoding applications. For low-delay video applications, bidirectional prediction can be used to increase compression performance efficiently with previous reference frames. However, at the same time, the computational complexity of the HEVC encoder is significantly increased due to the additional biprediction search. Although a some research has attempted to reduce this complexity, whether the prediction is strongly related to both motion complexity and prediction modes in a coding unit has not yet been investigated. A method that avoids most compression-inefficient search points is proposed so that the computational complexity of the motion estimation process can be dramatically decreased. To determine if biprediction is critical, the proposed method exploits the stochastic correlation of the context of prediction units (PUs): the direction of a PU and the accuracy of a motion vector. Through experimental results, the proposed method showed that the time complexity of biprediction can be reduced to 30% on average, outperforming existing methods in view of encoding time, number of function calls, and memory access.
Lin, Hongli; Yang, Xuedong; Wang, Weisheng
2014-08-01
Devising a method that can select cases based on the performance levels of trainees and the characteristics of cases is essential for developing a personalized training program in radiology education. In this paper, we propose a novel hybrid prediction algorithm called content-boosted collaborative filtering (CBCF) to predict the difficulty level of each case for each trainee. The CBCF utilizes a content-based filtering (CBF) method to enhance existing trainee-case ratings data and then provides final predictions through a collaborative filtering (CF) algorithm. The CBCF algorithm incorporates the advantages of both CBF and CF, while not inheriting the disadvantages of either. The CBCF method is compared with the pure CBF and pure CF approaches using three datasets. The experimental data are then evaluated in terms of the MAE metric. Our experimental results show that the CBCF outperforms the pure CBF and CF methods by 13.33 and 12.17 %, respectively, in terms of prediction precision. This also suggests that the CBCF can be used in the development of personalized training systems in radiology education.
Scared and less noisy: glucocorticoids are associated with alarm call entropy
Blumstein, Daniel T.; Chi, Yvonne Y.
2012-01-01
The nonlinearity and arousal hypothesis predicts that highly aroused mammals will produce nonlinear, noisy vocalizations. We tested this prediction by measuring faecal glucocorticoid metabolites (GCMs) in adult yellow-bellied marmots (Marmota flaviventris), and asking if variation in GCMs was positively correlated with Wiener entropy—a measure of noise. Contrary to our prediction, we found a significant negative relationship: marmots with more faecal GCMs produced calls with less noise than those with lower levels of GCMs. A previous study suggested that glucocorticoids modulate the probability that a marmot will emit a call. This study suggests that, like some other species, calls emitted from highly aroused individuals are less noisy. Glucocorticoids thus play an important, yet underappreciated role, in alarm call production. PMID:21976625
Scared and less noisy: glucocorticoids are associated with alarm call entropy.
Blumstein, Daniel T; Chi, Yvonne Y
2012-04-23
The nonlinearity and arousal hypothesis predicts that highly aroused mammals will produce nonlinear, noisy vocalizations. We tested this prediction by measuring faecal glucocorticoid metabolites (GCMs) in adult yellow-bellied marmots (Marmota flaviventris), and asking if variation in GCMs was positively correlated with Wiener entropy-a measure of noise. Contrary to our prediction, we found a significant negative relationship: marmots with more faecal GCMs produced calls with less noise than those with lower levels of GCMs. A previous study suggested that glucocorticoids modulate the probability that a marmot will emit a call. This study suggests that, like some other species, calls emitted from highly aroused individuals are less noisy. Glucocorticoids thus play an important, yet underappreciated role, in alarm call production.
Improved model predictive control of resistive wall modes by error field estimator in EXTRAP T2R
NASA Astrophysics Data System (ADS)
Setiadi, A. C.; Brunsell, P. R.; Frassinetti, L.
2016-12-01
Many implementations of a model-based approach for toroidal plasma have shown better control performance compared to the conventional type of feedback controller. One prerequisite of model-based control is the availability of a control oriented model. This model can be obtained empirically through a systematic procedure called system identification. Such a model is used in this work to design a model predictive controller to stabilize multiple resistive wall modes in EXTRAP T2R reversed-field pinch. Model predictive control is an advanced control method that can optimize the future behaviour of a system. Furthermore, this paper will discuss an additional use of the empirical model which is to estimate the error field in EXTRAP T2R. Two potential methods are discussed that can estimate the error field. The error field estimator is then combined with the model predictive control and yields better radial magnetic field suppression.
Gene and translation initiation site prediction in metagenomic sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John
2012-01-01
Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translationmore » initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.« less
miRNAFold: a web server for fast miRNA precursor prediction in genomes.
Tav, Christophe; Tempel, Sébastien; Poligny, Laurent; Tahi, Fariza
2016-07-08
Computational methods are required for prediction of non-coding RNAs (ncRNAs), which are involved in many biological processes, especially at post-transcriptional level. Among these ncRNAs, miRNAs have been largely studied and biologists need efficient and fast tools for their identification. In particular, ab initio methods are usually required when predicting novel miRNAs. Here we present a web server dedicated for miRNA precursors identification at a large scale in genomes. It is based on an algorithm called miRNAFold that allows predicting miRNA hairpin structures quickly with high sensitivity. miRNAFold is implemented as a web server with an intuitive and user-friendly interface, as well as a standalone version. The web server is freely available at: http://EvryRNA.ibisc.univ-evry.fr/miRNAFold. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Epidemiologic research using probabilistic outcome definitions.
Cai, Bing; Hennessy, Sean; Lo Re, Vincent; Small, Dylan S
2015-01-01
Epidemiologic studies using electronic healthcare data often define the presence or absence of binary clinical outcomes by using algorithms with imperfect specificity, sensitivity, and positive predictive value. This results in misclassification and bias in study results. We describe and evaluate a new method called probabilistic outcome definition (POD) that uses logistic regression to estimate the probability of a clinical outcome using multiple potential algorithms and then uses multiple imputation to make valid inferences about the risk ratio or other epidemiologic parameters of interest. We conducted a simulation to evaluate the performance of the POD method with two variables that can predict the true outcome and compared the POD method with the conventional method. The simulation results showed that when the true risk ratio is equal to 1.0 (null), the conventional method based on a binary outcome provides unbiased estimates. However, when the risk ratio is not equal to 1.0, the traditional method, either using one predictive variable or both predictive variables to define the outcome, is biased when the positive predictive value is <100%, and the bias is very severe when the sensitivity or positive predictive value is poor (less than 0.75 in our simulation). In contrast, the POD method provides unbiased estimates of the risk ratio both when this measure of effect is equal to 1.0 and not equal to 1.0. Even when the sensitivity and positive predictive value are low, the POD method continues to provide unbiased estimates of the risk ratio. The POD method provides an improved way to define outcomes in database research. This method has a major advantage over the conventional method in that it provided unbiased estimates of risk ratios and it is easy to use. Copyright © 2014 John Wiley & Sons, Ltd.
Text Mining Improves Prediction of Protein Functional Sites
Cohn, Judith D.; Ravikumar, Komandur E.
2012-01-01
We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388
Ramirez, Magaly; Jin, Haomiao; Ell, Kathleen; Gross-Schulman, Sandra; Myerchin Sklaroff, Laura; Guterman, Jeffrey
2016-01-01
Background Remote patient monitoring is increasingly integrated into health care delivery to expand access and increase effectiveness. Automation can add efficiency to remote monitoring, but patient acceptance of automated tools is critical for success. From 2010 to 2013, the Diabetes-Depression Care-management Adoption Trial (DCAT)–a quasi-experimental comparative effectiveness research trial aimed at accelerating the adoption of collaborative depression care in a safety-net health care system–tested a fully automated telephonic assessment (ATA) depression monitoring system serving low-income patients with diabetes. Objective The aim of this study was to determine patient acceptance of ATA calls over time, and to identify factors predicting long-term patient acceptance of ATA calls. Methods We conducted two analyses using data from the DCAT technology-facilitated care arm, in which for 12 months the ATA system periodically assessed depression symptoms, monitored treatment adherence, prompted self-care behaviors, and inquired about patients’ needs for provider contact. Patients received assessments at 6, 12, and 18 months using Likert-scale measures of willingness to use ATA calls, preferred mode of reach, perceived ease of use, usefulness, nonintrusiveness, privacy/security, and long-term usefulness. For the first analysis (patient acceptance over time), we computed descriptive statistics of these measures. In the second analysis (predictive factors), we collapsed patients into two groups: those reporting “high” versus “low” willingness to use ATA calls. To compare them, we used independent t tests for continuous variables and Pearson chi-square tests for categorical variables. Next, we jointly entered independent factors found to be significantly associated with 18-month willingness to use ATA calls at the univariate level into a logistic regression model with backward selection to identify predictive factors. We performed a final logistic regression model with the identified significant predictive factors and reported the odds ratio estimates and 95% confidence intervals. Results At 6 and 12 months, respectively, 89.6% (69/77) and 63.7% (49/77) of patients “agreed” or “strongly agreed” that they would be willing to use ATA calls in the future. At 18 months, 51.0% (64/125) of patients perceived ATA calls as useful and 59.7% (46/77) were willing to use the technology. Moreover, in the first 6 months, most patients reported that ATA calls felt private/secure (75.9%, 82/108) and were easy to use (86.2%, 94/109), useful (65.1%, 71/109), and nonintrusive (87.2%, 95/109). Perceived usefulness, however, decreased to 54.1% (59/109) in the second 6 months of the trial. Factors predicting willingness to use ATA calls at the 18-month follow-up were perceived privacy/security and long-term perceived usefulness of ATA calls. No patient characteristics were significant predictors of long-term acceptance. Conclusions In the short term, patients are generally accepting of ATA calls for depression monitoring, with ATA call design and the care management intervention being primary factors influencing patient acceptance. Acceptance over the long term requires that the system be perceived as private/secure, and that it be constantly useful for patients’ needs of awareness of feelings, self-care reminders, and connectivity with health care providers. Trial Registration ClinicalTrials.gov NCT01781013; https://clinicaltrials.gov/ct2/show/NCT01781013 (Archived by WebCite at http://www.webcitation.org/6e7NGku56) PMID:26810139
Representing climate, disturbance, and vegetation interactions in landscape models
Robert E. Keane; Donald McKenzie; Donald A. Falk; Erica A.H. Smithwick; Carol Miller; Lara-Karena B. Kellogg
2015-01-01
The prospect of rapidly changing climates over the next century calls for methods to predict their effects on myriad, interactive ecosystem processes. Spatially explicit models that simulate ecosystem dynamics at fine (plant, stand) to coarse (regional, global) scales are indispensable tools for meeting this challenge under a variety of possible futures. A special...
NASA Technical Reports Server (NTRS)
Everhart, J. L.
1983-01-01
A program called FLEXWAL for calculating wall modifications for solid, adaptive-wall wind tunnels is presented. The method used is the iterative technique of NASA TP-2081 and is applicable to subsonic and transonic test conditions. The program usage, program listing, and a sample case are given.
Development and evaluation of the photoload sampling technique
Robert E. Keane; Laura J. Dickinson
2007-01-01
Wildland fire managers need better estimates of fuel loading so they can accurately predict potential fire behavior and effects of alternative fuel and ecosystem restoration treatments. This report presents the development and evaluation of a new fuel sampling method, called the photoload sampling technique, to quickly and accurately estimate loadings for six common...
Prediction and analysis of beta-turns in proteins by support vector machine.
Pham, Tho Hoan; Satou, Kenji; Ho, Tu Bao
2003-01-01
Tight turn has long been recognized as one of the three important features of proteins after the alpha-helix and beta-sheet. Tight turns play an important role in globular proteins from both the structural and functional points of view. More than 90% tight turns are beta-turns. Analysis and prediction of beta-turns in particular and tight turns in general are very useful for the design of new molecules such as drugs, pesticides, and antigens. In this paper, we introduce a support vector machine (SVM) approach to prediction and analysis of beta-turns. We have investigated two aspects of applying SVM to the prediction and analysis of beta-turns. First, we developed a new SVM method, called BTSVM, which predicts beta-turns of a protein from its sequence. The prediction results on the dataset of 426 non-homologous protein chains by sevenfold cross-validation technique showed that our method is superior to the other previous methods. Second, we analyzed how amino acid positions support (or prevent) the formation of beta-turns based on the "multivariable" classification model of a linear SVM. This model is more general than the other ones of previous statistical methods. Our analysis results are more comprehensive and easier to use than previously published analysis results.
Social Communication and Vocal Recognition in Free-Ranging Rhesus Monkeys
NASA Astrophysics Data System (ADS)
Rendall, Christopher Andrew
Kinship and individual identity are key determinants of primate sociality, and the capacity for vocal recognition of individuals and kin is hypothesized to be an important adaptation facilitating intra-group social communication. Research was conducted on adult female rhesus monkeys on Cayo Santiago, Puerto Rico to test this hypothesis for three acoustically distinct calls characterized by varying selective pressures on communicating identity: coos (contact calls), grunts (close range social calls), and noisy screams (agonistic recruitment calls). Vocalization playback experiments confirmed a capacity for both individual and kin recognition of coos, but not screams (grunts were not tested). Acoustic analyses, using traditional spectrographic methods as well as linear predictive coding techniques, indicated that coos (but not grunts or screams) were highly distinctive, and that the effects of vocal tract filtering--formants --contributed more to statistical discriminations of both individuals and kin groups than did temporal or laryngeal source features. Formants were identified from very short (23 ms.) segments of coos and were stable within calls, indicating that formant cues to individual and kin identity were available throughout a call. This aspect of formant cues is predicted to be an especially important design feature for signaling identity efficiently in complex acoustic environments. Results of playback experiments involving manipulated coo stimuli provided preliminary perceptual support for the statistical inference that formant cues take precedence in facilitating vocal recognition. The similarity of formants among female kin suggested a mechanism for the development of matrilineal vocal signatures from the genetic and environmental determinants of vocal tract morphology shared among relatives. The fact that screams --calls strongly expected to communicate identity--were not individually distinctive nor recognized suggested the possibility that their acoustic structure and role in signaling identity might be constrained by functional or morphological design requirements associated with their role in signaling submission.
MHC2NNZ: A novel peptide binding prediction approach for HLA DQ molecules
NASA Astrophysics Data System (ADS)
Xie, Jiang; Zeng, Xu; Lu, Dongfang; Liu, Zhixiang; Wang, Jiao
2017-07-01
The major histocompatibility complex class II (MHC-II) molecule plays a crucial role in immunology. Computational prediction of MHC-II binding peptides can help researchers understand the mechanism of immune systems and design vaccines. Most of the prediction algorithms for MHC-II to date have made large efforts in human leukocyte antigen (HLA, the name of MHC in Human) molecules encoded in the DR locus. However, HLA DQ molecules are equally important and have only been made less progress because it is more difficult to handle them experimentally. In this study, we propose an artificial neural network-based approach called MHC2NNZ to predict peptides binding to HLA DQ molecules. Unlike previous artificial neural network-based methods, MHC2NNZ not only considers sequence similarity features but also captures the chemical and physical properties, and a novel method incorporating these properties is proposed to represent peptide flanking regions (PFR). Furthermore, MHC2NNZ improves the prediction accuracy by combining with amino acid preference at more specific positions of the peptides binding core. By evaluating on 3549 peptides binding to six most frequent HLA DQ molecules, MHC2NNZ is demonstrated to outperform other state-of-the-art MHC-II prediction methods.
A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions
Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A
2006-01-01
The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943
Sun, Xiaochun; Ma, Ping; Mumm, Rita H
2012-01-01
Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression.
Sun, Xiaochun; Ma, Ping; Mumm, Rita H.
2012-01-01
Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression. PMID:23226325
An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Yan, Gui-Ying; Hu, Ji-Pu
2016-10-01
Predicting protein-protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high-throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM-BiGP that combines the relevance vector machine (RVM) model and Bi-gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi-gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five-fold cross-validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-BiGP method is significantly better than the SVM-based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM-BiGP-PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/. © 2016 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Tracking employment shocks using mobile phone data
Toole, Jameson L.; Lin, Yu-Ru; Muehlegger, Erich; Shoag, Daniel; González, Marta C.; Lazer, David
2015-01-01
Can data from mobile phones be used to observe economic shocks and their consequences at multiple scales? Here we present novel methods to detect mass layoffs, identify individuals affected by them and predict changes in aggregate unemployment rates using call detail records (CDRs) from mobile phones. Using the closure of a large manufacturing plant as a case study, we first describe a structural break model to correctly detect the date of a mass layoff and estimate its size. We then use a Bayesian classification model to identify affected individuals by observing changes in calling behaviour following the plant's closure. For these affected individuals, we observe significant declines in social behaviour and mobility following job loss. Using the features identified at the micro level, we show that the same changes in these calling behaviours, aggregated at the regional level, can improve forecasts of macro unemployment rates. These methods and results highlight promise of new data resources to measure microeconomic behaviour and improve estimates of critical economic indicators. PMID:26018965
Fletcher, Timothy L; Popelier, Paul L A
2016-06-14
A machine learning method called kriging is applied to the set of all 20 naturally occurring amino acids. Kriging models are built that predict electrostatic multipole moments for all topological atoms in any amino acid based on molecular geometry only. These models then predict molecular electrostatic interaction energies. On the basis of 200 unseen test geometries for each amino acid, no amino acid shows a mean prediction error above 5.3 kJ mol(-1), while the lowest error observed is 2.8 kJ mol(-1). The mean error across the entire set is only 4.2 kJ mol(-1) (or 1 kcal mol(-1)). Charged systems are created by protonating or deprotonating selected amino acids, and these show no significant deviation in prediction error over their neutral counterparts. Similarly, the proposed methodology can also handle amino acids with aromatic side chains, without the need for modification. Thus, we present a generic method capable of accurately capturing multipolar polarizable electrostatics in amino acids.
Object-color-signal prediction using wraparound Gaussian metamers.
Mirzaei, Hamidreza; Funt, Brian
2014-07-01
Alexander Logvinenko introduced an object-color atlas based on idealized reflectances called rectangular metamers in 2009. For a given color signal, the atlas specifies a unique reflectance that is metameric to it under the given illuminant. The atlas is complete and illuminant invariant, but not possible to implement in practice. He later introduced a parametric representation of the object-color atlas based on smoother "wraparound Gaussian" functions. In this paper, these wraparound Gaussians are used in predicting illuminant-induced color signal changes. The method proposed in this paper is based on computationally "relighting" that reflectance to determine what its color signal would be under any other illuminant. Since that reflectance is in the metamer set the prediction is also physically realizable, which cannot be guaranteed for predictions obtained via von Kries scaling. Testing on Munsell spectra and a multispectral image shows that the proposed method outperforms the predictions of both those based on von Kries scaling and those based on the Bradford transform.
Advances in the use of observed spatial patterns of catchment hydrological response
NASA Astrophysics Data System (ADS)
Grayson, Rodger B.; Blöschl, Günter; Western, Andrew W.; McMahon, Thomas A.
Over the past two decades there have been repeated calls for the collection of new data for use in developing hydrological science. The last few years have begun to bear fruit from the seeds sown by these calls, through increases in the availability and utility of remote sensing data, as well as the execution of campaigns in research catchments aimed at providing new data for advancing hydrological understanding and predictive capability. In this paper we discuss some philosophical considerations related to model complexity, data availability and predictive performance, highlighting the potential of observed patterns in moving the science and practice of catchment hydrology forward. We then review advances that have arisen from recent work on spatial patterns, including in the characterisation of spatial structure and heterogeneity, and the use of patterns for developing, calibrating and testing distributed hydrological models. We illustrate progress via examples using observed patterns of snow cover, runoff occurrence and soil moisture. Methods for the comparison of patterns are presented, illustrating how they can be used to assess hydrologically important characteristics of model performance. These methods include point-to-point comparisons, spatial relationships between errors and landscape parameters, transects, and optimal local alignment. It is argued that the progress made to date augers well for future developments, but there is scope for improvements in several areas. These include better quantitative methods for pattern comparisons, better use of pattern information in data assimilation and modelling, and a call for improved archiving of data from field studies to assist in comparative studies for generalising results and developing fundamental understanding.
NASA Astrophysics Data System (ADS)
Prasetyo, S. Y. J.; Hartomo, K. D.
2018-01-01
The Spatial Plan of the Province of Central Java 2009-2029 identifies that most regencies or cities in Central Java Province are very vulnerable to landslide disaster. The data are also supported by other data from Indonesian Disaster Risk Index (In Indonesia called Indeks Risiko Bencana Indonesia) 2013 that suggest that some areas in Central Java Province exhibit a high risk of natural disasters. This research aims to develop an application architecture and analysis methodology in GIS to predict and to map rainfall distribution. We propose our GIS architectural application of “Multiplatform Architectural Spatiotemporal” and data analysis methods of “Triple Exponential Smoothing” and “Spatial Interpolation” as our significant scientific contribution. This research consists of 2 (two) parts, namely attribute data prediction using TES method and spatial data prediction using Inverse Distance Weight (IDW) method. We conduct our research in 19 subdistricts in the Boyolali Regency, Central Java Province, Indonesia. Our main research data is the biweekly rainfall data in 2000-2016 Climatology, Meteorology, and Geophysics Agency (In Indonesia called Badan Meteorologi, Klimatologi, dan Geofisika) of Central Java Province and Laboratory of Plant Disease Observations Region V Surakarta, Central Java. The application architecture and analytical methodology of “Multiplatform Architectural Spatiotemporal” and spatial data analysis methodology of “Triple Exponential Smoothing” and “Spatial Interpolation” can be developed as a GIS application framework of rainfall distribution for various applied fields. The comparison between the TES and IDW methods show that relative to time series prediction, spatial interpolation exhibit values that are approaching actual. Spatial interpolation is closer to actual data because computed values are the rainfall data of the nearest location or the neighbour of sample values. However, the IDW’s main weakness is that some area might exhibit the rainfall value of 0. The representation of 0 in the spatial interpolation is mainly caused by the absence of rainfall data in the nearest sample point or too far distance that produces smaller weight.
Habitat of calling blue and fin whales in the Southern California Bight
NASA Astrophysics Data System (ADS)
Sirovic, A.; Chou, E.; Roch, M. A.
2016-02-01
Northeast Pacific blue whale B calls and fin whale 20 Hz calls were detected from passive acoustic data collected over seven years at 16 sites in the Southern California Bight (SCB). Calling blue whales were most common in the coastal areas, during the summer and fall months. Fin whales began calling in fall and continued through winter, in the southcentral SCB. These data were used to develop habitat models of calling blue and fin whales in areas of high and low abundance in the SCB, using remotely sensed variables such as sea surface temperature, sea surface height, chlorophyll a, and primary productivity as model covariates. A random forest framework was used for variable selection and generalized additive models were developed to explain functional relationships, evaluate relative contribution of each significant variable, and investigate predictive abilities of models of calling whales. Seasonal component was an important feature of all models. Additionally, areas of high calling blue and fin whale abundance both had a positive relationship with the sea surface temperature. In areas of lower abundance, chlorophyll a concentration and primary productivity were important variables for blue whale models and sea surface height and primary productivity were significant covariates in fin whale models. Predictive models were generally better for predicting general trends than absolute values, but there was a large degree of variation in year-to-year predictability across different sites.
Yavaş, Gökhan; Koyutürk, Mehmet; Gould, Meetha P; McMahon, Sarah; LaFramboise, Thomas
2014-03-05
With the advent of paired-end high throughput sequencing, it is now possible to identify various types of structural variation on a genome-wide scale. Although many methods have been proposed for structural variation detection, most do not provide precise boundaries for identified variants. In this paper, we propose a new method, Distribution Based detection of Duplication Boundaries (DB2), for accurate detection of tandem duplication breakpoints, an important class of structural variation, with high precision and recall. Our computational experiments on simulated data show that DB2 outperforms state-of-the-art methods in terms of finding breakpoints of tandem duplications, with a higher positive predictive value (precision) in calling the duplications' presence. In particular, DB2's prediction of tandem duplications is correct 99% of the time even for very noisy data, while narrowing down the space of possible breakpoints within a margin of 15 to 20 bps on the average. Most of the existing methods provide boundaries in ranges that extend to hundreds of bases with lower precision values. Our method is also highly robust to varying properties of the sequencing library and to the sizes of the tandem duplications, as shown by its stable precision, recall and mean boundary mismatch performance. We demonstrate our method's efficacy using both simulated paired-end reads, and those generated from a melanoma sample and two ovarian cancer samples. Newly discovered tandem duplications are validated using PCR and Sanger sequencing. Our method, DB2, uses discordantly aligned reads, taking into account the distribution of fragment length to predict tandem duplications along with their breakpoints on a donor genome. The proposed method fine tunes the breakpoint calls by applying a novel probabilistic framework that incorporates the empirical fragment length distribution to score each feasible breakpoint. DB2 is implemented in Java programming language and is freely available at http://mendel.gene.cwru.edu/laframboiselab/software.php.
A comparison of fatigue life prediction methodologies for rotorcraft
NASA Technical Reports Server (NTRS)
Everett, R. A., Jr.
1990-01-01
Because of the current U.S. Army requirement that all new rotorcraft be designed to a 'six nines' reliability on fatigue life, this study was undertaken to assess the accuracy of the current safe life philosophy using the nominal stress Palmgrem-Miner linear cumulative damage rule to predict the fatigue life of rotorcraft dynamic components. It has been shown that this methodology can predict fatigue lives that differ from test lives by more than two orders of magnitude. A further objective of this work was to compare the accuracy of this methodology to another safe life method called the local strain approach as well as to a method which predicts fatigue life based solely on crack growth data. Spectrum fatigue tests were run on notched (k(sub t) = 3.2) specimens made of 4340 steel using the Felix/28 tests fairly well, being slightly on the unconservative side of the test data. The crack growth method, which is based on 'small crack' crack growth data and a crack-closure model, also predicted the fatigue lives very well with the predicted lives being slightly longer that the mean test lives but within the experimental scatter band. The crack growth model was also able to predict the change in test lives produced by the rainflow reconstructed spectra.
Prediction task guided representation learning of medical codes in EHR.
Cui, Liwen; Xie, Xiaolei; Shen, Zuojun
2018-06-18
There have been rapidly growing applications using machine learning models for predictive analytics in Electronic Health Records (EHR) to improve the quality of hospital services and the efficiency of healthcare resource utilization. A fundamental and crucial step in developing such models is to convert medical codes in EHR to feature vectors. These medical codes are used to represent diagnoses or procedures. Their vector representations have a tremendous impact on the performance of machine learning models. Recently, some researchers have utilized representation learning methods from Natural Language Processing (NLP) to learn vector representations of medical codes. However, most previous approaches are unsupervised, i.e. the generation of medical code vectors is independent from prediction tasks. Thus, the obtained feature vectors may be inappropriate for a specific prediction task. Moreover, unsupervised methods often require a lot of samples to obtain reliable results, but most practical problems have very limited patient samples. In this paper, we develop a new method called Prediction Task Guided Health Record Aggregation (PTGHRA), which aggregates health records guided by prediction tasks, to construct training corpus for various representation learning models. Compared with unsupervised approaches, representation learning models integrated with PTGHRA yield a significant improvement in predictive capability of generated medical code vectors, especially for limited training samples. Copyright © 2018. Published by Elsevier Inc.
Skoraczyński, G; Dittwald, P; Miasojedow, B; Szymkuć, S; Gajewska, E P; Grzybowski, B A; Gambin, A
2017-06-15
As machine learning/artificial intelligence algorithms are defeating chess masters and, most recently, GO champions, there is interest - and hope - that they will prove equally useful in assisting chemists in predicting outcomes of organic reactions. This paper demonstrates, however, that the applicability of machine learning to the problems of chemical reactivity over diverse types of chemistries remains limited - in particular, with the currently available chemical descriptors, fundamental mathematical theorems impose upper bounds on the accuracy with which raction yields and times can be predicted. Improving the performance of machine-learning methods calls for the development of fundamentally new chemical descriptors.
Acoustic radiosity for computation of sound fields in diffuse environments
NASA Astrophysics Data System (ADS)
Muehleisen, Ralph T.; Beamer, C. Walter
2002-05-01
The use of image and ray tracing methods (and variations thereof) for the computation of sound fields in rooms is relatively well developed. In their regime of validity, both methods work well for prediction in rooms with small amounts of diffraction and mostly specular reflection at the walls. While extensions to the method to include diffuse reflections and diffraction have been made, they are limited at best. In the fields of illumination and computer graphics the ray tracing and image methods are joined by another method called luminous radiative transfer or radiosity. In radiosity, an energy balance between surfaces is computed assuming diffuse reflection at the reflective surfaces. Because the interaction between surfaces is constant, much of the computation required for sound field prediction with multiple or moving source and receiver positions can be reduced. In acoustics the radiosity method has had little attention because of the problems of diffraction and specular reflection. The utility of radiosity in acoustics and an approach to a useful development of the method for acoustics will be presented. The method looks especially useful for sound level prediction in industrial and office environments. [Work supported by NSF.
NASA Astrophysics Data System (ADS)
Oh, Jung Hun; Kerns, Sarah; Ostrer, Harry; Powell, Simon N.; Rosenstein, Barry; Deasy, Joseph O.
2017-02-01
The biological cause of clinically observed variability of normal tissue damage following radiotherapy is poorly understood. We hypothesized that machine/statistical learning methods using single nucleotide polymorphism (SNP)-based genome-wide association studies (GWAS) would identify groups of patients of differing complication risk, and furthermore could be used to identify key biological sources of variability. We developed a novel learning algorithm, called pre-conditioned random forest regression (PRFR), to construct polygenic risk models using hundreds of SNPs, thereby capturing genomic features that confer small differential risk. Predictive models were trained and validated on a cohort of 368 prostate cancer patients for two post-radiotherapy clinical endpoints: late rectal bleeding and erectile dysfunction. The proposed method results in better predictive performance compared with existing computational methods. Gene ontology enrichment analysis and protein-protein interaction network analysis are used to identify key biological processes and proteins that were plausible based on other published studies. In conclusion, we confirm that novel machine learning methods can produce large predictive models (hundreds of SNPs), yielding clinically useful risk stratification models, as well as identifying important underlying biological processes in the radiation damage and tissue repair process. The methods are generally applicable to GWAS data and are not specific to radiotherapy endpoints.
Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul
2016-01-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257
Roff, D A; Crnokrak, P; Fairbairn, D J
2003-07-01
Quantitative genetic theory assumes that trade-offs are best represented by bivariate normal distributions. This theory predicts that selection will shift the trade-off function itself and not just move the mean trait values along a fixed trade-off line, as is generally assumed in optimality models. As a consequence, quantitative genetic theory predicts that the trade-off function will vary among populations in which at least one of the component traits itself varies. This prediction is tested using the trade-off between call duration and flight capability, as indexed by the mass of the dorsolateral flight muscles, in the macropterous morph of the sand cricket. We use four different populations of crickets that vary in the proportion of macropterous males (Lab = 33%, Florida = 29%, Bermuda = 72%, South Carolina = 80%). We find, as predicted, that there is significant variation in the intercept of the trade-off function but not the slope, supporting the hypothesis that trade-off functions are better represented as bivariate normal distributions rather than single lines. We also test the prediction from a quantitative genetical model of the evolution of wing dimorphism that the mean call duration of macropterous males will increase with the percentage of macropterous males in the population. This prediction is also supported. Finally, we estimate the probability of a macropterous male attracting a female, P, as a function of the relative time spent calling (P = time spent calling by macropterous male/(total time spent calling by both micropterous and macropterous male). We find that in the Lab and Florida populations the probability of a female selecting the macropterous male is equal to P, indicating that preference is due simply to relative call duration. But in the Bermuda and South Carolina populations the probability of a female selecting a macropterous male is less than P, indicating a preference for the micropterous male even after differences in call duration are accounted for.
Improving real-time efficiency of case-based reasoning for medical diagnosis.
Park, Yoon-Joo
2014-01-01
Conventional case-based reasoning (CBR) does not perform efficiently for high volume dataset because of case-retrieval time. Some previous researches overcome this problem by clustering a case-base into several small groups, and retrieve neighbors within a corresponding group to a target case. However, this approach generally produces less accurate predictive performances than the conventional CBR. This paper suggests a new case-based reasoning method called the Clustering-Merging CBR (CM-CBR) which produces similar level of predictive performances than the conventional CBR with spending significantly less computational cost.
Windowed multitaper correlation analysis of multimodal brain monitoring parameters.
Faltermeier, Rupert; Proescholdt, Martin A; Bele, Sylvia; Brawanski, Alexander
2015-01-01
Although multimodal monitoring sets the standard in daily practice of neurocritical care, problem-oriented analysis tools to interpret the huge amount of data are lacking. Recently a mathematical model was presented that simulates the cerebral perfusion and oxygen supply in case of a severe head trauma, predicting the appearance of distinct correlations between arterial blood pressure and intracranial pressure. In this study we present a set of mathematical tools that reliably detect the predicted correlations in data recorded at a neurocritical care unit. The time resolved correlations will be identified by a windowing technique combined with Fourier-based coherence calculations. The phasing of the data is detected by means of Hilbert phase difference within the above mentioned windows. A statistical testing method is introduced that allows tuning the parameters of the windowing method in such a way that a predefined accuracy is reached. With this method the data of fifteen patients were examined in which we found the predicted correlation in each patient. Additionally it could be shown that the occurrence of a distinct correlation parameter, called scp, represents a predictive value of high quality for the patients outcome.
A Simplified and Reliable Damage Method for the Prediction of the Composites Pieces
NASA Astrophysics Data System (ADS)
Viale, R.; Coquillard, M.; Seytre, C.
2012-07-01
Structural engineers are often faced to test results on composite structures largely tougher than predicted. By attempting to reduce this frequent gap, a survey of some extensive synthesis works relative to the prediction methods and to the failure criteria was led. This inquiry dealts with the plane stress state only. All classical methods have strong and weak points wrt practice and reliability aspects. The main conclusion is that in the plane stress case, the best usaul industrial methods give predictions rather similar. But very generally they do not explain the often large discrepancies wrt the tests, mainly in the cases of strong stress gradients or of bi-axial laminate loadings. It seems that only the methods considering the complexity of the composites damages (so-called physical methods or Continuum Damage Mechanics “CDM”) bring a clear mending wrt the usual methods..The only drawback of these methods is their relative intricacy mainly in urged industrial conditions. A method with an approaching but simplified representation of the CDM phenomenology is presented. It was compared to tests and other methods: - it brings a fear improvement of the correlation with tests wrt the usual industrial methods, - it gives results very similar to the painstaking CDM methods and very close to the test results. Several examples are provided. In addition this method is really thrifty wrt the material characterization as well as for the modelisation and the computation efforts.
Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods.
Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J Sunil
2014-08-01
We introduce a survival/risk bump hunting framework to build a bump hunting model with a possibly censored time-to-event type of response and to validate model estimates. First, we describe the use of adequate survival peeling criteria to build a survival/risk bump hunting model based on recursive peeling methods. Our method called "Patient Recursive Survival Peeling" is a rule-induction method that makes use of specific peeling criteria such as hazard ratio or log-rank statistics. Second, to validate our model estimates and improve survival prediction accuracy, we describe a resampling-based validation technique specifically designed for the joint task of decision rule making by recursive peeling (i.e. decision-box) and survival estimation. This alternative technique, called "combined" cross-validation is done by combining test samples over the cross-validation loops, a design allowing for bump hunting by recursive peeling in a survival setting. We provide empirical results showing the importance of cross-validation and replication.
Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods
Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil
2015-01-01
We introduce a survival/risk bump hunting framework to build a bump hunting model with a possibly censored time-to-event type of response and to validate model estimates. First, we describe the use of adequate survival peeling criteria to build a survival/risk bump hunting model based on recursive peeling methods. Our method called “Patient Recursive Survival Peeling” is a rule-induction method that makes use of specific peeling criteria such as hazard ratio or log-rank statistics. Second, to validate our model estimates and improve survival prediction accuracy, we describe a resampling-based validation technique specifically designed for the joint task of decision rule making by recursive peeling (i.e. decision-box) and survival estimation. This alternative technique, called “combined” cross-validation is done by combining test samples over the cross-validation loops, a design allowing for bump hunting by recursive peeling in a survival setting. We provide empirical results showing the importance of cross-validation and replication. PMID:26997922
A protein block based fold recognition method for the annotation of twilight zone sequences.
Suresh, V; Ganesan, K; Parthasarathy, S
2013-03-01
The description of protein backbone was recently improved with a group of structural fragments called Structural Alphabets instead of the regular three states (Helix, Sheet and Coil) secondary structure description. Protein Blocks is one of the Structural Alphabets used to describe each and every region of protein backbone including the coil. According to de Brevern (2000) the Protein Blocks has 16 structural fragments and each one has 5 residues in length. Protein Blocks fragments are highly informative among the available Structural Alphabets and it has been used for many applications. Here, we present a protein fold recognition method based on Protein Blocks for the annotation of twilight zone sequences. In our method, we align the predicted Protein Blocks of a query amino acid sequence with a library of assigned Protein Blocks of 953 known folds using the local pair-wise alignment. The alignment results with z-value ≥ 2.5 and P-value ≤ 0.08 are predicted as possible folds. Our method is able to recognize the possible folds for nearly 35.5% of the twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, which is available at Protein Block Export server.
Integrating linear optimization with structural modeling to increase HIV neutralization breadth.
Sevy, Alexander M; Panda, Swetasudha; Crowe, James E; Meiler, Jens; Vorobeychik, Yevgeniy
2018-02-01
Computational protein design has been successful in modeling fixed backbone proteins in a single conformation. However, when modeling large ensembles of flexible proteins, current methods in protein design have been insufficient. Large barriers in the energy landscape are difficult to traverse while redesigning a protein sequence, and as a result current design methods only sample a fraction of available sequence space. We propose a new computational approach that combines traditional structure-based modeling using the Rosetta software suite with machine learning and integer linear programming to overcome limitations in the Rosetta sampling methods. We demonstrate the effectiveness of this method, which we call BROAD, by benchmarking the performance on increasing predicted breadth of anti-HIV antibodies. We use this novel method to increase predicted breadth of naturally-occurring antibody VRC23 against a panel of 180 divergent HIV viral strains and achieve 100% predicted binding against the panel. In addition, we compare the performance of this method to state-of-the-art multistate design in Rosetta and show that we can outperform the existing method significantly. We further demonstrate that sequences recovered by this method recover known binding motifs of broadly neutralizing anti-HIV antibodies. Finally, our approach is general and can be extended easily to other protein systems. Although our modeled antibodies were not tested in vitro, we predict that these variants would have greatly increased breadth compared to the wild-type antibody.
2008-03-01
to predict its exact position. To locate Ceres, Carl Friedrich Gauss , a mere 24 years old at the time, developed a method called least-squares...dividend to produce the quotient. This method converges to the reciprocal quadratically [11]. For the special case of: 1 H × P (:, :, k)×H ′ + R (3.9) the...high-speed computation of reciprocals within the overall system. The Newton-Raphson method is also expanded for use in calculat- ing square-roots in
Yi, Faliu; Moon, Inkyu; Javidi, Bahram
2017-10-01
In this paper, we present two models for automatically extracting red blood cells (RBCs) from RBCs holographic images based on a deep learning fully convolutional neural network (FCN) algorithm. The first model, called FCN-1, only uses the FCN algorithm to carry out RBCs prediction, whereas the second model, called FCN-2, combines the FCN approach with the marker-controlled watershed transform segmentation scheme to achieve RBCs extraction. Both models achieve good segmentation accuracy. In addition, the second model has much better performance in terms of cell separation than traditional segmentation methods. In the proposed methods, the RBCs phase images are first numerically reconstructed from RBCs holograms recorded with off-axis digital holographic microscopy. Then, some RBCs phase images are manually segmented and used as training data to fine-tune the FCN. Finally, each pixel in new input RBCs phase images is predicted into either foreground or background using the trained FCN models. The RBCs prediction result from the first model is the final segmentation result, whereas the result from the second model is used as the internal markers of the marker-controlled transform algorithm for further segmentation. Experimental results show that the given schemes can automatically extract RBCs from RBCs phase images and much better RBCs separation results are obtained when the FCN technique is combined with the marker-controlled watershed segmentation algorithm.
Yi, Faliu; Moon, Inkyu; Javidi, Bahram
2017-01-01
In this paper, we present two models for automatically extracting red blood cells (RBCs) from RBCs holographic images based on a deep learning fully convolutional neural network (FCN) algorithm. The first model, called FCN-1, only uses the FCN algorithm to carry out RBCs prediction, whereas the second model, called FCN-2, combines the FCN approach with the marker-controlled watershed transform segmentation scheme to achieve RBCs extraction. Both models achieve good segmentation accuracy. In addition, the second model has much better performance in terms of cell separation than traditional segmentation methods. In the proposed methods, the RBCs phase images are first numerically reconstructed from RBCs holograms recorded with off-axis digital holographic microscopy. Then, some RBCs phase images are manually segmented and used as training data to fine-tune the FCN. Finally, each pixel in new input RBCs phase images is predicted into either foreground or background using the trained FCN models. The RBCs prediction result from the first model is the final segmentation result, whereas the result from the second model is used as the internal markers of the marker-controlled transform algorithm for further segmentation. Experimental results show that the given schemes can automatically extract RBCs from RBCs phase images and much better RBCs separation results are obtained when the FCN technique is combined with the marker-controlled watershed segmentation algorithm. PMID:29082078
Predicting financial trouble using call data—On social capital, phone logs, and financial trouble
Lin, Chia-Ching; Chen, Kuan-Ta; Singh, Vivek Kumar
2018-01-01
An ability to understand and predict financial wellbeing for individuals is of interest to economists, policy designers, financial institutions, and the individuals themselves. According to the Nilson reports, there were more than 3 billion credit cards in use in 2013, accounting for purchases exceeding US$ 2.2 trillion, and according to the Federal Reserve report, 39% of American households were carrying credit card debt from month to month. Prior literature has connected individual financial wellbeing with social capital. However, as yet, there is limited empirical evidence connecting social interaction behavior with financial outcomes. This work reports results from one of the largest known studies connecting financial outcomes and phone-based social behavior (180,000 individuals; 2 years’ time frame; 82.2 million monthly bills, and 350 million call logs). Our methodology tackles highly imbalanced dataset, which is a pertinent problem with modelling credit risk behavior, and offers a novel hybrid method that yields improvements over, both, a traditional transaction data only approach, and an approach that uses only call data. The results pave way for better financial modelling of billions of unbanked and underbanked customers using non-traditional metrics like phone-based credit scoring. PMID:29474411
Predicting financial trouble using call data-On social capital, phone logs, and financial trouble.
Agarwal, Rishav Raj; Lin, Chia-Ching; Chen, Kuan-Ta; Singh, Vivek Kumar
2018-01-01
An ability to understand and predict financial wellbeing for individuals is of interest to economists, policy designers, financial institutions, and the individuals themselves. According to the Nilson reports, there were more than 3 billion credit cards in use in 2013, accounting for purchases exceeding US$ 2.2 trillion, and according to the Federal Reserve report, 39% of American households were carrying credit card debt from month to month. Prior literature has connected individual financial wellbeing with social capital. However, as yet, there is limited empirical evidence connecting social interaction behavior with financial outcomes. This work reports results from one of the largest known studies connecting financial outcomes and phone-based social behavior (180,000 individuals; 2 years' time frame; 82.2 million monthly bills, and 350 million call logs). Our methodology tackles highly imbalanced dataset, which is a pertinent problem with modelling credit risk behavior, and offers a novel hybrid method that yields improvements over, both, a traditional transaction data only approach, and an approach that uses only call data. The results pave way for better financial modelling of billions of unbanked and underbanked customers using non-traditional metrics like phone-based credit scoring.
SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method.
Vasylenko, Tamara; Liou, Yi-Fan; Chen, Hong-An; Charoenkwan, Phasit; Huang, Hui-Ling; Ho, Shinn-Ying
2015-01-01
Photosynthetic proteins (PSPs) greatly differ in their structure and function as they are involved in numerous subprocesses that take place inside an organelle called a chloroplast. Few studies predict PSPs from sequences due to their high variety of sequences and structues. This work aims to predict and characterize PSPs by establishing the datasets of PSP and non-PSP sequences and developing prediction methods. A novel bioinformatics method of predicting and characterizing PSPs based on scoring card method (SCMPSP) was used. First, a dataset consisting of 649 PSPs was established by using a Gene Ontology term GO:0015979 and 649 non-PSPs from the SwissProt database with sequence identity <= 25%.- Several prediction methods are presented based on support vector machine (SVM), decision tree J48, Bayes, BLAST, and SCM. The SVM method using dipeptide features-performed well and yielded - a test accuracy of 72.31%. The SCMPSP method uses the estimated propensity scores of 400 dipeptides - as PSPs and has a test accuracy of 71.54%, which is comparable to that of the SVM method. The derived propensity scores of 20 amino acids were further used to identify informative physicochemical properties for characterizing PSPs. The analytical results reveal the following four characteristics of PSPs: 1) PSPs favour hydrophobic side chain amino acids; 2) PSPs are composed of the amino acids prone to form helices in membrane environments; 3) PSPs have low interaction with water; and 4) PSPs prefer to be composed of the amino acids of electron-reactive side chains. The SCMPSP method not only estimates the propensity of a sequence to be PSPs, it also discovers characteristics that further improve understanding of PSPs. The SCMPSP source code and the datasets used in this study are available at http://iclab.life.nctu.edu.tw/SCMPSP/.
VAN method of short-term earthquake prediction shows promise
NASA Astrophysics Data System (ADS)
Uyeda, Seiya
Although optimism prevailed in the 1970s, the present consensus on earthquake prediction appears to be quite pessimistic. However, short-term prediction based on geoelectric potential monitoring has stood the test of time in Greece for more than a decade [VarotsosandKulhanek, 1993] Lighthill, 1996]. The method used is called the VAN method.The geoelectric potential changes constantly due to causes such as magnetotelluric effects, lightning, rainfall, leakage from manmade sources, and electrochemical instabilities of electrodes. All of this noise must be eliminated before preseismic signals are identified, if they exist at all. The VAN group apparently accomplished this task for the first time. They installed multiple short (100-200m) dipoles with different lengths in both north-south and east-west directions and long (1-10 km) dipoles in appropriate orientations at their stations (one of their mega-stations, Ioannina, for example, now has 137 dipoles in operation) and found that practically all of the noise could be eliminated by applying a set of criteria to the data.
Taxi Time Prediction at Charlotte Airport Using Fast-Time Simulation and Machine Learning Techniques
NASA Technical Reports Server (NTRS)
Lee, Hanbong
2016-01-01
Accurate taxi time prediction is required for enabling efficient runway scheduling that can increase runway throughput and reduce taxi times and fuel consumptions on the airport surface. Currently NASA and American Airlines are jointly developing a decision-support tool called Spot and Runway Departure Advisor (SARDA) that assists airport ramp controllers to make gate pushback decisions and improve the overall efficiency of airport surface traffic. In this presentation, we propose to use Linear Optimized Sequencing (LINOS), a discrete-event fast-time simulation tool, to predict taxi times and provide the estimates to the runway scheduler in real-time airport operations. To assess its prediction accuracy, we also introduce a data-driven analytical method using machine learning techniques. These two taxi time prediction methods are evaluated with actual taxi time data obtained from the SARDA human-in-the-loop (HITL) simulation for Charlotte Douglas International Airport (CLT) using various performance measurement metrics. Based on the taxi time prediction results, we also discuss how the prediction accuracy can be affected by the operational complexity at this airport and how we can improve the fast time simulation model before implementing it with an airport scheduling algorithm in a real-time environment.
Li, Longhai; Feng, Cindy X; Qiu, Shi
2017-06-30
An important statistical task in disease mapping problems is to identify divergent regions with unusually high or low risk of disease. Leave-one-out cross-validatory (LOOCV) model assessment is the gold standard for estimating predictive p-values that can flag such divergent regions. However, actual LOOCV is time-consuming because one needs to rerun a Markov chain Monte Carlo analysis for each posterior distribution in which an observation is held out as a test case. This paper introduces a new method, called integrated importance sampling (iIS), for estimating LOOCV predictive p-values with only Markov chain samples drawn from the posterior based on a full data set. The key step in iIS is that we integrate away the latent variables associated the test observation with respect to their conditional distribution without reference to the actual observation. By following the general theory for importance sampling, the formula used by iIS can be proved to be equivalent to the LOOCV predictive p-value. We compare iIS and other three existing methods in the literature with two disease mapping datasets. Our empirical results show that the predictive p-values estimated with iIS are almost identical to the predictive p-values estimated with actual LOOCV and outperform those given by the existing three methods, namely, the posterior predictive checking, the ordinary importance sampling, and the ghosting method by Marshall and Spiegelhalter (2003). Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Suresh, V; Parthasarathy, S
2014-01-01
We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.
ERIC Educational Resources Information Center
Max, Jeffrey E.; Schachar, Russell J.; Levin, Harvey S.; Ewing-Cobbs, Linda; Chapman, Sandra B.; Dennis, Maureen; Saunders, Ann; Landis, Julie
2005-01-01
Objective: To assess the phenomenology and predictive factors of attention-deficit/hyperactivity disorder (ADHD) after traumatic brain injury (TBI), also called secondary ADHD (SADHD). Method: Children without preinjury ADHD 5-14 years old with TBI from consecutive admissions (n = 143) to five trauma centers were observed prospectively from 6 to…
Robert E. Keane; Laura J. Dickinson
2007-01-01
Fire managers need better estimates of fuel loading so they can more accurately predict the potential fire behavior and effects of alternative fuel and ecosystem restoration treatments. This report presents a new fuel sampling method, called the photoload sampling technique, to quickly and accurately estimate loadings for six common surface fuel components (1 hr, 10 hr...
Simulating the electrohydrodynamics of a viscous droplet
NASA Astrophysics Data System (ADS)
Theillard, Maxime; Saintillan, David
2016-11-01
We present a novel numerical approach for the simulation of viscous drop placed in an electric field in two and three spatial dimensions. Our method is constructed as a stable projection method on Quad/Octree grids. Using a modified pressure correction we were able to alleviate the standard time step restriction incurred by capillary forces. In weak electric fields, our results match remarkably well with the predictions from the Taylor-Melcher leaky dielectric model. In strong electric fields the so-called Quincke rotation is correctly reproduced.
NASA Astrophysics Data System (ADS)
Ha, Sangwoo; Lee, Gyoungho; Kalman, Calvin S.
2013-06-01
Hermeneutics is useful in science and science education by emphasizing the process of understanding. The purpose of this study was to construct a workshop based upon hermeneutical principles and to interpret students' learning in the workshop through a hermeneutical perspective. When considering the history of Newtonian mechanics, it could be considered that there are two methods of approaching Newtonian mechanics. One method is called the `prediction approach', and the other is called the `explanation approach'. The `prediction approach' refers to the application of the principles of Newtonian mechanics. We commonly use the prediction approach because its logical process is natural to us. However, its use is correct only when a force, such as gravitation, is exactly known. On the other hand, the `explanation approach' could be used when the nature of a force is not exactly known. In the workshop, students read a short text offering contradicting ideas about whether to analyze a friction situation using the explanation approach or the prediction approach. Twenty-two college students taking an upper-level mechanics course wrote their ideas about the text. The participants then discussed their ideas within six groups, each composed of three or four students. Through the group discussion, students were able to clarify their preconceptions about friction, and they responded to the group discussion positively. Students started to think about their learning from a holistic perspective. As students thought and discussed the friction problems in the manner of hermeneutical circles, they moved toward a better understanding of friction.
Bao, Yu; Hayashida, Morihiro; Akutsu, Tatsuya
2016-11-25
Dicer is necessary for the process of mature microRNA (miRNA) formation because the Dicer enzyme cleaves pre-miRNA correctly to generate miRNA with correct seed regions. Nonetheless, the mechanism underlying the selection of a Dicer cleavage site is still not fully understood. To date, several studies have been conducted to solve this problem, for example, a recent discovery indicates that the loop/bulge structure plays a central role in the selection of Dicer cleavage sites. In accordance with this breakthrough, a support vector machine (SVM)-based method called PHDCleav was developed to predict Dicer cleavage sites which outperforms other methods based on random forest and naive Bayes. PHDCleav, however, tests only whether a position in the shift window belongs to a loop/bulge structure. In this paper, we used the length of loop/bulge structures (in addition to their presence or absence) to develop an improved method, LBSizeCleav, for predicting Dicer cleavage sites. To evaluate our method, we used 810 empirically validated sequences of human pre-miRNAs and performed fivefold cross-validation. In both 5p and 3p arms of pre-miRNAs, LBSizeCleav showed greater prediction accuracy than PHDCleav did. This result suggests that the length of loop/bulge structures is useful for prediction of Dicer cleavage sites. We developed a novel algorithm for feature space mapping based on the length of a loop/bulge for predicting Dicer cleavage sites. The better performance of our method indicates the usefulness of the length of loop/bulge structures for such predictions.
Quantum Chemical Topology: Knowledgeable atoms in peptides
NASA Astrophysics Data System (ADS)
Popelier, Paul L. A.
2012-06-01
The need to improve atomistic biomolecular force fields remains acute. Fortunately, the abundance of contemporary computing power enables an overhaul of the architecture of current force fields, which typically base their electrostatics on fixed atomic partial charges. We discuss the principles behind the electrostatics of a more realistic force field under construction, called QCTFF. At the heart of QCTFF lies the so-called topological atom, which is a malleable box, whose shape and electrostatics changes in response to a changing environment. This response is captured by a machine learning method called Kriging. Kriging directly predicts each multipole moment of a given atom (i.e. the output) from the coordinates of the nuclei surrounding this atom (i.e. the input). This procedure yields accurate interatomic electrostatic energies, which form the basis for future-proof progress in force field design.
The prediction of intelligence in preschool children using alternative models to regression.
Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E
2011-12-01
Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.
Modification and Validation of Conceptual Design Aerodynamic Prediction Method HASC95 With VTXCHN
NASA Technical Reports Server (NTRS)
Albright, Alan E.; Dixon, Charles J.; Hegedus, Martin C.
1996-01-01
A conceptual/preliminary design level subsonic aerodynamic prediction code HASC (High Angle of Attack Stability and Control) has been improved in several areas, validated, and documented. The improved code includes improved methodologies for increased accuracy and robustness, and simplified input/output files. An engineering method called VTXCHN (Vortex Chine) for prediciting nose vortex shedding from circular and non-circular forebodies with sharp chine edges has been improved and integrated into the HASC code. This report contains a summary of modifications, description of the code, user's guide, and validation of HASC. Appendices include discussion of a new HASC utility code, listings of sample input and output files, and a discussion of the application of HASC to buffet analysis.
Feature selection for examining behavior by pathology laboratories.
Hawkins, S; Williams, G; Baxter, R
2001-08-01
Australia has a universal health insurance scheme called Medicare, which is managed by Australia's Health Insurance Commission. Medicare payments for pathology services generate voluminous transaction data on patients, doctors and pathology laboratories. The Health Insurance Commission (HIC) currently uses predictive models to monitor compliance with regulatory requirements. The HIC commissioned a project to investigate the generation of new features from the data. Feature generation has not appeared as an important step in the knowledge discovery in databases (KDD) literature. New interesting features for use in predictive modeling are generated. These features were summarized, visualized and used as inputs for clustering and outlier detection methods. Data organization and data transformation methods are described for the efficient access and manipulation of these new features.
Predictive Thermal Control Applied to HabEx
NASA Technical Reports Server (NTRS)
Brooks, Thomas E.
2017-01-01
Exoplanet science can be accomplished with a telescope that has an internal coronagraph or with an external starshade. An internal coronagraph architecture requires extreme wavefront stability (10 pm change/10 minutes for 10(exp -10) contrast), so every source of wavefront error (WFE) must be controlled. Analysis has been done to estimate the thermal stability required to meet the wavefront stability requirement. This paper illustrates the potential of a new thermal control method called predictive thermal control (PTC) to achieve the required thermal stability. A simple development test using PTC indicates that PTC may meet the thermal stability requirements. Further testing of the PTC method in flight-like environments will be conducted in the X-ray and Cryogenic Facility (XRCF) at Marshall Space Flight Center (MSFC).
Predictive thermal control applied to HabEx
NASA Astrophysics Data System (ADS)
Brooks, Thomas E.
2017-09-01
Exoplanet science can be accomplished with a telescope that has an internal coronagraph or with an external starshade. An internal coronagraph architecture requires extreme wavefront stability (10 pm change/10 minutes for 10-10 contrast), so every source of wavefront error (WFE) must be controlled. Analysis has been done to estimate the thermal stability required to meet the wavefront stability requirement. This paper illustrates the potential of a new thermal control method called predictive thermal control (PTC) to achieve the required thermal stability. A simple development test using PTC indicates that PTC may meet the thermal stability requirements. Further testing of the PTC method in flight-like environments will be conducted in the X-ray and Cryogenic Facility (XRCF) at Marshall Space Flight Center (MSFC).
High Ringxiety: Attachment Anxiety Predicts Experiences of Phantom Cell Phone Ringing.
Kruger, Daniel J; Djerf, Jaikob M
2016-01-01
Mobile cell phone users have reported experiencing ringing and/or vibrations associated with incoming calls and messages, only to find that no call or message had actually registered. We believe this phenomenon can be understood as a human signal detection issue, with potentially important influences from psychological attributes. We hypothesized that individuals higher in attachment anxiety would report more frequent phantom cell phone experiences, whereas individuals higher in attachment avoidance would report less frequent experiences. If these experiences are primarily psychologically related to attributes of interpersonal relationships, associations with attachment style should be stronger than for general sensation seeking. We also predicted that certain contexts would interact with attachment style to increase or decrease the likelihood of experiencing phantom cell phone calls and messages. Attachment anxiety directly predicted the frequency of phantom ringing and notification experiences, whereas attachment avoidance and sensation seeking did not directly predict frequency. Attachment anxiety and attachment avoidance interacted with contextual factors (expectations for a call or message and concerned about an issue that one may be contacted about) in the expected directions for predicting phantom cell phone experiences.
Machine Learning Techniques for Prediction of Early Childhood Obesity.
Dugan, T M; Mukhopadhyay, S; Carroll, A; Downs, S
2015-01-01
This paper aims to predict childhood obesity after age two, using only data collected prior to the second birthday by a clinical decision support system called CHICA. Analyses of six different machine learning methods: RandomTree, RandomForest, J48, ID3, Naïve Bayes, and Bayes trained on CHICA data show that an accurate, sensitive model can be created. Of the methods analyzed, the ID3 model trained on the CHICA dataset proved the best overall performance with accuracy of 85% and sensitivity of 89%. Additionally, the ID3 model had a positive predictive value of 84% and a negative predictive value of 88%. The structure of the tree also gives insight into the strongest predictors of future obesity in children. Many of the strongest predictors seen in the ID3 modeling of the CHICA dataset have been independently validated in the literature as correlated with obesity, thereby supporting the validity of the model. This study demonstrated that data from a production clinical decision support system can be used to build an accurate machine learning model to predict obesity in children after age two.
Du, Xiuquan; Hu, Changlin; Yao, Yu; Sun, Shiwei; Zhang, Yanping
2017-12-12
In bioinformatics, exon skipping (ES) event prediction is an essential part of alternative splicing (AS) event analysis. Although many methods have been developed to predict ES events, a solution has yet to be found. In this study, given the limitations of machine learning algorithms with RNA-Seq data or genome sequences, a new feature, called RS (RNA-seq and sequence) features, was constructed. These features include RNA-Seq features derived from the RNA-Seq data and sequence features derived from genome sequences. We propose a novel Rotation Forest classifier to predict ES events with the RS features (RotaF-RSES). To validate the efficacy of RotaF-RSES, a dataset from two human tissues was used, and RotaF-RSES achieved an accuracy of 98.4%, a specificity of 99.2%, a sensitivity of 94.1%, and an area under the curve (AUC) of 98.6%. When compared to the other available methods, the results indicate that RotaF-RSES is efficient and can predict ES events with RS features.
Predicting beta-turns in proteins using support vector machines with fractional polynomials
2013-01-01
Background β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design. Results We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features. Conclusions In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods. PMID:24565438
Predicting beta-turns in proteins using support vector machines with fractional polynomials.
Elbashir, Murtada; Wang, Jianxin; Wu, Fang-Xiang; Wang, Lusheng
2013-11-07
β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design. We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features. In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods.
Multiplex visibility graphs to investigate recurrent neural network dynamics
NASA Astrophysics Data System (ADS)
Bianchi, Filippo Maria; Livi, Lorenzo; Alippi, Cesare; Jenssen, Robert
2017-03-01
A recurrent neural network (RNN) is a universal approximator of dynamical systems, whose performance often depends on sensitive hyperparameters. Tuning them properly may be difficult and, typically, based on a trial-and-error approach. In this work, we adopt a graph-based framework to interpret and characterize internal dynamics of a class of RNNs called echo state networks (ESNs). We design principled unsupervised methods to derive hyperparameters configurations yielding maximal ESN performance, expressed in terms of prediction error and memory capacity. In particular, we propose to model time series generated by each neuron activations with a horizontal visibility graph, whose topological properties have been shown to be related to the underlying system dynamics. Successively, horizontal visibility graphs associated with all neurons become layers of a larger structure called a multiplex. We show that topological properties of such a multiplex reflect important features of ESN dynamics that can be used to guide the tuning of its hyperparamers. Results obtained on several benchmarks and a real-world dataset of telephone call data records show the effectiveness of the proposed methods.
Multiplex visibility graphs to investigate recurrent neural network dynamics
Bianchi, Filippo Maria; Livi, Lorenzo; Alippi, Cesare; Jenssen, Robert
2017-01-01
A recurrent neural network (RNN) is a universal approximator of dynamical systems, whose performance often depends on sensitive hyperparameters. Tuning them properly may be difficult and, typically, based on a trial-and-error approach. In this work, we adopt a graph-based framework to interpret and characterize internal dynamics of a class of RNNs called echo state networks (ESNs). We design principled unsupervised methods to derive hyperparameters configurations yielding maximal ESN performance, expressed in terms of prediction error and memory capacity. In particular, we propose to model time series generated by each neuron activations with a horizontal visibility graph, whose topological properties have been shown to be related to the underlying system dynamics. Successively, horizontal visibility graphs associated with all neurons become layers of a larger structure called a multiplex. We show that topological properties of such a multiplex reflect important features of ESN dynamics that can be used to guide the tuning of its hyperparamers. Results obtained on several benchmarks and a real-world dataset of telephone call data records show the effectiveness of the proposed methods. PMID:28281563
Improving prediction of heterodimeric protein complexes using combination with pairwise kernel.
Ruan, Peiying; Hayashida, Morihiro; Akutsu, Tatsuya; Vert, Jean-Philippe
2018-02-19
Since many proteins become functional only after they interact with their partner proteins and form protein complexes, it is essential to identify the sets of proteins that form complexes. Therefore, several computational methods have been proposed to predict complexes from the topology and structure of experimental protein-protein interaction (PPI) network. These methods work well to predict complexes involving at least three proteins, but generally fail at identifying complexes involving only two different proteins, called heterodimeric complexes or heterodimers. There is however an urgent need for efficient methods to predict heterodimers, since the majority of known protein complexes are precisely heterodimers. In this paper, we use three promising kernel functions, Min kernel and two pairwise kernels, which are Metric Learning Pairwise Kernel (MLPK) and Tensor Product Pairwise Kernel (TPPK). We also consider the normalization forms of Min kernel. Then, we combine Min kernel or its normalization form and one of the pairwise kernels by plugging. We applied kernels based on PPI, domain, phylogenetic profile, and subcellular localization properties to predicting heterodimers. Then, we evaluate our method by employing C-Support Vector Classification (C-SVC), carrying out 10-fold cross-validation, and calculating the average F-measures. The results suggest that the combination of normalized-Min-kernel and MLPK leads to the best F-measure and improved the performance of our previous work, which had been the best existing method so far. We propose new methods to predict heterodimers, using a machine learning-based approach. We train a support vector machine (SVM) to discriminate interacting vs non-interacting protein pairs, based on informations extracted from PPI, domain, phylogenetic profiles and subcellular localization. We evaluate in detail new kernel functions to encode these data, and report prediction performance that outperforms the state-of-the-art.
Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R
2016-12-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Smith, Adam S.; Birnie, Andrew K.; Lane, Kent R.; French, Jeffrey A.
2010-01-01
Males and females from many species produce distinct acoustic variations of functionally identical call types. Social behavior may be primed by sex-specific variation in acoustic features of calls. We present a series of acoustic analyses and playback experiments as methods for investigating this subject. Acoustic parameters of phee calls produced by Wied’s black-tufted-ear marmosets (Callithrix kuhlii) were analyzed for sex differences. Discriminant function analyses showed that calls contained sufficient acoustic variation to predict the sex of the caller. Several frequency variables differed significantly between the sexes. Natural and synthesized calls were presented to male–female pairs. Calls elicited differential behavioral responses based on the sex of the caller. Marmosets became significantly more vigilant following the playback of male phee calls (both natural and synthetic) than following female phee calls. In a second playback experiment, synthesized calls were modified by independently manipulating three parameters that were known to differ between the sexes (low-, peak-, and end-frequency). When end-frequency-modified calls were presented, responsiveness was differentiable by sex of caller but did not differ from responses to natural calls. This suggests that marmosets did not use end-frequency to determine the sex of the caller. Manipulation of peak-and low-frequency parameters eliminated the discrete behavioral responses to male and female calls. Together, these parameters may be important features that encode for the sex-specific signal. Recognition of sex by acoustic cues seems to be a multivariate process that depends on the congruency of acoustic features. PMID:19090554
Leveraging Call Center Logs for Customer Behavior Prediction
NASA Astrophysics Data System (ADS)
Parvathy, Anju G.; Vasudevan, Bintu G.; Kumar, Abhishek; Balakrishnan, Rajesh
Most major businesses use business process outsourcing for performing a process or a part of a process including financial services like mortgage processing, loan origination, finance and accounting and transaction processing. Call centers are used for the purpose of receiving and transmitting a large volume of requests through outbound and inbound calls to customers on behalf of a business. In this paper we deal specifically with the call centers notes from banks. Banks as financial institutions provide loans to non-financial businesses and individuals. Their call centers act as the nuclei of their client service operations and log the transactions between the customer and the bank. This crucial conversation or information can be exploited for predicting a customer’s behavior which will in turn help these businesses to decide on the next action to be taken. Thus the banks save considerable time and effort in tracking delinquent customers to ensure minimum subsequent defaulters. Majority of the time the call center notes are very concise and brief and often the notes are misspelled and use many domain specific acronyms. In this paper we introduce a novel domain specific spelling correction algorithm which corrects the misspelled words in the call center logs to meaningful ones. We also discuss a procedure that builds the behavioral history sequences for the customers by categorizing the logs into one of the predefined behavioral states. We then describe a pattern based predictive algorithm that uses temporal behavioral patterns mined from these sequences to predict the customer’s next behavioral state.
Spatial-temporal forecasting the sunspot diagram
NASA Astrophysics Data System (ADS)
Covas, Eurico
2017-09-01
Aims: We attempt to forecast the Sun's sunspot butterfly diagram in both space (I.e. in latitude) and time, instead of the usual one-dimensional time series forecasts prevalent in the scientific literature. Methods: We use a prediction method based on the non-linear embedding of data series in high dimensions. We use this method to forecast both in latitude (space) and in time, using a full spatial-temporal series of the sunspot diagram from 1874 to 2015. Results: The analysis of the results shows that it is indeed possible to reconstruct the overall shape and amplitude of the spatial-temporal pattern of sunspots, but that the method in its current form does not have real predictive power. We also apply a metric called structural similarity to compare the forecasted and the observed butterfly cycles, showing that this metric can be a useful addition to the usual root mean square error metric when analysing the efficiency of different prediction methods. Conclusions: We conclude that it is in principle possible to reconstruct the full sunspot butterfly diagram for at least one cycle using this approach and that this method and others should be explored since just looking at metrics such as sunspot count number or sunspot total area coverage is too reductive given the spatial-temporal dynamical complexity of the sunspot butterfly diagram. However, more data and/or an improved approach is probably necessary to have true predictive power.
NASA Technical Reports Server (NTRS)
Howland, G. R.; Durno, J. A.; Twomey, W. J.
1990-01-01
Sikorsky Aircraft, together with the other major helicopter airframe manufacturers, is engaged in a study to improve the use of finite element analysis to predict the dynamic behavior of helicopter airframes, under a rotorcraft structural dynamics program called DAMVIBS (Design Analysis Methods for VIBrationS), sponsored by the NASA-Langley. The test plan and test results are presented for a shake test of the UH-60A BLACK HAWK helicopter. A comparison is also presented of test results with results obtained from analysis using a NASTRAN finite element model.
Regnier, D.; Litaize, O.; Serot, O.
2015-12-23
Numerous nuclear processes involve the deexcitation of a compound nucleus through the emission of several neutrons, gamma-rays and/or conversion electrons. The characteristics of such a deexcitation are commonly derived from a total statistical framework often called “Hauser–Feshbach” method. In this work, we highlight a numerical limitation of this kind of method in the case of the deexcitation of a high spin initial state. To circumvent this issue, an improved technique called the Fluctuating Structure Properties (FSP) method is presented. Two FSP algorithms are derived and benchmarked on the calculation of the total radiative width for a thermal neutron capture onmore » 238U. We compare the standard method with these FSP algorithms for the prediction of particle multiplicities in the deexcitation of a high spin level of 143Ba. The gamma multiplicity turns out to be very sensitive to the numerical method. The bias between the two techniques can reach 1.5 γγ/cascade. Lastly, the uncertainty of these calculations coming from the lack of knowledge on nuclear structure is estimated via the FSP method.« less
Inductive matrix completion for predicting gene-disease associations.
Natarajan, Nagarajan; Dhillon, Inderjit S
2014-06-15
Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has <15% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature. Source code and datasets can be downloaded from http://bigdata.ices.utexas.edu/project/gene-disease. © The Author 2014. Published by Oxford University Press.
A link prediction method for heterogeneous networks based on BP neural network
NASA Astrophysics Data System (ADS)
Li, Ji-chao; Zhao, Dan-ling; Ge, Bing-Feng; Yang, Ke-Wei; Chen, Ying-Wu
2018-04-01
Most real-world systems, composed of different types of objects connected via many interconnections, can be abstracted as various complex heterogeneous networks. Link prediction for heterogeneous networks is of great significance for mining missing links and reconfiguring networks according to observed information, with considerable applications in, for example, friend and location recommendations and disease-gene candidate detection. In this paper, we put forward a novel integrated framework, called MPBP (Meta-Path feature-based BP neural network model), to predict multiple types of links for heterogeneous networks. More specifically, the concept of meta-path is introduced, followed by the extraction of meta-path features for heterogeneous networks. Next, based on the extracted meta-path features, a supervised link prediction model is built with a three-layer BP neural network. Then, the solution algorithm of the proposed link prediction model is put forward to obtain predicted results by iteratively training the network. Last, numerical experiments on the dataset of examples of a gene-disease network and a combat network are conducted to verify the effectiveness and feasibility of the proposed MPBP. It shows that the MPBP with very good performance is superior to the baseline methods.
Normal mode-guided transition pathway generation in proteins
Lee, Byung Ho; Seo, Sangjae; Kim, Min Hyeok; Kim, Youngjin; Jo, Soojin; Choi, Moon-ki; Lee, Hoomin; Choi, Jae Boong
2017-01-01
The biological function of proteins is closely related to its structural motion. For instance, structurally misfolded proteins do not function properly. Although we are able to experimentally obtain structural information on proteins, it is still challenging to capture their dynamics, such as transition processes. Therefore, we need a simulation method to predict the transition pathways of a protein in order to understand and study large functional deformations. Here, we present a new simulation method called normal mode-guided elastic network interpolation (NGENI) that performs normal modes analysis iteratively to predict transition pathways of proteins. To be more specific, NGENI obtains displacement vectors that determine intermediate structures by interpolating the distance between two end-point conformations, similar to a morphing method called elastic network interpolation. However, the displacement vector is regarded as a linear combination of the normal mode vectors of each intermediate structure, in order to enhance the physical sense of the proposed pathways. As a result, we can generate more reasonable transition pathways geometrically and thermodynamically. By using not only all normal modes, but also in part using only the lowest normal modes, NGENI can still generate reasonable pathways for large deformations in proteins. This study shows that global protein transitions are dominated by collective motion, which means that a few lowest normal modes play an important role in this process. NGENI has considerable merit in terms of computational cost because it is possible to generate transition pathways by partial degrees of freedom, while conventional methods are not capable of this. PMID:29020017
Predicting Drug-Target Interactions With Multi-Information Fusion.
Peng, Lihong; Liao, Bo; Zhu, Wen; Li, Zejun; Li, Keqin
2017-03-01
Identifying potential associations between drugs and targets is a critical prerequisite for modern drug discovery and repurposing. However, predicting these associations is difficult because of the limitations of existing computational methods. Most models only consider chemical structures and protein sequences, and other models are oversimplified. Moreover, datasets used for analysis contain only true-positive interactions, and experimentally validated negative samples are unavailable. To overcome these limitations, we developed a semi-supervised based learning framework called NormMulInf through collaborative filtering theory by using labeled and unlabeled interaction information. The proposed method initially determines similarity measures, such as similarities among samples and local correlations among the labels of the samples, by integrating biological information. The similarity information is then integrated into a robust principal component analysis model, which is solved using augmented Lagrange multipliers. Experimental results on four classes of drug-target interaction networks suggest that the proposed approach can accurately classify and predict drug-target interactions. Part of the predicted interactions are reported in public databases. The proposed method can also predict possible targets for new drugs and can be used to determine whether atropine may interact with alpha1B- and beta1- adrenergic receptors. Furthermore, the developed technique identifies potential drugs for new targets and can be used to assess whether olanzapine and propiomazine may target 5HT2B. Finally, the proposed method can potentially address limitations on studies of multitarget drugs and multidrug targets.
ERIC Educational Resources Information Center
Lueder, E. J.
The rationale, design, and data collection methods of a study of factors that may influence the effectiveness of adult instructional groups that included interaction are discussed. Two aspects to be considered when studying instructional groups are called work and emotionality. The Work-Emotionality Theory is discussed. Six types of…
Patrol force allocation for law enforcement: An introductory planning guide
NASA Technical Reports Server (NTRS)
Sohn, R. L.; Kennedy, R. D.
1976-01-01
Previous and current methods for analyzing police patrol forces are reviewed and discussed. The steps in developing an allocation analysis procedure are defined, including the prediction of the rate of calls for service, determination of the number of patrol units needed, designing sectors, and analyzing dispatch strategies. Existing computer programs used for this purpose are briefly described, and some results of their application are given.
Acoustic Radiation From Rotating Blades: The Kirchhoff Method in Aeroacoustics
NASA Technical Reports Server (NTRS)
Farassat, F.
2000-01-01
This paper reviews the current status of discrete frequency noise prediction for rotating blade machinery in the time domain. There are two major approaches both of which can be classified as the Kirchhoff method. These methods depend on the solution of two linear wave equations called the K and FW-H equations. The solutions of these equations for subsonic and supersonic surfaces are discussed and some important results of the research in the past years are presented. This paper is analytical in nature and emphasizes the work of the author and coworkers at NASA Langley Research Center.
Sea surface temperature predictions using a multi-ocean analysis ensemble scheme
NASA Astrophysics Data System (ADS)
Zhang, Ying; Zhu, Jieshun; Li, Zhongxian; Chen, Haishan; Zeng, Gang
2017-08-01
This study examined the global sea surface temperature (SST) predictions by a so-called multiple-ocean analysis ensemble (MAE) initialization method which was applied in the National Centers for Environmental Prediction (NCEP) Climate Forecast System Version 2 (CFSv2). Different from most operational climate prediction practices which are initialized by a specific ocean analysis system, the MAE method is based on multiple ocean analyses. In the paper, the MAE method was first justified by analyzing the ocean temperature variability in four ocean analyses which all are/were applied for operational climate predictions either at the European Centre for Medium-range Weather Forecasts or at NCEP. It was found that these systems exhibit substantial uncertainties in estimating the ocean states, especially at the deep layers. Further, a set of MAE hindcasts was conducted based on the four ocean analyses with CFSv2, starting from each April during 1982-2007. The MAE hindcasts were verified against a subset of hindcasts from the NCEP CFS Reanalysis and Reforecast (CFSRR) Project. Comparisons suggested that MAE shows better SST predictions than CFSRR over most regions where ocean dynamics plays a vital role in SST evolutions, such as the El Niño and Atlantic Niño regions. Furthermore, significant improvements were also found in summer precipitation predictions over the equatorial eastern Pacific and Atlantic oceans, for which the local SST prediction improvements should be responsible. The prediction improvements by MAE imply a problem for most current climate predictions which are based on a specific ocean analysis system. That is, their predictions would drift towards states biased by errors inherent in their ocean initialization system, and thus have large prediction errors. In contrast, MAE arguably has an advantage by sampling such structural uncertainties, and could efficiently cancel these errors out in their predictions.
Support vector machines for prediction and analysis of beta and gamma-turns in proteins.
Pham, Tho Hoan; Satou, Kenji; Ho, Tu Bao
2005-04-01
Tight turns have long been recognized as one of the three important features of proteins, together with alpha-helix and beta-sheet. Tight turns play an important role in globular proteins from both the structural and functional points of view. More than 90% tight turns are beta-turns and most of the rest are gamma-turns. Analysis and prediction of beta-turns and gamma-turns is very useful for design of new molecules such as drugs, pesticides, and antigens. In this paper we investigated two aspects of applying support vector machine (SVM), a promising machine learning method for bioinformatics, to prediction and analysis of beta-turns and gamma-turns. First, we developed two SVM-based methods, called BTSVM and GTSVM, which predict beta-turns and gamma-turns in a protein from its sequence. When compared with other methods, BTSVM has a superior performance and GTSVM is competitive. Second, we used SVMs with a linear kernel to estimate the support of amino acids for the formation of beta-turns and gamma-turns depending on their position in a protein. Our analysis results are more comprehensive and easier to use than the previous results in designing turns in proteins.
A probabilistic neural network based approach for predicting the output power of wind turbines
NASA Astrophysics Data System (ADS)
Tabatabaei, Sajad
2017-03-01
Finding the authentic predicting tools of eliminating the uncertainty of wind speed forecasts is highly required while wind power sources are strongly penetrating. Recently, traditional predicting models of generating point forecasts have no longer been trustee. Thus, the present paper aims at utilising the concept of prediction intervals (PIs) to assess the uncertainty of wind power generation in power systems. Besides, this paper uses a newly introduced non-parametric approach called lower upper bound estimation (LUBE) to build the PIs since the forecasting errors are unable to be modelled properly by applying distribution probability functions. In the present proposed LUBE method, a PI combination-based fuzzy framework is used to overcome the performance instability of neutral networks (NNs) used in LUBE. In comparison to other methods, this formulation more suitably has satisfied the PI coverage and PI normalised average width (PINAW). Since this non-linear problem has a high complexity, a new heuristic-based optimisation algorithm comprising a novel modification is introduced to solve the aforesaid problems. Based on data sets taken from a wind farm in Australia, the feasibility and satisfying performance of the suggested method have been investigated.
Windowed Multitaper Correlation Analysis of Multimodal Brain Monitoring Parameters
Proescholdt, Martin A.; Bele, Sylvia; Brawanski, Alexander
2015-01-01
Although multimodal monitoring sets the standard in daily practice of neurocritical care, problem-oriented analysis tools to interpret the huge amount of data are lacking. Recently a mathematical model was presented that simulates the cerebral perfusion and oxygen supply in case of a severe head trauma, predicting the appearance of distinct correlations between arterial blood pressure and intracranial pressure. In this study we present a set of mathematical tools that reliably detect the predicted correlations in data recorded at a neurocritical care unit. The time resolved correlations will be identified by a windowing technique combined with Fourier-based coherence calculations. The phasing of the data is detected by means of Hilbert phase difference within the above mentioned windows. A statistical testing method is introduced that allows tuning the parameters of the windowing method in such a way that a predefined accuracy is reached. With this method the data of fifteen patients were examined in which we found the predicted correlation in each patient. Additionally it could be shown that the occurrence of a distinct correlation parameter, called scp, represents a predictive value of high quality for the patients outcome. PMID:25821507
Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords.
Koyabu, Shun; Phan, Thi Thanh Thuy; Ohkawa, Takenao
2015-01-01
For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as "bind" or "interact" plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction.
Advanced Online Survival Analysis Tool for Predictive Modelling in Clinical Data Science.
Montes-Torres, Julio; Subirats, José Luis; Ribelles, Nuria; Urda, Daniel; Franco, Leonardo; Alba, Emilio; Jerez, José Manuel
2016-01-01
One of the prevailing applications of machine learning is the use of predictive modelling in clinical survival analysis. In this work, we present our view of the current situation of computer tools for survival analysis, stressing the need of transferring the latest results in the field of machine learning to biomedical researchers. We propose a web based software for survival analysis called OSA (Online Survival Analysis), which has been developed as an open access and user friendly option to obtain discrete time, predictive survival models at individual level using machine learning techniques, and to perform standard survival analysis. OSA employs an Artificial Neural Network (ANN) based method to produce the predictive survival models. Additionally, the software can easily generate survival and hazard curves with multiple options to personalise the plots, obtain contingency tables from the uploaded data to perform different tests, and fit a Cox regression model from a number of predictor variables. In the Materials and Methods section, we depict the general architecture of the application and introduce the mathematical background of each of the implemented methods. The study concludes with examples of use showing the results obtained with public datasets.
Advanced Online Survival Analysis Tool for Predictive Modelling in Clinical Data Science
Montes-Torres, Julio; Subirats, José Luis; Ribelles, Nuria; Urda, Daniel; Franco, Leonardo; Alba, Emilio; Jerez, José Manuel
2016-01-01
One of the prevailing applications of machine learning is the use of predictive modelling in clinical survival analysis. In this work, we present our view of the current situation of computer tools for survival analysis, stressing the need of transferring the latest results in the field of machine learning to biomedical researchers. We propose a web based software for survival analysis called OSA (Online Survival Analysis), which has been developed as an open access and user friendly option to obtain discrete time, predictive survival models at individual level using machine learning techniques, and to perform standard survival analysis. OSA employs an Artificial Neural Network (ANN) based method to produce the predictive survival models. Additionally, the software can easily generate survival and hazard curves with multiple options to personalise the plots, obtain contingency tables from the uploaded data to perform different tests, and fit a Cox regression model from a number of predictor variables. In the Materials and Methods section, we depict the general architecture of the application and introduce the mathematical background of each of the implemented methods. The study concludes with examples of use showing the results obtained with public datasets. PMID:27532883
Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang
2018-01-05
DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html.
Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang
2018-01-01
DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html PMID:29416743
NASA Astrophysics Data System (ADS)
Wang, Gaili; Yang, Ji; Wang, Dan; Liu, Liping
2016-11-01
Extrapolation techniques and storm-scale Numerical Weather Prediction (NWP) models are two primary approaches for short-term precipitation forecasts. The primary objective of this study is to verify precipitation forecasts and compare the performances of two nowcasting schemes: a Beijing Auto-Nowcast system (BJ-ANC) based on extrapolation techniques and a storm-scale NWP model called the Advanced Regional Prediction System (ARPS). The verification and comparison takes into account six heavy precipitation events that occurred in the summer of 2014 and 2015 in Jiangsu, China. The forecast performances of the two schemes were evaluated for the next 6 h at 1-h intervals using gridpoint-based measures of critical success index, bias, index of agreement, root mean square error, and using an object-based verification method called Structure-Amplitude-Location (SAL) score. Regarding gridpoint-based measures, BJ-ANC outperforms ARPS at first, but then the forecast accuracy decreases rapidly with lead time and performs worse than ARPS after 4-5 h of the initial forecast. Regarding the object-based verification method, most forecasts produced by BJ-ANC focus on the center of the diagram at the 1-h lead time and indicate high-quality forecasts. As the lead time increases, BJ-ANC overestimates precipitation amount and produces widespread precipitation, especially at a 6-h lead time. The ARPS model overestimates precipitation at all lead times, particularly at first.
NASA Astrophysics Data System (ADS)
Asadpour-Zeynali, Karim; Bastami, Mohammad
2010-02-01
In this work a new modification of the standard addition method called "net analyte signal standard addition method (NASSAM)" is presented for the simultaneous spectrofluorimetric and spectrophotometric analysis. The proposed method combines the advantages of standard addition method with those of net analyte signal concept. The method can be applied for the determination of analyte in the presence of known interferents. The accuracy of the predictions against H-point standard addition method is not dependent on the shape of the analyte and interferent spectra. The method was successfully applied to simultaneous spectrofluorimetric and spectrophotometric determination of pyridoxine (PY) and melatonin (MT) in synthetic mixtures and in a pharmaceutical formulation.
Ling, Ying; Zhang, Minqiang; Locke, Kenneth D; Li, Guangming; Li, Zonglong
2016-01-01
The Circumplex Scales of Interpersonal Values (CSIV) is a 64-item self-report measure of goals from each octant of the interpersonal circumplex. We used item response theory methods to compare whether dominance models or ideal point models best described how people respond to CSIV items. Specifically, we fit a polytomous dominance model called the generalized partial credit model and an ideal point model of similar complexity called the generalized graded unfolding model to the responses of 1,893 college students. The results of both graphical comparisons of item characteristic curves and statistical comparisons of model fit suggested that an ideal point model best describes the process of responding to CSIV items. The different models produced different rank orderings of high-scoring respondents, but overall the models did not differ in their prediction of criterion variables (agentic and communal interpersonal traits and implicit motives).
Learning Instance-Specific Predictive Models
Visweswaran, Shyam; Cooper, Gregory F.
2013-01-01
This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. PMID:25045325
PlantRNA_Sniffer: A SVM-Based Workflow to Predict Long Intergenic Non-Coding RNAs in Plants.
Vieira, Lucas Maciel; Grativol, Clicia; Thiebaut, Flavia; Carvalho, Thais G; Hardoim, Pablo R; Hemerly, Adriana; Lifschitz, Sergio; Ferreira, Paulo Cavalcanti Gomes; Walter, Maria Emilia M T
2017-03-04
Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane ( Saccharum spp.) and in maize ( Zea mays ). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms.
PlantRNA_Sniffer: A SVM-Based Workflow to Predict Long Intergenic Non-Coding RNAs in Plants
Vieira, Lucas Maciel; Grativol, Clicia; Thiebaut, Flavia; Carvalho, Thais G.; Hardoim, Pablo R.; Hemerly, Adriana; Lifschitz, Sergio; Ferreira, Paulo Cavalcanti Gomes; Walter, Maria Emilia M. T.
2017-01-01
Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane (Saccharum spp.) and in maize (Zea mays). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms. PMID:29657283
Open Rotor Noise Prediction Methods at NASA Langley- A Technology Review
NASA Technical Reports Server (NTRS)
Farassat, F.; Dunn, Mark H.; Tinetti, Ana F.; Nark, Douglas M.
2009-01-01
Open rotors are once again under consideration for propulsion of the future airliners because of their high efficiency. The noise generated by these propulsion systems must meet the stringent noise standards of today to reduce community impact. In this paper we review the open rotor noise prediction methods available at NASA Langley. We discuss three codes called ASSPIN (Advanced Subsonic-Supersonic Propeller Induced Noise), FW - Hpds (Ffowcs Williams-Hawkings with penetrable data surface) and the FSC (Fast Scattering Code). The first two codes are in the time domain and the third code is a frequency domain code. The capabilities of these codes and the input data requirements as well as the output data are presented. Plans for further improvements of these codes are discussed. In particular, a method based on equivalent sources is outlined to get rid of spurious signals in the FW - Hpds code.
Daniell, J F; Herbert, C M; Repp, J; Torbit, C A; Wentz, A C
1982-08-01
A new method for separating X and Y human spermatozoa called convection counter streaming galvanization was evaluated. The method was independently performed by this semenology laboratory with the use of the special separation equipment and extending media provided by its developer, Dr. Bhairab C. Bhattacharya. The mean number of Y spermatozoa increased from 48% to 77% in the separated fraction predicted to be Y-enriched. The fraction predicted to be X-enriched increased from a mean of 52% to 77%. The one separation process allowed accumulation of both enriched fractions simultaneously. The separated portions of spermatozoa maintained good motility and penetration of cervical mucus but produced a mean recovery concentration in the X- and Y-enriched fractions of only 15% to 16% of the preseparation concentration.
A direct-inverse method for transonic and separated flows about airfoils
NASA Technical Reports Server (NTRS)
Carlson, Leland A.
1990-01-01
A direct-inverse technique and computer program called TAMSEP that can be used for the analysis of the flow about airfoils at subsonic and low transonic freestream velocities is presented. The method is based upon a direct-inverse nonconservative full potential inviscid method, a Thwaites laminar boundary layer technique, and the Barnwell turbulent momentum integral scheme; and it is formulated using Cartesian coordinates. Since the method utilizes inverse boundary conditions in regions of separated flow, it is suitable for predicting the flow field about airfoils having trailing edge separated flow under high lift conditions. Comparisons with experimental data indicate that the method should be a useful tool for applied aerodynamic analyses.
Shea, Judy A.; Bellini, Lisa M.; Dinges, David F.; Curtis, Meredith L.; Tao, Yuanyuan; Zhu, Jingsan; Small, Dylan S.; Basner, Mathias; Norton, Laurie; Novak, Cristina; Dine, C. Jessica; Rosen, Ilene M.; Volpp, Kevin G.
2014-01-01
Background Patient safety and sleep experts advocate a protected sleep period for residents. Objective We examined whether interns scheduled for a protected sleep period during overnight call would have better end-of-rotation assessments of burnout, depression, and empathy scores compared with interns without protected sleep periods and whether the amount of sleep obtained during on call predicted end-of-rotation assessments. Methods We conducted a randomized, controlled trial with internal medicine interns at the Philadelphia Veterans Affairs Medical Center (PVAMC) and the Hospital of the University of Pennsylvania (HUP) in academic year 2009–2010. Four-week blocks were randomly assigned to either overnight call permitted under the 2003 duty hour standards or a protected sleep period from 12:30 am to 5:30 am. Participants wore wrist actigraphs. At the beginning and end of the rotations, they completed the Beck Depression Inventory (BDI-II), Maslach Burnout Inventory (MBI-HSS), and Interpersonal Reactivity Index (IRI). Results A total of 106 interns participated. There were no significant differences between groups in end-of-rotation BDI-II, MBI-HSS, or IRI scores at either location (P > .05). Amount of sleep while on call significantly predicted lower MBI-Emotional Exhaustion (P < .003), MBI-Depersonalization (P < .003), and IRI-Personal Distress (P < .006) at PVAMC, and higher IRI-Perspective Taking (P < .008) at HUP. Conclusions A protected sleep period produced few consistent improvements in depression, burnout, or empathy, although depression was already low at baseline. Possibly the amount of protected time was too small to affect these emotional states or sleep may not be directly related to these scores. PMID:24949128
A Study of Pattern Prediction in the Monitoring Data of Earthen Ruins with the Internet of Things.
Xiao, Yun; Wang, Xin; Eshragh, Faezeh; Wang, Xuanhong; Chen, Xiaojiang; Fang, Dingyi
2017-05-11
An understanding of the changes of the rammed earth temperature of earthen ruins is important for protection of such ruins. To predict the rammed earth temperature pattern using the air temperature pattern of the monitoring data of earthen ruins, a pattern prediction method based on interesting pattern mining and correlation, called PPER, is proposed in this paper. PPER first finds the interesting patterns in the air temperature sequence and the rammed earth temperature sequence. To reduce the processing time, two pruning rules and a new data structure based on an R-tree are also proposed. Correlation rules between the air temperature patterns and the rammed earth temperature patterns are then mined. The correlation rules are merged into predictive rules for the rammed earth temperature pattern. Experiments were conducted to show the accuracy of the presented method and the power of the pruning rules. Moreover, the Ming Dynasty Great Wall dataset was used to examine the algorithm, and six predictive rules from the air temperature to rammed earth temperature based on the interesting patterns were obtained, with the average hit rate reaching 89.8%. The PPER and predictive rules will be useful for rammed earth temperature prediction in protection of earthen ruins.
[Methods of artificial intelligence: a new trend in pharmacy].
Dohnal, V; Kuca, K; Jun, D
2005-07-01
Artificial neural networks (ANN) and genetic algorithms are one group of methods called artificial intelligence. The application of ANN on pharmaceutical data can lead to an understanding of the inner structure of data and a possibility to build a model (adaptation). In addition, for certain cases it is possible to extract rules from data. The adapted ANN is prepared for the prediction of properties of compounds which were not used in the adaptation phase. The applications of ANN have great potential in pharmaceutical industry and in the interpretation of analytical, pharmacokinetic or toxicological data.
Adaptive ingredients against food spoilage in Japanese cuisine.
Ohtsubo, Yohsuke
2009-12-01
Billing and Sherman proposed the antimicrobial hypothesis to explain the worldwide spice use pattern. The present study explored whether two antimicrobial ingredients (i.e. spices and vinegar) are used in ways consistent with the antimicrobial hypothesis. Four specific predictions were tested: meat-based recipes would call for more spices/vinegar than vegetable-based recipes; summer recipes would call for more spices/vinegar than winter recipes; recipes in hotter regions would call for more spices/vinegar; and recipes including unheated ingredients would call for more spices/vinegar. Spice/vinegar use patterns were compiled from two types of traditional Japanese cookbooks. Dataset I included recipes provided by elderly Japanese housewives. Dataset II included recipes provided by experts in traditional Japanese foods. The analyses of Dataset I revealed that the vinegar use pattern conformed to the predictions. In contrast, analyses of Dataset II generally supported the predictions in terms of spices, but not vinegar.
Meta-path based heterogeneous combat network link prediction
NASA Astrophysics Data System (ADS)
Li, Jichao; Ge, Bingfeng; Yang, Kewei; Chen, Yingwu; Tan, Yuejin
2017-09-01
The combat system-of-systems in high-tech informative warfare, composed of many interconnected combat systems of different types, can be regarded as a type of complex heterogeneous network. Link prediction for heterogeneous combat networks (HCNs) is of significant military value, as it facilitates reconfiguring combat networks to represent the complex real-world network topology as appropriate with observed information. This paper proposes a novel integrated methodology framework called HCNMP (HCN link prediction based on meta-path) to predict multiple types of links simultaneously for an HCN. More specifically, the concept of HCN meta-paths is introduced, through which the HCNMP can accumulate information by extracting different features of HCN links for all the six defined types. Next, an HCN link prediction model, based on meta-path features, is built to predict all types of links of the HCN simultaneously. Then, the solution algorithm for the HCN link prediction model is proposed, in which the prediction results are obtained by iteratively updating with the newly predicted results until the results in the HCN converge or reach a certain maximum iteration number. Finally, numerical experiments on the dataset of a real HCN are conducted to demonstrate the feasibility and effectiveness of the proposed HCNMP, in comparison with 30 baseline methods. The results show that the performance of the HCNMP is superior to those of the baseline methods.
Garnier, A; Poncet, F; Billette De Villemeur, A; Exbrayat, C; Bon, M F; Chevalier, A; Salicru, B; Tournegros, J M
2009-06-01
The screening program guidelines specify that the call back rate of women for additional imaging (positive mammogram) should not exceed 7% at initial screening, and 5% at subsequent screening. Materials and methods. Results in the Isere region (12%) have prompted a review of the correlation between the call back rate and indicators of quality (detection rate, sensitivity, specificity, positive predictive value) for the radiologists providing interpretations during that time period. Three groups of radiologists were identified: the group with call back rate of 10% achieved the best results (sensitivity: 92%, detection rate: 0.53%, specificity: 90%). The group with lowest call back rate (7.7%) showed insufficient sensitivity (58%). The last group with call back rate of 18.3%, showed no improvement in sensitivity (82%) and detection rate (0.53%), but showed reduced specificity (82%). The protocol update in 2001 does not resolve this problematic situation and national results continue to demonstrate a high percentage of positive screening mammograms. A significant increase in the number of positive screening examinations compared to recommended guidelines is not advantageous and leads to an overall decrease in the quality of the screening.
Impact of the mass media on calls to the CDC National AIDS Hotline.
Fan, D P
1996-06-01
This paper considers new computer methodologies for assessing the impact of different types of public health information. The example used public service announcements (PSAs) and mass media news to predict the volume of attempts to call the CDC National AIDS Hotline from December 1992 through to the end of 1993. The analysis relied solely on data from electronic databases. Newspaper stories and television news transcripts were obtained from the NEXIS electronic database and were scored by machine for AIDS coverage. The PSA database was generated by computer monitoring of advertising distributed by the Centers for Disease Control and Prevention (CDC) and by others. The volume of call attempts was collected automatically by the public branch exchange (PBX) of the Hotline telephone system. The call attempts, the PSAs and the news story data were related to each other using both a standard time series method and the statistical model of ideodynamics. The analysis indicated that the only significant explanatory variable for the call attempts was PSAs produced by the CDC. One possible explanation was that these commercials all included the Hotline telephone number while the other information sources did not.
The Deterministic Information Bottleneck
NASA Astrophysics Data System (ADS)
Strouse, D. J.; Schwab, David
2015-03-01
A fundamental and ubiquitous task that all organisms face is prediction of the future based on past sensory experience. Since an individual's memory resources are limited and costly, however, there is a tradeoff between memory cost and predictive payoff. The information bottleneck (IB) method (Tishby, Pereira, & Bialek 2000) formulates this tradeoff as a mathematical optimization problem using an information theoretic cost function. IB encourages storing as few bits of past sensory input as possible while selectively preserving the bits that are most predictive of the future. Here we introduce an alternative formulation of the IB method, which we call the deterministic information bottleneck (DIB). First, we argue for an alternative cost function, which better represents the biologically-motivated goal of minimizing required memory resources. Then, we show that this seemingly minor change has the dramatic effect of converting the optimal memory encoder from stochastic to deterministic. Next, we propose an iterative algorithm for solving the DIB problem. Additionally, we compare the IB and DIB methods on a variety of synthetic datasets, and examine the performance of retinal ganglion cell populations relative to the optimal encoding strategy for each problem.
Mining the key predictors for event outbreaks in social networks
NASA Astrophysics Data System (ADS)
Yi, Chengqi; Bao, Yuanyuan; Xue, Yibo
2016-04-01
It will be beneficial to devise a method to predict a so-called event outbreak. Existing works mainly focus on exploring effective methods for improving the accuracy of predictions, while ignoring the underlying causes: What makes event go viral? What factors that significantly influence the prediction of an event outbreak in social networks? In this paper, we proposed a novel definition for an event outbreak, taking into account the structural changes to a network during the propagation of content. In addition, we investigated features that were sensitive to predicting an event outbreak. In order to investigate the universality of these features at different stages of an event, we split the entire lifecycle of an event into 20 equal segments according to the proportion of the propagation time. We extracted 44 features, including features related to content, users, structure, and time, from each segment of the event. Based on these features, we proposed a prediction method using supervised classification algorithms to predict event outbreaks. Experimental results indicate that, as time goes by, our method is highly accurate, with a precision rate ranging from 79% to 97% and a recall rate ranging from 74% to 97%. In addition, after applying a feature-selection algorithm, the top five selected features can considerably improve the accuracy of the prediction. Data-driven experimental results show that the entropy of the eigenvector centrality, the entropy of the PageRank, the standard deviation of the betweenness centrality, the proportion of re-shares without content, and the average path length are the key predictors for an event outbreak. Our findings are especially useful for further exploring the intrinsic characteristics of outbreak prediction.
NWP model forecast skill optimization via closure parameter variations
NASA Astrophysics Data System (ADS)
Järvinen, H.; Ollinaho, P.; Laine, M.; Solonen, A.; Haario, H.
2012-04-01
We present results of a novel approach to tune predictive skill of numerical weather prediction (NWP) models. These models contain tunable parameters which appear in parameterizations schemes of sub-grid scale physical processes. The current practice is to specify manually the numerical parameter values, based on expert knowledge. We developed recently a concept and method (QJRMS 2011) for on-line estimation of the NWP model parameters via closure parameter variations. The method called EPPES ("Ensemble prediction and parameter estimation system") utilizes ensemble prediction infra-structure for parameter estimation in a very cost-effective way: practically no new computations are introduced. The approach provides an algorithmic decision making tool for model parameter optimization in operational NWP. In EPPES, statistical inference about the NWP model tunable parameters is made by (i) generating an ensemble of predictions so that each member uses different model parameter values, drawn from a proposal distribution, and (ii) feeding-back the relative merits of the parameter values to the proposal distribution, based on evaluation of a suitable likelihood function against verifying observations. In this presentation, the method is first illustrated in low-order numerical tests using a stochastic version of the Lorenz-95 model which effectively emulates the principal features of ensemble prediction systems. The EPPES method correctly detects the unknown and wrongly specified parameters values, and leads to an improved forecast skill. Second, results with an ensemble prediction system emulator, based on the ECHAM5 atmospheric GCM show that the model tuning capability of EPPES scales up to realistic models and ensemble prediction systems. Finally, preliminary results of EPPES in the context of ECMWF forecasting system are presented.
[Effect of leader-member exchange on nurses'sense of calling in workplace].
Zhang, L G; Ma, H L; Wang, Z J; Zhou, Y Y; Jin, T T
2017-12-20
Objective: To investigate the effect of leader-member exchange on nurses'sense of calling in workplace based on self-determination theory. Methods: A total of 381 nurses were randomly selected from five tertiary general hospitals in Zhejiang province, China from October to December, 2016. They were subjected to a survey using the Leader-Member Exchange Scale, Job Autonomy Scale, Core Self-Evaluation Scale, and Calling Scale. The mediating effect was used to test the procedures and the data were subjected to hierarchical regression analysis. Results: The leader-member exchange was positively correlated with job autonomy, core self-evaluation, and sense of calling ( r =0.471, P <0.001; r =0.373, P <0.001; r =0.475, P <0.001) ; the leader-member exchange had a positive predictive effect on job autonomy and sense of calling ( β = 0.47, P <0.001; β =0.48, P <0.001) ; the job autonomy had a partial mediating effect on the relationship between leader-member exchange and sense of calling ( F =66.50, P <0.001) ; the core self-evaluation negatively adjusted the positive relationship between leader-member exchange and job autonomy ( F =27.81, P <0.001) . Conclusion: High-quality leader-member exchange enhances the sense of calling by improving staffs' job autonomy and the core self-evaluation reduces the positive relationship between leader-member exchange and job autonomy.
A Consensus Method for the Prediction of ‘Aggregation-Prone’ Peptides in Globular Proteins
Tsolis, Antonios C.; Papandreou, Nikos C.; Iconomidou, Vassiliki A.; Hamodrakas, Stavros J.
2013-01-01
The purpose of this work was to construct a consensus prediction algorithm of ‘aggregation-prone’ peptides in globular proteins, combining existing tools. This allows comparison of the different algorithms and the production of more objective and accurate results. Eleven (11) individual methods are combined and produce AMYLPRED2, a publicly, freely available web tool to academic users (http://biophysics.biol.uoa.gr/AMYLPRED2), for the consensus prediction of amyloidogenic determinants/‘aggregation-prone’ peptides in proteins, from sequence alone. The performance of AMYLPRED2 indicates that it functions better than individual aggregation-prediction algorithms, as perhaps expected. AMYLPRED2 is a useful tool for identifying amyloid-forming regions in proteins that are associated with several conformational diseases, called amyloidoses, such as Altzheimer's, Parkinson's, prion diseases and type II diabetes. It may also be useful for understanding the properties of protein folding and misfolding and for helping to the control of protein aggregation/solubility in biotechnology (recombinant proteins forming bacterial inclusion bodies) and biotherapeutics (monoclonal antibodies and biopharmaceutical proteins). PMID:23326595
Comparison of simplified models in the prediction of two phase flow in pipelines
NASA Astrophysics Data System (ADS)
Jerez-Carrizales, M.; Jaramillo, J. E.; Fuentes, D.
2014-06-01
Prediction of two phase flow in pipelines is a common task in engineering. It is a complex phenomenon and many models have been developed to find an approximate solution to the problem. Some old models, such as the Hagedorn & Brown (HB) model, have been highlighted by many authors to give very good performance. Furthermore, many modifications have been applied to this method to improve its predictions. In this work two simplified models which are based on empiricism (HB and Mukherjee and Brill, MB) are considered. One mechanistic model which is based on the physics of the phenomenon (AN) and it still needs some correlations called closure relations is also used. Moreover, a drift flux model defined in steady state that is flow pattern dependent (HK model) is implemented. The implementation of these methods was tested using published data in the scientific literature for vertical upward flows. Furthermore, a comparison of the predictive performance of the four models is done against a well from Campo Escuela Colorado. Difference among four models is smaller than difference with experimental data from the well in Campo Escuela Colorado.
NASA Astrophysics Data System (ADS)
Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi
2018-04-01
Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.
Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi
2018-03-13
Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models' performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.
Zhang, Xiaotian; Yin, Jian; Zhang, Xu
2018-03-02
Increasing evidence suggests that dysregulation of microRNAs (miRNAs) may lead to a variety of diseases. Therefore, identifying disease-related miRNAs is a crucial problem. Currently, many computational approaches have been proposed to predict binary miRNA-disease associations. In this study, in order to predict underlying miRNA-disease association types, a semi-supervised model called the network-based label propagation algorithm is proposed to infer multiple types of miRNA-disease associations (NLPMMDA) by mutual information derived from the heterogeneous network. The NLPMMDA method integrates disease semantic similarity, miRNA functional similarity, and Gaussian interaction profile kernel similarity information of miRNAs and diseases to construct a heterogeneous network. NLPMMDA is a semi-supervised model which does not require verified negative samples. Leave-one-out cross validation (LOOCV) was implemented for four known types of miRNA-disease associations and demonstrated the reliable performance of our method. Moreover, case studies of lung cancer and breast cancer confirmed effective performance of NLPMMDA to predict novel miRNA-disease associations and their association types.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ecale Zhou, Carol L.
2016-07-05
Compare Gene Calls (CGC) is a Python code used for combining and comparing gene calls from any number of gene callers. A gene caller is a computer program that predicts the extends of open reading frames within genomes of biological organisms.
Resistance gene identification from Larimichthys crocea with machine learning techniques
NASA Astrophysics Data System (ADS)
Cai, Yinyin; Liao, Zhijun; Ju, Ying; Liu, Juan; Mao, Yong; Liu, Xiangrong
2016-12-01
The research on resistance genes (R-gene) plays a vital role in bioinformatics as it has the capability of coping with adverse changes in the external environment, which can form the corresponding resistance protein by transcription and translation. It is meaningful to identify and predict R-gene of Larimichthys crocea (L.Crocea). It is friendly for breeding and the marine environment as well. Large amounts of L.Crocea’s immune mechanisms have been explored by biological methods. However, much about them is still unclear. In order to break the limited understanding of the L.Crocea’s immune mechanisms and to detect new R-gene and R-gene-like genes, this paper came up with a more useful combination prediction method, which is to extract and classify the feature of available genomic data by machine learning. The effectiveness of feature extraction and classification methods to identify potential novel R-gene was evaluated, and different statistical analyzes were utilized to explore the reliability of prediction method, which can help us further understand the immune mechanisms of L.Crocea against pathogens. In this paper, a webserver called LCRG-Pred is available at http://server.malab.cn/rg_lc/.
Muley, Vijaykumar Yogesh; Ranjan, Akash
2012-01-01
Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.
Geostatistics for spatial genetic structures: study of wild populations of perennial ryegrass.
Monestiez, P; Goulard, M; Charmet, G
1994-04-01
Methods based on geostatistics were applied to quantitative traits of agricultural interest measured on a collection of 547 wild populations of perennial ryegrass in France. The mathematical background of these methods, which resembles spatial autocorrelation analysis, is briefly described. When a single variable is studied, the spatial structure analysis is similar to spatial autocorrelation analysis, and a spatial prediction method, called "kriging", gives a filtered map of the spatial pattern over all the sampled area. When complex interactions of agronomic traits with different evaluation sites define a multivariate structure for the spatial analysis, geostatistical methods allow the spatial variations to be broken down into two main spatial structures with ranges of 120 km and 300 km, respectively. The predicted maps that corresponded to each range were interpreted as a result of the isolation-by-distance model and as a consequence of selection by environmental factors. Practical collecting methodology for breeders may be derived from such spatial structures.
Magrath, Robert D; Platzen, Dirk; Kondo, Junko
2006-09-22
Young birds and mammals are extremely vulnerable to predators and so should benefit from responding to parental alarm calls warning of danger. However, young often respond differently from adults. This difference may reflect: (i) an imperfect stage in the gradual development of adult behaviour or (ii) an adaptation to different vulnerability. Altricial birds provide an excellent model to test for adaptive changes with age in response to alarm calls, because fledglings are vulnerable to a different range of predators than nestlings. For example, a flying hawk is irrelevant to a nestling in a enclosed nest, but is dangerous to that individual once it has left the nest, so we predict that young develop a response to aerial alarm calls to coincide with fledging. Supporting our prediction, recently fledged white-browed scrubwrens, Sericornis frontalis, fell silent immediately after playback of their parents' aerial alarm call, whereas nestlings continued to calling despite hearing the playback. Young scrubwrens are therefore exquisitely adapted to the changing risks faced during development.
Comparison of Methods of Detection of Exceptional Sequences in Prokaryotic Genomes.
Rusinov, I S; Ershova, A S; Karyagina, A S; Spirin, S A; Alexeevski, A V
2018-02-01
Many proteins need recognition of specific DNA sequences for functioning. The number of recognition sites and their distribution along the DNA might be of biological importance. For example, the number of restriction sites is often reduced in prokaryotic and phage genomes to decrease the probability of DNA cleavage by restriction endonucleases. We call a sequence an exceptional one if its frequency in a genome significantly differs from one predicted by some mathematical model. An exceptional sequence could be either under- or over-represented, depending on its frequency in comparison with the predicted one. Exceptional sequences could be considered biologically meaningful, for example, as targets of DNA-binding proteins or as parts of abundant repetitive elements. Several methods to predict frequency of a short sequence in a genome, based on actual frequencies of certain its subsequences, are used. The most popular are methods based on Markov chain models. But any rigorous comparison of the methods has not previously been performed. We compared three methods for the prediction of short sequence frequencies: the maximum-order Markov chain model-based method, the method that uses geometric mean of extended Markovian estimates, and the method that utilizes frequencies of all subsequences including discontiguous ones. We applied them to restriction sites in complete genomes of 2500 prokaryotic species and demonstrated that the results depend greatly on the method used: lists of 5% of the most under-represented sites differed by up to 50%. The method designed by Burge and coauthors in 1992, which utilizes all subsequences of the sequence, showed a higher precision than the other two methods both on prokaryotic genomes and randomly generated sequences after computational imitation of selective pressure. We propose this method as the first choice for detection of exceptional sequences in prokaryotic genomes.
Life Extending Control. [mechanical fatigue in reusable rocket engines
NASA Technical Reports Server (NTRS)
Lorenzo, Carl F.; Merrill, Walter C.
1991-01-01
The concept of Life Extending Control is defined. Life is defined in terms of mechanical fatigue life. A brief description is given of the current approach to life prediction using a local, cyclic, stress-strain approach for a critical system component. An alternative approach to life prediction based on a continuous functional relationship to component performance is proposed. Based on cyclic life prediction, an approach to life extending control, called the Life Management Approach, is proposed. A second approach, also based on cyclic life prediction, called the implicit approach, is presented. Assuming the existence of the alternative functional life prediction approach, two additional concepts for Life Extending Control are presented.
Life extending control: A concept paper
NASA Technical Reports Server (NTRS)
Lorenzo, Carl F.; Merrill, Walter C.
1991-01-01
The concept of Life Extending Control is defined. Life is defined in terms of mechanical fatigue life. A brief description is given of the current approach to life prediction using a local, cyclic, stress-strain approach for a critical system component. An alternative approach to life prediction based on a continuous functional relationship to component performance is proposed.Base on cyclic life prediction an approach to Life Extending Control, called the Life Management Approach is proposed. A second approach, also based on cyclic life prediction, called the Implicit Approach, is presented. Assuming the existence of the alternative functional life prediction approach, two additional concepts for Life Extending Control are presented.
Examining predictors of healthcare utilization in youth with inflammatory bowel disease.
Wojtowicz, Andrea A; Plevinsky, Jill M; Poulopoulos, Natasha; Schurman, Jennifer V; Greenley, Rachel N
2016-04-01
Traditional definitions of healthcare utilization (HCU) emphasize clinical visits and procedures. Clinic calls, an understudied form of HCU, occur with high frequency. Understanding and examining predictors of HCU, such as disease activity and parent distress, may help reduce overutilization. A total of 68 adolescents with inflammatory bowel disease [IBD; mean (SD) =14.18 (1.92) years] and their parents participated. Parent distress was assessed through parent report on the PedsQL Family Impact Module, and physicians provided ratings of patient disease activity using the Physician's Global Assessment index. Medical record reviews yielded HCU and clinic call information for 12 months after enrollment. HCU was operationalized as the total number of routine and sick gastrointestinal clinic visits, Emergency room visits, and IBD-related hospitalizations. A call composite reflected the total number of calls related to IBD symptoms/illness. Disease activity and parent distress predicted 12% of the variance in calls and 12% of the variance in HCU. Disease activity was the only significant predictor of clinic calls after accounting for the impact of other predictors; however, parent distress was the only individual variable that contributed significant variance to the prediction of HCU after accounting for other predictors. Greater parent distress and disease activity together predicted HCU and clinic calls. Disease activity was the most salient predictor of calls, whereas parent distress was the most salient predictor of in-person HCU. Clinic calls should not be overlooked as a form of HCU, as communication that takes place outside of scheduled appointments utilizes resources and may indicate poorer disease control.
NASA Astrophysics Data System (ADS)
Tchitchekova, Deyana S.; Morthomas, Julien; Ribeiro, Fabienne; Ducher, Roland; Perez, Michel
2014-07-01
A novel method for accurate and efficient evaluation of the change in energy barriers for carbon diffusion in ferrite under heterogeneous stress is introduced. This method, called Linear Combination of Stress States, is based on the knowledge of the effects of simple stresses (uniaxial or shear) on these diffusion barriers. Then, it is assumed that the change in energy barriers under a complex stress can be expressed as a linear combination of these already known simple stress effects. The modifications of energy barriers by either uniaxial traction/compression and shear stress are determined by means of atomistic simulations with the Climbing Image-Nudge Elastic Band method and are stored as a set of functions. The results of this method are compared to the predictions of anisotropic elasticity theory. It is shown that, linear anisotropic elasticity fails to predict the correct energy barrier variation with stress (especially with shear stress) whereas the proposed method provides correct energy barrier variation for stresses up to ˜3 GPa. This study provides a basis for the development of multiscale models of diffusion under non-uniform stress.
Tchitchekova, Deyana S; Morthomas, Julien; Ribeiro, Fabienne; Ducher, Roland; Perez, Michel
2014-07-21
A novel method for accurate and efficient evaluation of the change in energy barriers for carbon diffusion in ferrite under heterogeneous stress is introduced. This method, called Linear Combination of Stress States, is based on the knowledge of the effects of simple stresses (uniaxial or shear) on these diffusion barriers. Then, it is assumed that the change in energy barriers under a complex stress can be expressed as a linear combination of these already known simple stress effects. The modifications of energy barriers by either uniaxial traction/compression and shear stress are determined by means of atomistic simulations with the Climbing Image-Nudge Elastic Band method and are stored as a set of functions. The results of this method are compared to the predictions of anisotropic elasticity theory. It is shown that, linear anisotropic elasticity fails to predict the correct energy barrier variation with stress (especially with shear stress) whereas the proposed method provides correct energy barrier variation for stresses up to ∼3 GPa. This study provides a basis for the development of multiscale models of diffusion under non-uniform stress.
Coupled loads analysis for Space Shuttle payloads
NASA Technical Reports Server (NTRS)
Eldridge, J.
1992-01-01
Described here is a method for determining the transient response of, and the resultant loads in, a system exposed to predicted external forces. In this case, the system consists of four racks mounted on the inside of a space station resource node module (SSRNMO) which is mounted in the payload bay of the space shuttle. The predicted external forces are forcing functions which envelope worst case forces applied to the shuttle during liftoff and landing. This analysis, called a coupled loads analysis, is used to couple the payload and shuttle models together, determine the transient response of the system, and then recover payload loads, payload accelerations, and payload to shuttle interface forces.
Mining HIV protease cleavage data using genetic programming with a sum-product function.
Yang, Zheng Rong; Dalby, Andrew R; Qiu, Jing
2004-12-12
In order to design effective HIV inhibitors, studying and understanding the mechanism of HIV protease cleavage specification is critical. Various methods have been developed to explore the specificity of HIV protease cleavage activity. However, success in both extracting discriminant rules and maintaining high prediction accuracy is still challenging. The earlier study had employed genetic programming with a min-max scoring function to extract discriminant rules with success. However, the decision will finally be degenerated to one residue making further improvement of the prediction accuracy difficult. The challenge of revising the min-max scoring function so as to improve the prediction accuracy motivated this study. This paper has designed a new scoring function called a sum-product function for extracting HIV protease cleavage discriminant rules using genetic programming methods. The experiments show that the new scoring function is superior to the min-max scoring function. The software package can be obtained by request to Dr Zheng Rong Yang.
Paull, Evan O; Carlin, Daniel E; Niepel, Mario; Sorger, Peter K; Haussler, David; Stuart, Joshua M
2013-11-01
Identifying the cellular wiring that connects genomic perturbations to transcriptional changes in cancer is essential to gain a mechanistic understanding of disease initiation, progression and ultimately to predict drug response. We have developed a method called Tied Diffusion Through Interacting Events (TieDIE) that uses a network diffusion approach to connect genomic perturbations to gene expression changes characteristic of cancer subtypes. The method computes a subnetwork of protein-protein interactions, predicted transcription factor-to-target connections and curated interactions from literature that connects genomic and transcriptomic perturbations. Application of TieDIE to The Cancer Genome Atlas and a breast cancer cell line dataset identified key signaling pathways, with examples impinging on MYC activity. Interlinking genes are predicted to correspond to essential components of cancer signaling and may provide a mechanistic explanation of tumor character and suggest subtype-specific drug targets. Software is available from the Stuart lab's wiki: https://sysbiowiki.soe.ucsc.edu/tiedie. jstuart@ucsc.edu. Supplementary data are available at Bioinformatics online.
Generalized Predictive Control of Dynamic Systems with Rigid-Body Modes
NASA Technical Reports Server (NTRS)
Kvaternik, Raymond G.
2013-01-01
Numerical simulations to assess the effectiveness of Generalized Predictive Control (GPC) for active control of dynamic systems having rigid-body modes are presented. GPC is a linear, time-invariant, multi-input/multi-output predictive control method that uses an ARX model to characterize the system and to design the controller. Although the method can accommodate both embedded (implicit) and explicit feedforward paths for incorporation of disturbance effects, only the case of embedded feedforward in which the disturbances are assumed to be unknown is considered here. Results from numerical simulations using mathematical models of both a free-free three-degree-of-freedom mass-spring-dashpot system and the XV-15 tiltrotor research aircraft are presented. In regulation mode operation, which calls for zero system response in the presence of disturbances, the simulations showed reductions of nearly 100%. In tracking mode operations, where the system is commanded to follow a specified path, the GPC controllers produced the desired responses, even in the presence of disturbances.
Correlates of Gay-Related Name-Calling in Schools
ERIC Educational Resources Information Center
Slaatten, Hilde; Hetland, Jørn; Anderssen, Norman
2015-01-01
The aim of this study was to examine whether attitudes about gay-related name-calling, social norms concerning gay-related name-calling among co-students, teacher intervention, and school-related support would predict whether secondary school pupils had called another pupil a gay-related name during the last month. A total of 921 ninth-grade…
Kim, Seong Gon; Theera-Ampornpunt, Nawanol; Fang, Chih-Hao; Harwani, Mrudul; Grama, Ananth; Chaterji, Somali
2016-08-01
Gene expression is mediated by specialized cis-regulatory modules (CRMs), the most prominent of which are called enhancers. Early experiments indicated that enhancers located far from the gene promoters are often responsible for mediating gene transcription. Knowing their properties, regulatory activity, and genomic targets is crucial to the functional understanding of cellular events, ranging from cellular homeostasis to differentiation. Recent genome-wide investigation of epigenomic marks has indicated that enhancer elements could be enriched for certain epigenomic marks, such as, combinatorial patterns of histone modifications. Our efforts in this paper are motivated by these recent advances in epigenomic profiling methods, which have uncovered enhancer-associated chromatin features in different cell types and organisms. Specifically, in this paper, we use recent state-of-the-art Deep Learning methods and develop a deep neural network (DNN)-based architecture, called EP-DNN, to predict the presence and types of enhancers in the human genome. It uses as features, the expression levels of the histone modifications at the peaks of the functional sites as well as in its adjacent regions. We apply EP-DNN to four different cell types: H1, IMR90, HepG2, and HeLa S3. We train EP-DNN using p300 binding sites as enhancers, and TSS and random non-DHS sites as non-enhancers. We perform EP-DNN predictions to quantify the validation rate for different levels of confidence in the predictions and also perform comparisons against two state-of-the-art computational models for enhancer predictions, DEEP-ENCODE and RFECS. We find that EP-DNN has superior accuracy and takes less time to make predictions. Next, we develop methods to make EP-DNN interpretable by computing the importance of each input feature in the classification task. This analysis indicates that the important histone modifications were distinct for different cell types, with some overlaps, e.g., H3K27ac was important in cell type H1 but less so in HeLa S3, while H3K4me1 was relatively important in all four cell types. We finally use the feature importance analysis to reduce the number of input features needed to train the DNN, thus reducing training time, which is often the computational bottleneck in the use of a DNN. In this paper, we developed EP-DNN, which has high accuracy of prediction, with validation rates above 90 % for the operational region of enhancer prediction for all four cell lines that we studied, outperforming DEEP-ENCODE and RFECS. Then, we developed a method to analyze a trained DNN and determine which histone modifications are important, and within that, which features proximal or distal to the enhancer site, are important.
An, Ji-Yong; Zhang, Lei; Zhou, Yong; Zhao, Yu-Jun; Wang, Da-Fu
2017-08-18
Self-interactions Proteins (SIPs) is important for their biological activity owing to the inherent interaction amongst their secondary structures or domains. However, due to the limitations of experimental Self-interactions detection, one major challenge in the study of prediction SIPs is how to exploit computational approaches for SIPs detection based on evolutionary information contained protein sequence. In the work, we presented a novel computational approach named WELM-LAG, which combined the Weighed-Extreme Learning Machine (WELM) classifier with Local Average Group (LAG) to predict SIPs based on protein sequence. The major improvement of our method lies in presenting an effective feature extraction method used to represent candidate Self-interactions proteins by exploring the evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix (PSSM); and then employing a reliable and robust WELM classifier to carry out classification. In addition, the Principal Component Analysis (PCA) approach is used to reduce the impact of noise. The WELM-LAG method gave very high average accuracies of 92.94 and 96.74% on yeast and human datasets, respectively. Meanwhile, we compared it with the state-of-the-art support vector machine (SVM) classifier and other existing methods on human and yeast datasets, respectively. Comparative results indicated that our approach is very promising and may provide a cost-effective alternative for predicting SIPs. In addition, we developed a freely available web server called WELM-LAG-SIPs to predict SIPs. The web server is available at http://219.219.62.123:8888/WELMLAG/ .
NASA Astrophysics Data System (ADS)
Klotzsch, Stephan; Binder, Martin; Händel, Falk
2017-06-01
While planning tracer tests, uncertainties in geohydraulic parameters should be considered as an important factor. Neglecting these uncertainties can lead to missing the tracer breakthrough, for example. One way to consider uncertainties during tracer test design is the so called ensemble forecast. The applicability of this method to geohydrological problems is demonstrated by coupling the method with two analytical solute transport models. The algorithm presented in this article is suitable for prediction as well as parameter estimation. The parameter estimation function can be used in a tracer test for reducing the uncertainties in the measured data which can improve the initial prediction. The algorithm was implemented into a software tool which is freely downloadable from the website of the Institute for Groundwater Management at TU Dresden, Germany.
Statistical physics in foreign exchange currency and stock markets
NASA Astrophysics Data System (ADS)
Ausloos, M.
2000-09-01
Problems in economy and finance have attracted the interest of statistical physicists all over the world. Fundamental problems pertain to the existence or not of long-, medium- or/and short-range power-law correlations in various economic systems, to the presence of financial cycles and on economic considerations, including economic policy. A method like the detrended fluctuation analysis is recalled emphasizing its value in sorting out correlation ranges, thereby leading to predictability at short horizon. The ( m, k)-Zipf method is presented for sorting out short-range correlations in the sign and amplitude of the fluctuations. A well-known financial analysis technique, the so-called moving average, is shown to raise questions to physicists about fractional Brownian motion properties. Among spectacular results, the possibility of crash predictions has been demonstrated through the log-periodicity of financial index oscillations.
Statistical Analysis of Complexity Generators for Cost Estimation
NASA Technical Reports Server (NTRS)
Rowell, Ginger Holmes
1999-01-01
Predicting the cost of cutting edge new technologies involved with spacecraft hardware can be quite complicated. A new feature of the NASA Air Force Cost Model (NAFCOM), called the Complexity Generator, is being developed to model the complexity factors that drive the cost of space hardware. This parametric approach is also designed to account for the differences in cost, based on factors that are unique to each system and subsystem. The cost driver categories included in this model are weight, inheritance from previous missions, technical complexity, and management factors. This paper explains the Complexity Generator framework, the statistical methods used to select the best model within this framework, and the procedures used to find the region of predictability and the prediction intervals for the cost of a mission.
Extending Theory-Based Quantitative Predictions to New Health Behaviors.
Brick, Leslie Ann D; Velicer, Wayne F; Redding, Colleen A; Rossi, Joseph S; Prochaska, James O
2016-04-01
Traditional null hypothesis significance testing suffers many limitations and is poorly adapted to theory testing. A proposed alternative approach, called Testing Theory-based Quantitative Predictions, uses effect size estimates and confidence intervals to directly test predictions based on theory. This paper replicates findings from previous smoking studies and extends the approach to diet and sun protection behaviors using baseline data from a Transtheoretical Model behavioral intervention (N = 5407). Effect size predictions were developed using two methods: (1) applying refined effect size estimates from previous smoking research or (2) using predictions developed by an expert panel. Thirteen of 15 predictions were confirmed for smoking. For diet, 7 of 14 predictions were confirmed using smoking predictions and 6 of 16 using expert panel predictions. For sun protection, 3 of 11 predictions were confirmed using smoking predictions and 5 of 19 using expert panel predictions. Expert panel predictions and smoking-based predictions poorly predicted effect sizes for diet and sun protection constructs. Future studies should aim to use previous empirical data to generate predictions whenever possible. The best results occur when there have been several iterations of predictions for a behavior, such as with smoking, demonstrating that expected values begin to converge on the population effect size. Overall, the study supports necessity in strengthening and revising theory with empirical data.
Interpretable Deep Models for ICU Outcome Prediction
Che, Zhengping; Purushotham, Sanjay; Khemani, Robinder; Liu, Yan
2016-01-01
Exponential surge in health care data, such as longitudinal data from electronic health records (EHR), sensor data from intensive care unit (ICU), etc., is providing new opportunities to discover meaningful data-driven characteristics and patterns ofdiseases. Recently, deep learning models have been employedfor many computational phenotyping and healthcare prediction tasks to achieve state-of-the-art performance. However, deep models lack interpretability which is crucial for wide adoption in medical research and clinical decision-making. In this paper, we introduce a simple yet powerful knowledge-distillation approach called interpretable mimic learning, which uses gradient boosting trees to learn interpretable models and at the same time achieves strong prediction performance as deep learning models. Experiment results on Pediatric ICU dataset for acute lung injury (ALI) show that our proposed method not only outperforms state-of-the-art approaches for morality and ventilator free days prediction tasks but can also provide interpretable models to clinicians. PMID:28269832
Computational prediction of protein hot spot residues.
Morrow, John Kenneth; Zhang, Shuxing
2012-01-01
Most biological processes involve multiple proteins interacting with each other. It has been recently discovered that certain residues in these protein-protein interactions, which are called hot spots, contribute more significantly to binding affinity than others. Hot spot residues have unique and diverse energetic properties that make them challenging yet important targets in the modulation of protein-protein complexes. Design of therapeutic agents that interact with hot spot residues has proven to be a valid methodology in disrupting unwanted protein-protein interactions. Using biological methods to determine which residues are hot spots can be costly and time consuming. Recent advances in computational approaches to predict hot spots have incorporated a myriad of features, and have shown increasing predictive successes. Here we review the state of knowledge around protein-protein interactions, hot spots, and give an overview of multiple in silico prediction techniques of hot spot residues.
Computational Prediction of Hot Spot Residues
Morrow, John Kenneth; Zhang, Shuxing
2013-01-01
Most biological processes involve multiple proteins interacting with each other. It has been recently discovered that certain residues in these protein-protein interactions, which are called hot spots, contribute more significantly to binding affinity than others. Hot spot residues have unique and diverse energetic properties that make them challenging yet important targets in the modulation of protein-protein complexes. Design of therapeutic agents that interact with hot spot residues has proven to be a valid methodology in disrupting unwanted protein-protein interactions. Using biological methods to determine which residues are hot spots can be costly and time consuming. Recent advances in computational approaches to predict hot spots have incorporated a myriad of features, and have shown increasing predictive successes. Here we review the state of knowledge around protein-protein interactions, hot spots, and give an overview of multiple in silico prediction techniques of hot spot residues. PMID:22316154
A Comparative Study to Predict Student’s Performance Using Educational Data Mining Techniques
NASA Astrophysics Data System (ADS)
Uswatun Khasanah, Annisa; Harwati
2017-06-01
Student’s performance prediction is essential to be conducted for a university to prevent student fail. Number of student drop out is one of parameter that can be used to measure student performance and one important point that must be evaluated in Indonesia university accreditation. Data Mining has been widely used to predict student’s performance, and data mining that applied in this field usually called as Educational Data Mining. This study conducted Feature Selection to select high influence attributes with student performance in Department of Industrial Engineering Universitas Islam Indonesia. Then, two popular classification algorithm, Bayesian Network and Decision Tree, were implemented and compared to know the best prediction result. The outcome showed that student’s attendance and GPA in the first semester were in the top rank from all Feature Selection methods, and Bayesian Network is outperforming Decision Tree since it has higher accuracy rate.
rpiCOOL: A tool for In Silico RNA-protein interaction detection using random forest.
Akbaripour-Elahabad, Mohammad; Zahiri, Javad; Rafeh, Reza; Eslami, Morteza; Azari, Mahboobeh
2016-08-07
Understanding the principle of RNA-protein interactions (RPIs) is of critical importance to provide insights into post-transcriptional gene regulation and is useful to guide studies about many complex diseases. The limitations and difficulties associated with experimental determination of RPIs, call an urgent need to computational methods for RPI prediction. In this paper, we proposed a machine learning method to detect RNA-protein interactions based on sequence information. We used motif information and repetitive patterns, which have been extracted from experimentally validated RNA-protein interactions, in combination with sequence composition as descriptors to build a model to RPI prediction via a random forest classifier. About 20% of the "sequence motifs" and "nucleotide composition" features have been selected as the informative features with the feature selection methods. These results suggest that these two feature types contribute effectively in RPI detection. Results of 10-fold cross-validation experiments on three non-redundant benchmark datasets show a better performance of the proposed method in comparison with the current state-of-the-art methods in terms of various performance measures. In addition, the results revealed that the accuracy of the RPI prediction methods could vary considerably across different organisms. We have implemented the proposed method, namely rpiCOOL, as a stand-alone tool with a user friendly graphical user interface (GUI) that enables the researchers to predict RNA-protein interaction. The rpiCOOL is freely available at http://biocool.ir/rpicool.html for non-commercial uses. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Ansari, Hamid Reza
2014-09-01
In this paper we propose a new method for predicting rock porosity based on a combination of several artificial intelligence systems. The method focuses on one of the Iranian carbonate fields in the Persian Gulf. Because there is strong heterogeneity in carbonate formations, estimation of rock properties experiences more challenge than sandstone. For this purpose, seismic colored inversion (SCI) and a new approach of committee machine are used in order to improve porosity estimation. The study comprises three major steps. First, a series of sample-based attributes is calculated from 3D seismic volume. Acoustic impedance is an important attribute that is obtained by the SCI method in this study. Second, porosity log is predicted from seismic attributes using common intelligent computation systems including: probabilistic neural network (PNN), radial basis function network (RBFN), multi-layer feed forward network (MLFN), ε-support vector regression (ε-SVR) and adaptive neuro-fuzzy inference system (ANFIS). Finally, a power law committee machine (PLCM) is constructed based on imperial competitive algorithm (ICA) to combine the results of all previous predictions in a single solution. This technique is called PLCM-ICA in this paper. The results show that PLCM-ICA model improved the results of neural networks, support vector machine and neuro-fuzzy system.
Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords
Koyabu, Shun; Phan, Thi Thanh Thuy; Ohkawa, Takenao
2015-01-01
For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as “bind” or “interact” plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction. PMID:26783534
Comparison of vibrational conductivity and radiative energy transfer methods
NASA Astrophysics Data System (ADS)
Le Bot, A.
2005-05-01
This paper is concerned with the comparison of two methods well suited for the prediction of the wideband response of built-up structures subjected to high-frequency vibrational excitation. The first method is sometimes called the vibrational conductivity method and the second one is rather known as the radiosity method in the field of acoustics, or the radiative energy transfer method. Both are based on quite similar physical assumptions i.e. uncorrelated sources, mean response and high-frequency excitation. Both are based on analogies with some equations encountered in the field of heat transfer. However these models do not lead to similar results. This paper compares the two methods. Some numerical simulations on a pair of plates joined along one edge are provided to illustrate the discussion.
Browning, Brian L.; Yu, Zhaoxia
2009-01-01
We present a novel method for simultaneous genotype calling and haplotype-phase inference. Our method employs the computationally efficient BEAGLE haplotype-frequency model, which can be applied to large-scale studies with millions of markers and thousands of samples. We compare genotype calls made with our method to genotype calls made with the BIRDSEED, CHIAMO, GenCall, and ILLUMINUS genotype-calling methods, using genotype data from the Illumina 550K and Affymetrix 500K arrays. We show that our method has higher genotype-call accuracy and yields fewer uncalled genotypes than competing methods. We perform single-marker analysis of data from the Wellcome Trust Case Control Consortium bipolar disorder and type 2 diabetes studies. For bipolar disorder, the genotype calls in the original study yield 25 markers with apparent false-positive association with bipolar disorder at a p < 10−7 significance level, whereas genotype calls made with our method yield no associated markers at this significance threshold. Conversely, for markers with replicated association with type 2 diabetes, there is good concordance between genotype calls used in the original study and calls made by our method. Results from single-marker and haplotypic analysis of our method's genotype calls for the bipolar disorder study indicate that our method is highly effective at eliminating genotyping artifacts that cause false-positive associations in genome-wide association studies. Our new genotype-calling methods are implemented in the BEAGLE and BEAGLECALL software packages. PMID:19931040
Numerical and Experimental Studies on Impact Loaded Concrete Structures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saarenheimo, Arja; Hakola, Ilkka; Karna, Tuomo
2006-07-01
An experimental set-up has been constructed for medium scale impact tests. The main objective of this effort is to provide data for the calibration and verification of numerical models of a loading scenario where an aircraft impacts against a nuclear power plant. One goal is to develop and take in use numerical methods for predicting response of reinforced concrete structures to impacts of deformable projectiles that may contain combustible liquid ('fuel'). Loading, structural behaviour, like collapsing mechanism and the damage grade, will be predicted by simple analytical methods and using non-linear FE-method. In the so-called Riera method the behavior ofmore » the missile material is assumed to be rigid plastic or rigid visco-plastic. Using elastic plastic and elastic visco-plastic material models calculations are carried out by ABAQUS/Explicit finite element code, assuming axisymmetric deformation mode for the missile. With both methods, typically, the impact force time history, the velocity of the missile rear end and the missile shortening during the impact were recorded for comparisons. (authors)« less
Ouzounoglou, Eleftherios; Kolokotroni, Eleni; Stanulla, Martin; Stamatakos, Georgios S
2018-02-06
Efficient use of Virtual Physiological Human (VPH)-type models for personalized treatment response prediction purposes requires a precise model parameterization. In the case where the available personalized data are not sufficient to fully determine the parameter values, an appropriate prediction task may be followed. This study, a hybrid combination of computational optimization and machine learning methods with an already developed mechanistic model called the acute lymphoblastic leukaemia (ALL) Oncosimulator which simulates ALL progression and treatment response is presented. These methods are used in order for the parameters of the model to be estimated for retrospective cases and to be predicted for prospective ones. The parameter value prediction is based on a regression model trained on retrospective cases. The proposed Hybrid ALL Oncosimulator system has been evaluated when predicting the pre-phase treatment outcome in ALL. This has been correctly achieved for a significant percentage of patient cases tested (approx. 70% of patients). Moreover, the system is capable of denying the classification of cases for which the results are not trustworthy enough. In that case, potentially misleading predictions for a number of patients are avoided, while the classification accuracy for the remaining patient cases further increases. The results obtained are particularly encouraging regarding the soundness of the proposed methodologies and their relevance to the process of achieving clinical applicability of the proposed Hybrid ALL Oncosimulator system and VPH models in general.
A systematic investigation of computation models for predicting Adverse Drug Reactions (ADRs).
Kuang, Qifan; Wang, MinQi; Li, Rong; Dong, YongCheng; Li, Yizhou; Li, Menglong
2014-01-01
Early and accurate identification of adverse drug reactions (ADRs) is critically important for drug development and clinical safety. Computer-aided prediction of ADRs has attracted increasing attention in recent years, and many computational models have been proposed. However, because of the lack of systematic analysis and comparison of the different computational models, there remain limitations in designing more effective algorithms and selecting more useful features. There is therefore an urgent need to review and analyze previous computation models to obtain general conclusions that can provide useful guidance to construct more effective computational models to predict ADRs. In the current study, the main work is to compare and analyze the performance of existing computational methods to predict ADRs, by implementing and evaluating additional algorithms that have been earlier used for predicting drug targets. Our results indicated that topological and intrinsic features were complementary to an extent and the Jaccard coefficient had an important and general effect on the prediction of drug-ADR associations. By comparing the structure of each algorithm, final formulas of these algorithms were all converted to linear model in form, based on this finding we propose a new algorithm called the general weighted profile method and it yielded the best overall performance among the algorithms investigated in this paper. Several meaningful conclusions and useful findings regarding the prediction of ADRs are provided for selecting optimal features and algorithms.
Predicting New Indications for Approved Drugs Using a Proteo-Chemometric Method
Dakshanamurthy, Sivanesan; Issa, Naiem T; Assefnia, Shahin; Seshasayee, Ashwini; Peters, Oakland J; Madhavan, Subha; Uren, Aykut; Brown, Milton L; Byers, Stephen W
2012-01-01
The most effective way to move from target identification to the clinic is to identify already approved drugs with the potential for activating or inhibiting unintended targets (repurposing or repositioning). This is usually achieved by high throughput chemical screening, transcriptome matching or simple in silico ligand docking. We now describe a novel rapid computational proteo-chemometric method called “Train, Match, Fit, Streamline” (TMFS) to map new drug-target interaction space and predict new uses. The TMFS method combines shape, topology and chemical signatures, including docking score and functional contact points of the ligand, to predict potential drug-target interactions with remarkable accuracy. Using the TMFS method, we performed extensive molecular fit computations on 3,671 FDA approved drugs across 2,335 human protein crystal structures. The TMFS method predicts drug-target associations with 91% accuracy for the majority of drugs. Over 58% of the known best ligands for each target were correctly predicted as top ranked, followed by 66%, 76%, 84% and 91% for agents ranked in the top 10, 20, 30 and 40, respectively, out of all 3,671 drugs. Drugs ranked in the top 1–40, that have not been experimentally validated for a particular target now become candidates for repositioning. Furthermore, we used the TMFS method to discover that mebendazole, an anti-parasitic with recently discovered and unexpected anti-cancer properties, has the structural potential to inhibit VEGFR2. We confirmed experimentally that mebendazole inhibits VEGFR2 kinase activity as well as angiogenesis at doses comparable with its known effects on hookworm. TMFS also predicted, and was confirmed with surface plasmon resonance, that dimethyl celecoxib and the anti-inflammatory agent celecoxib can bind cadherin-11, an adhesion molecule important in rheumatoid arthritis and poor prognosis malignancies for which no targeted therapies exist. We anticipate that expanding our TMFS method to the >27,000 clinically active agents available worldwide across all targets will be most useful in the repositioning of existing drugs for new therapeutic targets. PMID:22780961
Calls Forecast for the Moscow Ambulance Service. The Impact of Weather Forecast
NASA Astrophysics Data System (ADS)
Gordin, Vladimir; Bykov, Philipp
2015-04-01
We use the known statistics of the calls for the current and previous days to predict them for tomorrow and for the following days. We assume that this algorithm will work operatively, will cyclically update the available information and will move the horizon of the forecast. Sure, the accuracy of such forecasts depends on their lead time, and from a choice of some group of diagnoses. For comparison we used the error of the inertial forecast (tomorrow there will be the same number of calls as today). Our technology has demonstrated accuracy that is approximately two times better compared to the inertial forecast. We obtained the following result: the number of calls depends on the actual weather in the city as well as on its rate of change. We were interested in the accuracy of the forecast for 12-hour sum of the calls in real situations. We evaluate the impact of the meteorological errors [1] on the forecast errors of the number of Ambulance calls. The weather and the Ambulance calls number both have seasonal tendencies. Therefore, if we have medical information from one city only, we should separate the impacts of such predictors as "annual variations in the number of calls" and "weather". We need to consider the seasonal tendencies (associated, e. g. with the seasonal migration of the population) and the impact of the air temperature simultaneously, rather than sequentially. We forecasted separately the number of calls with diagnoses of cardiovascular group, where it was demonstrated the advantage of the forecasting method, when we use the maximum daily air temperature as a predictor. We have a chance to evaluate statistically the influence of meteorological factors on the dynamics of medical problems. In some cases it may be useful for understanding of the physiology of disease and possible treatment options. We can assimilate some personal archives of medical parameters for the individuals with concrete diseases and the relative meteorological archive. As a result we hope to evaluate how weather can influence the intensity of the disease. Thus, the knowledge of the weather forecast for several days will help us to predict a state of health. The person will be able to take some proactive actions to avoid the anticipated worsening of his health. Literature 1. A. N. Bagrov, F. L. Bykov, V. A. Gordin. Complex Forecast of Surface Meteorological Parameters. Meteorology and Hydrology, 2014, N 5, 5-16 (Russian), 283-291 (English). 2. Bykov, Ph.L., Gordin, V.A., Objective Analysis of the Structure of Three-Dimensional Atmospheric Fronts. Izvestia of Russian Academy of Sciences. Ser. The Physics of Atmosphere and Ocean, 48 (2) (2012), 172-188 (Russian), 152-168 (English), http://dx.doi.org/10.1134/S0001433812020053 3. V.A.Gordin. Mathematical Problems and Methods in Hydrodynamical Weather Forecasting. Amsterdam etc.: Gordon & Breach Publ. House, 2000. 4. V.A.Gordin. Mathematics, Computer, Weather Forecasting, and Other Mathematical Physics' Scenarios. Moscow, Fizmatlit, 2010, 2012 (Russian).
Zhao, Mingbo; Zhang, Zhao; Chow, Tommy W S; Li, Bing
2014-07-01
Dealing with high-dimensional data has always been a major problem in research of pattern recognition and machine learning, and Linear Discriminant Analysis (LDA) is one of the most popular methods for dimension reduction. However, it only uses labeled samples while neglecting unlabeled samples, which are abundant and can be easily obtained in the real world. In this paper, we propose a new dimension reduction method, called "SL-LDA", by using unlabeled samples to enhance the performance of LDA. The new method first propagates label information from the labeled set to the unlabeled set via a label propagation process, where the predicted labels of unlabeled samples, called "soft labels", can be obtained. It then incorporates the soft labels into the construction of scatter matrixes to find a transformed matrix for dimension reduction. In this way, the proposed method can preserve more discriminative information, which is preferable when solving the classification problem. We further propose an efficient approach for solving SL-LDA under a least squares framework, and a flexible method of SL-LDA (FSL-LDA) to better cope with datasets sampled from a nonlinear manifold. Extensive simulations are carried out on several datasets, and the results show the effectiveness of the proposed method. Copyright © 2014 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, L.E.; Brown, J.R.
1977-01-01
Parameters of the mating call of spring peepers (Hyla crucifer) were best predicted by water temperature rather than air or body temperature. Thus, water temperature should most closely approach the true body temperature of the calling frogs.
Non-parallel coevolution of sender and receiver in the acoustic communication system of treefrogs.
Schul, Johannes; Bush, Sarah L
2002-09-07
Advertisement calls of closely related species often differ in quantitative features such as the repetition rate of signal units. These differences are important in species recognition. Current models of signal-receiver coevolution predict two possible patterns in the evolution of the mechanism used by receivers to recognize the call: (i) classical sexual selection models (Fisher process, good genes/indirect benefits, direct benefits models) predict that close relatives use qualitatively similar signal recognition mechanisms tuned to different values of a call parameter; and (ii) receiver bias models (hidden preference, pre-existing bias models) predict that if different signal recognition mechanisms are used by sibling species, evidence of an ancestral mechanism will persist in the derived species, and evidence of a pre-existing bias will be detectable in the ancestral species. We describe qualitatively different call recognition mechanisms in sibling species of treefrogs. Whereas Hyla chrysoscelis uses pulse rate to recognize male calls, Hyla versicolor uses absolute measurements of pulse duration and interval duration. We found no evidence of either hidden preferences or pre-existing biases. The results are compared with similar data from katydids (Tettigonia sp.). In both taxa, the data are not adequately explained by current models of signal-receiver coevolution.
2017-01-01
Background Influenza is a viral respiratory disease capable of causing epidemics that represent a threat to communities worldwide. The rapidly growing availability of electronic “big data” from diagnostic and prediagnostic sources in health care and public health settings permits advance of a new generation of methods for local detection and prediction of winter influenza seasons and influenza pandemics. Objective The aim of this study was to present a method for integrated detection and prediction of influenza virus activity in local settings using electronically available surveillance data and to evaluate its performance by retrospective application on authentic data from a Swedish county. Methods An integrated detection and prediction method was formally defined based on a design rationale for influenza detection and prediction methods adapted for local surveillance. The novel method was retrospectively applied on data from the winter influenza season 2008-09 in a Swedish county (population 445,000). Outcome data represented individuals who met a clinical case definition for influenza (based on International Classification of Diseases version 10 [ICD-10] codes) from an electronic health data repository. Information from calls to a telenursing service in the county was used as syndromic data source. Results The novel integrated detection and prediction method is based on nonmechanistic statistical models and is designed for integration in local health information systems. The method is divided into separate modules for detection and prediction of local influenza virus activity. The function of the detection module is to alert for an upcoming period of increased load of influenza cases on local health care (using influenza-diagnosis data), whereas the function of the prediction module is to predict the timing of the activity peak (using syndromic data) and its intensity (using influenza-diagnosis data). For detection modeling, exponential regression was used based on the assumption that the beginning of a winter influenza season has an exponential growth of infected individuals. For prediction modeling, linear regression was applied on 7-day periods at the time in order to find the peak timing, whereas a derivate of a normal distribution density function was used to find the peak intensity. We found that the integrated detection and prediction method detected the 2008-09 winter influenza season on its starting day (optimal timeliness 0 days), whereas the predicted peak was estimated to occur 7 days ahead of the factual peak and the predicted peak intensity was estimated to be 26% lower than the factual intensity (6.3 compared with 8.5 influenza-diagnosis cases/100,000). Conclusions Our detection and prediction method is one of the first integrated methods specifically designed for local application on influenza data electronically available for surveillance. The performance of the method in a retrospective study indicates that further prospective evaluations of the methods are justified. PMID:28619700
Dai, Hanjun; Umarov, Ramzan; Kuwahara, Hiroyuki; Li, Yu; Song, Le; Gao, Xin
2017-11-15
An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods. Our program is freely available at https://github.com/ramzan1990/sequence2vec. xin.gao@kaust.edu.sa or lsong@cc.gatech.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Structure-Templated Predictions of Novel Protein Interactions from Sequence Information
Betel, Doron; Breitkreuz, Kevin E; Isserlin, Ruth; Dewar-Darch, Danielle; Tyers, Mike; Hogue, Christopher W. V
2007-01-01
The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information. PMID:17892321
Exploring Mouse Protein Function via Multiple Approaches.
Huang, Guohua; Chu, Chen; Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning; Cai, Yu-Dong
2016-01-01
Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.
Exploring Mouse Protein Function via Multiple Approaches
Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning
2016-01-01
Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality. PMID:27846315
Spreco, Armin; Eriksson, Olle; Dahlström, Örjan; Cowling, Benjamin John; Timpka, Toomas
2017-06-15
Influenza is a viral respiratory disease capable of causing epidemics that represent a threat to communities worldwide. The rapidly growing availability of electronic "big data" from diagnostic and prediagnostic sources in health care and public health settings permits advance of a new generation of methods for local detection and prediction of winter influenza seasons and influenza pandemics. The aim of this study was to present a method for integrated detection and prediction of influenza virus activity in local settings using electronically available surveillance data and to evaluate its performance by retrospective application on authentic data from a Swedish county. An integrated detection and prediction method was formally defined based on a design rationale for influenza detection and prediction methods adapted for local surveillance. The novel method was retrospectively applied on data from the winter influenza season 2008-09 in a Swedish county (population 445,000). Outcome data represented individuals who met a clinical case definition for influenza (based on International Classification of Diseases version 10 [ICD-10] codes) from an electronic health data repository. Information from calls to a telenursing service in the county was used as syndromic data source. The novel integrated detection and prediction method is based on nonmechanistic statistical models and is designed for integration in local health information systems. The method is divided into separate modules for detection and prediction of local influenza virus activity. The function of the detection module is to alert for an upcoming period of increased load of influenza cases on local health care (using influenza-diagnosis data), whereas the function of the prediction module is to predict the timing of the activity peak (using syndromic data) and its intensity (using influenza-diagnosis data). For detection modeling, exponential regression was used based on the assumption that the beginning of a winter influenza season has an exponential growth of infected individuals. For prediction modeling, linear regression was applied on 7-day periods at the time in order to find the peak timing, whereas a derivate of a normal distribution density function was used to find the peak intensity. We found that the integrated detection and prediction method detected the 2008-09 winter influenza season on its starting day (optimal timeliness 0 days), whereas the predicted peak was estimated to occur 7 days ahead of the factual peak and the predicted peak intensity was estimated to be 26% lower than the factual intensity (6.3 compared with 8.5 influenza-diagnosis cases/100,000). Our detection and prediction method is one of the first integrated methods specifically designed for local application on influenza data electronically available for surveillance. The performance of the method in a retrospective study indicates that further prospective evaluations of the methods are justified. ©Armin Spreco, Olle Eriksson, Örjan Dahlström, Benjamin John Cowling, Toomas Timpka. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.06.2017.
Mortality prediction system for heart failure with orthogonal relief and dynamic radius means.
Wang, Zhe; Yao, Lijuan; Li, Dongdong; Ruan, Tong; Liu, Min; Gao, Ju
2018-07-01
This paper constructs a mortality prediction system based on a real-world dataset. This mortality prediction system aims to predict mortality in heart failure (HF) patients. Effective mortality prediction can improve resources allocation and clinical outcomes, avoiding inappropriate overtreatment of low-mortality patients and discharging of high-mortality patients. This system covers three mortality prediction targets: prediction of in-hospital mortality, prediction of 30-day mortality and prediction of 1-year mortality. HF data are collected from the Shanghai Shuguang hospital. 10,203 in-patients records are extracted from encounters occurring between March 2009 and April 2016. The records involve 4682 patients, including 539 death cases. A feature selection method called Orthogonal Relief (OR) algorithm is first used to reduce the dimensionality. Then, a classification algorithm named Dynamic Radius Means (DRM) is proposed to predict the mortality in HF patients. The comparative experimental results demonstrate that mortality prediction system achieves high performance in all targets by DRM. It is noteworthy that the performance of in-hospital mortality prediction achieves 87.3% in AUC (35.07% improvement). Moreover, the AUC of 30-day and 1-year mortality prediction reach to 88.45% and 84.84%, respectively. Especially, the system could keep itself effective and not deteriorate when the dimension of samples is sharply reduced. The proposed system with its own method DRM can predict mortality in HF patients and achieve high performance in all three mortality targets. Furthermore, effective feature selection strategy can boost the system. This system shows its importance in real-world applications, assisting clinicians in HF treatment by providing crucial decision information. Copyright © 2018 Elsevier B.V. All rights reserved.
Johansen, Morten Bo; Izarzugaza, Jose M. G.; Brunak, Søren; Petersen, Thomas Nordahl; Gupta, Ramneek
2013-01-01
We have developed a sequence conservation-based artificial neural network predictor called NetDiseaseSNP which classifies nsSNPs as disease-causing or neutral. Our method uses the excellent alignment generation algorithm of SIFT to identify related sequences and a combination of 31 features assessing sequence conservation and the predicted surface accessibility to produce a single score which can be used to rank nsSNPs based on their potential to cause disease. NetDiseaseSNP classifies successfully disease-causing and neutral mutations. In addition, we show that NetDiseaseSNP discriminates cancer driver and passenger mutations satisfactorily. Our method outperforms other state-of-the-art methods on several disease/neutral datasets as well as on cancer driver/passenger mutation datasets and can thus be used to pinpoint and prioritize plausible disease candidates among nsSNPs for further investigation. NetDiseaseSNP is publicly available as an online tool as well as a web service: http://www.cbs.dtu.dk/services/NetDiseaseSNP PMID:23935863
Fujibuchi, Wataru; Anderson, John S. J.; Landsman, David
2001-01-01
Consensus pattern and matrix-based searches designed to predict cis-acting transcriptional regulatory sequences have historically been subject to large numbers of false positives. We sought to decrease false positives by incorporating expression profile data into a consensus pattern-based search method. We have systematically analyzed the expression phenotypes of over 6000 yeast genes, across 121 expression profile experiments, and correlated them with the distribution of 14 known regulatory elements over sequences upstream of the genes. Our method is based on a metric we term probabilistic element assessment (PEA), which is a ranking of potential sites based on sequence similarity in the upstream regions of genes with similar expression phenotypes. For eight of the 14 known elements that we examined, our method had a much higher selectivity than a naïve consensus pattern search. Based on our analysis, we have developed a web-based tool called PROSPECT, which allows consensus pattern-based searching of gene clusters obtained from microarray data. PMID:11574681
NASA Technical Reports Server (NTRS)
Cohn, S. E.
1982-01-01
Numerical weather prediction (NWP) is an initial-value problem for a system of nonlinear differential equations, in which initial values are known incompletely and inaccurately. Observational data available at the initial time must therefore be supplemented by data available prior to the initial time, a problem known as meteorological data assimilation. A further complication in NWP is that solutions of the governing equations evolve on two different time scales, a fast one and a slow one, whereas fast scale motions in the atmosphere are not reliably observed. This leads to the so called initialization problem: initial values must be constrained to result in a slowly evolving forecast. The theory of estimation of stochastic dynamic systems provides a natural approach to such problems. For linear stochastic dynamic models, the Kalman-Bucy (KB) sequential filter is the optimal data assimilation method, for linear models, the optimal combined data assimilation-initialization method is a modified version of the KB filter.
Won, Sungho; Choi, Hosik; Park, Suyeon; Lee, Juyoung; Park, Changyi; Kwon, Sunghoon
2015-01-01
Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called "large P and small N" problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration.
Porsa, Sina; Lin, Yi-Chung; Pandy, Marcus G
2016-08-01
The aim of this study was to compare the computational performances of two direct methods for solving large-scale, nonlinear, optimal control problems in human movement. Direct shooting and direct collocation were implemented on an 8-segment, 48-muscle model of the body (24 muscles on each side) to compute the optimal control solution for maximum-height jumping. Both algorithms were executed on a freely-available musculoskeletal modeling platform called OpenSim. Direct collocation converged to essentially the same optimal solution up to 249 times faster than direct shooting when the same initial guess was assumed (3.4 h of CPU time for direct collocation vs. 35.3 days for direct shooting). The model predictions were in good agreement with the time histories of joint angles, ground reaction forces and muscle activation patterns measured for subjects jumping to their maximum achievable heights. Both methods converged to essentially the same solution when started from the same initial guess, but computation time was sensitive to the initial guess assumed. Direct collocation demonstrates exceptional computational performance and is well suited to performing predictive simulations of movement using large-scale musculoskeletal models.
Robust face alignment under occlusion via regional predictive power estimation.
Heng Yang; Xuming He; Xuhui Jia; Patras, Ioannis
2015-08-01
Face alignment has been well studied in recent years, however, when a face alignment model is applied on facial images with heavy partial occlusion, the performance deteriorates significantly. In this paper, instead of training an occlusion-aware model with visibility annotation, we address this issue via a model adaptation scheme that uses the result of a local regression forest (RF) voting method. In the proposed scheme, the consistency of the votes of the local RF in each of several oversegmented regions is used to determine the reliability of predicting the location of the facial landmarks. The latter is what we call regional predictive power (RPP). Subsequently, we adapt a holistic voting method (cascaded pose regression based on random ferns) by putting weights on the votes of each fern according to the RPP of the regions used in the fern tests. The proposed method shows superior performance over existing face alignment models in the most challenging data sets (COFW and 300-W). Moreover, it can also estimate with high accuracy (72.4% overlap ratio) which image areas belong to the face or nonface objects, on the heavily occluded images of the COFW data set, without explicit occlusion modeling.
NASA Astrophysics Data System (ADS)
Harudin, N.; Jamaludin, K. R.; Muhtazaruddin, M. Nabil; Ramlie, F.; Muhamad, Wan Zuki Azman Wan
2018-03-01
T-Method is one of the techniques governed under Mahalanobis Taguchi System that developed specifically for multivariate data predictions. Prediction using T-Method is always possible even with very limited sample size. The user of T-Method required to clearly understanding the population data trend since this method is not considering the effect of outliers within it. Outliers may cause apparent non-normality and the entire classical methods breakdown. There exist robust parameter estimate that provide satisfactory results when the data contain outliers, as well as when the data are free of them. The robust parameter estimates of location and scale measure called Shamos Bickel (SB) and Hodges Lehman (HL) which are used as a comparable method to calculate the mean and standard deviation of classical statistic is part of it. Embedding these into T-Method normalize stage feasibly help in enhancing the accuracy of the T-Method as well as analysing the robustness of T-method itself. However, the result of higher sample size case study shows that T-method is having lowest average error percentages (3.09%) on data with extreme outliers. HL and SB is having lowest error percentages (4.67%) for data without extreme outliers with minimum error differences compared to T-Method. The error percentages prediction trend is vice versa for lower sample size case study. The result shows that with minimum sample size, which outliers always be at low risk, T-Method is much better on that, while higher sample size with extreme outliers, T-Method as well show better prediction compared to others. For the case studies conducted in this research, it shows that normalization of T-Method is showing satisfactory results and it is not feasible to adapt HL and SB or normal mean and standard deviation into it since it’s only provide minimum effect of percentages errors. Normalization using T-method is still considered having lower risk towards outlier’s effect.
Alexander, Robert L.; Shafer, Paul; Mann, Nathan; Malarcher, Ann; Zhang, Lei
2015-01-01
Introduction We estimated changes in call volume in the United States in response to increases in advertising doses of the Tips From Former Smokers (Tips) campaign, the first federal national tobacco education campaign, which aired for 12 weeks from March 19 to June 10, 2012. We also measured the effectiveness of ad taglines that promoted calls directly with a quitline number (1-800-QUIT-NOW) and indirectly with a cessation help website (Smokefree.gov). Methods Multivariate regressions estimated the weekly number of calls to 1–800-QUIT-NOW by area code as a function of weekly market-level gross rating points (GRPs) from CDC’s Tips campaign in 2012. The number of quitline calls attributable solely to Tips was predicted. Results For quitline-tagged ads, an additional 100 television GRPs per week was associated with an increase of 89 calls per week in a typical area code in the United States (P < .001). The same unit increase in advertising GRPs for ads tagged with Smokefree.gov was associated with an increase of 29 calls per week in any given area code (P < .001). We estimated that the Tips campaign was responsible for more than 170,000 additional calls to 1–800-QUIT-NOW during the campaign and that it would have generated approximately 140,000 additional calls if all ads were tagged with 1–800-QUIT-NOW. Conclusion For campaign planners, these results make it possible to estimate 1) the likely impact of tobacco prevention media buys and 2) the additional quitline capacity needed at the national level should future campaigns of similar scale use 1–800-QUIT-NOW taglines exclusively. PMID:26542143
Application of Large-Scale Database-Based Online Modeling to Plant State Long-Term Estimation
NASA Astrophysics Data System (ADS)
Ogawa, Masatoshi; Ogai, Harutoshi
Recently, attention has been drawn to the local modeling techniques of a new idea called “Just-In-Time (JIT) modeling”. To apply “JIT modeling” to a large amount of database online, “Large-scale database-based Online Modeling (LOM)” has been proposed. LOM is a technique that makes the retrieval of neighboring data more efficient by using both “stepwise selection” and quantization. In order to predict the long-term state of the plant without using future data of manipulated variables, an Extended Sequential Prediction method of LOM (ESP-LOM) has been proposed. In this paper, the LOM and the ESP-LOM are introduced.
Harrison, Thomas; Ruiz, Jaime; Sloan, Daniel B.; Ben-Hur, Asa; Boucher, Christina
2016-01-01
Pentatricopeptide repeat containing proteins (PPRs) bind to RNA transcripts originating from mitochondria and plastids. There are two classes of PPR proteins. The P class contains tandem P-type motif sequences, and the PLS class contains alternating P, L and S type sequences. In this paper, we describe a novel tool that predicts PPR-RNA interaction; specifically, our method, which we call aPPRove, determines where and how a PLS-class PPR protein will bind to RNA when given a PPR and one or more RNA transcripts by using a combinatorial binding code for site specificity proposed by Barkan et al. Our results demonstrate that aPPRove successfully locates how and where a PPR protein belonging to the PLS class can bind to RNA. For each binding event it outputs the binding site, the amino-acid-nucleotide interaction, and its statistical significance. Furthermore, we show that our method can be used to predict binding events for PLS-class proteins using a known edit site and the statistical significance of aligning the PPR protein to that site. In particular, we use our method to make a conjecture regarding an interaction between CLB19 and the second intronic region of ycf3. The aPPRove web server can be found at www.cs.colostate.edu/~approve. PMID:27560805
Predicting Negative Emotions Based on Mobile Phone Usage Patterns: An Exploratory Study
Yang, Pei-Ching; Chang, Chia-Chi; Chiang, Jung-Hsien; Chen, Ying-Yeh
2016-01-01
Background Prompt recognition and intervention of negative emotions is crucial for patients with depression. Mobile phones and mobile apps are suitable technologies that can be used to recognize negative emotions and intervene if necessary. Objective Mobile phone usage patterns can be associated with concurrent emotional states. The objective of this study is to adapt machine-learning methods to analyze such patterns for the prediction of negative emotion. Methods We developed an Android-based app to capture emotional states and mobile phone usage patterns, which included call logs (and use of apps). Visual analog scales (VASs) were used to report negative emotions in dimensions of depression, anxiety, and stress. In the system-training phase, participants were requested to tag their emotions for 14 consecutive days. Five feature-selection methods were used to determine individual usage patterns and four machine-learning methods were tested. Finally, rank product scoring was used to select the best combination to construct the prediction model. In the system evaluation phase, participants were then requested to verify the predicted negative emotions for at least 5 days. Results Out of 40 enrolled healthy participants, we analyzed data from 28 participants, including 30% (9/28) women with a mean (SD) age of 29.2 (5.1) years with sufficient emotion tags. The combination of time slots of 2 hours, greedy forward selection, and Naïve Bayes method was chosen for the prediction model. We further validated the personalized models in 18 participants who performed at least 5 days of model evaluation. Overall, the predictive accuracy for negative emotions was 86.17%. Conclusion We developed a system capable of predicting negative emotions based on mobile phone usage patterns. This system has potential for ecological momentary intervention (EMI) for depressive disorders by automatically recognizing negative emotions and providing people with preventive treatments before it escalates to clinical depression. PMID:27511748
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ranganathan, V; Kumar, P; Bzdusek, K
Purpose: We propose a novel data-driven method to predict the achievability of clinical objectives upfront before invoking the IMRT optimization. Methods: A new metric called “Geometric Complexity (GC)” is used to estimate the achievability of clinical objectives. Here, GC is the measure of the number of “unmodulated” beamlets or rays that intersect the Region-of-interest (ROI) and the target volume. We first compute the geometric complexity ratio (GCratio) between the GC of a ROI (say, parotid) in a reference plan and the GC of the same ROI in a given plan. The GCratio of a ROI indicates the relative geometric complexitymore » of the ROI as compared to the same ROI in the reference plan. Hence GCratio can be used to predict if a defined clinical objective associated with the ROI can be met by the optimizer for a given case. Basically a higher GCratio indicates a lesser likelihood for the optimizer to achieve the clinical objective defined for a given ROI. Similarly, a lower GCratio indicates a higher likelihood for the optimizer to achieve the clinical objective defined for the given ROI. We have evaluated the proposed method on four Head and Neck cases using Pinnacle3 (version 9.10.0) Treatment Planning System (TPS). Results: Out of the total of 28 clinical objectives from four head and neck cases included in the study, 25 were in agreement with the prediction, which implies an agreement of about 85% between predicted and obtained results. The Pearson correlation test shows a positive correlation between predicted and obtained results (Correlation = 0.82, r2 = 0.64, p < 0.005). Conclusion: The study demonstrates the feasibility of the proposed method in head and neck cases for predicting the achievability of clinical objectives with reasonable accuracy.« less
NASA Astrophysics Data System (ADS)
Fikri Zanil, Muhamad; Nur Wahidah Nik Hashim, Nik; Azam, Huda
2017-11-01
Psychiatrist currently relies on questionnaires and interviews for psychological assessment. These conservative methods often miss true positives and might lead to death, especially in cases where a patient might be experiencing suicidal predisposition but was only diagnosed as major depressive disorder (MDD). With modern technology, an assessment tool might aid psychiatrist with a more accurate diagnosis and thus hope to reduce casualty. This project will explore on the relationship between speech features of spoken audio signal (reading) in Bahasa Malaysia with the Beck Depression Inventory scores. The speech features used in this project were Power Spectral Density (PSD), Mel-frequency Ceptral Coefficients (MFCC), Transition Parameter, formant and pitch. According to analysis, the optimum combination of speech features to predict BDI-II scores include PSD, MFCC and Transition Parameters. The linear regression approach with sequential forward/backward method was used to predict the BDI-II scores using reading speech. The result showed 0.4096 mean absolute error (MAE) for female reading speech. For male, the BDI-II scores successfully predicted 100% less than 1 scores difference with MAE of 0.098437. A prediction system called Depression Severity Evaluator (DSE) was developed. The DSE managed to predict one out of five subjects. Although the prediction rate was low, the system precisely predict the score within the maximum difference of 4.93 for each person. This demonstrates that the scores are not random numbers.
A Study of Water Wave Wakes of Washington State Ferries
NASA Astrophysics Data System (ADS)
Perfect, Bradley; Riley, James; Thomson, Jim; Fay, Endicott
2015-11-01
Washington State Ferries (WSF) operates a ferry route that travels through a 600m-wide channel called Rich Passage. Concerns of shoreline erosion in Rich Passage have prompted this study of the generation and propagation of surface wave wakes caused by WSF vessels. The problem was addressed in three ways: analytically, using an extension of the Kelvin wake model by Darmon et al. (J. Fluid Mech., 738, 2014); computationally, employing a RANS Navier-Stokes model in the CFD code OpenFOAM which uses the Volume of Fluid method to treat the free surface; and with field data taken in Sept-Nov, 2014, using a suite of surface wave measuring buoys. This study represents one of the first times that model predictions of ferry boat-generated wakes can be tested against measurements in open waters. The results of the models and the field data are evaluated using direct comparison of predicted and measured surface wave height as well as other metrics. Furthermore, the model predictions and field measurements suggest differences in wake amplitudes for different class vessels. Finally, the relative strengths and weaknesses of each prediction method as well as of the field measurements will be discussed. Washington State Department of Transportation.
Hansen, Katja; Biegler, Franziska; Ramakrishnan, Raghunathan; ...
2015-06-04
Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstratemore » prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. The same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.« less
2015-01-01
Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstrate prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. In addition, the same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies. PMID:26113956
NASA Astrophysics Data System (ADS)
Keating, Elizabeth H.; Doherty, John; Vrugt, Jasper A.; Kang, Qinjun
2010-10-01
Highly parameterized and CPU-intensive groundwater models are increasingly being used to understand and predict flow and transport through aquifers. Despite their frequent use, these models pose significant challenges for parameter estimation and predictive uncertainty analysis algorithms, particularly global methods which usually require very large numbers of forward runs. Here we present a general methodology for parameter estimation and uncertainty analysis that can be utilized in these situations. Our proposed method includes extraction of a surrogate model that mimics key characteristics of a full process model, followed by testing and implementation of a pragmatic uncertainty analysis technique, called null-space Monte Carlo (NSMC), that merges the strengths of gradient-based search and parameter dimensionality reduction. As part of the surrogate model analysis, the results of NSMC are compared with a formal Bayesian approach using the DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm. Such a comparison has never been accomplished before, especially in the context of high parameter dimensionality. Despite the highly nonlinear nature of the inverse problem, the existence of multiple local minima, and the relatively large parameter dimensionality, both methods performed well and results compare favorably with each other. Experiences gained from the surrogate model analysis are then transferred to calibrate the full highly parameterized and CPU intensive groundwater model and to explore predictive uncertainty of predictions made by that model. The methodology presented here is generally applicable to any highly parameterized and CPU-intensive environmental model, where efficient methods such as NSMC provide the only practical means for conducting predictive uncertainty analysis.
Conditional Entropy-Constrained Residual VQ with Application to Image Coding
NASA Technical Reports Server (NTRS)
Kossentini, Faouzi; Chung, Wilson C.; Smith, Mark J. T.
1996-01-01
This paper introduces an extension of entropy-constrained residual vector quantization (VQ) where intervector dependencies are exploited. The method, which we call conditional entropy-constrained residual VQ, employs a high-order entropy conditioning strategy that captures local information in the neighboring vectors. When applied to coding images, the proposed method is shown to achieve better rate-distortion performance than that of entropy-constrained residual vector quantization with less computational complexity and lower memory requirements. Moreover, it can be designed to support progressive transmission in a natural way. It is also shown to outperform some of the best predictive and finite-state VQ techniques reported in the literature. This is due partly to the joint optimization between the residual vector quantizer and a high-order conditional entropy coder as well as the efficiency of the multistage residual VQ structure and the dynamic nature of the prediction.
Feng, Biao; Levitas, Valery I
2017-04-21
The main principles of producing a region near the center of a sample, compressed in a diamond anvil cell (DAC), with a very high pressure gradient and, consequently, with high pressure are predicted theoretically. The revealed phenomenon of generating extremely high pressure gradient is called the pressure self-focusing effect. Initial analytical predictions utilized generalization of a simplified equilibrium equation. Then, the results are refined using our recent advanced model for elastoplastic material under high pressures in finite element method (FEM) simulations. The main points in producing the pressure self-focusing effect are to use beveled anvils and reach a very thin sample thickness at the center. We find that the superposition of torsion in a rotational DAC (RDAC) offers drastic enhancement of the pressure self-focusing effect and allows one to reach the same pressure under a much lower force and deformation of anvils.
Adaptive LINE-P: An Adaptive Linear Energy Prediction Model for Wireless Sensor Network Nodes.
Ahmed, Faisal; Tamberg, Gert; Le Moullec, Yannick; Annus, Paul
2018-04-05
In the context of wireless sensor networks, energy prediction models are increasingly useful tools that can facilitate the power management of the wireless sensor network (WSN) nodes. However, most of the existing models suffer from the so-called fixed weighting parameter, which limits their applicability when it comes to, e.g., solar energy harvesters with varying characteristics. Thus, in this article we propose the Adaptive LINE-P (all cases) model that calculates adaptive weighting parameters based on the stored energy profiles. Furthermore, we also present a profile compression method to reduce the memory requirements. To determine the performance of our proposed model, we have used real data for the solar and wind energy profiles. The simulation results show that our model achieves 90-94% accuracy and that the compressed method reduces memory overheads by 50% as compared to state-of-the-art models.
Adaptive LINE-P: An Adaptive Linear Energy Prediction Model for Wireless Sensor Network Nodes
Ahmed, Faisal
2018-01-01
In the context of wireless sensor networks, energy prediction models are increasingly useful tools that can facilitate the power management of the wireless sensor network (WSN) nodes. However, most of the existing models suffer from the so-called fixed weighting parameter, which limits their applicability when it comes to, e.g., solar energy harvesters with varying characteristics. Thus, in this article we propose the Adaptive LINE-P (all cases) model that calculates adaptive weighting parameters based on the stored energy profiles. Furthermore, we also present a profile compression method to reduce the memory requirements. To determine the performance of our proposed model, we have used real data for the solar and wind energy profiles. The simulation results show that our model achieves 90–94% accuracy and that the compressed method reduces memory overheads by 50% as compared to state-of-the-art models. PMID:29621169
Actividad solar del ciclo 23. Predicción del máximo y fase decreciente utilizando redes neuronales
NASA Astrophysics Data System (ADS)
Parodi, M. A.; Ceccatto, H. A.; Piacentini, R. D.; García, P. J.
Different methods have been proposed in order to predict the maximum amplitude of solar cycles, either as a consequence of the intrinsic importance of this event and because of its relation with solar storms and possible effects upon satellites, communication systems, etc. In this work, a neural network solar activity prediction is presented, measured through the sunspot number (SSN). The 16-units neural network, with a 12:3:1 architecture, was trained in a ``feed-forward" propagation way and learning by the so called ``back propagation rule". The annual mean SSN data in the 1700-1975 and 1987-1998 periods were used as the training set. The solar cycle 21 (1976-1986) was taken as the cross-validation data set. After performing the network training we obtained a prediction of the maximum annual mean for the current solar cycle 23, SSNmax= 135 ±17 at the year 2000, which is 13% smaller than the International Consensus Commitee's mean maximum prediction obtained through ``precursor techniques". On the other hand, our prediction is only about 4% smaller than the Consensus's neural network mean prediction. A ``multiple step" prediction technique was also performed and SSN annual mean predicted values for the near-maximum (from the present year 1999 to beyond the maximum) and the declining activity of solar cycle 23 are presented in this work. The sensibility of predictions is also tested. To do so, we changed the interval width and comparated our results with those of a previous neural network prediction and those of others authors using differents methods.
Cross-sectional analysis of patient phone calls to an inflammatory bowel disease clinic.
Corral, Juan E; Yarur, Andres J; Diaz, Liege; Simmons, Okeefe L; Sussman, Daniel A
2015-01-01
Patients with inflammatory bowel disease (IBD) require close follow up and frequently utilize healthcare services. We aimed to identify the main reasons that prompted patient calls to gastroenterology providers and further characterize the "frequent callers". This retrospective cross-sectional study included all phone calls registered in medical records of IBD patients during 2012. Predictive variables included demographics, psychiatric history, IBD phenotype, disease complications and medical therapies. Primary outcome was the reason for call (symptoms, medication refill, procedures and appointment issues). Secondary outcome was the frequency of changes in management prompted by the call. 209 patients participated in 526 calls. The mean number of calls per patient was 2.5 (range 0-27); 49 (23.4%) patients met the criterion of "frequent caller". Frequent callers made or received 75.9% of all calls. Crohn's disease, anxiety, extra-intestinal manifestations and high sedimentation rate were significantly associated with higher call volume. 85.7% of frequent callers had at least one call that prompted a therapeutic intervention, compared to 18.9% of non-frequent callers (P<0.001). The most common interventions were ordering laboratory or imaging studies (15.4%), dose adjustments (12.1%), changes in medication class (8.4%), and expediting clinic visits (8.4%). Most phone calls originated from a minority of patients. Repeated calling by the same patient and new onset of gastrointestinal (GI) and non-GI symptoms were important factors predicting the order of diagnostic modalities or therapeutic changes in care. Triaging calls to IBD healthcare providers for patients more likely to require a change in management may improve healthcare delivery.
eMBI: Boosting Gene Expression-based Clustering for Cancer Subtypes.
Chang, Zheng; Wang, Zhenjia; Ashby, Cody; Zhou, Chuan; Li, Guojun; Zhang, Shuzhong; Huang, Xiuzhen
2014-01-01
Identifying clinically relevant subtypes of a cancer using gene expression data is a challenging and important problem in medicine, and is a necessary premise to provide specific and efficient treatments for patients of different subtypes. Matrix factorization provides a solution by finding checker-board patterns in the matrices of gene expression data. In the context of gene expression profiles of cancer patients, these checkerboard patterns correspond to genes that are up- or down-regulated in patients with particular cancer subtypes. Recently, a new matrix factorization framework for biclustering called Maximum Block Improvement (MBI) is proposed; however, it still suffers several problems when applied to cancer gene expression data analysis. In this study, we developed many effective strategies to improve MBI and designed a new program called enhanced MBI (eMBI), which is more effective and efficient to identify cancer subtypes. Our tests on several gene expression profiling datasets of cancer patients consistently indicate that eMBI achieves significant improvements in comparison with MBI, in terms of cancer subtype prediction accuracy, robustness, and running time. In addition, the performance of eMBI is much better than another widely used matrix factorization method called nonnegative matrix factorization (NMF) and the method of hierarchical clustering, which is often the first choice of clinical analysts in practice.
eMBI: Boosting Gene Expression-based Clustering for Cancer Subtypes
Chang, Zheng; Wang, Zhenjia; Ashby, Cody; Zhou, Chuan; Li, Guojun; Zhang, Shuzhong; Huang, Xiuzhen
2014-01-01
Identifying clinically relevant subtypes of a cancer using gene expression data is a challenging and important problem in medicine, and is a necessary premise to provide specific and efficient treatments for patients of different subtypes. Matrix factorization provides a solution by finding checker-board patterns in the matrices of gene expression data. In the context of gene expression profiles of cancer patients, these checkerboard patterns correspond to genes that are up- or down-regulated in patients with particular cancer subtypes. Recently, a new matrix factorization framework for biclustering called Maximum Block Improvement (MBI) is proposed; however, it still suffers several problems when applied to cancer gene expression data analysis. In this study, we developed many effective strategies to improve MBI and designed a new program called enhanced MBI (eMBI), which is more effective and efficient to identify cancer subtypes. Our tests on several gene expression profiling datasets of cancer patients consistently indicate that eMBI achieves significant improvements in comparison with MBI, in terms of cancer subtype prediction accuracy, robustness, and running time. In addition, the performance of eMBI is much better than another widely used matrix factorization method called nonnegative matrix factorization (NMF) and the method of hierarchical clustering, which is often the first choice of clinical analysts in practice. PMID:25374455
Culture Representation in Human Reliability Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
David Gertman; Julie Marble; Steven Novack
Understanding human-system response is critical to being able to plan and predict mission success in the modern battlespace. Commonly, human reliability analysis has been used to predict failures of human performance in complex, critical systems. However, most human reliability methods fail to take culture into account. This paper takes an easily understood state of the art human reliability analysis method and extends that method to account for the influence of culture, including acceptance of new technology, upon performance. The cultural parameters used to modify the human reliability analysis were determined from two standard industry approaches to cultural assessment: Hofstede’s (1991)more » cultural factors and Davis’ (1989) technology acceptance model (TAM). The result is called the Culture Adjustment Method (CAM). An example is presented that (1) reviews human reliability assessment with and without cultural attributes for a Supervisory Control and Data Acquisition (SCADA) system attack, (2) demonstrates how country specific information can be used to increase the realism of HRA modeling, and (3) discusses the differences in human error probability estimates arising from cultural differences.« less
Mohr, Johannes A; Jain, Brijnesh J; Obermayer, Klaus
2008-09-01
Quantitative structure activity relationship (QSAR) analysis is traditionally based on extracting a set of molecular descriptors and using them to build a predictive model. In this work, we propose a QSAR approach based directly on the similarity between the 3D structures of a set of molecules measured by a so-called molecule kernel, which is independent of the spatial prealignment of the compounds. Predictors can be build using the molecule kernel in conjunction with the potential support vector machine (P-SVM), a recently proposed machine learning method for dyadic data. The resulting models make direct use of the structural similarities between the compounds in the test set and a subset of the training set and do not require an explicit descriptor construction. We evaluated the predictive performance of the proposed method on one classification and four regression QSAR datasets and compared its results to the results reported in the literature for several state-of-the-art descriptor-based and 3D QSAR approaches. In this comparison, the proposed molecule kernel method performed better than the other QSAR methods.
An empirical analysis of the corporate call decision
NASA Astrophysics Data System (ADS)
Carlson, Murray Dean
1998-12-01
In this thesis we provide insights into the behavior of financial managers of utility companies by studying their decisions to redeem callable preferred shares. In particular, we investigate whether or not an option pricing based model of the call decision, with managers who maximize shareholder value, does a better job of explaining callable preferred share prices and call decisions than do other models of the decision. In order to perform these tests, we extend an empirical technique introduced by Rust (1987) to include the use of information from preferred share prices in addition to the call decisions. The model we develop to value the option embedded in a callable preferred share differs from standard models in two ways. First, as suggested in Kraus (1983), we explicitly account for transaction costs associated with a redemption. Second, we account for state variables that are observed by the decision makers but not by the preferred shareholders. We interpret these unobservable state variables as the benefits and costs associated with a change in capital structure that can accompany a call decision. When we add this variable, our empirical model changes from one which predicts exactly when a share should be called to one which predicts the probability of a call as the function of the observable state. These two modifications of the standard model result in predictions of calls, and therefore of callable preferred share prices, that are consistent with several previously unexplained features of the data; we show that the predictive power of the model is improved in a statistical sense by adding these features to the model. The pricing and call probability functions from our model do a good job of describing call decisions and preferred share prices for several utilities. Using data from shares of the Pacific Gas and Electric Co. (PGE) we obtain reasonable estimates for the transaction costs associated with a call. Using a formal empirical test, we are able to conclude that the managers of the Pacific Gas and Electric Company clearly take into account the value of the option to delay the call when making their call decisions. Overall, the model seems to be robust to tests of its specification and does a better job of describing the data than do simpler models of the decision making process. Limitations in the data do not allow us to perform the same tests in a larger cross-section of utility companies. However, we are able to estimate transaction cost parameters for many firms and these do not seem to vary significantly from those of PGE. This evidence does not cause us to reject our hypothesis that managerial behavior is consistent with a model in which managers maximize shareholder value.
Intra- and Inter-Fractional Variation Prediction of Lung Tumors Using Fuzzy Deep Learning
Park, Seonyeong; Lee, Suk Jin; Weiss, Elisabeth
2016-01-01
Tumor movements should be accurately predicted to improve delivery accuracy and reduce unnecessary radiation exposure to healthy tissue during radiotherapy. The tumor movements pertaining to respiration are divided into intra-fractional variation occurring in a single treatment session and inter-fractional variation arising between different sessions. Most studies of patients’ respiration movements deal with intra-fractional variation. Previous studies on inter-fractional variation are hardly mathematized and cannot predict movements well due to inconstant variation. Moreover, the computation time of the prediction should be reduced. To overcome these limitations, we propose a new predictor for intra- and inter-fractional data variation, called intra- and inter-fraction fuzzy deep learning (IIFDL), where FDL, equipped with breathing clustering, predicts the movement accurately and decreases the computation time. Through the experimental results, we validated that the IIFDL improved root-mean-square error (RMSE) by 29.98% and prediction overshoot by 70.93%, compared with existing methods. The results also showed that the IIFDL enhanced the average RMSE and overshoot by 59.73% and 83.27%, respectively. In addition, the average computation time of IIFDL was 1.54 ms for both intra- and inter-fractional variation, which was much smaller than the existing methods. Therefore, the proposed IIFDL might achieve real-time estimation as well as better tracking techniques in radiotherapy. PMID:27170914
Manikandan, Narayanan; Subha, Srinivasan
2016-01-01
Software development life cycle has been characterized by destructive disconnects between activities like planning, analysis, design, and programming. Particularly software developed with prediction based results is always a big challenge for designers. Time series data forecasting like currency exchange, stock prices, and weather report are some of the areas where an extensive research is going on for the last three decades. In the initial days, the problems with financial analysis and prediction were solved by statistical models and methods. For the last two decades, a large number of Artificial Neural Networks based learning models have been proposed to solve the problems of financial data and get accurate results in prediction of the future trends and prices. This paper addressed some architectural design related issues for performance improvement through vectorising the strengths of multivariate econometric time series models and Artificial Neural Networks. It provides an adaptive approach for predicting exchange rates and it can be called hybrid methodology for predicting exchange rates. This framework is tested for finding the accuracy and performance of parallel algorithms used.
Manikandan, Narayanan; Subha, Srinivasan
2016-01-01
Software development life cycle has been characterized by destructive disconnects between activities like planning, analysis, design, and programming. Particularly software developed with prediction based results is always a big challenge for designers. Time series data forecasting like currency exchange, stock prices, and weather report are some of the areas where an extensive research is going on for the last three decades. In the initial days, the problems with financial analysis and prediction were solved by statistical models and methods. For the last two decades, a large number of Artificial Neural Networks based learning models have been proposed to solve the problems of financial data and get accurate results in prediction of the future trends and prices. This paper addressed some architectural design related issues for performance improvement through vectorising the strengths of multivariate econometric time series models and Artificial Neural Networks. It provides an adaptive approach for predicting exchange rates and it can be called hybrid methodology for predicting exchange rates. This framework is tested for finding the accuracy and performance of parallel algorithms used. PMID:26881271
Roles for text mining in protein function prediction.
Verspoor, Karin M
2014-01-01
The Human Genome Project has provided science with a hugely valuable resource: the blueprints for life; the specification of all of the genes that make up a human. While the genes have all been identified and deciphered, it is proteins that are the workhorses of the human body: they are essential to virtually all cell functions and are the primary mechanism through which biological function is carried out. Hence in order to fully understand what happens at a molecular level in biological organisms, and eventually to enable development of treatments for diseases where some aspect of a biological system goes awry, we must understand the functions of proteins. However, experimental characterization of protein function cannot scale to the vast amount of DNA sequence data now available. Computational protein function prediction has therefore emerged as a problem at the forefront of modern biology (Radivojac et al., Nat Methods 10(13):221-227, 2013).Within the varied approaches to computational protein function prediction that have been explored, there are several that make use of biomedical literature mining. These methods take advantage of information in the published literature to associate specific proteins with specific protein functions. In this chapter, we introduce two main strategies for doing this: association of function terms, represented as Gene Ontology terms (Ashburner et al., Nat Genet 25(1):25-29, 2000), to proteins based on information in published articles, and a paradigm called LEAP-FS (Literature-Enhanced Automated Prediction of Functional Sites) in which literature mining is used to validate the predictions of an orthogonal computational protein function prediction method.
First-principles prediction of the softening of the silicon shock Hugoniot curve
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hu, S. X.; Militzer, B.; Collins, L. A.
Here, whock compression of silicon (Si) under extremely high pressures (>100 Mbar) was investigated by using two first-principles methods of orbital-free molecular dynamics (OFMD) and path integral Monte Carlo (PIMC). While pressures from the two methods agree very well, PIMC predicts a second compression maximum because of 1s electron ionization that is absent in OFMD calculations since Thomas–Fermi-based theories lack inner shell structure. The Kohn–Sham density functional theory is used to calculate the equation of state (EOS) of warm dense silicon for low-pressure loadings (P < 100 Mbar). Combining these first-principles EOS results, the principal Hugoniot curve of silicon formore » pressures varying from 0.80 Mbar to above ~10 Gbar was derived. We find that silicon is ~20% or more softer than what was predicted by EOS models based on the chemical picture of matter. Existing experimental data (P ≈ 1–2 Mbar) seem to indicate this softening behavior of Si, which calls for future strong-shock experiments (P > 10 Mbar) to benchmark our results.« less
First-principles prediction of the softening of the silicon shock Hugoniot curve
Hu, S. X.; Militzer, B.; Collins, L. A.; ...
2016-09-15
Here, whock compression of silicon (Si) under extremely high pressures (>100 Mbar) was investigated by using two first-principles methods of orbital-free molecular dynamics (OFMD) and path integral Monte Carlo (PIMC). While pressures from the two methods agree very well, PIMC predicts a second compression maximum because of 1s electron ionization that is absent in OFMD calculations since Thomas–Fermi-based theories lack inner shell structure. The Kohn–Sham density functional theory is used to calculate the equation of state (EOS) of warm dense silicon for low-pressure loadings (P < 100 Mbar). Combining these first-principles EOS results, the principal Hugoniot curve of silicon formore » pressures varying from 0.80 Mbar to above ~10 Gbar was derived. We find that silicon is ~20% or more softer than what was predicted by EOS models based on the chemical picture of matter. Existing experimental data (P ≈ 1–2 Mbar) seem to indicate this softening behavior of Si, which calls for future strong-shock experiments (P > 10 Mbar) to benchmark our results.« less
Li, Guanghui; Luo, Jiawei; Xiao, Qiu; Liang, Cheng; Ding, Pingjian
2018-05-12
Interactions between microRNAs (miRNAs) and diseases can yield important information for uncovering novel prognostic markers. Since experimental determination of disease-miRNA associations is time-consuming and costly, attention has been given to designing efficient and robust computational techniques for identifying undiscovered interactions. In this study, we present a label propagation model with linear neighborhood similarity, called LPLNS, to predict unobserved miRNA-disease associations. Additionally, a preprocessing step is performed to derive new interaction likelihood profiles that will contribute to the prediction since new miRNAs and diseases lack known associations. Our results demonstrate that the LPLNS model based on the known disease-miRNA associations could achieve impressive performance with an AUC of 0.9034. Furthermore, we observed that the LPLNS model based on new interaction likelihood profiles could improve the performance to an AUC of 0.9127. This was better than other comparable methods. In addition, case studies also demonstrated our method's outstanding performance for inferring undiscovered interactions between miRNAs and diseases, especially for novel diseases. Copyright © 2018. Published by Elsevier Inc.
Technical note: Combining quantile forecasts and predictive distributions of streamflows
NASA Astrophysics Data System (ADS)
Bogner, Konrad; Liechti, Katharina; Zappa, Massimiliano
2017-11-01
The enhanced availability of many different hydro-meteorological modelling and forecasting systems raises the issue of how to optimally combine this great deal of information. Especially the usage of deterministic and probabilistic forecasts with sometimes widely divergent predicted future streamflow values makes it even more complicated for decision makers to sift out the relevant information. In this study multiple streamflow forecast information will be aggregated based on several different predictive distributions, and quantile forecasts. For this combination the Bayesian model averaging (BMA) approach, the non-homogeneous Gaussian regression (NGR), also known as the ensemble model output statistic (EMOS) techniques, and a novel method called Beta-transformed linear pooling (BLP) will be applied. By the help of the quantile score (QS) and the continuous ranked probability score (CRPS), the combination results for the Sihl River in Switzerland with about 5 years of forecast data will be compared and the differences between the raw and optimally combined forecasts will be highlighted. The results demonstrate the importance of applying proper forecast combination methods for decision makers in the field of flood and water resource management.
Bayesian Estimation of Thermonuclear Reaction Rates for Deuterium+Deuterium Reactions
NASA Astrophysics Data System (ADS)
Gómez Iñesta, Á.; Iliadis, C.; Coc, A.
2017-11-01
The study of d+d reactions is of major interest since their reaction rates affect the predicted abundances of D, 3He, and 7Li. In particular, recent measurements of primordial D/H ratios call for reduced uncertainties in the theoretical abundances predicted by Big Bang nucleosynthesis (BBN). Different authors have studied reactions involved in BBN by incorporating new experimental data and a careful treatment of systematic and probabilistic uncertainties. To analyze the experimental data, Coc et al. used results of ab initio models for the theoretical calculation of the energy dependence of S-factors in conjunction with traditional statistical methods based on χ 2 minimization. Bayesian methods have now spread to many scientific fields and provide numerous advantages in data analysis. Astrophysical S-factors and reaction rates using Bayesian statistics were calculated by Iliadis et al. Here we present a similar analysis for two d+d reactions, d(d, n)3He and d(d, p)3H, that has been translated into a total decrease of the predicted D/H value by 0.16%.
Scheeres, Korine; Knoop, Hans; Meer, van der Jos; Bleijenberg, Gijs
2009-04-01
Effective treatment of chronic fatigue syndrome (CFS) with cognitive behavioural therapy (CBT) relies on a correct classification of so called 'fluctuating active' versus 'passive' patients. For successful treatment with CBT is it especially important to recognise the passive patients and give them a tailored treatment protocol. In the present study it was evaluated whether CFS patient's physical activity pattern can be assessed most accurately with the 'Activity Pattern Interview' (API), the International Physical Activity Questionnaire (IPAQ) or the CFS-Activity Questionnaire (CFS-AQ). The three instruments were validated compared to actometers. Actometers are until now the best and most objective instrument to measure physical activity, but they are too expensive and time consuming for most clinical practice settings. In total 226 CFS patients enrolled for CBT therapy answered the API at intake and filled in the two questionnaires. Directly after intake they wore the actometer for two weeks. Based on receiver operating characteristic (ROC) curves the validity of the three methods were assessed and compared. Both the API and the two questionnaires had an acceptable validity (0.64 to 0.71). None of the three instruments was significantly better than the others. The proportion of false predictions was rather high for all three instrument. The IPAQ had the highest proportion of correct passive predictions (sensitivity 70.1%). The validity of all three instruments appeared to be fair, and all showed rather high proportions of false classifications. Hence in fact none of the tested instruments could really be called satisfactory. Because the IPAQ showed to be the best in correctly predicting 'passive' CFS patients, which is most essentially related to treatment results, it was concluded that the IPAQ is the preferable alternative for an actometer when treating CFS patients in clinical practice.
Is killer whale dialect evolution random?
Filatova, Olga A; Burdin, Alexandr M; Hoyt, Erich
2013-10-01
The killer whale is among the few species in which cultural change accumulates over many generations, leading to cumulative cultural evolution. Killer whales have group-specific vocal repertoires which are thought to be learned rather than being genetically coded. It is supposed that divergence between vocal repertoires of sister groups increases gradually over time due to random learning mistakes and innovations. In this case, the similarity of calls across groups must be correlated with pod relatedness and, consequently, with each other. In this study we tested this prediction by comparing the patterns of call similarity between matrilines of resident killer whales from Eastern Kamchatka. We calculated the similarity of seven components from three call types across 14 matrilines. In contrast to the theoretical predictions, matrilines formed different clusters on the dendrograms made by different calls and even by different components of the same call. We suggest three possible explanations for this phenomenon. First, the lack of agreement between similarity patterns of different components may be the result of constraints in the call structure. Second, it is possible that call components change in time with different speed and/or in different directions. Third, horizontal cultural transmission of call features may occur between matrilines. Copyright © 2013 Elsevier B.V. All rights reserved.
Li, Guang-Qing; Liu, Zi; Shen, Hong-Bin; Yu, Dong-Jun
2016-10-01
As one of the most ubiquitous post-transcriptional modifications of RNA, N 6 -methyladenosine ( [Formula: see text]) plays an essential role in many vital biological processes. The identification of [Formula: see text] sites in RNAs is significantly important for both basic biomedical research and practical drug development. In this study, we designed a computational-based method, called TargetM6A, to rapidly and accurately target [Formula: see text] sites solely from the primary RNA sequences. Two new features, i.e., position-specific nucleotide/dinucleotide propensities (PSNP/PSDP), are introduced and combined with the traditional nucleotide composition (NC) feature to formulate RNA sequences. The extracted features are further optimized to obtain a much more compact and discriminative feature subset by applying an incremental feature selection (IFS) procedure. Based on the optimized feature subset, we trained TargetM6A on the training dataset with a support vector machine (SVM) as the prediction engine. We compared the proposed TargetM6A method with existing methods for predicting [Formula: see text] sites by performing stringent jackknife tests and independent validation tests on benchmark datasets. The experimental results show that the proposed TargetM6A method outperformed the existing methods for predicting [Formula: see text] sites and remarkably improved the prediction performances, with MCC = 0.526 and AUC = 0.818. We also provided a user-friendly web server for TargetM6A, which is publicly accessible for academic use at http://csbio.njust.edu.cn/bioinf/TargetM6A.
Cracking the Code of Faraway Worlds
NASA Technical Reports Server (NTRS)
2007-01-01
This infrared data from NASA's Spitzer Space Telescope - called a spectrum - tells astronomers that a distant gas planet, a so-called 'hot Jupiter' called HD 189733b, might be smothered with high clouds. It is one of the first spectra of an alien world. A spectrum is created when an instrument called a spectrograph cracks light from an object open into a rainbow of different wavelengths. Patterns or ripples within the spectrum indicate the presence, or absence, of molecules making up the object. Astronomers using Spitzer's spectrograph were able to obtain infrared spectra for two so-called 'transiting' hot-Jupiter planets using the 'secondary eclipse' technique. In this method, the spectrograph first collects the combined infrared light from the planet plus its star, then, as the planet is eclipsed by the star, the infrared light of just the star. Subtracting the latter from the former reveals the planet's own rainbow of infrared colors. Astronomers were perplexed when they first saw the infrared spectrum above. It doesn't look anything like what theorists had predicted. Theorists thought the spectra of hot, Jupiter-like planets like this one would be filled with the signatures of molecules in the planets' atmospheres. But the spectrum doesn't show any molecules, and is instead what astronomers call 'flat.' For example, theorists thought there'd be a strong signature of water in the form of a big drop in the wavelength range between 7 and 10 microns. The fact that water is not detected may indicate that it is hidden underneath a thick blanket of high, dry clouds. The average brightness of the spectrum is also a bit lower than theoretical predictions, suggesting that very high winds are rapidly moving the terrific heat of the noonday sun from the day side of HD 189733b to the night side. This spectrum was produced by Dr. Carl Grillmair of NASA's Spitzer Science Center at the California Institute of Technology in Pasadena, Calif., and his colleagues. The data were taken by Spitzer's infrared spectrograph on November 22, 2006.Cracking the Code of Faraway Worlds
2007-02-21
This infrared data from NASA's Spitzer Space Telescope -- called a spectrum -- tells astronomers that a distant gas planet, a so-called "hot Jupiter" called HD 189733b, might be smothered with high clouds. It is one of the first spectra of an alien world. A spectrum is created when an instrument called a spectrograph cracks light from an object open into a rainbow of different wavelengths. Patterns or ripples within the spectrum indicate the presence, or absence, of molecules making up the object. Astronomers using Spitzer's spectrograph were able to obtain infrared spectra for two so-called "transiting" hot-Jupiter planets using the "secondary eclipse" technique. In this method, the spectrograph first collects the combined infrared light from the planet plus its star, then, as the planet is eclipsed by the star, the infrared light of just the star. Subtracting the latter from the former reveals the planet's own rainbow of infrared colors. Astronomers were perplexed when they first saw the infrared spectrum above. It doesn't look anything like what theorists had predicted. Theorists thought the spectra of hot, Jupiter-like planets like this one would be filled with the signatures of molecules in the planets' atmospheres. But the spectrum doesn't show any molecules, and is instead what astronomers call "flat." For example, theorists thought there'd be a strong signature of water in the form of a big drop in the wavelength range between 7 and 10 microns. The fact that water is not detected may indicate that it is hidden underneath a thick blanket of high, dry clouds. The average brightness of the spectrum is also a bit lower than theoretical predictions, suggesting that very high winds are rapidly moving the terrific heat of the noonday sun from the day side of HD 189733b to the night side. This spectrum was produced by Dr. Carl Grillmair of NASA's Spitzer Science Center at the California Institute of Technology in Pasadena, Calif., and his colleagues. The data were taken by Spitzer's infrared spectrograph on November 22, 2006. http://photojournal.jpl.nasa.gov/catalog/PIA09199
Geometrical optics in the near field: local plane-interface approach with evanescent waves.
Bose, Gaurav; Hyvärinen, Heikki J; Tervo, Jani; Turunen, Jari
2015-01-12
We show that geometrical models may provide useful information on light propagation in wavelength-scale structures even if evanescent fields are present. We apply a so-called local plane-wave and local plane-interface methods to study a geometry that resembles a scanning near-field microscope. We show that fair agreement between the geometrical approach and rigorous electromagnetic theory can be achieved in the case where evanescent waves are required to predict any transmission through the structure.
A Systematic Investigation of Computation Models for Predicting Adverse Drug Reactions (ADRs)
Kuang, Qifan; Wang, MinQi; Li, Rong; Dong, YongCheng; Li, Yizhou; Li, Menglong
2014-01-01
Background Early and accurate identification of adverse drug reactions (ADRs) is critically important for drug development and clinical safety. Computer-aided prediction of ADRs has attracted increasing attention in recent years, and many computational models have been proposed. However, because of the lack of systematic analysis and comparison of the different computational models, there remain limitations in designing more effective algorithms and selecting more useful features. There is therefore an urgent need to review and analyze previous computation models to obtain general conclusions that can provide useful guidance to construct more effective computational models to predict ADRs. Principal Findings In the current study, the main work is to compare and analyze the performance of existing computational methods to predict ADRs, by implementing and evaluating additional algorithms that have been earlier used for predicting drug targets. Our results indicated that topological and intrinsic features were complementary to an extent and the Jaccard coefficient had an important and general effect on the prediction of drug-ADR associations. By comparing the structure of each algorithm, final formulas of these algorithms were all converted to linear model in form, based on this finding we propose a new algorithm called the general weighted profile method and it yielded the best overall performance among the algorithms investigated in this paper. Conclusion Several meaningful conclusions and useful findings regarding the prediction of ADRs are provided for selecting optimal features and algorithms. PMID:25180585
Skolarus, Lesli E.; Zimmerman, Marc A.; Murphy, Jillian; Brown, Devin L.; Kerber, Kevin A.; Bailey, Sarah; Fowlkes, Sophronia; Morgenstern, Lewis B.
2014-01-01
Background and Purpose Acute stroke treatments are underutilized primarily due to delayed hospital arrival. Using a community based participatory research approach, we explored stroke self-efficacy, knowledge and perceptions of stroke among a predominately African American population in Flint, Michigan. Methods In March 2010, a survey was administered to youth and adults after religious services at three churches and one church health day. The survey consisted of vignettes (12 stroke, 4 non-stroke) to assess knowledge of stroke warning signs and behavioral intent to call 911. The survey also assessed stroke self-efficacy, personal knowledge of someone who had had a stroke, personal history of stroke and barriers to calling 911. Linear regression models explored the association of stroke self-efficacy with behavioral intent to call 911 among adults. Results Two hundred forty two adults and 90 youth completed the survey. Ninety two percent of adults and 90% of youth respondents were African American. Responding to 12 stroke vignettes, adults would call 911 in 72% (sd=0.26) of the vignettes while youth would call 911 in 54% (sd=0.29) (p<0.001). Adults correctly identified stroke in 51% (sd=0.32) of the stroke vignettes and youth in 46% (sd=0.28) of the stroke vignettes (p=0.28). Stroke self-efficacy predicted behavioral intent to call 911 (p=0.046). Conclusion In addition to knowledge of stroke warning signs, behavioral interventions to increase both stroke self-efficacy and behavioral intent may be useful for helping people make appropriate 911 calls for stroke. A community based participatory research approach may be effective in reducing stroke disparities. PMID:21617148
Application of the Hilbert space average method on heat conduction models.
Michel, Mathias; Gemmer, Jochen; Mahler, Günter
2006-01-01
We analyze closed one-dimensional chains of weakly coupled many level systems, by means of the so-called Hilbert space average method (HAM). Subject to some concrete conditions on the Hamiltonian of the system, our theory predicts energy diffusion with respect to a coarse-grained description for almost all initial states. Close to the respective equilibrium, we investigate this behavior in terms of heat transport and derive the heat conduction coefficient. Thus, we are able to show that both heat (energy) diffusive behavior as well as Fourier's law follows from and is compatible with a reversible Schrödinger dynamics on the complete level of description.
Discriminative components of data.
Peltonen, Jaakko; Kaski, Samuel
2005-01-01
A simple probabilistic model is introduced to generalize classical linear discriminant analysis (LDA) in finding components that are informative of or relevant for data classes. The components maximize the predictability of the class distribution which is asymptotically equivalent to 1) maximizing mutual information with the classes, and 2) finding principal components in the so-called learning or Fisher metrics. The Fisher metric measures only distances that are relevant to the classes, that is, distances that cause changes in the class distribution. The components have applications in data exploration, visualization, and dimensionality reduction. In empirical experiments, the method outperformed, in addition to more classical methods, a Renyi entropy-based alternative while having essentially equivalent computational cost.
Cracking the Code of Faraway Worlds
NASA Technical Reports Server (NTRS)
2007-01-01
This infrared data from NASA's Spitzer Space Telescope - called a spectrum - tells astronomers that a distant gas planet, a so-called 'hot Jupiter' called HD 209458b, might be smothered with high clouds. It is one of the first spectra of an alien world. A spectrum is created when an instrument called a spectrograph spreads light from an object apart into a rainbow of different wavelengths. Patterns or ripples within the spectrum indicate the presence, or absence, of molecules making up the object. Astronomers using Spitzer's spectrograph were able to obtain infrared spectra for two so-called 'transiting' hot-Jupiter planets using the 'secondary eclipse' technique. In this method, the spectrograph first collects the combined infrared light from the planet plus its star, then, as the planet is eclipsed by the star, the infrared light of just the star. Subtracting the latter from the former reveals the planet's own rainbow of infrared colors. When astronomers first saw the infrared spectrum above, they were shocked. It doesn't look anything like what theorists had predicted. Theorists though the spectra for hot, Jupiter-like planets like this one would be filled with the signatures of molecules in the planets' atmospheres. But the spectrum doesn't show any molecules. It is what astronomers call 'flat.' For example, theorists thought there'd be signatures of water in the wavelength ranges of 8 to 9 microns. The fact that water is not seen there might indicate that the water is hidden under a thick blanket of high, dry clouds. This spectrum was produced by Dr. Mark R. Swain of NASA's Jet Propulsion Laboratory in Pasadena, Calif., using a complex set of mathematical tools. It was derived using two different methods, both of which led to the same result. The data were taken on July 6 and 13, 2005, by Dr. Jeremy Richardson of NASA's Goddard Space Flight Center and his team using Spitzer's infrared spectrograph.Revealing protein functions based on relationships of interacting proteins and GO terms.
Teng, Zhixia; Guo, Maozu; Liu, Xiaoyan; Tian, Zhen; Che, Kai
2017-09-20
In recent years, numerous computational methods predicted protein function based on the protein-protein interaction (PPI) network. These methods supposed that two proteins share the same function if they interact with each other. However, it is reported by recent studies that the functions of two interacting proteins may be just related. It will mislead the prediction of protein function. Therefore, there is a need for investigating the functional relationship between interacting proteins. In this paper, the functional relationship between interacting proteins is studied and a novel method, called as GoDIN, is advanced to annotate functions of interacting proteins in Gene Ontology (GO) context. It is assumed that the functional difference between interacting proteins can be expressed by semantic difference between GO term and its relatives. Thus, the method uses GO term and its relatives to annotate the interacting proteins separately according to their functional roles in the PPI network. The method is validated by a series of experiments and compared with the concerned method. The experimental results confirm the assumption and suggest that GoDIN is effective on predicting functions of protein. This study demonstrates that: (1) interacting proteins are not equal in the PPI network, and their function may be same or similar, or just related; (2) functional difference between interacting proteins can be measured by their degrees in the PPI network; (3) functional relationship between interacting proteins can be expressed by relationship between GO term and its relatives.
Harris, Scott H.; Johnson, Joel A.; Neiswanger, Jeffery R.; Twitchell, Kevin E.
2004-03-09
The present invention includes systems configured to distribute a telephone call, communication systems, communication methods and methods of routing a telephone call to a customer service representative. In one embodiment of the invention, a system configured to distribute a telephone call within a network includes a distributor adapted to connect with a telephone system, the distributor being configured to connect a telephone call using the telephone system and output the telephone call and associated data of the telephone call; and a plurality of customer service representative terminals connected with the distributor and a selected customer service representative terminal being configured to receive the telephone call and the associated data, the distributor and the selected customer service representative terminal being configured to synchronize, application of the telephone call and associated data from the distributor to the selected customer service representative terminal.
NASA Astrophysics Data System (ADS)
Li, Hui; Yu, Jun-Ling; Yu, Le-An; Sun, Jie
2014-05-01
Case-based reasoning (CBR) is one of the main forecasting methods in business forecasting, which performs well in prediction and holds the ability of giving explanations for the results. In business failure prediction (BFP), the number of failed enterprises is relatively small, compared with the number of non-failed ones. However, the loss is huge when an enterprise fails. Therefore, it is necessary to develop methods (trained on imbalanced samples) which forecast well for this small proportion of failed enterprises and performs accurately on total accuracy meanwhile. Commonly used methods constructed on the assumption of balanced samples do not perform well in predicting minority samples on imbalanced samples consisting of the minority/failed enterprises and the majority/non-failed ones. This article develops a new method called clustering-based CBR (CBCBR), which integrates clustering analysis, an unsupervised process, with CBR, a supervised process, to enhance the efficiency of retrieving information from both minority and majority in CBR. In CBCBR, various case classes are firstly generated through hierarchical clustering inside stored experienced cases, and class centres are calculated out by integrating cases information in the same clustered class. When predicting the label of a target case, its nearest clustered case class is firstly retrieved by ranking similarities between the target case and each clustered case class centre. Then, nearest neighbours of the target case in the determined clustered case class are retrieved. Finally, labels of the nearest experienced cases are used in prediction. In the empirical experiment with two imbalanced samples from China, the performance of CBCBR was compared with the classical CBR, a support vector machine, a logistic regression and a multi-variant discriminate analysis. The results show that compared with the other four methods, CBCBR performed significantly better in terms of sensitivity for identifying the minority samples and generated high total accuracy meanwhile. The proposed approach makes CBR useful in imbalanced forecasting.
Cook, Tessa S; Hernandez, Jessica; Scanlon, Mary; Langlotz, Curtis; Li, Chun-Der L
2016-07-01
Despite its increasing use in training other medical specialties, high-fidelity simulation to prepare diagnostic radiology residents for call remains an underused educational resource. To attempt to characterize the barriers toward adoption of this technology, we conducted a survey of academic radiologists and radiology trainees. An Institutional Review Board-approved survey was distributed to the Association of University Radiologists members via e-mail. Survey results were collected electronically, tabulated, and analyzed. A total of 68 survey responses representing 51 programs were received from program directors, department chairs, chief residents, and program administrators. The most common form of educational activity for resident call preparation was lectures. Faculty supervised "baby call" was also widely reported. Actual simulated call environments were quite rare with only three programs reporting this type of educational activity. Barriers to the use of simulation include lack of faculty time, lack of faculty expertise, and lack of perceived need. High-fidelity simulation can be used to mimic the high-stress, high-stakes independent call environment that the typical radiology resident encounters during the second year of training, and can provide objective data for program directors to assess the Accreditation Council of Graduate Medical Education milestones. We predict that this technology will begin to supplement traditional diagnostic radiology teaching methods and to improve patient care and safety in the next decade. Published by Elsevier Inc.
Scattering of sound by atmospheric turbulence predictions in a refractive shadow zone
NASA Technical Reports Server (NTRS)
Mcbride, Walton E.; Bass, Henry E.; Raspet, Richard; Gilbert, Kenneth E.
1990-01-01
According to ray theory, regions exist in an upward refracting atmosphere where no sound should be present. Experiments show, however, that appreciable sound levels penetrate these so-called shadow zones. Two mechanisms contribute to sound in the shadow zone: diffraction and turbulent scattering of sound. Diffractive effects can be pronounced at lower frequencies but are small at high frequencies. In the short wavelength limit, then, scattering due to turbulence should be the predominant mechanism involved in producing the sound levels measured in shadow zones. No existing analytical method includes turbulence effects in the prediction of sound pressure levels in upward refractive shadow zones. In order to obtain quantitative average sound pressure level predictions, a numerical simulation of the effect of atmospheric turbulence on sound propagation is performed. The simulation is based on scattering from randomly distributed scattering centers ('turbules'). Sound pressure levels are computed for many realizations of a turbulent atmosphere. Predictions from the numerical simulation are compared with existing theories and experimental data.
Characterization of essential proteins based on network topology in proteins interaction networks
NASA Astrophysics Data System (ADS)
Bakar, Sakhinah Abu; Taheri, Javid; Zomaya, Albert Y.
2014-06-01
The identification of essential proteins is theoretically and practically important as (1) it is essential to understand the minimal surviving requirements for cellular lives, and (2) it provides fundamental for development of drug. As conducting experimental studies to identify essential proteins are both time and resource consuming, here we present a computational approach in predicting them based on network topology properties from protein-protein interaction networks of Saccharomyces cerevisiae. The proposed method, namely EP3NN (Essential Proteins Prediction using Probabilistic Neural Network) employed a machine learning algorithm called Probabilistic Neural Network as a classifier to identify essential proteins of the organism of interest; it uses degree centrality, closeness centrality, local assortativity and local clustering coefficient of each protein in the network for such predictions. Results show that EP3NN managed to successfully predict essential proteins with an accuracy of 95% for our studied organism. Results also show that most of the essential proteins are close to other proteins, have assortativity behavior and form clusters/sub-graph in the network.
Cao, Qi; Leung, K M
2014-09-22
Reliable computer models for the prediction of chemical biodegradability from molecular descriptors and fingerprints are very important for making health and environmental decisions. Coupling of the differential evolution (DE) algorithm with the support vector classifier (SVC) in order to optimize the main parameters of the classifier resulted in an improved classifier called the DE-SVC, which is introduced in this paper for use in chemical biodegradability studies. The DE-SVC was applied to predict the biodegradation of chemicals on the basis of extensive sample data sets and known structural features of molecules. Our optimization experiments showed that DE can efficiently find the proper parameters of the SVC. The resulting classifier possesses strong robustness and reliability compared with grid search, genetic algorithm, and particle swarm optimization methods. The classification experiments conducted here showed that the DE-SVC exhibits better classification performance than models previously used for such studies. It is a more effective and efficient prediction model for chemical biodegradability.
Estimation of the Driving Style Based on the Users' Activity and Environment Influence.
Sysoev, Mikhail; Kos, Andrej; Guna, Jože; Pogačnik, Matevž
2017-10-21
New models and methods have been designed to predict the influence of the user's environment and activity information to the driving style in standard automotive environments. For these purposes, an experiment was conducted providing two types of analysis: (i) the evaluation of a self-assessment of the driving style; (ii) the prediction of aggressive driving style based on drivers' activity and environment parameters. Sixty seven h of driving data from 10 drivers were collected for analysis in this study. The new parameters used in the experiment are the car door opening and closing manner, which were applied to improve the prediction accuracy. An Android application called Sensoric was developed to collect low-level smartphone data about the users' activity. The driving style was predicted from the user's environment and activity data collected before driving. The prediction was tested against the actual driving style, calculated from objective driving data. The prediction has shown encouraging results, with precision values ranging from 0.727 up to 0.909 for aggressive driving recognition rate. The obtained results lend support to the hypothesis that user's environment and activity data could be used for the prediction of the aggressive driving style in advance, before the driving starts.
NASA Astrophysics Data System (ADS)
Cai, Jiaxin; Chen, Tingting; Li, Yan; Zhu, Nenghui; Qiu, Xuan
2018-03-01
In order to analysis the fibrosis stage and inflammatory activity grade of chronic hepatitis C, a novel classification method based on collaborative representation (CR) with smoothly clipped absolute deviation penalty (SCAD) penalty term, called CR-SCAD classifier, is proposed for pattern recognition. After that, an auto-grading system based on CR-SCAD classifier is introduced for the prediction of fibrosis stage and inflammatory activity grade of chronic hepatitis C. The proposed method has been tested on 123 clinical cases of chronic hepatitis C based on serological indexes. Experimental results show that the performance of the proposed method outperforms the state-of-the-art baselines for the classification of fibrosis stage and inflammatory activity grade of chronic hepatitis C.
An improved method to detect correct protein folds using partial clustering.
Zhou, Jianjun; Wishart, David S
2013-01-16
Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient "partial" clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods. We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either C(α) RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite. The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance.
An improved method to detect correct protein folds using partial clustering
2013-01-01
Background Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient “partial“ clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods. Results We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either Cα RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite. Conclusions The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance. PMID:23323835
Parameter Optimization for Selected Correlation Analysis of Intracranial Pathophysiology.
Faltermeier, Rupert; Proescholdt, Martin A; Bele, Sylvia; Brawanski, Alexander
2015-01-01
Recently we proposed a mathematical tool set, called selected correlation analysis, that reliably detects positive and negative correlations between arterial blood pressure (ABP) and intracranial pressure (ICP). Such correlations are associated with severe impairment of the cerebral autoregulation and intracranial compliance, as predicted by a mathematical model. The time resolved selected correlation analysis is based on a windowing technique combined with Fourier-based coherence calculations and therefore depends on several parameters. For real time application of this method at an ICU it is inevitable to adjust this mathematical tool for high sensitivity and distinct reliability. In this study, we will introduce a method to optimize the parameters of the selected correlation analysis by correlating an index, called selected correlation positive (SCP), with the outcome of the patients represented by the Glasgow Outcome Scale (GOS). For that purpose, the data of twenty-five patients were used to calculate the SCP value for each patient and multitude of feasible parameter sets of the selected correlation analysis. It could be shown that an optimized set of parameters is able to improve the sensitivity of the method by a factor greater than four in comparison to our first analyses.
Parameter Optimization for Selected Correlation Analysis of Intracranial Pathophysiology
Faltermeier, Rupert; Proescholdt, Martin A.; Bele, Sylvia; Brawanski, Alexander
2015-01-01
Recently we proposed a mathematical tool set, called selected correlation analysis, that reliably detects positive and negative correlations between arterial blood pressure (ABP) and intracranial pressure (ICP). Such correlations are associated with severe impairment of the cerebral autoregulation and intracranial compliance, as predicted by a mathematical model. The time resolved selected correlation analysis is based on a windowing technique combined with Fourier-based coherence calculations and therefore depends on several parameters. For real time application of this method at an ICU it is inevitable to adjust this mathematical tool for high sensitivity and distinct reliability. In this study, we will introduce a method to optimize the parameters of the selected correlation analysis by correlating an index, called selected correlation positive (SCP), with the outcome of the patients represented by the Glasgow Outcome Scale (GOS). For that purpose, the data of twenty-five patients were used to calculate the SCP value for each patient and multitude of feasible parameter sets of the selected correlation analysis. It could be shown that an optimized set of parameters is able to improve the sensitivity of the method by a factor greater than four in comparison to our first analyses. PMID:26693250
Yang, Jing; Jin, Qi-Yu; Zhang, Biao; Shen, Hong-Bin
2016-08-15
Inter-residue contacts in proteins dictate the topology of protein structures. They are crucial for protein folding and structural stability. Accurate prediction of residue contacts especially for long-range contacts is important to the quality of ab inito structure modeling since they can enforce strong restraints to structure assembly. In this paper, we present a new Residue-Residue Contact predictor called R2C that combines machine learning-based and correlated mutation analysis-based methods, together with a two-dimensional Gaussian noise filter to enhance the long-range residue contact prediction. Our results show that the outputs from the machine learning-based method are concentrated with better performance on short-range contacts; while for correlated mutation analysis-based approach, the predictions are widespread with higher accuracy on long-range contacts. An effective query-driven dynamic fusion strategy proposed here takes full advantages of the two different methods, resulting in an impressive overall accuracy improvement. We also show that the contact map directly from the prediction model contains the interesting Gaussian noise, which has not been discovered before. Different from recent studies that tried to further enhance the quality of contact map by removing its transitive noise, we designed a new two-dimensional Gaussian noise filter, which was especially helpful for reinforcing the long-range residue contact prediction. Tested on recent CASP10/11 datasets, the overall top L/5 accuracy of our final R2C predictor is 17.6%/15.5% higher than the pure machine learning-based method and 7.8%/8.3% higher than the correlated mutation analysis-based approach for the long-range residue contact prediction. http://www.csbio.sjtu.edu.cn/bioinf/R2C/Contact:hbshen@sjtu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Graph wavelet alignment kernels for drug virtual screening.
Smalter, Aaron; Huan, Jun; Lushington, Gerald
2009-06-01
In this paper, we introduce a novel statistical modeling technique for target property prediction, with applications to virtual screening and drug design. In our method, we use graphs to model chemical structures and apply a wavelet analysis of graphs to summarize features capturing graph local topology. We design a novel graph kernel function to utilize the topology features to build predictive models for chemicals via Support Vector Machine classifier. We call the new graph kernel a graph wavelet-alignment kernel. We have evaluated the efficacy of the wavelet-alignment kernel using a set of chemical structure-activity prediction benchmarks. Our results indicate that the use of the kernel function yields performance profiles comparable to, and sometimes exceeding that of the existing state-of-the-art chemical classification approaches. In addition, our results also show that the use of wavelet functions significantly decreases the computational costs for graph kernel computation with more than ten fold speedup.
Cheung, Y M; Leung, W M; Xu, L
1997-01-01
We propose a prediction model called Rival Penalized Competitive Learning (RPCL) and Combined Linear Predictor method (CLP), which involves a set of local linear predictors such that a prediction is made by the combination of some activated predictors through a gating network (Xu et al., 1994). Furthermore, we present its improved variant named Adaptive RPCL-CLP that includes an adaptive learning mechanism as well as a data pre-and-post processing scheme. We compare them with some existing models by demonstrating their performance on two real-world financial time series--a China stock price and an exchange-rate series of US Dollar (USD) versus Deutschmark (DEM). Experiments have shown that Adaptive RPCL-CLP not only outperforms the other approaches with the smallest prediction error and training costs, but also brings in considerable high profits in the trading simulation of foreign exchange market.
Chen, Mingchen; Lin, Xingcheng; Zheng, Weihua; Onuchic, José N; Wolynes, Peter G
2016-08-25
The associative memory, water mediated, structure and energy model (AWSEM) is a coarse-grained force field with transferable tertiary interactions that incorporates local in sequence energetic biases using bioinformatically derived structural information about peptide fragments with locally similar sequences that we call memories. The memory information from the protein data bank (PDB) database guides proper protein folding. The structural information about available sequences in the database varies in quality and can sometimes lead to frustrated free energy landscapes locally. One way out of this difficulty is to construct the input fragment memory information from all-atom simulations of portions of the complete polypeptide chain. In this paper, we investigate this approach first put forward by Kwac and Wolynes in a more complete way by studying the structure prediction capabilities of this approach for six α-helical proteins. This scheme which we call the atomistic associative memory, water mediated, structure and energy model (AAWSEM) amounts to an ab initio protein structure prediction method that starts from the ground up without using bioinformatic input. The free energy profiles from AAWSEM show that atomistic fragment memories are sufficient to guide the correct folding when tertiary forces are included. AAWSEM combines the efficiency of coarse-grained simulations on the full protein level with the local structural accuracy achievable from all-atom simulations of only parts of a large protein. The results suggest that a hybrid use of atomistic fragment memory and database memory in structural predictions may well be optimal for many practical applications.
Improving Genomic Prediction in Cassava Field Experiments by Accounting for Interplot Competition
Elias, Ani A.; Rabbi, Ismail; Kulakow, Peter; Jannink, Jean-Luc
2018-01-01
Plants competing for available resources is an unavoidable phenomenon in a field. We conducted studies in cassava (Manihot esculenta Crantz) in order to understand the pattern of this competition. Taking into account the competitive ability of genotypes while selecting parents for breeding advancement or commercialization can be very useful. We assumed that competition could occur at two levels: (i) the genotypic level, which we call interclonal, and (ii) the plot level irrespective of the type of genotype, which we call interplot competition or competition error. Modification in incidence matrices was applied in order to relate neighboring genotype/plot to the performance of a target genotype/plot with respect to its competitive ability. This was added into a genomic selection (GS) model to simultaneously predict the direct and competitive ability of a genotype. Predictability of the models was tested through a 10-fold cross-validation method repeated five times. The best model was chosen as the one with the lowest prediction root mean squared error (pRMSE) compared to that of the base model having no competitive component. Results from our real data studies indicated that <10% increase in accuracy was achieved with GS-interclonal competition model, but this value reached up to 25% with a GS-competition error model. We also found that the competitive influence of a cassava clone is not just limited to the adjacent neighbors but spreads beyond them. Through simulations, we found that a 26% increase of accuracy in estimating trait genotypic effect can be achieved even in the presence of high competitive variance. PMID:29358232
Improving Genomic Prediction in Cassava Field Experiments by Accounting for Interplot Competition.
Elias, Ani A; Rabbi, Ismail; Kulakow, Peter; Jannink, Jean-Luc
2018-03-02
Plants competing for available resources is an unavoidable phenomenon in a field. We conducted studies in cassava ( Manihot esculenta Crantz) in order to understand the pattern of this competition. Taking into account the competitive ability of genotypes while selecting parents for breeding advancement or commercialization can be very useful. We assumed that competition could occur at two levels: (i) the genotypic level, which we call interclonal, and (ii) the plot level irrespective of the type of genotype, which we call interplot competition or competition error. Modification in incidence matrices was applied in order to relate neighboring genotype/plot to the performance of a target genotype/plot with respect to its competitive ability. This was added into a genomic selection (GS) model to simultaneously predict the direct and competitive ability of a genotype. Predictability of the models was tested through a 10-fold cross-validation method repeated five times. The best model was chosen as the one with the lowest prediction root mean squared error (pRMSE) compared to that of the base model having no competitive component. Results from our real data studies indicated that <10% increase in accuracy was achieved with GS-interclonal competition model, but this value reached up to 25% with a GS-competition error model. We also found that the competitive influence of a cassava clone is not just limited to the adjacent neighbors but spreads beyond them. Through simulations, we found that a 26% increase of accuracy in estimating trait genotypic effect can be achieved even in the presence of high competitive variance. Copyright © 2018 Elias et al.
Exploring high dimensional free energy landscapes: Temperature accelerated sliced sampling
NASA Astrophysics Data System (ADS)
Awasthi, Shalini; Nair, Nisanth N.
2017-03-01
Biased sampling of collective variables is widely used to accelerate rare events in molecular simulations and to explore free energy surfaces. However, computational efficiency of these methods decreases with increasing number of collective variables, which severely limits the predictive power of the enhanced sampling approaches. Here we propose a method called Temperature Accelerated Sliced Sampling (TASS) that combines temperature accelerated molecular dynamics with umbrella sampling and metadynamics to sample the collective variable space in an efficient manner. The presented method can sample a large number of collective variables and is advantageous for controlled exploration of broad and unbound free energy basins. TASS is also shown to achieve quick free energy convergence and is practically usable with ab initio molecular dynamics techniques.
Mena, Luis J.; Orozco, Eber E.; Felix, Vanessa G.; Ostos, Rodolfo; Melgarejo, Jesus; Maestre, Gladys E.
2012-01-01
Machine learning has become a powerful tool for analysing medical domains, assessing the importance of clinical parameters, and extracting medical knowledge for outcomes research. In this paper, we present a machine learning method for extracting diagnostic and prognostic thresholds, based on a symbolic classification algorithm called REMED. We evaluated the performance of our method by determining new prognostic thresholds for well-known and potential cardiovascular risk factors that are used to support medical decisions in the prognosis of fatal cardiovascular diseases. Our approach predicted 36% of cardiovascular deaths with 80% specificity and 75% general accuracy. The new method provides an innovative approach that might be useful to support decisions about medical diagnoses and prognoses. PMID:22924062
Sea level forecasts using neural networks
NASA Astrophysics Data System (ADS)
Röske, Frank
1997-03-01
In this paper, a new method for predicting the sea level employing a neural network approach is introduced. It was designed to improve the prediction of the sea level along the German North Sea Coast under standard conditions. The sea level at any given time depends upon the tides as well as meteorological and oceanographic factors, such as the winds and external surges induced by air pressure. Since tidal predictions are already sufficiently accurate, they have been subtracted from the observed sea levels. The differences will be predicted up to 18 hours in advance. In this paper, the differences are called anomalies. The prediction of the sea level each hour is distinguished from its predictions at the times of high and low tide. For this study, Cuxhaven was selected as a reference site. The predictions made using neural networks were compared for accuracy with the prognoses prepared using six models: two hydrodynamic models, a statistical model, a nearest neighbor model, which is based on analogies, the persistence model, and the verbal forecasts that are broadcast and kept on record by the Sea Level Forecast Service of the Federal Maritime and Hydrography Agency (BSH) in Hamburg. Predictions were calculated for the year 1993 and compared with the actual levels measured. Artificial neural networks are capable of learning. By applying them to the prediction of sea levels, learning from past events has been attempted. It was also attempted to make the experiences of expert forecasters objective. Instead of using the wide-spread back-propagation networks, the self-organizing feature map of Kohonen, or “Kohonen network”, was applied. The fundamental principle of this network is the transformation of the signal similarity into the neighborhood of the neurons while preserving the topology of the signal space. The self-organization procedure of Kohonen networks can be visualized. To make predictions, these networks have been subdivided into a part describing the past state and another part describing the prediction. Both parts have been chosen according to methods of auto- and multiregression. A Kohonen network that has finished learning can be interpreted to be an adaptive table of such descriptions. To avoid overloading the Kohonen networks, the time series, made as complete as possible, were reduced to a learnable data set by means of two selection methods. The minimal distance method as a part of the cluster analysis was used, which selects representative temporal patterns. A novel method called circular group reduction was developed, which selects extreme patterns. This method is used as a supplement to the first one. To help the Kohonen network maintain its memory, the number of neurons and the maximum learning time were chosen according to the number of learning samples. To improve convergence, a combination of criteria was developed to break off learning, which could shown to be conform with the self-organization procedure. Kohonen networks were also applied in an autoregressive manner for the prediction of meteorological variables, especially wind. However, the quality of these predictions was inferior to those of the Marine Weather Service (SWA) in Hamburg, which is part of the German Weather Service (DWD) in Offenbach. High and low tide anomalies were predicted using Kohonen networks for multiregressions. The verbal predictions of high tide anomalies of the BSH Sea Level Forecasting Service were the most precise of all six comparison models. By using the Kohonen networks, it was even possible to improve these predictions and reduce their average error by 1 cm, from 15 to 14 cm.The precision of the Kohonen networks improved as their number of neurons increased and as their weight vectors became smaller. Since there were no major changes in the statistical properties of measurements made over mediumrange time scales, such networks that have completed learning were placed at the Sea Level Forecast Service. However, over the long term, there can be changes in these properties due to climate changes and deepening of the Elbe River. Therefore, the training process of the networks should be repeated periodically taking longer time series into consideration.
Ensemble of trees approaches to risk adjustment for evaluating a hospital's performance.
Liu, Yang; Traskin, Mikhail; Lorch, Scott A; George, Edward I; Small, Dylan
2015-03-01
A commonly used method for evaluating a hospital's performance on an outcome is to compare the hospital's observed outcome rate to the hospital's expected outcome rate given its patient (case) mix and service. The process of calculating the hospital's expected outcome rate given its patient mix and service is called risk adjustment (Iezzoni 1997). Risk adjustment is critical for accurately evaluating and comparing hospitals' performances since we would not want to unfairly penalize a hospital just because it treats sicker patients. The key to risk adjustment is accurately estimating the probability of an Outcome given patient characteristics. For cases with binary outcomes, the method that is commonly used in risk adjustment is logistic regression. In this paper, we consider ensemble of trees methods as alternatives for risk adjustment, including random forests and Bayesian additive regression trees (BART). Both random forests and BART are modern machine learning methods that have been shown recently to have excellent performance for prediction of outcomes in many settings. We apply these methods to carry out risk adjustment for the performance of neonatal intensive care units (NICU). We show that these ensemble of trees methods outperform logistic regression in predicting mortality among babies treated in NICU, and provide a superior method of risk adjustment compared to logistic regression.
Mihaljević, Bojan; Bielza, Concha; Benavides-Piccione, Ruth; DeFelipe, Javier; Larrañaga, Pedro
2014-01-01
Interneuron classification is an important and long-debated topic in neuroscience. A recent study provided a data set of digitally reconstructed interneurons classified by 42 leading neuroscientists according to a pragmatic classification scheme composed of five categorical variables, namely, of the interneuron type and four features of axonal morphology. From this data set we now learned a model which can classify interneurons, on the basis of their axonal morphometric parameters, into these five descriptive variables simultaneously. Because of differences in opinion among the neuroscientists, especially regarding neuronal type, for many interneurons we lacked a unique, agreed-upon classification, which we could use to guide model learning. Instead, we guided model learning with a probability distribution over the neuronal type and the axonal features, obtained, for each interneuron, from the neuroscientists' classification choices. We conveniently encoded such probability distributions with Bayesian networks, calling them label Bayesian networks (LBNs), and developed a method to predict them. This method predicts an LBN by forming a probabilistic consensus among the LBNs of the interneurons most similar to the one being classified. We used 18 axonal morphometric parameters as predictor variables, 13 of which we introduce in this paper as quantitative counterparts to the categorical axonal features. We were able to accurately predict interneuronal LBNs. Furthermore, when extracting crisp (i.e., non-probabilistic) predictions from the predicted LBNs, our method outperformed related work on interneuron classification. Our results indicate that our method is adequate for multi-dimensional classification of interneurons with probabilistic labels. Moreover, the introduced morphometric parameters are good predictors of interneuron type and the four features of axonal morphology and thus may serve as objective counterparts to the subjective, categorical axonal features.
A digital prediction algorithm for a single-phase boost PFC
NASA Astrophysics Data System (ADS)
Qing, Wang; Ning, Chen; Weifeng, Sun; Shengli, Lu; Longxing, Shi
2012-12-01
A novel digital control algorithm for digital control power factor correction is presented, which is called the prediction algorithm and has a feature of a higher PF (power factor) with lower total harmonic distortion, and a faster dynamic response with the change of the input voltage or load current. For a certain system, based on the current system state parameters, the prediction algorithm can estimate the track of the output voltage and the inductor current at the next switching cycle and get a set of optimized control sequences to perfectly track the trajectory of input voltage. The proposed prediction algorithm is verified at different conditions, and computer simulation and experimental results under multi-situations confirm the effectiveness of the prediction algorithm. Under the circumstances that the input voltage is in the range of 90-265 V and the load current in the range of 20%-100%, the PF value is larger than 0.998. The startup and the recovery times respectively are about 0.1 s and 0.02 s without overshoot. The experimental results also verify the validity of the proposed method.
The statistical mechanics of complex signaling networks: nerve growth factor signaling
NASA Astrophysics Data System (ADS)
Brown, K. S.; Hill, C. C.; Calero, G. A.; Myers, C. R.; Lee, K. H.; Sethna, J. P.; Cerione, R. A.
2004-10-01
The inherent complexity of cellular signaling networks and their importance to a wide range of cellular functions necessitates the development of modeling methods that can be applied toward making predictions and highlighting the appropriate experiments to test our understanding of how these systems are designed and function. We use methods of statistical mechanics to extract useful predictions for complex cellular signaling networks. A key difficulty with signaling models is that, while significant effort is being made to experimentally measure the rate constants for individual steps in these networks, many of the parameters required to describe their behavior remain unknown or at best represent estimates. To establish the usefulness of our approach, we have applied our methods toward modeling the nerve growth factor (NGF)-induced differentiation of neuronal cells. In particular, we study the actions of NGF and mitogenic epidermal growth factor (EGF) in rat pheochromocytoma (PC12) cells. Through a network of intermediate signaling proteins, each of these growth factors stimulates extracellular regulated kinase (Erk) phosphorylation with distinct dynamical profiles. Using our modeling approach, we are able to predict the influence of specific signaling modules in determining the integrated cellular response to the two growth factors. Our methods also raise some interesting insights into the design and possible evolution of cellular systems, highlighting an inherent property of these systems that we call 'sloppiness.'
Boomerang: A method for recursive reclassification.
Devlin, Sean M; Ostrovnaya, Irina; Gönen, Mithat
2016-09-01
While there are many validated prognostic classifiers used in practice, often their accuracy is modest and heterogeneity in clinical outcomes exists in one or more risk subgroups. Newly available markers, such as genomic mutations, may be used to improve the accuracy of an existing classifier by reclassifying patients from a heterogenous group into a higher or lower risk category. The statistical tools typically applied to develop the initial classifiers are not easily adapted toward this reclassification goal. In this article, we develop a new method designed to refine an existing prognostic classifier by incorporating new markers. The two-stage algorithm called Boomerang first searches for modifications of the existing classifier that increase the overall predictive accuracy and then merges to a prespecified number of risk groups. Resampling techniques are proposed to assess the improvement in predictive accuracy when an independent validation data set is not available. The performance of the algorithm is assessed under various simulation scenarios where the marker frequency, degree of censoring, and total sample size are varied. The results suggest that the method selects few false positive markers and is able to improve the predictive accuracy of the classifier in many settings. Lastly, the method is illustrated on an acute myeloid leukemia data set where a new refined classifier incorporates four new mutations into the existing three category classifier and is validated on an independent data set. © 2016, The International Biometric Society.
Boomerang: A Method for Recursive Reclassification
Devlin, Sean M.; Ostrovnaya, Irina; Gönen, Mithat
2016-01-01
Summary While there are many validated prognostic classifiers used in practice, often their accuracy is modest and heterogeneity in clinical outcomes exists in one or more risk subgroups. Newly available markers, such as genomic mutations, may be used to improve the accuracy of an existing classifier by reclassifying patients from a heterogenous group into a higher or lower risk category. The statistical tools typically applied to develop the initial classifiers are not easily adapted towards this reclassification goal. In this paper, we develop a new method designed to refine an existing prognostic classifier by incorporating new markers. The two-stage algorithm called Boomerang first searches for modifications of the existing classifier that increase the overall predictive accuracy and then merges to a pre-specified number of risk groups. Resampling techniques are proposed to assess the improvement in predictive accuracy when an independent validation data set is not available. The performance of the algorithm is assessed under various simulation scenarios where the marker frequency, degree of censoring, and total sample size are varied. The results suggest that the method selects few false positive markers and is able to improve the predictive accuracy of the classifier in many settings. Lastly, the method is illustrated on an acute myeloid leukemia dataset where a new refined classifier incorporates four new mutations into the existing three category classifier and is validated on an independent dataset. PMID:26754051
A path-based measurement for human miRNA functional similarities using miRNA-disease associations
NASA Astrophysics Data System (ADS)
Ding, Pingjian; Luo, Jiawei; Xiao, Qiu; Chen, Xiangtao
2016-09-01
Compared with the sequence and expression similarity, miRNA functional similarity is so important for biology researches and many applications such as miRNA clustering, miRNA function prediction, miRNA synergism identification and disease miRNA prioritization. However, the existing methods always utilized the predicted miRNA target which has high false positive and false negative to calculate the miRNA functional similarity. Meanwhile, it is difficult to achieve high reliability of miRNA functional similarity with miRNA-disease associations. Therefore, it is increasingly needed to improve the measurement of miRNA functional similarity. In this study, we develop a novel path-based calculation method of miRNA functional similarity based on miRNA-disease associations, called MFSP. Compared with other methods, our method obtains higher average functional similarity of intra-family and intra-cluster selected groups. Meanwhile, the lower average functional similarity of inter-family and inter-cluster miRNA pair is obtained. In addition, the smaller p-value is achieved, while applying Wilcoxon rank-sum test and Kruskal-Wallis test to different miRNA groups. The relationship between miRNA functional similarity and other information sources is exhibited. Furthermore, the constructed miRNA functional network based on MFSP is a scale-free and small-world network. Moreover, the higher AUC for miRNA-disease prediction indicates the ability of MFSP uncovering miRNA functional similarity.
Quasi-coarse-grained dynamics: modelling of metallic materials at mesoscales
NASA Astrophysics Data System (ADS)
Dongare, Avinash M.
2014-12-01
A computationally efficient modelling method called quasi-coarse-grained dynamics (QCGD) is developed to expand the capabilities of molecular dynamics (MD) simulations to model behaviour of metallic materials at the mesoscales. This mesoscale method is based on solving the equations of motion for a chosen set of representative atoms from an atomistic microstructure and using scaling relationships for the atomic-scale interatomic potentials in MD simulations to define the interactions between representative atoms. The scaling relationships retain the atomic-scale degrees of freedom and therefore energetics of the representative atoms as would be predicted in MD simulations. The total energetics of the system is retained by scaling the energetics and the atomic-scale degrees of freedom of these representative atoms to account for the missing atoms in the microstructure. This scaling of the energetics renders improved time steps for the QCGD simulations. The success of the QCGD method is demonstrated by the prediction of the structural energetics, high-temperature thermodynamics, deformation behaviour of interfaces, phase transformation behaviour, plastic deformation behaviour, heat generation during plastic deformation, as well as the wave propagation behaviour, as would be predicted using MD simulations for a reduced number of representative atoms. The reduced number of atoms and the improved time steps enables the modelling of metallic materials at the mesoscale in extreme environments.
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling.
Wang, Sheng; Sun, Siqi; Xu, Jinbo
2016-09-01
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC.
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling
Wang, Sheng; Sun, Siqi
2017-01-01
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC. PMID:28884168
Maltarollo, Vinícius G; Homem-de-Mello, Paula; Honorio, Káthia M
2011-10-01
Current researches on treatments for metabolic diseases involve a class of biological receptors called peroxisome proliferator-activated receptors (PPARs), which control the metabolism of carbohydrates and lipids. A subclass of these receptors, PPARδ, regulates several metabolic processes, and the substances that activate them are being studied as new drug candidates for the treatment of diabetes mellitus and metabolic syndrome. In this study, several PPARδ agonists with experimental biological activity were selected for a structural and chemical study. Electronic, stereochemical, lipophilic and topological descriptors were calculated for the selected compounds using various theoretical methods, such as density functional theory (DFT). Fisher's weight and principal components analysis (PCA) methods were employed to select the most relevant variables for this study. The partial least squares (PLS) method was used to construct the multivariate statistical model, and the best model obtained had 4 PCs, q ( 2 ) = 0.80 and r ( 2 ) = 0.90, indicating a good internal consistency. The prediction residues calculated for the compounds in the test set had low values, indicating the good predictive capability of our PLS model. The model obtained in this study is reliable and can be used to predict the biological activity of new untested compounds. Docking studies have also confirmed the importance of the molecular descriptors selected for this system.
Tang, Hua; Chen, Wei; Lin, Hao
2016-04-01
Immunoglobulins, also called antibodies, are a group of cell surface proteins which are produced by the immune system in response to the presence of a foreign substance (called antigen). They play key roles in many medical, diagnostic and biotechnological applications. Correct identification of immunoglobulins is crucial to the comprehension of humoral immune function. With the avalanche of protein sequences identified in postgenomic age, it is highly desirable to develop computational methods to timely identify immunoglobulins. In view of this, we designed a predictor called "IGPred" by formulating protein sequences with the pseudo amino acid composition into which nine physiochemical properties of amino acids were incorporated. Jackknife cross-validated results showed that 96.3% of immunoglobulins and 97.5% of non-immunoglobulins can be correctly predicted, indicating that IGPred holds very high potential to become a useful tool for antibody analysis. For the convenience of most experimental scientists, a web-server for IGPred was established at http://lin.uestc.edu.cn/server/IGPred. We believe that the web-server will become a powerful tool to study immunoglobulins and to guide related experimental validations.
Ma, Chuang; Xin, Mingming; Feldmann, Kenneth A.; Wang, Xiangfeng
2014-01-01
Machine learning (ML) is an intelligent data mining technique that builds a prediction model based on the learning of prior knowledge to recognize patterns in large-scale data sets. We present an ML-based methodology for transcriptome analysis via comparison of gene coexpression networks, implemented as an R package called machine learning–based differential network analysis (mlDNA) and apply this method to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana. The mlDNA first used a ML-based filtering process to remove nonexpressed, constitutively expressed, or non-stress-responsive “noninformative” genes prior to network construction, through learning the patterns of 32 expression characteristics of known stress-related genes. The retained “informative” genes were subsequently analyzed by ML-based network comparison to predict candidate stress-related genes showing expression and network differences between control and stress networks, based on 33 network topological characteristics. Comparative evaluation of the network-centric and gene-centric analytic methods showed that mlDNA substantially outperformed traditional statistical testing–based differential expression analysis at identifying stress-related genes, with markedly improved prediction accuracy. To experimentally validate the mlDNA predictions, we selected 89 candidates out of the 1784 predicted salt stress–related genes with available SALK T-DNA mutagenesis lines for phenotypic screening and identified two previously unreported genes, mutants of which showed salt-sensitive phenotypes. PMID:24520154
Analysis of the origin of predictability in human communications
NASA Astrophysics Data System (ADS)
Zhang, Lin; Liu, Yani; Wu, Ye; Xiao, Jinghua
2014-01-01
Human behaviors in daily life can be traced by their communications via electronic devices. E-mails, short messages and cell-phone calls can be used to investigate the predictability of communication partners’ patterns, because these three are the most representative and common behaviors in daily communications. In this paper, we show that all the three manners have apparent predictability in partners’ patterns, and moreover, the short message users’ sequences have the highest predictability among the three. We also reveal that people with fewer communication partners have higher predictability. Finally, we investigate the origin of predictability, which comes from two aspects: one is the intrinsic pattern in the partners sequence, that is, people have the preference of communicating with a fixed partner after another fixed one. The other aspect is the burst, which is communicating with the same partner several times in a row. The high burst in short message communication pattern is one of the main reasons for its high predictability, the intrinsic pattern in e-mail partners sequence is the main reason for its predictability, and the predictability of cell-phone call partners sequence comes from both aspects.
Akdenur, B; Okkesum, S; Kara, S; Günes, S
2009-11-01
In this study, electromyography signals sampled from children undergoing orthodontic treatment were used to estimate the effect of an orthodontic trainer on the anterior temporal muscle. A novel data normalization method, called the correlation- and covariance-supported normalization method (CCSNM), based on correlation and covariance between features in a data set, is proposed to provide predictive guidance to the orthodontic technique. The method was tested in two stages: first, data normalization using the CCSNM; second, prediction of normalized values of anterior temporal muscles using an artificial neural network (ANN) with a Levenberg-Marquardt learning algorithm. The data set consists of electromyography signals from right anterior temporal muscles, recorded from 20 children aged 8-13 years with class II malocclusion. The signals were recorded at the start and end of a 6-month treatment. In order to train and test the ANN, two-fold cross-validation was used. The CCSNM was compared with four normalization methods: minimum-maximum normalization, z score, decimal scaling, and line base normalization. In order to demonstrate the performance of the proposed method, prevalent performance-measuring methods, and the mean square error and mean absolute error as mathematical methods, the statistical relation factor R2 and the average deviation have been examined. The results show that the CCSNM was the best normalization method among other normalization methods for estimating the effect of the trainer.
PSSMHCpan: a novel PSSM-based software for predicting class I peptide-HLA binding affinity
Liu, Geng; Li, Dongli; Li, Zhang; Qiu, Si; Li, Wenhui; Chao, Cheng-chi; Yang, Naibo; Li, Handong; Cheng, Zhen; Song, Xin; Cheng, Le; Zhang, Xiuqing; Wang, Jian; Yang, Huanming
2017-01-01
Abstract Predicting peptide binding affinity with human leukocyte antigen (HLA) is a crucial step in developing powerful antitumor vaccine for cancer immunotherapy. Currently available methods work quite well in predicting peptide binding affinity with HLA alleles such as HLA-A*0201, HLA-A*0101, and HLA-B*0702 in terms of sensitivity and specificity. However, quite a few types of HLA alleles that are present in the majority of human populations including HLA-A*0202, HLA-A*0203, HLA-A*6802, HLA-B*5101, HLA-B*5301, HLA-B*5401, and HLA-B*5701 still cannot be predicted with satisfactory accuracy using currently available methods. Furthermore, currently the most popularly used methods for predicting peptide binding affinity are inefficient in identifying neoantigens from a large quantity of whole genome and transcriptome sequencing data. Here we present a Position Specific Scoring Matrix (PSSM)-based software called PSSMHCpan to accurately and efficiently predict peptide binding affinity with a broad coverage of HLA class I alleles. We evaluated the performance of PSSMHCpan by analyzing 10-fold cross-validation on a training database containing 87 HLA alleles and obtained an average area under receiver operating characteristic curve (AUC) of 0.94 and accuracy (ACC) of 0.85. In an independent dataset (Peptide Database of Cancer Immunity) evaluation, PSSMHCpan is substantially better than the popularly used NetMHC-4.0, NetMHCpan-3.0, PickPocket, Nebula, and SMM with a sensitivity of 0.90, as compared to 0.74, 0.81, 0.77, 0.24, and 0.79. In addition, PSSMHCpan is more than 197 times faster than NetMHC-4.0, NetMHCpan-3.0, PickPocket, sNebula, and SMM when predicting neoantigens from 661 263 peptides from a breast tumor sample. Finally, we built a neoantigen prediction pipeline and identified 117 017 neoantigens from 467 cancer samples of various cancers from TCGA. PSSMHCpan is superior to the currently available methods in predicting peptide binding affinity with a broad coverage of HLA class I alleles. PMID:28327987
Kim, Byoungjip; Kang, Seungwoo; Ha, Jin-Young; Song, Junehwa
2015-07-16
In this paper, we introduce a novel smartphone framework called VisitSense that automatically detects and predicts a smartphone user's place visits from ambient radio to enable behavioral targeting for mobile ads in large shopping malls. VisitSense enables mobile app developers to adopt visit-pattern-aware mobile advertising for shopping mall visitors in their apps. It also benefits mobile users by allowing them to receive highly relevant mobile ads that are aware of their place visit patterns in shopping malls. To achieve the goal, VisitSense employs accurate visit detection and prediction methods. For accurate visit detection, we develop a change-based detection method to take into consideration the stability change of ambient radio and the mobility change of users. It performs well in large shopping malls where ambient radio is quite noisy and causes existing algorithms to easily fail. In addition, we proposed a causality-based visit prediction model to capture the causality in the sequential visit patterns for effective prediction. We have developed a VisitSense prototype system, and a visit-pattern-aware mobile advertising application that is based on it. Furthermore, we deploy the system in the COEX Mall, one of the largest shopping malls in Korea, and conduct diverse experiments to show the effectiveness of VisitSense.
Microbial genomic island discovery, visualization and analysis.
Bertelli, Claire; Tilley, Keith E; Brinkman, Fiona S L
2018-06-03
Horizontal gene transfer (also called lateral gene transfer) is a major mechanism for microbial genome evolution, enabling rapid adaptation and survival in specific niches. Genomic islands (GIs), commonly defined as clusters of bacterial or archaeal genes of probable horizontal origin, are of particular medical, environmental and/or industrial interest, as they disproportionately encode virulence factors and some antimicrobial resistance genes and may harbor entire metabolic pathways that confer a specific adaptation (solvent resistance, symbiosis properties, etc). As large-scale analyses of microbial genomes increases, such as for genomic epidemiology investigations of infectious disease outbreaks in public health, there is increased appreciation of the need to accurately predict and track GIs. Over the past decade, numerous computational tools have been developed to tackle the challenges inherent in accurate GI prediction. We review here the main types of GI prediction methods and discuss their advantages and limitations for a routine analysis of microbial genomes in this era of rapid whole-genome sequencing. An assessment is provided of 20 GI prediction software methods that use sequence-composition bias to identify the GIs, using a reference GI data set from 104 genomes obtained using an independent comparative genomics approach. Finally, we present guidelines to assist researchers in effectively identifying these key genomic regions.
Kiryu, Hisanori; Kin, Taishin; Asai, Kiyoshi
2007-02-15
Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. Supplementary data are available at Bioinformatics online.
Patel, Meenal J; Andreescu, Carmen; Price, Julie C; Edelman, Kathryn L; Reynolds, Charles F; Aizenstein, Howard J
2015-10-01
Currently, depression diagnosis relies primarily on behavioral symptoms and signs, and treatment is guided by trial and error instead of evaluating associated underlying brain characteristics. Unlike past studies, we attempted to estimate accurate prediction models for late-life depression diagnosis and treatment response using multiple machine learning methods with inputs of multi-modal imaging and non-imaging whole brain and network-based features. Late-life depression patients (medicated post-recruitment) (n = 33) and older non-depressed individuals (n = 35) were recruited. Their demographics and cognitive ability scores were recorded, and brain characteristics were acquired using multi-modal magnetic resonance imaging pretreatment. Linear and nonlinear learning methods were tested for estimating accurate prediction models. A learning method called alternating decision trees estimated the most accurate prediction models for late-life depression diagnosis (87.27% accuracy) and treatment response (89.47% accuracy). The diagnosis model included measures of age, Mini-mental state examination score, and structural imaging (e.g. whole brain atrophy and global white mater hyperintensity burden). The treatment response model included measures of structural and functional connectivity. Combinations of multi-modal imaging and/or non-imaging measures may help better predict late-life depression diagnosis and treatment response. As a preliminary observation, we speculate that the results may also suggest that different underlying brain characteristics defined by multi-modal imaging measures-rather than region-based differences-are associated with depression versus depression recovery because to our knowledge this is the first depression study to accurately predict both using the same approach. These findings may help better understand late-life depression and identify preliminary steps toward personalized late-life depression treatment. Copyright © 2015 John Wiley & Sons, Ltd.
ESIF Call for High-Impact Integrated Projects | Energy Systems Integration
Integrated Projects As a U.S. Department of Energy user facility, the Energy Systems Integration Facility concepts, tools, and technologies needed to measure, analyze, predict, protect, and control the grid of the Facility | NREL ESIF Call for High-Impact Integrated Projects ESIF Call for High-Impact
ERIC Educational Resources Information Center
Mishara, Brian L.; Giroux, Guy
1993-01-01
Examined stress perceived by telephone intervention volunteers (N=80) at suicide prevention center. Only amount of experience in telephone intervention with suicidal persons predicted stress level before shift. Stress during high-urgency call was related to level of urgency of call; total length of all calls received; and coping mechanisms of…
ERIC Educational Resources Information Center
Poteat, V. Paul; O'Dwyer, Laura M.; Mereish, Ethan H.
2012-01-01
This longitudinal study tested for changes in how students used and were called homophobic epithets as they progressed through high school. Boys used and were called these epithets with increased frequency over time, whereas girls reported decreases on both. Distinct gender socialization processes may contribute to these different patterns for…
TarPmiR: a new approach for microRNA target site prediction.
Ding, Jun; Li, Xiaoman; Hu, Haiyan
2016-09-15
The identification of microRNA (miRNA) target sites is fundamentally important for studying gene regulation. There are dozens of computational methods available for miRNA target site prediction. Despite their existence, we still cannot reliably identify miRNA target sites, partially due to our limited understanding of the characteristics of miRNA target sites. The recently published CLASH (crosslinking ligation and sequencing of hybrids) data provide an unprecedented opportunity to study the characteristics of miRNA target sites and improve miRNA target site prediction methods. Applying four different machine learning approaches to the CLASH data, we identified seven new features of miRNA target sites. Combining these new features with those commonly used by existing miRNA target prediction algorithms, we developed an approach called TarPmiR for miRNA target site prediction. Testing on two human and one mouse non-CLASH datasets, we showed that TarPmiR predicted more than 74.2% of true miRNA target sites in each dataset. Compared with three existing approaches, we demonstrated that TarPmiR is superior to these existing approaches in terms of better recall and better precision. The TarPmiR software is freely available at http://hulab.ucf.edu/research/projects/miRNA/TarPmiR/ CONTACTS: haihu@cs.ucf.edu or xiaoman@mail.ucf.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Wagenbrenner, N. S.; Forthofer, J.; Butler, B.; Shannon, K.
2014-12-01
Near-surface wind predictions are important for a number of applications, including transport and dispersion, wind energy forecasting, and wildfire behavior. Researchers and forecasters would benefit from a wind model that could be readily applied to complex terrain for use in these various disciplines. Unfortunately, near-surface winds in complex terrain are not handled well by traditional modeling approaches. Numerical weather prediction models employ coarse horizontal resolutions which do not adequately resolve sub-grid terrain features important to the surface flow. Computational fluid dynamics (CFD) models are increasingly being applied to simulate atmospheric boundary layer (ABL) flows, especially in wind energy applications; however, the standard functionality provided in commercial CFD models is not suitable for ABL flows. Appropriate CFD modeling in the ABL requires modification of empirically-derived wall function parameters and boundary conditions to avoid erroneous streamwise gradients due to inconsistences between inlet profiles and specified boundary conditions. This work presents a new version of a near-surface wind model for complex terrain called WindNinja. The new version of WindNinja offers two options for flow simulations: 1) the native, fast-running mass-consistent method available in previous model versions and 2) a CFD approach based on the OpenFOAM modeling framework and optimized for ABL flows. The model is described and evaluations of predictions with surface wind data collected from two recent field campaigns in complex terrain are presented. A comparison of predictions from the native mass-consistent method and the new CFD method is also provided.
Henry, Jade Vu; Magruder, S; Snyder, M
2004-09-24
Kaiser Permanente of the Mid-Atlantic States (KPMAS) is collaborating with the Electronic Surveillance System for Early Notification of Community-Based Epidemics II (ESSENCE II) program to understand how managed-care data can be effectively used for syndromic surveillance. This study examined whether KPMAS nurse advice hotline data would be able to predict the syndrome diagnoses made during subsequent KPMAS office visits. All nurse advice hotline calls during 2002 that were linked to an outpatient office visit were identified. By using International Classification of Diseases, Ninth Revision (ICD-9) codes, outpatient visits were categorized into seven ESSENCE II syndrome groups (coma, gastrointestinal, respiratory, neurologic, hemorrhagic, infectious dermatologic, and fever). Nurse advice hotline calls were categorized into ESSENCE II syndrome groups on the basis of the advice guidelines assigned. For each syndrome group, the sensitivity, specificity, and positive predictive value of hotline calls were calculated by using office visits as a diagnostic standard. For matching syndrome call-visit pairs, the lag (i.e., the number of hours that elapsed between the date and time the patient spoke to an advice nurse and the date and time the patient made an office visit) was calculated. Of all syndrome groups, the sensitivity of hotline calls for respiratory syndrome was highest (74.7%), followed by hotline calls for gastrointestinal syndrome (72.0%). The specificity of all nurse advice syndrome groups ranged from 88.9% to 99.9%. The mean lag between hotline calls and office visits ranged from 8.3 to 50 hours, depending on the syndrome group. The timeliness of hotline data capture compared with office visit data capture, as well as the sensitivity and specificity of hotline calls for detecting respiratory and gastrointestinal syndromes, indicate that KPMAS nurse advice hotline data can be used to predict KPMAS syndromic outpatient office visits.
[Assessment of cervical intraepithelial neoplasia (CIN) lesions by DNA image cytometry].
Sun, Xiao-rong; Che, Dong-yuan; Tu, Hong-zhang; Li, Dan; Wang, Jian
2006-11-01
To compare the value of conventional cytology and DNA image cytometry (DNA-ICM) assisted cytology in detection and prognostic assessment of cervical CIN lesions. 87 women were enrolled in this study. Cervical samples were collected employing cervix brushes which were then washed in Sedfix. After preparing single cell suspensions by mechanical procedure, cell monolayers were prepared by cyto-spinning the cells onto microscope slides. Two slides were prepared from each case: one slide was stained by Papanicolou staining for conventional cytology, another was stained by Feulgen-Thionin method for measurements of the amount of DNA in the cell nuclei using an automated DNA imaging cytometer. Biopsies from the cervical lesions were also taken for histopathology and Ki-67 immunohistochemistry. Of the total of 20 ASCUS cases called by conventional cytology, no CIN, nor greater lesions were found. Among the 20 cases, 7 cases did not show any cells with DNA amount greater than 5c, while CIN2 lesions were found in 11 of other 13 cases that had some aneuploid cells with DNA amount greater than 5c. Of 30 LSIL cases called by conventional cytology, CIN2 lesions were detected in 3 out of 7 cases that did not contain any aneuploid cells with DNA greater than 5c, but in 22 out of the other 23 cases that contained aneuploid cells with DNA amount greater than > 5c. Of the remaining 7 cases called HSIL by conventional cytology, all case contained aneuploid cells containing DNA greater than 5c. If cytology was used to refer all cases of LSIL and HSIL to colposcopy procedure to detect potential CIN2 or greater lesions, the sensitivity, specificity, positive predictive value and negative predictive value were 58.2%, 84.4%, 86.5% and 54.0%, respectively. If DNA-ICM were used and all cases having 3 or more cells with a DNA amount greater than 5c were assessed to be referred to pathology to detect potential CIN2 or greater lesions, the sensitivity, specificity, positive predictive value and negative predictive were 72.7% , 87.5%, 90.9% and 65.1%, respectively. We also compared Ki67 positive cells in these samples and found that DNA-ICM results were comparable to this biomarker method. The study demonstrated that DNA-ICM approach can be successfully used to detect significant (i.e. CIN2 or greater) lesions, and also provide a prognostic assessment of CIN lesions.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-05-21
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.
NASA Astrophysics Data System (ADS)
Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.
2017-05-01
Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.
Zhou, Hongyi; Skolnick, Jeffrey
2010-01-01
In this work, we develop a method called FTCOM for assessing the global quality of protein structural models for targets of medium and hard difficulty (remote homology) produced by structure prediction approaches such as threading or ab initio structure prediction. FTCOM requires the Cα coordinates of full length models and assesses model quality based on fragment comparison and a score derived from comparison of the model to top threading templates. On a set of 361 medium/hard targets, FTCOM was applied to and assessed for its ability to improve upon the results from the SP3, SPARKS, PROSPECTOR_3, and PRO-SP3-TASSER threading algorithms. The average TM-score improves by 5%–10% for the first selected model by the new method over models obtained by the original selection procedure in the respective threading methods. Moreover the number of foldable targets (TM-score ≥0.4) increases from least 7.6% for SP3 to 54% for SPARKS. Thus, FTCOM is a promising approach to template selection. PMID:20455261
Cejka, Pavel; Culík, Jiří; Horák, Tomáš; Jurková, Marie; Olšovská, Jana
2013-12-26
The rate of beer aging is affected by storage conditions including largely time and temperature. Although bottled beer is commonly stored for up to 1 year, sensorial damage of it is quite frequent. Therefore, a method for retrospective determination of temperature of stored beer was developed. The method is based on the determination of selected carbonyl compounds called as "aging indicators", which are formed during beer aging. The aging indicators were determined using GC-MS after precolumn derivatization with O-(2,3,4,5,6-pentaflourobenzyl)hydroxylamine hydrochloride, and their profile was correlated with the development of old flavor evolving under defined conditions (temperature, time) using both a mathematical and statistical apparatus. Three approaches, including calculation from regression graph, multiple linear regression, and neural networks, were employed. The ultimate uncertainty of the method ranged from 3.0 to 11.0 °C depending on the approach used. Furthermore, the assay was extended to include prediction of beer tendency to sensory aging from freshly bottled beer.
Discrete Event-based Performance Prediction for Temperature Accelerated Dynamics
NASA Astrophysics Data System (ADS)
Junghans, Christoph; Mniszewski, Susan; Voter, Arthur; Perez, Danny; Eidenbenz, Stephan
2014-03-01
We present an example of a new class of tools that we call application simulators, parameterized fast-running proxies of large-scale scientific applications using parallel discrete event simulation (PDES). We demonstrate our approach with a TADSim application simulator that models the Temperature Accelerated Dynamics (TAD) method, which is an algorithmically complex member of the Accelerated Molecular Dynamics (AMD) family. The essence of the TAD application is captured without the computational expense and resource usage of the full code. We use TADSim to quickly characterize the runtime performance and algorithmic behavior for the otherwise long-running simulation code. We further extend TADSim to model algorithm extensions to standard TAD, such as speculative spawning of the compute-bound stages of the algorithm, and predict performance improvements without having to implement such a method. Focused parameter scans have allowed us to study algorithm parameter choices over far more scenarios than would be possible with the actual simulation. This has led to interesting performance-related insights into the TAD algorithm behavior and suggested extensions to the TAD method.
Sufficient Forecasting Using Factor Models
Fan, Jianqing; Xue, Lingzhou; Yao, Jiawei
2017-01-01
We consider forecasting a single time series when there is a large number of predictors and a possible nonlinear effect. The dimensionality was first reduced via a high-dimensional (approximate) factor model implemented by the principal component analysis. Using the extracted factors, we develop a novel forecasting method called the sufficient forecasting, which provides a set of sufficient predictive indices, inferred from high-dimensional predictors, to deliver additional predictive power. The projected principal component analysis will be employed to enhance the accuracy of inferred factors when a semi-parametric (approximate) factor model is assumed. Our method is also applicable to cross-sectional sufficient regression using extracted factors. The connection between the sufficient forecasting and the deep learning architecture is explicitly stated. The sufficient forecasting correctly estimates projection indices of the underlying factors even in the presence of a nonparametric forecasting function. The proposed method extends the sufficient dimension reduction to high-dimensional regimes by condensing the cross-sectional information through factor models. We derive asymptotic properties for the estimate of the central subspace spanned by these projection directions as well as the estimates of the sufficient predictive indices. We further show that the natural method of running multiple regression of target on estimated factors yields a linear estimate that actually falls into this central subspace. Our method and theory allow the number of predictors to be larger than the number of observations. We finally demonstrate that the sufficient forecasting improves upon the linear forecasting in both simulation studies and an empirical study of forecasting macroeconomic variables. PMID:29731537
Dai, Zongli; Zhao, Aiwu; He, Jie
2018-01-01
In this paper, we propose a hybrid method to forecast the stock prices called High-order-fuzzy-fluctuation-Trends-based Back Propagation(HTBP)Neural Network model. First, we compare each value of the historical training data with the previous day's value to obtain a fluctuation trend time series (FTTS). On this basis, the FTTS blur into fuzzy time series (FFTS) based on the fluctuation of the increasing, equality, decreasing amplitude and direction. Since the relationship between FFTS and future wave trends is nonlinear, the HTBP neural network algorithm is used to find the mapping rules in the form of self-learning. Finally, the results of the algorithm output are used to predict future fluctuations. The proposed model provides some innovative features:(1)It combines fuzzy set theory and neural network algorithm to avoid overfitting problems existed in traditional models. (2)BP neural network algorithm can intelligently explore the internal rules of the actual existence of sequential data, without the need to analyze the influence factors of specific rules and the path of action. (3)The hybrid modal can reasonably remove noises from the internal rules by proper fuzzy treatment. This paper takes the TAIEX data set of Taiwan stock exchange as an example, and compares and analyzes the prediction performance of the model. The experimental results show that this method can predict the stock market in a very simple way. At the same time, we use this method to predict the Shanghai stock exchange composite index, and further verify the effectiveness and universality of the method. PMID:29420584
Guan, Hongjun; Dai, Zongli; Zhao, Aiwu; He, Jie
2018-01-01
In this paper, we propose a hybrid method to forecast the stock prices called High-order-fuzzy-fluctuation-Trends-based Back Propagation(HTBP)Neural Network model. First, we compare each value of the historical training data with the previous day's value to obtain a fluctuation trend time series (FTTS). On this basis, the FTTS blur into fuzzy time series (FFTS) based on the fluctuation of the increasing, equality, decreasing amplitude and direction. Since the relationship between FFTS and future wave trends is nonlinear, the HTBP neural network algorithm is used to find the mapping rules in the form of self-learning. Finally, the results of the algorithm output are used to predict future fluctuations. The proposed model provides some innovative features:(1)It combines fuzzy set theory and neural network algorithm to avoid overfitting problems existed in traditional models. (2)BP neural network algorithm can intelligently explore the internal rules of the actual existence of sequential data, without the need to analyze the influence factors of specific rules and the path of action. (3)The hybrid modal can reasonably remove noises from the internal rules by proper fuzzy treatment. This paper takes the TAIEX data set of Taiwan stock exchange as an example, and compares and analyzes the prediction performance of the model. The experimental results show that this method can predict the stock market in a very simple way. At the same time, we use this method to predict the Shanghai stock exchange composite index, and further verify the effectiveness and universality of the method.
Ramirez, Magaly; Wu, Shinyi; Jin, Haomiao; Ell, Kathleen; Gross-Schulman, Sandra; Myerchin Sklaroff, Laura; Guterman, Jeffrey
2016-01-25
Remote patient monitoring is increasingly integrated into health care delivery to expand access and increase effectiveness. Automation can add efficiency to remote monitoring, but patient acceptance of automated tools is critical for success. From 2010 to 2013, the Diabetes-Depression Care-management Adoption Trial (DCAT)-a quasi-experimental comparative effectiveness research trial aimed at accelerating the adoption of collaborative depression care in a safety-net health care system-tested a fully automated telephonic assessment (ATA) depression monitoring system serving low-income patients with diabetes. The aim of this study was to determine patient acceptance of ATA calls over time, and to identify factors predicting long-term patient acceptance of ATA calls. We conducted two analyses using data from the DCAT technology-facilitated care arm, in which for 12 months the ATA system periodically assessed depression symptoms, monitored treatment adherence, prompted self-care behaviors, and inquired about patients' needs for provider contact. Patients received assessments at 6, 12, and 18 months using Likert-scale measures of willingness to use ATA calls, preferred mode of reach, perceived ease of use, usefulness, nonintrusiveness, privacy/security, and long-term usefulness. For the first analysis (patient acceptance over time), we computed descriptive statistics of these measures. In the second analysis (predictive factors), we collapsed patients into two groups: those reporting "high" versus "low" willingness to use ATA calls. To compare them, we used independent t tests for continuous variables and Pearson chi-square tests for categorical variables. Next, we jointly entered independent factors found to be significantly associated with 18-month willingness to use ATA calls at the univariate level into a logistic regression model with backward selection to identify predictive factors. We performed a final logistic regression model with the identified significant predictive factors and reported the odds ratio estimates and 95% confidence intervals. At 6 and 12 months, respectively, 89.6% (69/77) and 63.7% (49/77) of patients "agreed" or "strongly agreed" that they would be willing to use ATA calls in the future. At 18 months, 51.0% (64/125) of patients perceived ATA calls as useful and 59.7% (46/77) were willing to use the technology. Moreover, in the first 6 months, most patients reported that ATA calls felt private/secure (75.9%, 82/108) and were easy to use (86.2%, 94/109), useful (65.1%, 71/109), and nonintrusive (87.2%, 95/109). Perceived usefulness, however, decreased to 54.1% (59/109) in the second 6 months of the trial. Factors predicting willingness to use ATA calls at the 18-month follow-up were perceived privacy/security and long-term perceived usefulness of ATA calls. No patient characteristics were significant predictors of long-term acceptance. In the short term, patients are generally accepting of ATA calls for depression monitoring, with ATA call design and the care management intervention being primary factors influencing patient acceptance. Acceptance over the long term requires that the system be perceived as private/secure, and that it be constantly useful for patients' needs of awareness of feelings, self-care reminders, and connectivity with health care providers. ClinicalTrials.gov NCT01781013; https://clinicaltrials.gov/ct2/show/NCT01781013 (Archived by WebCite at http://www.webcitation.org/6e7NGku56).
Using Poison Center Exposure Calls to Predict Methadone Poisoning Deaths
Dasgupta, Nabarun; Davis, Jonathan; Jonsson Funk, Michele; Dart, Richard
2012-01-01
Purpose There are more drug overdose deaths in the Untied States than motor vehicle fatalities. Yet the US vital statistics reporting system is of limited value because the data are delayed by four years. Poison centers report data within an hour of the event, but previous studies suggested a small proportion of poisoning deaths are reported to poison centers (PC). In an era of improved electronic surveillance capabilities, exposure calls to PCs may be an alternate indicator of trends in overdose mortality. Methods We used PC call counts for methadone that were reported to the Researched Abuse, Diversion and Addiction-Related Surveillance (RADARS®) System in 2006 and 2007. US death certificate data were used to identify deaths due to methadone. Linear regression was used to quantify the relationship of deaths and poison center calls. Results Compared to decedents, poison center callers tended to be younger, more often female, at home and less likely to require medical attention. A strong association was found with PC calls and methadone mortality (b = 0.88, se = 0.42, t = 9.5, df = 1, p<0.0001, R2 = 0.77). These findings were robust to large changes in a sensitivity analysis assessing the impact of underreporting of methadone overdose deaths. Conclusions Our results suggest that calls to poison centers for methadone are correlated with poisoning mortality as identified on death certificates. Calls received by poison centers may be used for timely surveillance of mortality due to methadone. In the midst of the prescription opioid overdose epidemic, electronic surveillance tools that report in real-time are powerful public health tools. PMID:22829925
Inter-species pathway perturbation prediction via data-driven detection of functional homology.
Hafemeister, Christoph; Romero, Roberto; Bilal, Erhan; Meyer, Pablo; Norel, Raquel; Rhrissorrakrai, Kahn; Bonneau, Richard; Tarca, Adi L
2015-02-15
Experiments in animal models are often conducted to infer how humans will respond to stimuli by assuming that the same biological pathways will be affected in both organisms. The limitations of this assumption were tested in the IMPROVER Species Translation Challenge, where 52 stimuli were applied to both human and rat cells and perturbed pathways were identified. In the Inter-species Pathway Perturbation Prediction sub-challenge, multiple teams proposed methods to use rat transcription data from 26 stimuli to predict human gene set and pathway activity under the same perturbations. Submissions were evaluated using three performance metrics on data from the remaining 26 stimuli. We present two approaches, ranked second in this challenge, that do not rely on sequence-based orthology between rat and human genes to translate pathway perturbation state but instead identify transcriptional response orthologs across a set of training conditions. The translation from rat to human accomplished by these so-called direct methods is not dependent on the particular analysis method used to identify perturbed gene sets. In contrast, machine learning-based methods require performing a pathway analysis initially and then mapping the pathway activity between organisms. Unlike most machine learning approaches, direct methods can be used to predict the activation of a human pathway for a new (test) stimuli, even when that pathway was never activated by a training stimuli. Gene expression data are available from ArrayExpress (accession E-MTAB-2091), while software implementations are available from http://bioinformaticsprb.med.wayne.edu?p=50 and http://goo.gl/hJny3h. christoph.hafemeister@nyu.edu or atarca@med.wayne.edu. Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.; ...
2016-11-24
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.
Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Bayesian assessment of the expected data impact on prediction confidence in optimal sampling design
NASA Astrophysics Data System (ADS)
Leube, P. C.; Geiges, A.; Nowak, W.
2012-02-01
Incorporating hydro(geo)logical data, such as head and tracer data, into stochastic models of (subsurface) flow and transport helps to reduce prediction uncertainty. Because of financial limitations for investigation campaigns, information needs toward modeling or prediction goals should be satisfied efficiently and rationally. Optimal design techniques find the best one among a set of investigation strategies. They optimize the expected impact of data on prediction confidence or related objectives prior to data collection. We introduce a new optimal design method, called PreDIA(gnosis) (Preposterior Data Impact Assessor). PreDIA derives the relevant probability distributions and measures of data utility within a fully Bayesian, generalized, flexible, and accurate framework. It extends the bootstrap filter (BF) and related frameworks to optimal design by marginalizing utility measures over the yet unknown data values. PreDIA is a strictly formal information-processing scheme free of linearizations. It works with arbitrary simulation tools, provides full flexibility concerning measurement types (linear, nonlinear, direct, indirect), allows for any desired task-driven formulations, and can account for various sources of uncertainty (e.g., heterogeneity, geostatistical assumptions, boundary conditions, measurement values, model structure uncertainty, a large class of model errors) via Bayesian geostatistics and model averaging. Existing methods fail to simultaneously provide these crucial advantages, which our method buys at relatively higher-computational costs. We demonstrate the applicability and advantages of PreDIA over conventional linearized methods in a synthetic example of subsurface transport. In the example, we show that informative data is often invisible for linearized methods that confuse zero correlation with statistical independence. Hence, PreDIA will often lead to substantially better sampling designs. Finally, we extend our example to specifically highlight the consideration of conceptual model uncertainty.
Schaffter, Thomas; Marbach, Daniel; Floreano, Dario
2011-08-15
Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5). GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data. Supplementary data are available at Bioinformatics online. dario.floreano@epfl.ch.
A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction
Spencer, Matt; Eickholt, Jesse; Cheng, Jianlin
2014-01-01
Ab initio protein secondary structure (SS) predictions are utilized to generate tertiary structure predictions, which are increasingly demanded due to the rapid discovery of proteins. Although recent developments have slightly exceeded previous methods of SS prediction, accuracy has stagnated around 80% and many wonder if prediction cannot be advanced beyond this ceiling. Disciplines that have traditionally employed neural networks are experimenting with novel deep learning techniques in attempts to stimulate progress. Since neural networks have historically played an important role in SS prediction, we wanted to determine whether deep learning could contribute to the advancement of this field as well. We developed an SS predictor that makes use of the position-specific scoring matrix generated by PSI-BLAST and deep learning network architectures, which we call DNSS. Graphical processing units and CUDA software optimize the deep network architecture and efficiently train the deep networks. Optimal parameters for the training process were determined, and a workflow comprising three separately trained deep networks was constructed in order to make refined predictions. This deep learning network approach was used to predict SS for a fully independent test data set of 198 proteins, achieving a Q3 accuracy of 80.7% and a Sov accuracy of 74.2%. PMID:25750595
A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction.
Spencer, Matt; Eickholt, Jesse; Jianlin Cheng
2015-01-01
Ab initio protein secondary structure (SS) predictions are utilized to generate tertiary structure predictions, which are increasingly demanded due to the rapid discovery of proteins. Although recent developments have slightly exceeded previous methods of SS prediction, accuracy has stagnated around 80 percent and many wonder if prediction cannot be advanced beyond this ceiling. Disciplines that have traditionally employed neural networks are experimenting with novel deep learning techniques in attempts to stimulate progress. Since neural networks have historically played an important role in SS prediction, we wanted to determine whether deep learning could contribute to the advancement of this field as well. We developed an SS predictor that makes use of the position-specific scoring matrix generated by PSI-BLAST and deep learning network architectures, which we call DNSS. Graphical processing units and CUDA software optimize the deep network architecture and efficiently train the deep networks. Optimal parameters for the training process were determined, and a workflow comprising three separately trained deep networks was constructed in order to make refined predictions. This deep learning network approach was used to predict SS for a fully independent test dataset of 198 proteins, achieving a Q3 accuracy of 80.7 percent and a Sov accuracy of 74.2 percent.
Beluga whale, Delphinapterus leucas, vocalizations from the Churchill River, Manitoba, Canada.
Chmelnitsky, Elly G; Ferguson, Steven H
2012-06-01
Classification of animal vocalizations is often done by a human observer using aural and visual analysis but more efficient, automated methods have also been utilized to reduce bias and increase reproducibility. Beluga whale, Delphinapterus leucas, calls were described from recordings collected in the summers of 2006-2008, in the Churchill River, Manitoba. Calls (n=706) were classified based on aural and visual analysis, and call characteristics were measured; calls were separated into 453 whistles (64.2%; 22 types), 183 pulsed∕noisy calls (25.9%; 15 types), and 70 combined calls (9.9%; seven types). Measured parameters varied within each call type but less variation existed in pulsed and noisy call types and some combined call types than in whistles. A more efficient and repeatable hierarchical clustering method was applied to 200 randomly chosen whistles using six call characteristics as variables; twelve groups were identified. Call characteristics varied less in cluster analysis groups than in whistle types described by visual and aural analysis and results were similar to the whistle contours described. This study provided the first description of beluga calls in Hudson Bay and using two methods provides more robust interpretations and an assessment of appropriate methods for future studies.
When is an error not a prediction error? An electrophysiological investigation.
Holroyd, Clay B; Krigolson, Olave E; Baker, Robert; Lee, Seung; Gibson, Jessica
2009-03-01
A recent theory holds that the anterior cingulate cortex (ACC) uses reinforcement learning signals conveyed by the midbrain dopamine system to facilitate flexible action selection. According to this position, the impact of reward prediction error signals on ACC modulates the amplitude of a component of the event-related brain potential called the error-related negativity (ERN). The theory predicts that ERN amplitude is monotonically related to the expectedness of the event: It is larger for unexpected outcomes than for expected outcomes. However, a recent failure to confirm this prediction has called the theory into question. In the present article, we investigated this discrepancy in three trial-and-error learning experiments. All three experiments provided support for the theory, but the effect sizes were largest when an optimal response strategy could actually be learned. This observation suggests that ACC utilizes dopamine reward prediction error signals for adaptive decision making when the optimal behavior is, in fact, learnable.
Hard-Rock Stability Analysis for Span Design in Entry-Type Excavations with Learning Classifiers
García-Gonzalo, Esperanza; Fernández-Muñiz, Zulima; García Nieto, Paulino José; Bernardo Sánchez, Antonio; Menéndez Fernández, Marta
2016-01-01
The mining industry relies heavily on empirical analysis for design and prediction. An empirical design method, called the critical span graph, was developed specifically for rock stability analysis in entry-type excavations, based on an extensive case-history database of cut and fill mining in Canada. This empirical span design chart plots the critical span against rock mass rating for the observed case histories and has been accepted by many mining operations for the initial span design of cut and fill stopes. Different types of analysis have been used to classify the observed cases into stable, potentially unstable and unstable groups. The main purpose of this paper is to present a new method for defining rock stability areas of the critical span graph, which applies machine learning classifiers (support vector machine and extreme learning machine). The results show a reasonable correlation with previous guidelines. These machine learning methods are good tools for developing empirical methods, since they make no assumptions about the regression function. With this software, it is easy to add new field observations to a previous database, improving prediction output with the addition of data that consider the local conditions for each mine. PMID:28773653
Hard-Rock Stability Analysis for Span Design in Entry-Type Excavations with Learning Classifiers.
García-Gonzalo, Esperanza; Fernández-Muñiz, Zulima; García Nieto, Paulino José; Bernardo Sánchez, Antonio; Menéndez Fernández, Marta
2016-06-29
The mining industry relies heavily on empirical analysis for design and prediction. An empirical design method, called the critical span graph, was developed specifically for rock stability analysis in entry-type excavations, based on an extensive case-history database of cut and fill mining in Canada. This empirical span design chart plots the critical span against rock mass rating for the observed case histories and has been accepted by many mining operations for the initial span design of cut and fill stopes. Different types of analysis have been used to classify the observed cases into stable, potentially unstable and unstable groups. The main purpose of this paper is to present a new method for defining rock stability areas of the critical span graph, which applies machine learning classifiers (support vector machine and extreme learning machine). The results show a reasonable correlation with previous guidelines. These machine learning methods are good tools for developing empirical methods, since they make no assumptions about the regression function. With this software, it is easy to add new field observations to a previous database, improving prediction output with the addition of data that consider the local conditions for each mine.
Impact of cyclostationarity on fan broadband noise prediction
NASA Astrophysics Data System (ADS)
Wohlbrandt, A.; Kissner, C.; Guérin, S.
2018-04-01
One of the dominant noise sources of modern Ultra High Bypass Ratio (UHBR) engines is the interaction of the rotor wakes with the leading edges of the stator vanes in the fan stage. While the tonal components of this noise generation mechanism are fairly well understood by now, the broadband components are not. This calls to further the understanding of the broadband noise generation in the fan stage. This article introduces a new extension to the Random Particle Mesh (RPM) method, which accommodates in-depth studies of the impact of cyclostationary wake characteristics on the broadband noise in the fan stage. The RPM method is used to synthesize a turbulence field in the stator domain using a URANS simulation characterized by time-periodic turbulence and mean flow. The rotor-stator interaction noise is predicted by a two-dimensional CAA computation of the stator cascade. The impact of cyclostationarity is decomposed into various effects, which are separately investigated. This leads to the finding that the periodic turbulent kinetic energy (TKE) and periodic flow have only a negligible effect on the radiated sound power. The impact of the periodic integral length scale (TLS) is, however, substantial. The limits of a stationary representation of the TLS are demonstrated making this new extension to the RPM method indispensable when background and wake TKE are of comparable level. Good agreement of the predictions with measurements obtained from the 2015 AIAA Fan Broadband Noise Prediction Workshop are also shown.
An Adaptive Prediction-Based Approach to Lossless Compression of Floating-Point Volume Data.
Fout, N; Ma, Kwan-Liu
2012-12-01
In this work, we address the problem of lossless compression of scientific and medical floating-point volume data. We propose two prediction-based compression methods that share a common framework, which consists of a switched prediction scheme wherein the best predictor out of a preset group of linear predictors is selected. Such a scheme is able to adapt to different datasets as well as to varying statistics within the data. The first method, called APE (Adaptive Polynomial Encoder), uses a family of structured interpolating polynomials for prediction, while the second method, which we refer to as ACE (Adaptive Combined Encoder), combines predictors from previous work with the polynomial predictors to yield a more flexible, powerful encoder that is able to effectively decorrelate a wide range of data. In addition, in order to facilitate efficient visualization of compressed data, our scheme provides an option to partition floating-point values in such a way as to provide a progressive representation. We compare our two compressors to existing state-of-the-art lossless floating-point compressors for scientific data, with our data suite including both computer simulations and observational measurements. The results demonstrate that our polynomial predictor, APE, is comparable to previous approaches in terms of speed but achieves better compression rates on average. ACE, our combined predictor, while somewhat slower, is able to achieve the best compression rate on all datasets, with significantly better rates on most of the datasets.
An, Ji-Yong; You, Zhu-Hong; Meng, Fan-Rong; Xu, Shu-Juan; Wang, Yin
2016-05-18
Protein-Protein Interactions (PPIs) play essential roles in most cellular processes. Knowledge of PPIs is becoming increasingly more important, which has prompted the development of technologies that are capable of discovering large-scale PPIs. Although many high-throughput biological technologies have been proposed to detect PPIs, there are unavoidable shortcomings, including cost, time intensity, and inherently high false positive and false negative rates. For the sake of these reasons, in silico methods are attracting much attention due to their good performances in predicting PPIs. In this paper, we propose a novel computational method known as RVM-AB that combines the Relevance Vector Machine (RVM) model and Average Blocks (AB) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the AB feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We performed five-fold cross-validation experiments on yeast and Helicobacter pylori datasets, and achieved very high accuracies of 92.98% and 95.58% respectively, which is significantly better than previous works. In addition, we also obtained good prediction accuracies of 88.31%, 89.46%, 91.08%, 91.55%, and 94.81% on other five independent datasets C. elegans, M. musculus, H. sapiens, H. pylori, and E. coli for cross-species prediction. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-AB method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool. To facilitate extensive studies for future proteomics research, we developed a freely available web server called RVMAB-PPI in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/ppi_ab/.
Simultaneous fitting of genomic-BLUP and Bayes-C components in a genomic prediction model.
Iheshiulor, Oscar O M; Woolliams, John A; Svendsen, Morten; Solberg, Trygve; Meuwissen, Theo H E
2017-08-24
The rapid adoption of genomic selection is due to two key factors: availability of both high-throughput dense genotyping and statistical methods to estimate and predict breeding values. The development of such methods is still ongoing and, so far, there is no consensus on the best approach. Currently, the linear and non-linear methods for genomic prediction (GP) are treated as distinct approaches. The aim of this study was to evaluate the implementation of an iterative method (called GBC) that incorporates aspects of both linear [genomic-best linear unbiased prediction (G-BLUP)] and non-linear (Bayes-C) methods for GP. The iterative nature of GBC makes it less computationally demanding similar to other non-Markov chain Monte Carlo (MCMC) approaches. However, as a Bayesian method, GBC differs from both MCMC- and non-MCMC-based methods by combining some aspects of G-BLUP and Bayes-C methods for GP. Its relative performance was compared to those of G-BLUP and Bayes-C. We used an imputed 50 K single-nucleotide polymorphism (SNP) dataset based on the Illumina Bovine50K BeadChip, which included 48,249 SNPs and 3244 records. Daughter yield deviations for somatic cell count, fat yield, milk yield, and protein yield were used as response variables. GBC was frequently (marginally) superior to G-BLUP and Bayes-C in terms of prediction accuracy and was significantly better than G-BLUP only for fat yield. On average across the four traits, GBC yielded a 0.009 and 0.006 increase in prediction accuracy over G-BLUP and Bayes-C, respectively. Computationally, GBC was very much faster than Bayes-C and similar to G-BLUP. Our results show that incorporating some aspects of G-BLUP and Bayes-C in a single model can improve accuracy of GP over the commonly used method: G-BLUP. Generally, GBC did not statistically perform better than G-BLUP and Bayes-C, probably due to the close relationships between reference and validation individuals. Nevertheless, it is a flexible tool, in the sense, that it simultaneously incorporates some aspects of linear and non-linear models for GP, thereby exploiting family relationships while also accounting for linkage disequilibrium between SNPs and genes with large effects. The application of GBC in GP merits further exploration.
The design of a joined wing flight demonstrator aircraft
NASA Technical Reports Server (NTRS)
Smith, S. C.; Cliff, S. E.; Kroo, I. M.
1987-01-01
A joined-wing flight demonstrator aircraft has been developed at the NASA Ames Research Center in collaboration with ACA Industries. The aircraft is designed to utilize the fuselage, engines, and undercarriage of the existing NASA AD-1 flight demonstrator aircraft. The design objectives, methods, constraints, and the resulting aircraft design, called the JW-1, are presented. A wind-tunnel model of the JW-1 was tested in the NASA Ames 12-foot wind tunnel. The test results indicate that the JW-1 has satisfactory flying qualities for a flight demonstrator aircraft. Good agreement of test results with design predictions confirmed the validity of the design methods used for application to joined-wing configurations.
Galilean invariant resummation schemes of cosmological perturbations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peloso, Marco; Pietroni, Massimo, E-mail: peloso@physics.umn.edu, E-mail: massimo.pietroni@unipr.it
2017-01-01
Many of the methods proposed so far to go beyond Standard Perturbation Theory break invariance under time-dependent boosts (denoted here as extended Galilean Invariance, or GI). This gives rise to spurious large scale effects which spoil the small scale predictions of these approximation schemes. By using consistency relations we derive fully non-perturbative constraints that GI imposes on correlation functions. We then introduce a method to quantify the amount of GI breaking of a given scheme, and to correct it by properly tailored counterterms. Finally, we formulate resummation schemes which are manifestly GI, discuss their general features, and implement them inmore » the so called Time-Flow, or TRG, equations.« less
V/STOL Aircraft Noise Prediction (Jet Propulsors)
1975-06-01
SPLLFTPPNoPNLC) 162 Pt4LFTT= PNLC A-11 STOLPROG 163 C* 164 C***** CALCULATE TOTAL OF FAN AND TURBINE NOISE **i 165 C* 166 SUM2=040 167 00 130 I~lo24 168 SUM1=0.0...0.0) OASPLA=10,O*ALOG10(SUM1) 194 CALL PNLREV(SPLAROPPNP2) 195 PNLARO:PN 196 CALL TONE(SPLAROPPNPPNLC) 197 PNLART= PNLC 198 141 CONTINUE 199 C* 20:) C...CALL PNLREV(SPLJETPPNP2) A-17 STOLPROG 217 PNLJET=PN 218 CALL TONE(SPLJETPP)JPNLC) 219 PNLTJT= PNLC 220 CALL PNLREV(SPLEXSPPNP2) 221 PNLEXS:PN 222
Frequency adaptive metadynamics for the calculation of rare-event kinetics
NASA Astrophysics Data System (ADS)
Wang, Yong; Valsson, Omar; Tiwary, Pratyush; Parrinello, Michele; Lindorff-Larsen, Kresten
2018-08-01
The ability to predict accurate thermodynamic and kinetic properties in biomolecular systems is of both scientific and practical utility. While both remain very difficult, predictions of kinetics are particularly difficult because rates, in contrast to free energies, depend on the route taken. For this reason, specific enhanced sampling methods are needed to calculate long-time scale kinetics. It has recently been demonstrated that it is possible to recover kinetics through the so-called "infrequent metadynamics" simulations, where the simulations are biased in a way that minimally corrupts the dynamics of moving between metastable states. This method, however, requires the bias to be added slowly, thus hampering applications to processes with only modest separations of time scales. Here we present a frequency-adaptive strategy which bridges normal and infrequent metadynamics. We show that this strategy can improve the precision and accuracy of rate calculations at fixed computational cost and should be able to extend rate calculations for much slower kinetic processes.
Parental Leave Legislation in the U.S. Senate: Toward a Model of Roll-Call Voting.
ERIC Educational Resources Information Center
Monroe, Pamela A.; Garand, James C.
1991-01-01
Developed and tested a model of roll-call voting by U.S. senators on a cloture motion relating to the Parental and Medical Leave Act of 1988. Senate roll-call voting was found to be dominated by policy liberalism and party, whereas the impact of contextual demand variables was relatively minor and indirect, and votes could be accurately predicted.…
A novel method to accelerate orthodontic tooth movement
Buyuk, S. Kutalmış; Yavuz, Mustafa C.; Genc, Esra; Sunar, Oguzhan
2018-01-01
This clinical case report presents fixed orthodontic treatment of a patient with moderately crowded teeth. It was performed with a new technique called ‘discision’. Discision method that was described for the first time by the present authors yielded predictable outcomes, and orthodontic treatment was completed in a short period of time. The total duration of orthodontic treatment was 4 months. Class I molar and canine relationships were established at the end of the treatment. Moreover, crowding in the mandible and maxilla was corrected, and optimal overjet and overbite were established. No scar tissue was observed in any gingival region on which discision was performed. The discision technique was developed as a minimally invasive alternative method to piezocision technique, and the authors suggest that this new method yields good outcomes in achieving rapid tooth movement. PMID:29436571
New computational tools for H/D determination in macromolecular structures from neutron data.
Siliqi, Dritan; Caliandro, Rocco; Carrozzini, Benedetta; Cascarano, Giovanni Luca; Mazzone, Annamaria
2010-11-01
Two new computational methods dedicated to neutron crystallography, called n-FreeLunch and DNDM-NDM, have been developed and successfully tested. The aim in developing these methods is to determine hydrogen and deuterium positions in macromolecular structures by using information from neutron density maps. Of particular interest is resolving cases in which the geometrically predicted hydrogen or deuterium positions are ambiguous. The methods are an evolution of approaches that are already applied in X-ray crystallography: extrapolation beyond the observed resolution (known as the FreeLunch procedure) and a difference electron-density modification (DEDM) technique combined with the electron-density modification (EDM) tool (known as DEDM-EDM). It is shown that the two methods are complementary to each other and are effective in finding the positions of H and D atoms in neutron density maps.
Prophetic Granger Causality to infer gene regulatory networks.
Carlin, Daniel E; Paull, Evan O; Graim, Kiley; Wong, Christopher K; Bivol, Adrian; Ryabinin, Peter; Ellrott, Kyle; Sokolov, Artem; Stuart, Joshua M
2017-01-01
We introduce a novel method called Prophetic Granger Causality (PGC) for inferring gene regulatory networks (GRNs) from protein-level time series data. The method uses an L1-penalized regression adaptation of Granger Causality to model protein levels as a function of time, stimuli, and other perturbations. When combined with a data-independent network prior, the framework outperformed all other methods submitted to the HPN-DREAM 8 breast cancer network inference challenge. Our investigations reveal that PGC provides complementary information to other approaches, raising the performance of ensemble learners, while on its own achieves moderate performance. Thus, PGC serves as a valuable new tool in the bioinformatics toolkit for analyzing temporal datasets. We investigate the general and cell-specific interactions predicted by our method and find several novel interactions, demonstrating the utility of the approach in charting new tumor wiring.
Prophetic Granger Causality to infer gene regulatory networks
Carlin, Daniel E.; Paull, Evan O.; Graim, Kiley; Wong, Christopher K.; Bivol, Adrian; Ryabinin, Peter; Ellrott, Kyle; Sokolov, Artem
2017-01-01
We introduce a novel method called Prophetic Granger Causality (PGC) for inferring gene regulatory networks (GRNs) from protein-level time series data. The method uses an L1-penalized regression adaptation of Granger Causality to model protein levels as a function of time, stimuli, and other perturbations. When combined with a data-independent network prior, the framework outperformed all other methods submitted to the HPN-DREAM 8 breast cancer network inference challenge. Our investigations reveal that PGC provides complementary information to other approaches, raising the performance of ensemble learners, while on its own achieves moderate performance. Thus, PGC serves as a valuable new tool in the bioinformatics toolkit for analyzing temporal datasets. We investigate the general and cell-specific interactions predicted by our method and find several novel interactions, demonstrating the utility of the approach in charting new tumor wiring. PMID:29211761
Cheng, Zhanzhan; Zhou, Shuigeng; Wang, Yang; Liu, Hui; Guan, Jihong; Chen, Yi-Ping Phoebe
2016-05-18
Prediction of compound-protein interactions (CPIs) is to find new compound-protein pairs where a protein is targeted by at least a compound, which is a crucial step in new drug design. Currently, a number of machine learning based methods have been developed to predict new CPIs in the literature. However, as there is not yet any publicly available set of validated negative CPIs, most existing machine learning based approaches use the unknown interactions (not validated CPIs) selected randomly as the negative examples to train classifiers for predicting new CPIs. Obviously, this is not quite reasonable and unavoidably impacts the CPI prediction performance. In this paper, we simply take the unknown CPIs as unlabeled examples, and propose a new method called PUCPI (the abbreviation of PU learning for Compound-Protein Interaction identification) that employs biased-SVM (Support Vector Machine) to predict CPIs using only positive and unlabeled examples. PU learning is a class of learning methods that leans from positive and unlabeled (PU) samples. To the best of our knowledge, this is the first work that identifies CPIs using only positive and unlabeled examples. We first collect known CPIs as positive examples and then randomly select compound-protein pairs not in the positive set as unlabeled examples. For each CPI/compound-protein pair, we extract protein domains as protein features and compound substructures as chemical features, then take the tensor product of the corresponding compound features and protein features as the feature vector of the CPI/compound-protein pair. After that, biased-SVM is employed to train classifiers on different datasets of CPIs and compound-protein pairs. Experiments over various datasets show that our method outperforms six typical classifiers, including random forest, L1- and L2-regularized logistic regression, naive Bayes, SVM and k-nearest neighbor (kNN), and three types of existing CPI prediction models. Source code, datasets and related documents of PUCPI are available at: http://admis.fudan.edu.cn/projects/pucpi.html.
Prediction of specialty coffee cup quality based on near infrared spectra of green coffee beans.
Tolessa, Kassaye; Rademaker, Michael; De Baets, Bernard; Boeckx, Pascal
2016-04-01
The growing global demand for specialty coffee increases the need for improved coffee quality assessment methods. Green bean coffee quality analysis is usually carried out by physical (e.g. black beans, immature beans) and cup quality (e.g. acidity, flavour) evaluation. However, these evaluation methods are subjective, costly, time consuming, require sample preparation and may end up in poor grading systems. This calls for the development of a rapid, low-cost, reliable and reproducible analytical method to evaluate coffee quality attributes and eventually chemical compounds of interest (e.g. chlorogenic acid) in coffee beans. The aim of this study was to develop a model able to predict coffee cup quality based on NIR spectra of green coffee beans. NIR spectra of 86 samples of green Arabica beans of varying quality were analysed. Partial least squares (PLS) regression method was used to develop a model correlating spectral data to cupping score data (cup quality). The selected PLS model had a good predictive power for total specialty cup quality and its individual quality attributes (overall cup preference, acidity, body and aftertaste) showing a high correlation coefficient with r-values of 90, 90,78, 72 and 72, respectively, between measured and predicted cupping scores for 20 out of 86 samples. The corresponding root mean square error of prediction (RMSEP) was 1.04, 0.22, 0.27, 0.24 and 0.27 for total specialty cup quality, overall cup preference, acidity, body and aftertaste, respectively. The results obtained suggest that NIR spectra of green coffee beans are a promising tool for fast and accurate prediction of coffee quality and for classifying green coffee beans into different specialty grades. However, the model should be further tested for coffee samples from different regions in Ethiopia and test if one generic or region-specific model should be developed. Copyright © 2015 Elsevier B.V. All rights reserved.
Bi-objective integer programming for RNA secondary structure prediction with pseudoknots.
Legendre, Audrey; Angel, Eric; Tahi, Fariza
2018-01-15
RNA structure prediction is an important field in bioinformatics, and numerous methods and tools have been proposed. Pseudoknots are specific motifs of RNA secondary structures that are difficult to predict. Almost all existing methods are based on a single model and return one solution, often missing the real structure. An alternative approach would be to combine different models and return a (small) set of solutions, maximizing its quality and diversity in order to increase the probability that it contains the real structure. We propose here an original method for predicting RNA secondary structures with pseudoknots, based on integer programming. We developed a generic bi-objective integer programming algorithm allowing to return optimal and sub-optimal solutions optimizing simultaneously two models. This algorithm was then applied to the combination of two known models of RNA secondary structure prediction, namely MEA and MFE. The resulting tool, called BiokoP, is compared with the other methods in the literature. The results show that the best solution (structure with the highest F 1 -score) is, in most cases, given by BiokoP. Moreover, the results of BiokoP are homogeneous, regardless of the pseudoknot type or the presence or not of pseudoknots. Indeed, the F 1 -scores are always higher than 70% for any number of solutions returned. The results obtained by BiokoP show that combining the MEA and the MFE models, as well as returning several optimal and several sub-optimal solutions, allow to improve the prediction of secondary structures. One perspective of our work is to combine better mono-criterion models, in particular to combine a model based on the comparative approach with the MEA and the MFE models. This leads to develop in the future a new multi-objective algorithm to combine more than two models. BiokoP is available on the EvryRNA platform: https://EvryRNA.ibisc.univ-evry.fr .
Blanche, Paul; Proust-Lima, Cécile; Loubère, Lucie; Berr, Claudine; Dartigues, Jean-François; Jacqmin-Gadda, Hélène
2015-03-01
Thanks to the growing interest in personalized medicine, joint modeling of longitudinal marker and time-to-event data has recently started to be used to derive dynamic individual risk predictions. Individual predictions are called dynamic because they are updated when information on the subject's health profile grows with time. We focus in this work on statistical methods for quantifying and comparing dynamic predictive accuracy of this kind of prognostic models, accounting for right censoring and possibly competing events. Dynamic area under the ROC curve (AUC) and Brier Score (BS) are used to quantify predictive accuracy. Nonparametric inverse probability of censoring weighting is used to estimate dynamic curves of AUC and BS as functions of the time at which predictions are made. Asymptotic results are established and both pointwise confidence intervals and simultaneous confidence bands are derived. Tests are also proposed to compare the dynamic prediction accuracy curves of two prognostic models. The finite sample behavior of the inference procedures is assessed via simulations. We apply the proposed methodology to compare various prediction models using repeated measures of two psychometric tests to predict dementia in the elderly, accounting for the competing risk of death. Models are estimated on the French Paquid cohort and predictive accuracies are evaluated and compared on the French Three-City cohort. © 2014, The International Biometric Society.
Psychometrics Matter in Health Behavior: A Long-term Reliability Generalization Study.
Pickett, Andrew C; Valdez, Danny; Barry, Adam E
2017-09-01
Despite numerous calls for increased understanding and reporting of reliability estimates, social science research, including the field of health behavior, has been slow to respond and adopt such practices. Therefore, we offer a brief overview of reliability and common reporting errors; we then perform analyses to examine and demonstrate the variability of reliability estimates by sample and over time. Using meta-analytic reliability generalization, we examined the variability of coefficient alpha scores for a well-designed, consistent, nationwide health study, covering a span of nearly 40 years. For each year and sample, reliability varied. Furthermore, reliability was predicted by a sample characteristic that differed among age groups within each administration. We demonstrated that reliability is influenced by the methods and individuals from which a given sample is drawn. Our work echoes previous calls that psychometric properties, particularly reliability of scores, are important and must be considered and reported before drawing statistical conclusions.
2010-11-01
Glo1 expression and anxiety -like behavior. PLoS One, 4 (3), e4649. Glyoxalase 1(Glo1) has been implicated in anxiety -like behavior in animal models...Glo1 expression and anxiety -like behavior in both inbred strain panels and outbred CD-1 mice. 12. Cirulli, E.T., Kasperavičiūtė, D., Attix, D.K...a new method for heart-rate variability ( HRV ) called CS-index. This index is the ratio of average cardio- intervals and standard cardio-intervals
Tadesse, Tsegaye; Brown, Jesslyn F.; Hayes, M.J.
2005-01-01
Droughts are normal climate episodes, yet they are among the most expensive natural disasters in the world. Knowledge about the timing, severity, and pattern of droughts on the landscape can be incorporated into effective planning and decision-making. In this study, we present a data mining approach to modeling vegetation stress due to drought and mapping its spatial extent during the growing season. Rule-based regression tree models were generated that identify relationships between satellite-derived vegetation conditions, climatic drought indices, and biophysical data, including land-cover type, available soil water capacity, percent of irrigated farm land, and ecological type. The data mining method builds numerical rule-based models that find relationships among the input variables. Because the models can be applied iteratively with input data from previous time periods, the method enables to provide predictions of vegetation conditions farther into the growing season based on earlier conditions. Visualizing the model outputs as mapped information (called VegPredict) provides a means to evaluate the model. We present prototype maps for the 2002 drought year for Nebraska and South Dakota and discuss potential uses for these maps.
Higher order alchemical derivatives from coupled perturbed self-consistent field theory.
Lesiuk, Michał; Balawender, Robert; Zachara, Janusz
2012-01-21
We present an analytical approach to treat higher order derivatives of Hartree-Fock (HF) and Kohn-Sham (KS) density functional theory energy in the Born-Oppenheimer approximation with respect to the nuclear charge distribution (so-called alchemical derivatives). Modified coupled perturbed self-consistent field theory is used to calculate molecular systems response to the applied perturbation. Working equations for the second and the third derivatives of HF/KS energy are derived. Similarly, analytical forms of the first and second derivatives of orbital energies are reported. The second derivative of Kohn-Sham energy and up to the third derivative of Hartree-Fock energy with respect to the nuclear charge distribution were calculated. Some issues of practical calculations, in particular the dependence of the basis set and Becke weighting functions on the perturbation, are considered. For selected series of isoelectronic molecules values of available alchemical derivatives were computed and Taylor series expansion was used to predict energies of the "surrounding" molecules. Predicted values of energies are in unexpectedly good agreement with the ones computed using HF/KS methods. Presented method allows one to predict orbital energies with the error less than 1% or even smaller for valence orbitals. © 2012 American Institute of Physics
Rios, Anthony; Kavuluru, Ramakanth
2017-11-01
The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) provided a set of 1000 neuropsychiatric notes to participants as part of a competition to predict psychiatric symptom severity scores. This paper summarizes our methods, results, and experiences based on our participation in the second track of the shared task. Classical methods of text classification usually fall into one of three problem types: binary, multi-class, and multi-label classification. In this effort, we study ordinal regression problems with text data where misclassifications are penalized differently based on how far apart the ground truth and model predictions are on the ordinal scale. Specifically, we present our entries (methods and results) in the N-GRID shared task in predicting research domain criteria (RDoC) positive valence ordinal symptom severity scores (absent, mild, moderate, and severe) from psychiatric notes. We propose a novel convolutional neural network (CNN) model designed to handle ordinal regression tasks on psychiatric notes. Broadly speaking, our model combines an ordinal loss function, a CNN, and conventional feature engineering (wide features) into a single model which is learned end-to-end. Given interpretability is an important concern with nonlinear models, we apply a recent approach called locally interpretable model-agnostic explanation (LIME) to identify important words that lead to instance specific predictions. Our best model entered into the shared task placed third among 24 teams and scored a macro mean absolute error (MMAE) based normalized score (100·(1-MMAE)) of 83.86. Since the competition, we improved our score (using basic ensembling) to 85.55, comparable with the winning shared task entry. Applying LIME to model predictions, we demonstrate the feasibility of instance specific prediction interpretation by identifying words that led to a particular decision. In this paper, we present a method that successfully uses wide features and an ordinal loss function applied to convolutional neural networks for ordinal text classification specifically in predicting psychiatric symptom severity scores. Our approach leads to excellent performance on the N-GRID shared task and is also amenable to interpretability using existing model-agnostic approaches. Copyright © 2017 Elsevier Inc. All rights reserved.
Run-Reversal Equilibrium for Clinical Trial Randomization
Grant, William C.
2015-01-01
In this paper, we describe a new restricted randomization method called run-reversal equilibrium (RRE), which is a Nash equilibrium of a game where (1) the clinical trial statistician chooses a sequence of medical treatments, and (2) clinical investigators make treatment predictions. RRE randomization counteracts how each investigator could observe treatment histories in order to forecast upcoming treatments. Computation of a run-reversal equilibrium reflects how the treatment history at a particular site is imperfectly correlated with the treatment imbalance for the overall trial. An attractive feature of RRE randomization is that treatment imbalance follows a random walk at each site, while treatment balance is tightly constrained and regularly restored for the overall trial. Less predictable and therefore more scientifically valid experiments can be facilitated by run-reversal equilibrium for multi-site clinical trials. PMID:26079608
Factors Associated with Clinician Participation in TF-CBT Post-workshop Training Components.
Pemberton, Joy R; Conners-Burrow, Nicola A; Sigel, Benjamin A; Sievers, Chad M; Stokes, Lauren D; Kramer, Teresa L
2017-07-01
For proficiency in an evidence-based treatment (EBT), mental health professionals (MHPs) need training activities extending beyond a one-time workshop. Using data from 178 MHPs participating in a statewide TF-CBT dissemination project, we used five variables assessed at the workshop, via multiple and logistic regression, to predict participation in three post-workshop training components. Perceived in-workshop learning and client-treatment mismatch were predictive of consultation call participation and case presentation respectively. Attitudes toward EBTs were predictive of trauma assessment utilization, although only with non-call participants removed from analysis. Productivity requirements and confidence in TF-CBT skills were not associated with participation in post-workshop activities.
Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan
2008-12-01
Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. In this paper, a novel building block of proteins called Top-n-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-n-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-n-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-n-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-n-grams and LSA gives significantly better results compared to related methods. The method based on Top-n-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-n-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.
Kinase Identification with Supervised Laplacian Regularized Least Squares
Zhang, He; Wang, Minghui
2015-01-01
Phosphorylation is catalyzed by protein kinases and is irreplaceable in regulating biological processes. Identification of phosphorylation sites with their corresponding kinases contributes to the understanding of molecular mechanisms. Mass spectrometry analysis of phosphor-proteomes generates a large number of phosphorylated sites. However, experimental methods are costly and time-consuming, and most phosphorylation sites determined by experimental methods lack kinase information. Therefore, computational methods are urgently needed to address the kinase identification problem. To this end, we propose a new kernel-based machine learning method called Supervised Laplacian Regularized Least Squares (SLapRLS), which adopts a new method to construct kernels based on the similarity matrix and minimizes both structure risk and overall inconsistency between labels and similarities. The results predicted using both Phospho.ELM and an additional independent test dataset indicate that SLapRLS can more effectively identify kinases compared to other existing algorithms. PMID:26448296
Kinase Identification with Supervised Laplacian Regularized Least Squares.
Li, Ao; Xu, Xiaoyi; Zhang, He; Wang, Minghui
2015-01-01
Phosphorylation is catalyzed by protein kinases and is irreplaceable in regulating biological processes. Identification of phosphorylation sites with their corresponding kinases contributes to the understanding of molecular mechanisms. Mass spectrometry analysis of phosphor-proteomes generates a large number of phosphorylated sites. However, experimental methods are costly and time-consuming, and most phosphorylation sites determined by experimental methods lack kinase information. Therefore, computational methods are urgently needed to address the kinase identification problem. To this end, we propose a new kernel-based machine learning method called Supervised Laplacian Regularized Least Squares (SLapRLS), which adopts a new method to construct kernels based on the similarity matrix and minimizes both structure risk and overall inconsistency between labels and similarities. The results predicted using both Phospho.ELM and an additional independent test dataset indicate that SLapRLS can more effectively identify kinases compared to other existing algorithms.
Germline contamination and leakage in whole genome somatic single nucleotide variant detection.
Sendorek, Dorota H; Caloian, Cristian; Ellrott, Kyle; Bare, J Christopher; Yamaguchi, Takafumi N; Ewing, Adam D; Houlahan, Kathleen E; Norman, Thea C; Margolin, Adam A; Stuart, Joshua M; Boutros, Paul C
2018-01-31
The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called "germline leakage". The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.
Genetic benefits of a female mating preference in gray tree frogs are context-dependent.
Welch, Allison M
2003-04-01
"Good genes" models of sexual selection predict that male courtship displays can advertise genetic quality and that, by mating with males with extreme displays, females can obtain genetic benefits for their offspring. However, because the relative performance of different genotypes can vary across environments, these genetic benefits may depend on the environmental context; in which case, static mating preferences may not be adaptive. To better understand how selection acts on the preference that female gray tree frogs (Hyla versicolor) express for long advertisement calls, I tested for genetic benefits in two realistic natural environments, by comparing the performance of half-sibling offspring sired by males with long versus short calls. Tadpoles from twelve such maternal half-sibships were raised in enclosures in their natal pond at two densities. In the low-density treatment, offspring of long-call males were larger at metamorphosis than were offspring of short-call males, whereas in the high-density treatment, offspring of males with long calls tended to metamorphose later than offspring of males with short calls. Thus, although the genes indicated by long calls were advantageous under low-density conditions, they were not beneficial under all conditions, suggesting that a static preference for long calls may not be adaptive in all environments. Such a genotype-by-environment interaction in the genetic consequences of mate choice predicts that when the environment is variable, selection may favor plasticity in female preferences or female selectivity among environments to control the conditions experienced by the offspring.
Classification of baseline toxicants for QSAR predictions to replace fish acute toxicity studies.
Nendza, Monika; Müller, Martin; Wenzel, Andrea
2017-03-22
Fish acute toxicity studies are required for environmental hazard and risk assessment of chemicals by national and international legislations such as REACH, the regulations of plant protection products and biocidal products, or the GHS (globally harmonised system) for classification and labelling of chemicals. Alternative methods like QSARs (quantitative structure-activity relationships) can replace many ecotoxicity tests. However, complete substitution of in vivo animal tests by in silico methods may not be realistic. For the so-called baseline toxicants, it is possible to predict the fish acute toxicity with sufficient accuracy from log K ow and, hence, valid QSARs can replace in vivo testing. In contrast, excess toxicants and chemicals not reliably classified as baseline toxicants require further in silico, in vitro or in vivo assessments. Thus, the critical task is to discriminate between baseline and excess toxicants. For fish acute toxicity, we derived a scheme based on structural alerts and physicochemical property thresholds to classify chemicals as either baseline toxicants (=predictable by QSARs) or as potential excess toxicants (=not predictable by baseline QSARs). The step-wise approach identifies baseline toxicants (true negatives) in a precautionary way to avoid false negative predictions. Therefore, a certain fraction of false positives can be tolerated, i.e. baseline toxicants without specific effects that may be tested instead of predicted. Application of the classification scheme to a new heterogeneous dataset for diverse fish species results in 40% baseline toxicants, 24% excess toxicants and 36% compounds not classified. Thus, we can conclude that replacing about half of the fish acute toxicity tests by QSAR predictions is realistic to be achieved in the short-term. The long-term goals are classification criteria also for further groups of toxicants and to replace as many in vivo fish acute toxicity tests as possible with valid QSAR predictions.
CaMELS: In silico prediction of calmodulin binding proteins and their binding sites.
Abbasi, Wajid Arshad; Asif, Amina; Andleeb, Saiqa; Minhas, Fayyaz Ul Amir Afsar
2017-09-01
Due to Ca 2+ -dependent binding and the sequence diversity of Calmodulin (CaM) binding proteins, identifying CaM interactions and binding sites in the wet-lab is tedious and costly. Therefore, computational methods for this purpose are crucial to the design of such wet-lab experiments. We present an algorithm suite called CaMELS (CalModulin intEraction Learning System) for predicting proteins that interact with CaM as well as their binding sites using sequence information alone. CaMELS offers state of the art accuracy for both CaM interaction and binding site prediction and can aid biologists in studying CaM binding proteins. For CaM interaction prediction, CaMELS uses protein sequence features coupled with a large-margin classifier. CaMELS models the binding site prediction problem using multiple instance machine learning with a custom optimization algorithm which allows more effective learning over imprecisely annotated CaM-binding sites during training. CaMELS has been extensively benchmarked using a variety of data sets, mutagenic studies, proteome-wide Gene Ontology enrichment analyses and protein structures. Our experiments indicate that CaMELS outperforms simple motif-based search and other existing methods for interaction and binding site prediction. We have also found that the whole sequence of a protein, rather than just its binding site, is important for predicting its interaction with CaM. Using the machine learning model in CaMELS, we have identified important features of protein sequences for CaM interaction prediction as well as characteristic amino acid sub-sequences and their relative position for identifying CaM binding sites. Python code for training and evaluating CaMELS together with a webserver implementation is available at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#camels. © 2017 Wiley Periodicals, Inc.
Rejali, Mehri; Mansourian, Marjan; Babaei, Zohre; Eshrati, Babak
2017-01-01
Background: In this study, we evaluated assessed elements connected with low birth weight (LBW) and used decision curve analysis (DCA) to define a scale to anticipate the probability of having a LBW newborn child. Methods: This hospital-based case–control study was led in Arak Hospital in Iran. The study included 470 mothers with LBW neonate and 470 mothers with natural neonates. Information were gathered by meeting moms utilizing preplanned organized questionnaire and from hospital records. The estimated probabilities of detecting LBW were calculated using the logistic regression and DCA to quantify the clinical consequences and its validation. Results: Factors significantly associated with LBW were premature membrane rupture (odds ratio [OR] = 3.18 [1.882–5.384]), former LBW infants (OR = 2.99 [1.510–5.932]), premature pain (OR = 2.70 [1.659–4.415]), hypertension in pregnancy (OR = 2.39 [1.429–4.019]), last trimester of pregnancy bleeding (OR = 2.58 [1.018–6.583]), mother age >30 (OR = 2.17 [1.350–3.498]). However, with DCA, the prediction model made on these 15 variables has a net benefit (NB) of 0.3110 is best predictive with the highest NB. NB has simple clinical interpretation and utilizing the model is what might as well be called a procedure that distinguished what might as well be called 31.1 LBW per 100 cases with no superfluous recognize. Conclusions: It is conceivable to foresee LBW utilizing a prediction model show in light of noteworthy hazard components connected with LBW. The majority of the hazard elements for LBW are preventable, and moms can be alluded amid early pregnancy to a middle which is furnished with facilities for administration of high hazard pregnancy and LBW infant. PMID:28928911
MemBrain: An Easy-to-Use Online Webserver for Transmembrane Protein Structure Prediction
NASA Astrophysics Data System (ADS)
Yin, Xi; Yang, Jing; Xiao, Feng; Yang, Yang; Shen, Hong-Bin
2018-03-01
Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels, transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments, accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called MemBrain, whose input is the amino acid sequence. MemBrain consists of specialized modules for predicting transmembrane helices, residue-residue contacts and relative accessible surface area of α-helical membrane proteins. MemBrain achieves a prediction accuracy of 97.9% of A TMH, 87.1% of A P, 3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. MemBrain-Contact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction, respectively. And MemBrain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of 13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins. MemBrain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/MemBrain/. [Figure not available: see fulltext.
Yang, Jing; He, Bao-Ji; Jang, Richard; Zhang, Yang; Shen, Hong-Bin
2015-01-01
Abstract Motivation: Cysteine-rich proteins cover many important families in nature but there are currently no methods specifically designed for modeling the structure of these proteins. The accuracy of disulfide connectivity pattern prediction, particularly for the proteins of higher-order connections, e.g. >3 bonds, is too low to effectively assist structure assembly simulations. Results: We propose a new hierarchical order reduction protocol called Cyscon for disulfide-bonding prediction. The most confident disulfide bonds are first identified and bonding prediction is then focused on the remaining cysteine residues based on SVR training. Compared with purely machine learning-based approaches, Cyscon improved the average accuracy of connectivity pattern prediction by 21.9%. For proteins with more than 5 disulfide bonds, Cyscon improved the accuracy by 585% on the benchmark set of PDBCYS. When applied to 158 non-redundant cysteine-rich proteins, Cyscon predictions helped increase (or decrease) the TM-score (or RMSD) of the ab initio QUARK modeling by 12.1% (or 14.4%). This result demonstrates a new avenue to improve the ab initio structure modeling for cysteine-rich proteins. Availability and implementation: http://www.csbio.sjtu.edu.cn/bioinf/Cyscon/ Contact: zhng@umich.edu or hbshen@sjtu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26254435
Open Rotor Aeroacoustic Modeling
NASA Technical Reports Server (NTRS)
Envia, Edmane
2012-01-01
Owing to their inherent fuel efficiency, there is renewed interest in developing open rotor propulsion systems that are both efficient and quiet. The major contributor to the overall noise of an open rotor system is the propulsor noise, which is produced as a result of the interaction of the airstream with the counter-rotating blades. As such, robust aeroacoustic prediction methods are an essential ingredient in any approach to designing low-noise open rotor systems. To that end, an effort has been underway at NASA to assess current open rotor noise prediction tools and develop new capabilities. Under this effort, high-fidelity aerodynamic simulations of a benchmark open rotor blade set were carried out and used to make noise predictions via existing NASA open rotor noise prediction codes. The results have been compared with the aerodynamic and acoustic data that were acquired for this benchmark open rotor blade set. The emphasis of this paper is on providing a summary of recent results from a NASA Glenn effort to validate an in-house open noise prediction code called LINPROP which is based on a high-blade-count asymptotic approximation to the Ffowcs-Williams Hawkings Equation. The results suggest that while predicting the absolute levels may be difficult, the noise trends are reasonably well predicted by this approach.
Open Rotor Aeroacoustic Modelling
NASA Technical Reports Server (NTRS)
Envia, Edmane
2012-01-01
Owing to their inherent fuel efficiency, there is renewed interest in developing open rotor propulsion systems that are both efficient and quiet. The major contributor to the overall noise of an open rotor system is the propulsor noise, which is produced as a result of the interaction of the airstream with the counter-rotating blades. As such, robust aeroacoustic prediction methods are an essential ingredient in any approach to designing low-noise open rotor systems. To that end, an effort has been underway at NASA to assess current open rotor noise prediction tools and develop new capabilities. Under this effort, high-fidelity aerodynamic simulations of a benchmark open rotor blade set were carried out and used to make noise predictions via existing NASA open rotor noise prediction codes. The results have been compared with the aerodynamic and acoustic data that were acquired for this benchmark open rotor blade set. The emphasis of this paper is on providing a summary of recent results from a NASA Glenn effort to validate an in-house open noise prediction code called LINPROP which is based on a high-blade-count asymptotic approximation to the Ffowcs-Williams Hawkings Equation. The results suggest that while predicting the absolute levels may be difficult, the noise trends are reasonably well predicted by this approach.
Estimation of the Driving Style Based on the Users’ Activity and Environment Influence
Sysoev, Mikhail; Kos, Andrej; Guna, Jože; Pogačnik, Matevž
2017-01-01
New models and methods have been designed to predict the influence of the user’s environment and activity information to the driving style in standard automotive environments. For these purposes, an experiment was conducted providing two types of analysis: (i) the evaluation of a self-assessment of the driving style; (ii) the prediction of aggressive driving style based on drivers’ activity and environment parameters. Sixty seven h of driving data from 10 drivers were collected for analysis in this study. The new parameters used in the experiment are the car door opening and closing manner, which were applied to improve the prediction accuracy. An Android application called Sensoric was developed to collect low-level smartphone data about the users’ activity. The driving style was predicted from the user’s environment and activity data collected before driving. The prediction was tested against the actual driving style, calculated from objective driving data. The prediction has shown encouraging results, with precision values ranging from 0.727 up to 0.909 for aggressive driving recognition rate. The obtained results lend support to the hypothesis that user’s environment and activity data could be used for the prediction of the aggressive driving style in advance, before the driving starts. PMID:29065476
Short-term load and wind power forecasting using neural network-based prediction intervals.
Quan, Hao; Srinivasan, Dipti; Khosravi, Abbas
2014-02-01
Electrical power systems are evolving from today's centralized bulk systems to more decentralized systems. Penetrations of renewable energies, such as wind and solar power, significantly increase the level of uncertainty in power systems. Accurate load forecasting becomes more complex, yet more important for management of power systems. Traditional methods for generating point forecasts of load demands cannot properly handle uncertainties in system operations. To quantify potential uncertainties associated with forecasts, this paper implements a neural network (NN)-based method for the construction of prediction intervals (PIs). A newly introduced method, called lower upper bound estimation (LUBE), is applied and extended to develop PIs using NN models. A new problem formulation is proposed, which translates the primary multiobjective problem into a constrained single-objective problem. Compared with the cost function, this new formulation is closer to the primary problem and has fewer parameters. Particle swarm optimization (PSO) integrated with the mutation operator is used to solve the problem. Electrical demands from Singapore and New South Wales (Australia), as well as wind power generation from Capital Wind Farm, are used to validate the PSO-based LUBE method. Comparative results show that the proposed method can construct higher quality PIs for load and wind power generation forecasts in a short time.
Probabalistic Risk Assessment of a Turbine Disk
NASA Astrophysics Data System (ADS)
Carter, Jace A.; Thomas, Michael; Goswami, Tarun; Fecke, Ted
Current Federal Aviation Administration (FAA) rotor design certification practices risk assessment using a probabilistic framework focused on only the life-limiting defect location of a component. This method generates conservative approximations of the operational risk. The first section of this article covers a discretization method, which allows for a transition from this relative risk to an absolute risk where the component is discretized into regions called zones. General guidelines were established for the zone-refinement process based on the stress gradient topology in order to reach risk convergence. The second section covers a risk assessment method for predicting the total fatigue life due to fatigue induced damage. The total fatigue life incorporates a dual mechanism approach including the crack initiation life and propagation life while simultaneously determining the associated initial flaw sizes. A microstructure-based model was employed to address uncertainties in material response and relate crack initiation life with crack size, while propagation life was characterized large crack growth laws. The two proposed methods were applied to a representative Inconel 718 turbine disk. The zone-based method reduces the conservative approaches, while showing effects of feature-based inspection on the risk assessment. In the fatigue damage assessment, the predicted initial crack distribution was found to be the most sensitive probabilistic parameter and can be used to establish an enhanced inspection planning.
Ghosh, Soumen; Cramer, Christopher J; Truhlar, Donald G; Gagliardi, Laura
2017-04-01
Predicting ground- and excited-state properties of open-shell organic molecules by electronic structure theory can be challenging because an accurate treatment has to correctly describe both static and dynamic electron correlation. Strongly correlated systems, i.e. , systems with near-degeneracy correlation effects, are particularly troublesome. Multiconfigurational wave function methods based on an active space are adequate in principle, but it is impractical to capture most of the dynamic correlation in these methods for systems characterized by many active electrons. We recently developed a new method called multiconfiguration pair-density functional theory (MC-PDFT), that combines the advantages of wave function theory and density functional theory to provide a more practical treatment of strongly correlated systems. Here we present calculations of the singlet-triplet gaps in oligoacenes ranging from naphthalene to dodecacene. Calculations were performed for unprecedently large orbitally optimized active spaces of 50 electrons in 50 orbitals, and we test a range of active spaces and active space partitions, including four kinds of frontier orbital partitions. We show that MC-PDFT can predict the singlet-triplet splittings for oligoacenes consistent with the best available and much more expensive methods, and indeed MC-PDFT may constitute the benchmark against which those other models should be compared, given the absence of experimental data.
Estimating Lion Abundance using N-mixture Models for Social Species
Belant, Jerrold L.; Bled, Florent; Wilton, Clay M.; Fyumagwa, Robert; Mwampeta, Stanslaus B.; Beyer, Dean E.
2016-01-01
Declining populations of large carnivores worldwide, and the complexities of managing human-carnivore conflicts, require accurate population estimates of large carnivores to promote their long-term persistence through well-informed management We used N-mixture models to estimate lion (Panthera leo) abundance from call-in and track surveys in southeastern Serengeti National Park, Tanzania. Because of potential habituation to broadcasted calls and social behavior, we developed a hierarchical observation process within the N-mixture model conditioning lion detectability on their group response to call-ins and individual detection probabilities. We estimated 270 lions (95% credible interval = 170–551) using call-ins but were unable to estimate lion abundance from track data. We found a weak negative relationship between predicted track density and predicted lion abundance from the call-in surveys. Luminosity was negatively correlated with individual detection probability during call-in surveys. Lion abundance and track density were influenced by landcover, but direction of the corresponding effects were undetermined. N-mixture models allowed us to incorporate multiple parameters (e.g., landcover, luminosity, observer effect) influencing lion abundance and probability of detection directly into abundance estimates. We suggest that N-mixture models employing a hierarchical observation process can be used to estimate abundance of other social, herding, and grouping species. PMID:27786283
Estimating Lion Abundance using N-mixture Models for Social Species.
Belant, Jerrold L; Bled, Florent; Wilton, Clay M; Fyumagwa, Robert; Mwampeta, Stanslaus B; Beyer, Dean E
2016-10-27
Declining populations of large carnivores worldwide, and the complexities of managing human-carnivore conflicts, require accurate population estimates of large carnivores to promote their long-term persistence through well-informed management We used N-mixture models to estimate lion (Panthera leo) abundance from call-in and track surveys in southeastern Serengeti National Park, Tanzania. Because of potential habituation to broadcasted calls and social behavior, we developed a hierarchical observation process within the N-mixture model conditioning lion detectability on their group response to call-ins and individual detection probabilities. We estimated 270 lions (95% credible interval = 170-551) using call-ins but were unable to estimate lion abundance from track data. We found a weak negative relationship between predicted track density and predicted lion abundance from the call-in surveys. Luminosity was negatively correlated with individual detection probability during call-in surveys. Lion abundance and track density were influenced by landcover, but direction of the corresponding effects were undetermined. N-mixture models allowed us to incorporate multiple parameters (e.g., landcover, luminosity, observer effect) influencing lion abundance and probability of detection directly into abundance estimates. We suggest that N-mixture models employing a hierarchical observation process can be used to estimate abundance of other social, herding, and grouping species.
Dimitrakopoulos, Christos; Theofilatos, Konstantinos; Pegkas, Andreas; Likothanassis, Spiros; Mavroudi, Seferina
2016-07-01
Proteins are vital biological molecules driving many fundamental cellular processes. They rarely act alone, but form interacting groups called protein complexes. The study of protein complexes is a key goal in systems biology. Recently, large protein-protein interaction (PPI) datasets have been published and a plethora of computational methods that provide new ideas for the prediction of protein complexes have been implemented. However, most of the methods suffer from two major limitations: First, they do not account for proteins participating in multiple functions and second, they are unable to handle weighted PPI graphs. Moreover, the problem remains open as existing algorithms and tools are insufficient in terms of predictive metrics. In the present paper, we propose gradually expanding neighborhoods with adjustment (GENA), a new algorithm that gradually expands neighborhoods in a graph starting from highly informative "seed" nodes. GENA considers proteins as multifunctional molecules allowing them to participate in more than one protein complex. In addition, GENA accepts weighted PPI graphs by using a weighted evaluation function for each cluster. In experiments with datasets from Saccharomyces cerevisiae and human, GENA outperformed Markov clustering, restricted neighborhood search and clustering with overlapping neighborhood expansion, three state-of-the-art methods for computationally predicting protein complexes. Seven PPI networks and seven evaluation datasets were used in total. GENA outperformed existing methods in 16 out of 18 experiments achieving an average improvement of 5.5% when the maximum matching ratio metric was used. Our method was able to discover functionally homogeneous protein clusters and uncover important network modules in a Parkinson expression dataset. When used on the human networks, around 47% of the detected clusters were enriched in gene ontology (GO) terms with depth higher than five in the GO hierarchy. In the present manuscript, we introduce a new method for the computational prediction of protein complexes by making the realistic assumption that proteins participate in multiple protein complexes and cellular functions. Our method can detect accurate and functionally homogeneous clusters. Copyright © 2016 Elsevier B.V. All rights reserved.
Hidden Markov induced Dynamic Bayesian Network for recovering time evolving gene regulatory networks
NASA Astrophysics Data System (ADS)
Zhu, Shijia; Wang, Yadong
2015-12-01
Dynamic Bayesian Networks (DBN) have been widely used to recover gene regulatory relationships from time-series data in computational systems biology. Its standard assumption is ‘stationarity’, and therefore, several research efforts have been recently proposed to relax this restriction. However, those methods suffer from three challenges: long running time, low accuracy and reliance on parameter settings. To address these problems, we propose a novel non-stationary DBN model by extending each hidden node of Hidden Markov Model into a DBN (called HMDBN), which properly handles the underlying time-evolving networks. Correspondingly, an improved structural EM algorithm is proposed to learn the HMDBN. It dramatically reduces searching space, thereby substantially improving computational efficiency. Additionally, we derived a novel generalized Bayesian Information Criterion under the non-stationary assumption (called BWBIC), which can help significantly improve the reconstruction accuracy and largely reduce over-fitting. Moreover, the re-estimation formulas for all parameters of our model are derived, enabling us to avoid reliance on parameter settings. Compared to the state-of-the-art methods, the experimental evaluation of our proposed method on both synthetic and real biological data demonstrates more stably high prediction accuracy and significantly improved computation efficiency, even with no prior knowledge and parameter settings.
Scully, Erin N; Schuldhaus, Brenna C; Congdon, Jenna V; Hahn, Allison H; Campbell, Kimberley A; Wilson, David R; Sturdy, Christopher B
2018-06-08
Black-capped chickadees (Poecile atricapillus) use their namesake chick-a-dee call for multiple functions, altering the features of the call depending on context. For example, duty cycle (the proportion of time filled by vocalizations) and fine structure traits (e.g., number of D notes) can encode contextual factors, such as predator size and food quality. Wilson and Mennill [1] found that chickadees show stronger behavioral responses to playback of chick-a-dee calls with higher duty cycles, but not to the number of D notes. That is, independent of the number of D notes in a call, but dependent on the overall proportion of time filled with vocalization, birds responded more to higher duty cycle playback compared to lower duty cycle playback. Here we presented chickadees with chick-a-dee calls that contained either two D (referred to hereafter as 2 D) notes with a low duty cycle, 2 D notes with a high duty cycle, 10 D notes with a high duty cycle, or 2 D notes with a high duty cycle but played in reverse (a non-signaling control). We then measured ZENK expression in the auditory nuclei where perceptual discrimination is thought to occur. Based on the behavioral results of Wilson and Mennill [1], we predicted we would observe the highest ZENK expression in response to forward-playing calls with high duty cycles; we predicted we would observe no significant difference in ZENK expression between forward-playing high duty cycle playbacks (2 D or 10 D). We found no significant difference between forward-playing 2 D and 10 D high duty cycle playbacks. However, contrary to our predictions, we did not find any effects of altering the duty cycle or note number presented. Copyright © 2018 Elsevier B.V. All rights reserved.
Speech-Like Rhythm in a Voiced and Voiceless Orangutan Call
Lameira, Adriano R.; Hardus, Madeleine E.; Bartlett, Adrian M.; Shumaker, Robert W.; Wich, Serge A.; Menken, Steph B. J.
2015-01-01
The evolutionary origins of speech remain obscure. Recently, it was proposed that speech derived from monkey facial signals which exhibit a speech-like rhythm of ∼5 open-close lip cycles per second. In monkeys, these signals may also be vocalized, offering a plausible evolutionary stepping stone towards speech. Three essential predictions remain, however, to be tested to assess this hypothesis' validity; (i) Great apes, our closest relatives, should likewise produce 5Hz-rhythm signals, (ii) speech-like rhythm should involve calls articulatorily similar to consonants and vowels given that speech rhythm is the direct product of stringing together these two basic elements, and (iii) speech-like rhythm should be experience-based. Via cinematic analyses we demonstrate that an ex-entertainment orangutan produces two calls at a speech-like rhythm, coined “clicks” and “faux-speech.” Like voiceless consonants, clicks required no vocal fold action, but did involve independent manoeuvring over lips and tongue. In parallel to vowels, faux-speech showed harmonic and formant modulations, implying vocal fold and supralaryngeal action. This rhythm was several times faster than orangutan chewing rates, as observed in monkeys and humans. Critically, this rhythm was seven-fold faster, and contextually distinct, than any other known rhythmic calls described to date in the largest database of the orangutan repertoire ever assembled. The first two predictions advanced by this study are validated and, based on parsimony and exclusion of potential alternative explanations, initial support is given to the third prediction. Irrespectively of the putative origins of these calls and underlying mechanisms, our findings demonstrate irrevocably that great apes are not respiratorily, articulatorilly, or neurologically constrained for the production of consonant- and vowel-like calls at speech rhythm. Orangutan clicks and faux-speech confirm the importance of rhythmic speech antecedents within the primate lineage, and highlight potential articulatory homologies between great ape calls and human consonants and vowels. PMID:25569211
Song, Jiangning; Li, Fuyi; Takemoto, Kazuhiro; Haffari, Gholamreza; Akutsu, Tatsuya; Chou, Kuo-Chen; Webb, Geoffrey I
2018-04-14
Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence-structure-function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence-structure-function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations. Copyright © 2018 Elsevier Ltd. All rights reserved.
A Ranking Approach to Genomic Selection.
Blondel, Mathieu; Onogi, Akio; Iwata, Hiroyoshi; Ueda, Naonori
2015-01-01
Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual's breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used. In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value. We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.
Boiret, Mathieu; Meunier, Loïc; Ginot, Yves-Michel
2011-02-20
A near infrared (NIR) method was developed for determination of tablet potency of active pharmaceutical ingredient (API) in a complex coated tablet matrix. The calibration set contained samples from laboratory and production scale batches. The reference values were obtained by high performance liquid chromatography (HPLC) and partial least squares (PLS) regression was used to establish a model. The model was challenged by calculating tablet potency of two external test sets. Root mean square errors of prediction were respectively equal to 2.0% and 2.7%. To use this model with a second spectrometer from the production field, a calibration transfer method called piecewise direct standardisation (PDS) was used. After the transfer, the root mean square error of prediction of the first test set was 2.4% compared to 4.0% without transferring the spectra. A statistical technique using bootstrap of PLS residuals was used to estimate confidence intervals of tablet potency calculations. This method requires an optimised PLS model, selection of the bootstrap number and determination of the risk. In the case of a chemical analysis, the tablet potency value will be included within the confidence interval calculated by the bootstrap method. An easy to use graphical interface was developed to easily determine if the predictions, surrounded by minimum and maximum values, are within the specifications defined by the regulatory organisation. Copyright © 2010 Elsevier B.V. All rights reserved.
Similarity-based Regularized Latent Feature Model for Link Prediction in Bipartite Networks.
Wang, Wenjun; Chen, Xue; Jiao, Pengfei; Jin, Di
2017-12-05
Link prediction is an attractive research topic in the field of data mining and has significant applications in improving performance of recommendation system and exploring evolving mechanisms of the complex networks. A variety of complex systems in real world should be abstractly represented as bipartite networks, in which there are two types of nodes and no links connect nodes of the same type. In this paper, we propose a framework for link prediction in bipartite networks by combining the similarity based structure and the latent feature model from a new perspective. The framework is called Similarity Regularized Nonnegative Matrix Factorization (SRNMF), which explicitly takes the local characteristics into consideration and encodes the geometrical information of the networks by constructing a similarity based matrix. We also develop an iterative scheme to solve the objective function based on gradient descent. Extensive experiments on a variety of real world bipartite networks show that the proposed framework of link prediction has a more competitive, preferable and stable performance in comparison with the state-of-art methods.
Wan, Shixiang; Duan, Yucong; Zou, Quan
2017-09-01
Predicting the subcellular localization of proteins is an important and challenging problem. Traditional experimental approaches are often expensive and time-consuming. Consequently, a growing number of research efforts employ a series of machine learning approaches to predict the subcellular location of proteins. There are two main challenges among the state-of-the-art prediction methods. First, most of the existing techniques are designed to deal with multi-class rather than multi-label classification, which ignores connections between multiple labels. In reality, multiple locations of particular proteins imply that there are vital and unique biological significances that deserve special focus and cannot be ignored. Second, techniques for handling imbalanced data in multi-label classification problems are necessary, but never employed. For solving these two issues, we have developed an ensemble multi-label classifier called HPSLPred, which can be applied for multi-label classification with an imbalanced protein source. For convenience, a user-friendly webserver has been established at http://server.malab.cn/HPSLPred. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Earthquake Prediction in a Big Data World
NASA Astrophysics Data System (ADS)
Kossobokov, V. G.
2016-12-01
The digital revolution started just about 15 years ago has already surpassed the global information storage capacity of more than 5000 Exabytes (in optimally compressed bytes) per year. Open data in a Big Data World provides unprecedented opportunities for enhancing studies of the Earth System. However, it also opens wide avenues for deceptive associations in inter- and transdisciplinary data and for inflicted misleading predictions based on so-called "precursors". Earthquake prediction is not an easy task that implies a delicate application of statistics. So far, none of the proposed short-term precursory signals showed sufficient evidence to be used as a reliable precursor of catastrophic earthquakes. Regretfully, in many cases of seismic hazard assessment (SHA), from term-less to time-dependent (probabilistic PSHA or deterministic DSHA), and short-term earthquake forecasting (StEF), the claims of a high potential of the method are based on a flawed application of statistics and, therefore, are hardly suitable for communication to decision makers. Self-testing must be done in advance claiming prediction of hazardous areas and/or times. The necessity and possibility of applying simple tools of Earthquake Prediction Strategies, in particular, Error Diagram, introduced by G.M. Molchan in early 1990ies, and Seismic Roulette null-hypothesis as a metric of the alerted space, is evident. The set of errors, i.e. the rates of failure and of the alerted space-time volume, can be easily compared to random guessing, which comparison permits evaluating the SHA method effectiveness and determining the optimal choice of parameters in regard to a given cost-benefit function. These and other information obtained in such a simple testing may supply us with a realistic estimates of confidence and accuracy of SHA predictions and, if reliable but not necessarily perfect, with related recommendations on the level of risks for decision making in regard to engineering design, insurance, and emergency management. The examples of independent expertize of "seismic hazard maps", "precursors", and "forecast/prediction methods" are provided.
DemQSAR: predicting human volume of distribution and clearance of drugs
NASA Astrophysics Data System (ADS)
Demir-Kavuk, Ozgur; Bentzien, Jörg; Muegge, Ingo; Knapp, Ernst-Walter
2011-12-01
In silico methods characterizing molecular compounds with respect to pharmacologically relevant properties can accelerate the identification of new drugs and reduce their development costs. Quantitative structure-activity/-property relationship (QSAR/QSPR) correlate structure and physico-chemical properties of molecular compounds with a specific functional activity/property under study. Typically a large number of molecular features are generated for the compounds. In many cases the number of generated features exceeds the number of molecular compounds with known property values that are available for learning. Machine learning methods tend to overfit the training data in such situations, i.e. the method adjusts to very specific features of the training data, which are not characteristic for the considered property. This problem can be alleviated by diminishing the influence of unimportant, redundant or even misleading features. A better strategy is to eliminate such features completely. Ideally, a molecular property can be described by a small number of features that are chemically interpretable. The purpose of the present contribution is to provide a predictive modeling approach, which combines feature generation, feature selection, model building and control of overtraining into a single application called DemQSAR. DemQSAR is used to predict human volume of distribution (VDss) and human clearance (CL). To control overtraining, quadratic and linear regularization terms were employed. A recursive feature selection approach is used to reduce the number of descriptors. The prediction performance is as good as the best predictions reported in the recent literature. The example presented here demonstrates that DemQSAR can generate a model that uses very few features while maintaining high predictive power. A standalone DemQSAR Java application for model building of any user defined property as well as a web interface for the prediction of human VDss and CL is available on the webpage of DemPRED: http://agknapp.chemie.fu-berlin.de/dempred/.
DemQSAR: predicting human volume of distribution and clearance of drugs.
Demir-Kavuk, Ozgur; Bentzien, Jörg; Muegge, Ingo; Knapp, Ernst-Walter
2011-12-01
In silico methods characterizing molecular compounds with respect to pharmacologically relevant properties can accelerate the identification of new drugs and reduce their development costs. Quantitative structure-activity/-property relationship (QSAR/QSPR) correlate structure and physico-chemical properties of molecular compounds with a specific functional activity/property under study. Typically a large number of molecular features are generated for the compounds. In many cases the number of generated features exceeds the number of molecular compounds with known property values that are available for learning. Machine learning methods tend to overfit the training data in such situations, i.e. the method adjusts to very specific features of the training data, which are not characteristic for the considered property. This problem can be alleviated by diminishing the influence of unimportant, redundant or even misleading features. A better strategy is to eliminate such features completely. Ideally, a molecular property can be described by a small number of features that are chemically interpretable. The purpose of the present contribution is to provide a predictive modeling approach, which combines feature generation, feature selection, model building and control of overtraining into a single application called DemQSAR. DemQSAR is used to predict human volume of distribution (VD(ss)) and human clearance (CL). To control overtraining, quadratic and linear regularization terms were employed. A recursive feature selection approach is used to reduce the number of descriptors. The prediction performance is as good as the best predictions reported in the recent literature. The example presented here demonstrates that DemQSAR can generate a model that uses very few features while maintaining high predictive power. A standalone DemQSAR Java application for model building of any user defined property as well as a web interface for the prediction of human VD(ss) and CL is available on the webpage of DemPRED: http://agknapp.chemie.fu-berlin.de/dempred/ .
Predictions of runoff signatures in ungauged basins: Austrian case study
NASA Astrophysics Data System (ADS)
Viglione, A.; Parajka, J.; Salinas, J.; Rogger, M.; Sivapalan, M.; Bloeschl, G.
2012-12-01
Runoff variability can be broken up into several components, each of them meaningful of a certain class of applications of societal relevance: annual runoff, seasonal runoff, flow duration curve, low flows, floods and hydrographs. We call them runoff signatures and we view them as a manifestation of catchment functioning at different time scales, as emergent properties of the complex systems that catchments are. Just as a medical doctor has many different options for studying the state and functioning of a patient, we can infer the state and functioning of a catchment observing its runoff signatures. But what can we do in the absence of runoff data? This study aims to understand how well one can predict runoff signatures in ungauged catchments. The comparison across signatures is based on one consistent data set (Austria) and one regionalisation method (Top-Kriging) in order to explore the relative performance of the predictions of each of the signatures. Results indicate that the performance, assessed by cross-validation, is best for annual and seasonal runoff, it degrades as one moves to low flows and floods and goes up again to high values for runoff hydrographs. Also, dedicated regionalisation methods, i.e. focusing on particular signatures and their characteristics, provide better predictions of the signatures than regionalisation of the entire hydrograph. These results suggest that the use of signatures in the calibration or assessment of process models can be valuable, in that this can lead to models predicting runoff correctly for the right reasons.
Slater, Graham J; Pennell, Matthew W
2014-05-01
A central prediction of much theory on adaptive radiations is that traits should evolve rapidly during the early stages of a clade's history and subsequently slowdown in rate as niches become saturated--a so-called "Early Burst." Although a common pattern in the fossil record, evidence for early bursts of trait evolution in phylogenetic comparative data has been equivocal at best. We show here that this may not necessarily be due to the absence of this pattern in nature. Rather, commonly used methods to infer its presence perform poorly when when the strength of the burst--the rate at which phenotypic evolution declines--is small, and when some morphological convergence is present within the clade. We present two modifications to existing comparative methods that allow greater power to detect early bursts in simulated datasets. First, we develop posterior predictive simulation approaches and show that they outperform maximum likelihood approaches at identifying early bursts at moderate strength. Second, we use a robust regression procedure that allows for the identification and down-weighting of convergent taxa, leading to moderate increases in method performance. We demonstrate the utility and power of these approach by investigating the evolution of body size in cetaceans. Model fitting using maximum likelihood is equivocal with regards the mode of cetacean body size evolution. However, posterior predictive simulation combined with a robust node height test return low support for Brownian motion or rate shift models, but not the early burst model. While the jury is still out on whether early bursts are actually common in nature, our approach will hopefully facilitate more robust testing of this hypothesis. We advocate the adoption of similar posterior predictive approaches to improve the fit and to assess the adequacy of macroevolutionary models in general.
NegGOA: negative GO annotations selection using ontology structure.
Fu, Guangyuan; Wang, Jun; Yang, Bo; Yu, Guoxian
2016-10-01
Predicting the biological functions of proteins is one of the key challenges in the post-genomic era. Computational models have demonstrated the utility of applying machine learning methods to predict protein function. Most prediction methods explicitly require a set of negative examples-proteins that are known not carrying out a particular function. However, Gene Ontology (GO) almost always only provides the knowledge that proteins carry out a particular function, and functional annotations of proteins are incomplete. GO structurally organizes more than tens of thousands GO terms and a protein is annotated with several (or dozens) of these terms. For these reasons, the negative examples of a protein can greatly help distinguishing true positive examples of the protein from such a large candidate GO space. In this paper, we present a novel approach (called NegGOA) to select negative examples. Specifically, NegGOA takes advantage of the ontology structure, available annotations and potentiality of additional annotations of a protein to choose negative examples of the protein. We compare NegGOA with other negative examples selection algorithms and find that NegGOA produces much fewer false negatives than them. We incorporate the selected negative examples into an efficient function prediction model to predict the functions of proteins in Yeast, Human, Mouse and Fly. NegGOA also demonstrates improved accuracy than these comparing algorithms across various evaluation metrics. In addition, NegGOA is less suffered from incomplete annotations of proteins than these comparing methods. The Matlab and R codes are available at https://sites.google.com/site/guoxian85/neggoa gxyu@swu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection.
Tang, Zaixiang; Shen, Yueping; Zhang, Xinyan; Yi, Nengjun
2017-01-01
Large-scale "omics" data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, there are considerable challenges in analyzing high-dimensional molecular data, including the large number of potential molecular predictors, limited number of samples, and small effect of each predictor. We propose new Bayesian hierarchical generalized linear models, called spike-and-slab lasso GLMs, for prognostic prediction and detection of associated genes using large-scale molecular data. The proposed model employs a spike-and-slab mixture double-exponential prior for coefficients that can induce weak shrinkage on large coefficients, and strong shrinkage on irrelevant coefficients. We have developed a fast and stable algorithm to fit large-scale hierarchal GLMs by incorporating expectation-maximization (EM) steps into the fast cyclic coordinate descent algorithm. The proposed approach integrates nice features of two popular methods, i.e., penalized lasso and Bayesian spike-and-slab variable selection. The performance of the proposed method is assessed via extensive simulation studies. The results show that the proposed approach can provide not only more accurate estimates of the parameters, but also better prediction. We demonstrate the proposed procedure on two cancer data sets: a well-known breast cancer data set consisting of 295 tumors, and expression data of 4919 genes; and the ovarian cancer data set from TCGA with 362 tumors, and expression data of 5336 genes. Our analyses show that the proposed procedure can generate powerful models for predicting outcomes and detecting associated genes. The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). Copyright © 2017 by the Genetics Society of America.
NASA Astrophysics Data System (ADS)
Valtierra, Robert Daniel
Passive acoustic localization has benefited from many major developments and has become an increasingly important focus point in marine mammal research. Several challenges still remain. This work seeks to address several of these challenges such as tracking the calling depths of baleen whales. In this work, data from an array of widely spaced Marine Acoustic Recording Units (MARUs) was used to achieve three dimensional localization by combining the methods Time Difference of Arrival (TDOA) and Direct-Reflected Time Difference of Arrival (DRTD) along with a newly developed autocorrelation technique. TDOA was applied to data for two dimensional (latitude and longitude) localization and depth was resolved using DRTD. Previously, DRTD had been limited to pulsed broadband signals, such as sperm whale or dolphin echolocation, where individual direct and reflected signals are separated in time. Due to the length of typical baleen whale vocalizations, individual multipath signal arrivals can overlap making time differences of arrival difficult to resolve. This problem can be solved using an autocorrelation, which can extract reflection information from overlapping signals. To establish this technique, a derivation was made to model the autocorrelation of a direct signal and its overlapping reflection. The model was exploited to derive performance limits allowing for prediction of the minimum resolvable direct-reflected time difference for a known signal type. The dependence on signal parameters (sweep rate, call duration) was also investigated. The model was then verified using both recorded and simulated data from two analysis cases for North Atlantic right whales (NARWs, Eubalaena glacialis) and humpback whales (Megaptera noveaengliae). The newly developed autocorrelation technique was then combined with DRTD and tested using data from playback transmissions to localize an acoustic transducer at a known depth and location. The combined DRTD-autocorrelation methods enabled calling depth and range estimations of a vocalizing NARW and humpback whale in two separate cases. The DRTD-autocorrelation method was then combined with TDOA to create a three dimensional track of a NARW in the Stellwagen Bank National Marine Sanctuary. Results from these experiments illustrated the potential of the combined methods to successfully resolve baleen calling depths in three dimensions.
CrossLink: a novel method for cross-condition classification of cancer subtypes.
Ma, Chifeng; Sastry, Konduru S; Flore, Mario; Gehani, Salah; Al-Bozom, Issam; Feng, Yusheng; Serpedin, Erchin; Chouchane, Lotfi; Chen, Yidong; Huang, Yufei
2016-08-22
We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another. To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature. We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 %, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles. A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.
Characteristics of fin whale vocalizations recorded on instruments in the northeast Pacific Ocean
NASA Astrophysics Data System (ADS)
Weirathmueller, Maria Michelle Josephine
This thesis focuses on fin whale vocalizations recorded on ocean bottom seismometers (OBSs) in the Northeast Pacific Ocean, using data collected between 2003 and 2013. OBSs are a valuable, and largely untapped resource for the passive acoustic monitoring of large baleen whales. This dissertation is divided into three parts, each of which uses the recordings of fin whale vocalizations to better understand their calling behaviors and distributions. The first study describes the development of a technique to extract source levels of fin whale vocalizations from OBS recordings. Source levels were estimated using data collected on a network of eight OBSs in the Northeast Pacific Ocean. The acoustic pressure levels measured at the instruments were adjusted for the propagation path between the calling whales and the instruments using the call location and estimating losses along the acoustic travel path. A total of 1241 calls were used to estimate an average source level of 189 +/-5.8 dB re 1uPa 1m. This variability is largely attributed to uncertainties in the horizontal and vertical position of the fin whale at the time of each call, and the effect of these uncertainties on subsequent calculations. The second study describes a semi-automated method for obtaining horizontal ranges to vocalizing fin whales using the timing and relative amplitude of multipath arrivals. A matched filter is used to detect fin whale calls and pick the relative times and amplitudes of multipath arrivals. Ray-based propagation models are used to predict multipath times and amplitudes as function of range. Because the direct and first multiple arrivals are not always observed, three hypotheses for the paths of the observed arrivals are considered; the solution is the hypothesis and range that optimizes the fit to the data. Ray-theoretical amplitudes are not accurate and solutions are improved by determining amplitudes from the observations using a bootstrap method. Data from ocean bottom seismometers at two locations are used to assess the method: one on the Juan de Fuca Ridge, a bathymetrically complex mid-ocean ridge environment, and the other at a flat sedimented location in the Cascadia Basin. At both sites, the method is reliable up to 4 km range which is sufficient to enable estimates of call density. The third study explores spatial and temporal trends in fin whale calling patterns. The frequency and inter-pulse interval of fin whale 20 Hz vocalizations were observed over 10 years from 2003-2013 on bottom mounted hydrophones and OBSs in the northeast Pacific Ocean. The instrument locations extended from 40°N and 130°W to 125°W with water depths ranging from 1500-4000 m. The inter-pulse interval (IPI) of fin whale song sequences was observed to increase at a rate of 0.59 seconds/year over the decade of observation. During the same time period, peak frequency decreased at a rate of 0.16 Hz/year. Two primary call patterns were observed. During the earlier years, the more commonly observed pattern had a single frequency and single IPI. In later years, a doublet pattern emerged, with two dominant frequencies and two IPIs. Many call sequences in the intervening years appeared to represent a transitional state between the two patterns. The overall trend was consistent across the entire geographical span, although some regional differences exist.
NASA Astrophysics Data System (ADS)
D'Souza, Adora M.; Abidin, Anas Zainul; Nagarajan, Mahesh B.; Wismüller, Axel
2016-03-01
We investigate the applicability of a computational framework, called mutual connectivity analysis (MCA), for directed functional connectivity analysis in both synthetic and resting-state functional MRI data. This framework comprises of first evaluating non-linear cross-predictability between every pair of time series prior to recovering the underlying network structure using community detection algorithms. We obtain the non-linear cross-prediction score between time series using Generalized Radial Basis Functions (GRBF) neural networks. These cross-prediction scores characterize the underlying functionally connected networks within the resting brain, which can be extracted using non-metric clustering approaches, such as the Louvain method. We first test our approach on synthetic models with known directional influence and network structure. Our method is able to capture the directional relationships between time series (with an area under the ROC curve = 0.92 +/- 0.037) as well as the underlying network structure (Rand index = 0.87 +/- 0.063) with high accuracy. Furthermore, we test this method for network recovery on resting-state fMRI data, where results are compared to the motor cortex network recovered from a motor stimulation sequence, resulting in a strong agreement between the two (Dice coefficient = 0.45). We conclude that our MCA approach is effective in analyzing non-linear directed functional connectivity and in revealing underlying functional network structure in complex systems.
DSouza, Adora M; Abidin, Anas Zainul; Nagarajan, Mahesh B; Wismüller, Axel
2016-03-29
We investigate the applicability of a computational framework, called mutual connectivity analysis (MCA), for directed functional connectivity analysis in both synthetic and resting-state functional MRI data. This framework comprises of first evaluating non-linear cross-predictability between every pair of time series prior to recovering the underlying network structure using community detection algorithms. We obtain the non-linear cross-prediction score between time series using Generalized Radial Basis Functions (GRBF) neural networks. These cross-prediction scores characterize the underlying functionally connected networks within the resting brain, which can be extracted using non-metric clustering approaches, such as the Louvain method. We first test our approach on synthetic models with known directional influence and network structure. Our method is able to capture the directional relationships between time series (with an area under the ROC curve = 0.92 ± 0.037) as well as the underlying network structure (Rand index = 0.87 ± 0.063) with high accuracy. Furthermore, we test this method for network recovery on resting-state fMRI data, where results are compared to the motor cortex network recovered from a motor stimulation sequence, resulting in a strong agreement between the two (Dice coefficient = 0.45). We conclude that our MCA approach is effective in analyzing non-linear directed functional connectivity and in revealing underlying functional network structure in complex systems.
Smart Extraction and Analysis System for Clinical Research.
Afzal, Muhammad; Hussain, Maqbool; Khan, Wajahat Ali; Ali, Taqdir; Jamshed, Arif; Lee, Sungyoung
2017-05-01
With the increasing use of electronic health records (EHRs), there is a growing need to expand the utilization of EHR data to support clinical research. The key challenge in achieving this goal is the unavailability of smart systems and methods to overcome the issue of data preparation, structuring, and sharing for smooth clinical research. We developed a robust analysis system called the smart extraction and analysis system (SEAS) that consists of two subsystems: (1) the information extraction system (IES), for extracting information from clinical documents, and (2) the survival analysis system (SAS), for a descriptive and predictive analysis to compile the survival statistics and predict the future chance of survivability. The IES subsystem is based on a novel permutation-based pattern recognition method that extracts information from unstructured clinical documents. Similarly, the SAS subsystem is based on a classification and regression tree (CART)-based prediction model for survival analysis. SEAS is evaluated and validated on a real-world case study of head and neck cancer. The overall information extraction accuracy of the system for semistructured text is recorded at 99%, while that for unstructured text is 97%. Furthermore, the automated, unstructured information extraction has reduced the average time spent on manual data entry by 75%, without compromising the accuracy of the system. Moreover, around 88% of patients are found in a terminal or dead state for the highest clinical stage of disease (level IV). Similarly, there is an ∼36% probability of a patient being alive if at least one of the lifestyle risk factors was positive. We presented our work on the development of SEAS to replace costly and time-consuming manual methods with smart automatic extraction of information and survival prediction methods. SEAS has reduced the time and energy of human resources spent unnecessarily on manual tasks.
Luboz, Vincent; Chabanas, Matthieu; Swider, Pascal; Payan, Yohan
2005-08-01
This paper addresses an important issue raised for the clinical relevance of Computer-Assisted Surgical applications, namely the methodology used to automatically build patient-specific finite element (FE) models of anatomical structures. From this perspective, a method is proposed, based on a technique called the mesh-matching method, followed by a process that corrects mesh irregularities. The mesh-matching algorithm generates patient-specific volume meshes from an existing generic model. The mesh regularization process is based on the Jacobian matrix transform related to the FE reference element and the current element. This method for generating patient-specific FE models is first applied to computer-assisted maxillofacial surgery, and more precisely, to the FE elastic modelling of patient facial soft tissues. For each patient, the planned bone osteotomies (mandible, maxilla, chin) are used as boundary conditions to deform the FE face model, in order to predict the aesthetic outcome of the surgery. Seven FE patient-specific models were successfully generated by our method. For one patient, the prediction of the FE model is qualitatively compared with the patient's post-operative appearance, measured from a computer tomography scan. Then, our methodology is applied to computer-assisted orbital surgery. It is, therefore, evaluated for the generation of 11 patient-specific FE poroelastic models of the orbital soft tissues. These models are used to predict the consequences of the surgical decompression of the orbit. More precisely, an average law is extrapolated from the simulations carried out for each patient model. This law links the size of the osteotomy (i.e. the surgical gesture) and the backward displacement of the eyeball (the consequence of the surgical gesture).
A comparison of measured and theoretical predictions for STS ascent and entry sonic booms
NASA Technical Reports Server (NTRS)
Garcia, F., Jr.; Jones, J. H.; Henderson, H. R.
1983-01-01
Sonic boom measurements have been obtained during the flights of STS-1 through 5. During STS-1, 2, and 4, entry sonic boom measurements were obtained and ascent measurements were made on STS-5. The objectives of this measurement program were (1) to define the sonic boom characteristics of the Space Transportation System (STS), (2) provide a realistic assessment of the validity of xisting theoretical prediction techniques, and (3) establish a level of confidence for predicting future STS configuration sonic boom environments. Detail evaluation and reporting of the results of this program are in progress. This paper will address only the significant results, mainly those data obtained during the entry of STS-1 at Edwards Air Force Base (EAFB), and the ascent of STS-5 from Kennedy Space Center (KSC). The theoretical prediction technique employed in this analysis is the so called Thomas Program. This prediction technique is a semi-empirical method that required definition of the near field signatures, detailed trajectory characteristics, and the prevailing meteorological characteristics as an input. This analytical procedure then extrapolates the near field signatures from the flight altitude to an altitude consistent with each measurement location.
NASA Astrophysics Data System (ADS)
Ouyang, Qin; Chen, Quansheng; Zhao, Jiewen
2016-02-01
The approach presented herein reports the application of near infrared (NIR) spectroscopy, in contrast with human sensory panel, as a tool for estimating Chinese rice wine quality; concretely, to achieve the prediction of the overall sensory scores assigned by the trained sensory panel. Back propagation artificial neural network (BPANN) combined with adaptive boosting (AdaBoost) algorithm, namely BP-AdaBoost, as a novel nonlinear algorithm, was proposed in modeling. First, the optimal spectra intervals were selected by synergy interval partial least square (Si-PLS). Then, BP-AdaBoost model based on the optimal spectra intervals was established, called Si-BP-AdaBoost model. These models were optimized by cross validation, and the performance of each final model was evaluated according to correlation coefficient (Rp) and root mean square error of prediction (RMSEP) in prediction set. Si-BP-AdaBoost showed excellent performance in comparison with other models. The best Si-BP-AdaBoost model was achieved with Rp = 0.9180 and RMSEP = 2.23 in the prediction set. It was concluded that NIR spectroscopy combined with Si-BP-AdaBoost was an appropriate method for the prediction of the sensory quality in Chinese rice wine.
Oviedo de la Fuente, Manuel; Febrero-Bande, Manuel; Muñoz, María Pilar; Domínguez, Àngela
2018-01-01
This paper proposes a novel approach that uses meteorological information to predict the incidence of influenza in Galicia (Spain). It extends the Generalized Least Squares (GLS) methods in the multivariate framework to functional regression models with dependent errors. These kinds of models are useful when the recent history of the incidence of influenza are readily unavailable (for instance, by delays on the communication with health informants) and the prediction must be constructed by correcting the temporal dependence of the residuals and using more accessible variables. A simulation study shows that the GLS estimators render better estimations of the parameters associated with the regression model than they do with the classical models. They obtain extremely good results from the predictive point of view and are competitive with the classical time series approach for the incidence of influenza. An iterative version of the GLS estimator (called iGLS) was also proposed that can help to model complicated dependence structures. For constructing the model, the distance correlation measure [Formula: see text] was employed to select relevant information to predict influenza rate mixing multivariate and functional variables. These kinds of models are extremely useful to health managers in allocating resources in advance to manage influenza epidemics.
Accurate secondary structure prediction and fold recognition for circular dichroism spectroscopy
Micsonai, András; Wien, Frank; Kernya, Linda; Lee, Young-Ho; Goto, Yuji; Réfrégiers, Matthieu; Kardos, József
2015-01-01
Circular dichroism (CD) spectroscopy is a widely used technique for the study of protein structure. Numerous algorithms have been developed for the estimation of the secondary structure composition from the CD spectra. These methods often fail to provide acceptable results on α/β-mixed or β-structure–rich proteins. The problem arises from the spectral diversity of β-structures, which has hitherto been considered as an intrinsic limitation of the technique. The predictions are less reliable for proteins of unusual β-structures such as membrane proteins, protein aggregates, and amyloid fibrils. Here, we show that the parallel/antiparallel orientation and the twisting of the β-sheets account for the observed spectral diversity. We have developed a method called β-structure selection (BeStSel) for the secondary structure estimation that takes into account the twist of β-structures. This method can reliably distinguish parallel and antiparallel β-sheets and accurately estimates the secondary structure for a broad range of proteins. Moreover, the secondary structure components applied by the method are characteristic to the protein fold, and thus the fold can be predicted to the level of topology in the CATH classification from a single CD spectrum. By constructing a web server, we offer a general tool for a quick and reliable structure analysis using conventional CD or synchrotron radiation CD (SRCD) spectroscopy for the protein science research community. The method is especially useful when X-ray or NMR techniques fail. Using BeStSel on data collected by SRCD spectroscopy, we investigated the structure of amyloid fibrils of various disease-related proteins and peptides. PMID:26038575
Investigation of Super Learner Methodology on HIV-1 Small Sample: Application on Jaguar Trial Data.
Houssaïni, Allal; Assoumou, Lambert; Marcelin, Anne Geneviève; Molina, Jean Michel; Calvez, Vincent; Flandre, Philippe
2012-01-01
Background. Many statistical models have been tested to predict phenotypic or virological response from genotypic data. A statistical framework called Super Learner has been introduced either to compare different methods/learners (discrete Super Learner) or to combine them in a Super Learner prediction method. Methods. The Jaguar trial is used to apply the Super Learner framework. The Jaguar study is an "add-on" trial comparing the efficacy of adding didanosine to an on-going failing regimen. Our aim was also to investigate the impact on the use of different cross-validation strategies and different loss functions. Four different repartitions between training set and validations set were tested through two loss functions. Six statistical methods were compared. We assess performance by evaluating R(2) values and accuracy by calculating the rates of patients being correctly classified. Results. Our results indicated that the more recent Super Learner methodology of building a new predictor based on a weighted combination of different methods/learners provided good performance. A simple linear model provided similar results to those of this new predictor. Slight discrepancy arises between the two loss functions investigated, and slight difference arises also between results based on cross-validated risks and results from full dataset. The Super Learner methodology and linear model provided around 80% of patients correctly classified. The difference between the lower and higher rates is around 10 percent. The number of mutations retained in different learners also varys from one to 41. Conclusions. The more recent Super Learner methodology combining the prediction of many learners provided good performance on our small dataset.
Discrete sequence prediction and its applications
NASA Technical Reports Server (NTRS)
Laird, Philip
1992-01-01
Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We apply sequence prediction using a simple and practical sequence-prediction algorithm, called TDAG. The TDAG algorithm is first tested by comparing its performance with some common data compression algorithms. Then it is adapted to the detailed requirements of dynamic program optimization, with excellent results.
Zhang, Hua; Zhang, Tuo; Gao, Jianzhao; Ruan, Jishou; Shen, Shiyi; Kurgan, Lukasz
2012-01-01
Proteins fold through a two-state (TS), with no visible intermediates, or a multi-state (MS), via at least one intermediate, process. We analyze sequence-derived factors that determine folding types by introducing a novel sequence-based folding type predictor called FOKIT. This method implements a logistic regression model with six input features which hybridize information concerning amino acid composition and predicted secondary structure and solvent accessibility. FOKIT provides predictions with average Matthews correlation coefficient (MCC) between 0.58 and 0.91 measured using out-of-sample tests on four benchmark datasets. These results are shown to be competitive or better than results of four modern predictors. We also show that FOKIT outperforms these methods when predicting chains that share low similarity with the chains used to build the model, which is an important advantage given the limited number of annotated chains. We demonstrate that inclusion of solvent accessibility helps in discrimination of the folding kinetic types and that three of the features constitute statistically significant markers that differentiate TS and MS folders. We found that the increased content of exposed Trp and buried Leu are indicative of the MS folding, which implies that the exposure/burial of certain hydrophobic residues may play important role in the formation of the folding intermediates. Our conclusions are supported by two case studies.
Kim, Byoungjip; Kang, Seungwoo; Ha, Jin-Young; Song, Junehwa
2015-01-01
In this paper, we introduce a novel smartphone framework called VisitSense that automatically detects and predicts a smartphone user’s place visits from ambient radio to enable behavioral targeting for mobile ads in large shopping malls. VisitSense enables mobile app developers to adopt visit-pattern-aware mobile advertising for shopping mall visitors in their apps. It also benefits mobile users by allowing them to receive highly relevant mobile ads that are aware of their place visit patterns in shopping malls. To achieve the goal, VisitSense employs accurate visit detection and prediction methods. For accurate visit detection, we develop a change-based detection method to take into consideration the stability change of ambient radio and the mobility change of users. It performs well in large shopping malls where ambient radio is quite noisy and causes existing algorithms to easily fail. In addition, we proposed a causality-based visit prediction model to capture the causality in the sequential visit patterns for effective prediction. We have developed a VisitSense prototype system, and a visit-pattern-aware mobile advertising application that is based on it. Furthermore, we deploy the system in the COEX Mall, one of the largest shopping malls in Korea, and conduct diverse experiments to show the effectiveness of VisitSense. PMID:26193275
Zhou, Ronggang; Feng, Caihong
2017-01-01
There is a rapidly growing body of literature on mobile video calling, which is a promising communication technology; however, little research has focused on user acceptance of mobile video calling, especially in different use contexts. This study explored factors (especially perceived enjoyment) influencing the intention of users to employ video calling in different contexts (a work and a leisure context) by applying the technology acceptance model (TAM) combined with the theory of planned behavior. The revised research model differentiated external factors (subjective norms and personal innovativeness) from internal factors (perceived usefulness, perceived ease of use (PEU), perceived enjoyment, and intention to use mobile video calling). In addition, the current study investigated predictors of perceived enjoyment across these two contexts. With the use of a structured questionnaire, participants were divided in two groups and completed self-report measures related to one context; a total of 386 student respondents’ responses were analyzed. The results indicated that users’ intentions were directly predicted by their perceived enjoyment of video calling (β ≥ 0.35) and the call’s perceived usefulness (β ≥ 0.27) and PEU (β = 0.13, only for the leisure context), which jointly explained at least 55.6% of the variance in use intention. In addition to the effects of these predictors on mobile video calling use acceptance, an assessment of the moderating effects of different contexts indicated that perceived enjoyment played a more important role in influencing intention for the leisure context, while perceived usefulness appeared to be more important for the work context. This study’s findings are important in that they provide strong support for the necessity of distinguishing among different types of contexts when predicting users’ intentions to use video calling. Furthermore, the results showed that perceived enjoyment was most significantly influenced by perceived usefulness (β ≥ 0.61), followed by PEU (β ≥ 0.13). In summary, the roles of core TAM variables (especially perceived enjoyment and perceived usefulness) and of external factors (subjective norms and personal innovativeness) differed between the leisure and work contexts. The implications of these findings are discussed. PMID:28337166
Two Faces of Shame: Understanding Shame and Guilt in the Prediction of Jail Inmates’ Recidivism
Tangney, June P.; Stuewig, Jeffrey; Martinez, Andres G.
2014-01-01
Psychological research using mostly cross-sectional methods calls into question the presumed function of shame as inhibitor of immoral or illegal behavior. In a longitudinal study of 476 jail inmates, we assessed shame-proneness, guilt-proneness, and externalization of blame shortly upon incarceration. Participants (n = 332) were interviewed one year following release into the community and official arrest records were accessed (n = 446). Guilt-proneness negatively, and directly, predicted re-offense in the first year post-release; shame-proneness did not. Further mediational modeling showed that shame-proneness positively predicted recidivism via its robust link to externalization of blame. There remained a direct effect of shame on recidivism, however, such that shame – unimpeded by defensive externalization of blame – inhibited recidivism. Items assessing a motivation to hide were primarily responsible for this pattern. Overall, results suggest that the pain of shame may have two faces – one with destructive and the other with constructive potential. PMID:24395738
Virus Database and Online Inquiry System Based on Natural Vectors.
Dong, Rui; Zheng, Hui; Tian, Kun; Yau, Shek-Chung; Mao, Weiguang; Yu, Wenping; Yin, Changchuan; Yu, Chenglong; He, Rong Lucy; Yang, Jie; Yau, Stephen St
2017-01-01
We construct a virus database called VirusDB (http://yaulab.math.tsinghua.edu.cn/VirusDB/) and an online inquiry system to serve people who are interested in viral classification and prediction. The database stores all viral genomes, their corresponding natural vectors, and the classification information of the single/multiple-segmented viral reference sequences downloaded from National Center for Biotechnology Information. The online inquiry system serves the purpose of computing natural vectors and their distances based on submitted genomes, providing an online interface for accessing and using the database for viral classification and prediction, and back-end processes for automatic and manual updating of database content to synchronize with GenBank. Submitted genomes data in FASTA format will be carried out and the prediction results with 5 closest neighbors and their classifications will be returned by email. Considering the one-to-one correspondence between sequence and natural vector, time efficiency, and high accuracy, natural vector is a significant advance compared with alignment methods, which makes VirusDB a useful database in further research.
NASA Technical Reports Server (NTRS)
Sharp, Dave; Sobel, Larry
1997-01-01
A simple and rapid analysis method, consisting of a number of modular, 'strength-of-materials-type' models, is presented for predicting the nonlinear response and stiffener separation of postbuckled, flat, composite, shear panels. The analysis determines the maximum principal tensile stress in the skin surface layer under to toe. Failure is said to occur when this stress reaches the mean transverse tensile strength of the layer. The analysis methodology consists of a number of closed-form equations that can easily be used in a 'hand analysis. For expediency, they have been programmed into a preliminary design code called SNAPPS (Speedy Nonlinear Analysis of Postbuckled Panels in Shear), which rapidly predicts postbuckling response of the panel for each value of the applied shear load. SNAPPS response and failure predictions were found to agree well with test results for three panels with widely different geometries, laminates and stiffnesses. Design guidelines are given for increasing the load-carrying capacity of stiffened, composite shear panels.
Online prediction of organileptic data for snack food using color images
NASA Astrophysics Data System (ADS)
Yu, Honglu; MacGregor, John F.
2004-11-01
In this paper, a study for the prediction of organileptic properties of snack food in real-time using RGB color images is presented. The so-called organileptic properties, which are properties based on texture, taste and sight, are generally measured either by human sensory response or by mechanical devices. Neither of these two methods can be used for on-line feedback control in high-speed production. In this situation, a vision-based soft sensor is very attractive. By taking images of the products, the samples remain untouched and the product properties can be predicted in real time from image data. Four types of organileptic properties are considered in this study: blister level, toast points, taste and peak break force. Wavelet transform are applied on the color images and the averaged absolute value for each filtered image is used as texture feature variable. In order to handle the high correlation among the feature variables, Partial Least Squares (PLS) is used to regress the extracted feature variables against the four response variables.
ERIC Educational Resources Information Center
Carter, Angela
This study involved observing a second-grade classroom to investigate how the teacher called on students, noting whether the teacher gave enough attention to students who raised their hands frequently by calling on them and examining students' responses when called on. Researchers implemented a new method of calling on students using name cards,…
Implementation of unsteady sampling procedures for the parallel direct simulation Monte Carlo method
NASA Astrophysics Data System (ADS)
Cave, H. M.; Tseng, K.-C.; Wu, J.-S.; Jermy, M. C.; Huang, J.-C.; Krumdieck, S. P.
2008-06-01
An unsteady sampling routine for a general parallel direct simulation Monte Carlo method called PDSC is introduced, allowing the simulation of time-dependent flow problems in the near continuum range. A post-processing procedure called DSMC rapid ensemble averaging method (DREAM) is developed to improve the statistical scatter in the results while minimising both memory and simulation time. This method builds an ensemble average of repeated runs over small number of sampling intervals prior to the sampling point of interest by restarting the flow using either a Maxwellian distribution based on macroscopic properties for near equilibrium flows (DREAM-I) or output instantaneous particle data obtained by the original unsteady sampling of PDSC for strongly non-equilibrium flows (DREAM-II). The method is validated by simulating shock tube flow and the development of simple Couette flow. Unsteady PDSC is found to accurately predict the flow field in both cases with significantly reduced run-times over single processor code and DREAM greatly reduces the statistical scatter in the results while maintaining accurate particle velocity distributions. Simulations are then conducted of two applications involving the interaction of shocks over wedges. The results of these simulations are compared to experimental data and simulations from the literature where there these are available. In general, it was found that 10 ensembled runs of DREAM processing could reduce the statistical uncertainty in the raw PDSC data by 2.5-3.3 times, based on the limited number of cases in the present study.
Muver, a computational framework for accurately calling accumulated mutations.
Burkholder, Adam B; Lujan, Scott A; Lavender, Christopher A; Grimm, Sara A; Kunkel, Thomas A; Fargo, David C
2018-05-09
Identification of mutations from next-generation sequencing data typically requires a balance between sensitivity and accuracy. This is particularly true of DNA insertions and deletions (indels), that can impart significant phenotypic consequences on cells but are harder to call than substitution mutations from whole genome mutation accumulation experiments. To overcome these difficulties, we present muver, a computational framework that integrates established bioinformatics tools with novel analytical methods to generate mutation calls with the extremely low false positive rates and high sensitivity required for accurate mutation rate determination and comparison. Muver uses statistical comparison of ancestral and descendant allelic frequencies to identify variant loci and assigns genotypes with models that include per-sample assessments of sequencing errors by mutation type and repeat context. Muver identifies maximally parsimonious mutation pathways that connect these genotypes, differentiating potential allelic conversion events and delineating ambiguities in mutation location, type, and size. Benchmarking with a human gold standard father-son pair demonstrates muver's sensitivity and low false positive rates. In DNA mismatch repair (MMR) deficient Saccharomyces cerevisiae, muver detects multi-base deletions in homopolymers longer than the replicative polymerase footprint at rates greater than predicted for sequential single-base deletions, implying a novel multi-repeat-unit slippage mechanism. Benchmarking results demonstrate the high accuracy and sensitivity achieved with muver, particularly for indels, relative to available tools. Applied to an MMR-deficient Saccharomyces cerevisiae system, muver mutation calls facilitate mechanistic insights into DNA replication fidelity.
PREDICTION OF GEOMAGNETIC STORM STRENGTH FROM INNER HELIOSPHERIC IN SITU OBSERVATIONS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kubicka, M.; Möstl, C.; Amerstorfer, T.
2016-12-20
Prediction of the effects of coronal mass ejections (CMEs) on Earth strongly depends on knowledge of the interplanetary magnetic field southward component, B{sub z}. Predicting the strength and duration of B{sub z} inside a CME with sufficient accuracy is currently impossible, forming the so-called B{sub z} problem. Here, we provide a proof-of-concept of a new method for predicting the CME arrival time, speed, B{sub z}, and resulting disturbance storm time ( Dst ) index on Earth based only on magnetic field data, measured in situ in the inner heliosphere (<1 au). On 2012 June 12–16, three approximately Earthward-directed and interactingmore » CMEs were observed by the Solar Terrestrial Relations Observatory imagers and Venus Express (VEX) in situ at 0.72 au, 6° away from the Sun–Earth line. The CME kinematics are calculated using the drag-based and WSA–Enlil models, constrained by the arrival time at VEX , resulting in the CME arrival time and speed on Earth. The CME magnetic field strength is scaled with a power law from VEX to Wind . Our investigation shows promising results for the Dst forecast (predicted: −96 and −114 nT (from 2 Dst models); observed: −71 nT), for the arrival speed (predicted: 531 ± 23 km s{sup −1}; observed: 488 ± 30 km s{sup −1}), and for the timing (6 ± 1 hr after the actual arrival time). The prediction lead time is 21 hr. The method may be applied to vector magnetic field data from a spacecraft at an artificial Lagrange point between the Sun and Earth or to data taken by any spacecraft temporarily crossing the Sun–Earth line.« less
A Conserving Discretization for the Free Boundary in a Two-Dimensional Stefan Problem
NASA Astrophysics Data System (ADS)
Segal, Guus; Vuik, Kees; Vermolen, Fred
1998-03-01
The dissolution of a disk-likeAl2Cuparticle is considered. A characteristic property is that initially the particle has a nonsmooth boundary. The mathematical model of this dissolution process contains a description of the particle interface, of which the position varies in time. Such a model is called a Stefan problem. It is impossible to obtain an analytical solution for a general two-dimensional Stefan problem, so we use the finite element method to solve this problem numerically. First, we apply a classical moving mesh method. Computations show that after some time steps the predicted particle interface becomes very unrealistic. Therefore, we derive a new method for the displacement of the free boundary based on the balance of atoms. This method leads to good results, also, for nonsmooth boundaries. Some numerical experiments are given for the dissolution of anAl2Cuparticle in anAl-Cualloy.
Malina, Robert M; Coelho E Silva, Manuel J; Figueiredo, António J; Carling, Christopher; Beunen, Gaston P
2012-01-01
The relationships among indicators of biological maturation were evaluated and concordance between classifications of maturity status in two age groups of youth soccer players examined (11-12 years, n = 87; 13-14 years, n = 93). Data included chronological age (CA), skeletal age (SA, Fels method), stage of pubic hair, predicted age at peak height velocity, and percent of predicted adult height. Players were classified as on time, late or early in maturation using the SA-CA difference, predicted age at peak height velocity, and percent of predicted mature height. Factor analyses indicated two factors in players aged 11-12 years (maturity status: percent of predicted mature height, stage of pubic hair, 59% of variance; maturity timing: SA/CA ratio, predicted age at peak height velocity, 26% of variance), and one factor in players aged 13-14 years (68% of variance). Kappa coefficients were low (0.02-0.23) and indicated poor agreement between maturity classifications. Spearman rank-order correlations between categories were low to moderate (0.16-0.50). Although the indicators were related, concordance of maturity classifications between skeletal age and predicted age at peak height velocity and percent predicted mature height was poor. Talent development programmes call for the classification of youth as early, average, and late maturing for the purpose of designing training and competition programmes. Non-invasive indicators of maturity status have limitations for this purpose.
NASA Technical Reports Server (NTRS)
Laird, Philip
1992-01-01
We distinguish static and dynamic optimization of programs: whereas static optimization modifies a program before runtime and is based only on its syntactical structure, dynamic optimization is based on the statistical properties of the input source and examples of program execution. Explanation-based generalization is a commonly used dynamic optimization method, but its effectiveness as a speedup-learning method is limited, in part because it fails to separate the learning process from the program transformation process. This paper describes a dynamic optimization technique called a learn-optimize cycle that first uses a learning element to uncover predictable patterns in the program execution and then uses an optimization algorithm to map these patterns into beneficial transformations. The technique has been used successfully for dynamic optimization of pure Prolog.
A spectral-spatial-dynamic hierarchical Bayesian (SSD-HB) model for estimating soybean yield
NASA Astrophysics Data System (ADS)
Kazama, Yoriko; Kujirai, Toshihiro
2014-10-01
A method called a "spectral-spatial-dynamic hierarchical-Bayesian (SSD-HB) model," which can deal with many parameters (such as spectral and weather information all together) by reducing the occurrence of multicollinearity, is proposed. Experiments conducted on soybean yields in Brazil fields with a RapidEye satellite image indicate that the proposed SSD-HB model can predict soybean yield with a higher degree of accuracy than other estimation methods commonly used in remote-sensing applications. In the case of the SSD-HB model, the mean absolute error between estimated yield of the target area and actual yield is 0.28 t/ha, compared to 0.34 t/ha when conventional PLS regression was applied, showing the potential effectiveness of the proposed model.
Nilsson, Johanna; Axelsson, Östen
2015-08-01
Aesthetic quality is central to textile conservators when evaluating a conservation method. However, the literature on textile conservation chiefly focuses on physical properties, and little is known about what factors determine aesthetic quality according to textile conservators. The latter was explored through two experiments. Experiment 1 explored the underlying attributes of aesthetic quality of textile conservation interventions. Experiment 2 explored the relationships between these attributes and how well they predicted aesthetic quality. Rank-order correlation analyses revealed two latent factors called Coherence and Completeness. Ordinal regression analysis revealed that Coherence was the most important predictor of aesthetic quality. This means that a successful conservation intervention is visually well-integrated with the textile item in terms of the material and method.
Phylogenetic tree and community structure from a Tangled Nature model.
Canko, Osman; Taşkın, Ferhat; Argın, Kamil
2015-10-07
In evolutionary biology, the taxonomy and origination of species are widely studied subjects. An estimation of the evolutionary tree can be done via available DNA sequence data. The calculation of the tree is made by well-known and frequently used methods such as maximum likelihood and neighbor-joining. In order to examine the results of these methods, an evolutionary tree is pursued computationally by a mathematical model, called Tangled Nature. A relatively small genome space is investigated due to computational burden and it is found that the actual and predicted trees are in reasonably good agreement in terms of shape. Moreover, the speciation and the resulting community structure of the food-web are investigated by modularity. Copyright © 2015 Elsevier Ltd. All rights reserved.
A test of the acoustic adaptation hypothesis in four species of marmots.
Daniel; Blumstein
1998-12-01
Acoustic signals must be transmitted from a signaller to a receiver during which time they become modified. The acoustic adaptation hypothesis suggests that selection should shape the structure of long-distance signals to maximize transmission through different habitats. A specific prediction of the acoustic adaptation hypothesis is that long-distance signals of animals in their native habitat are expected to change less during transmission than non-native signals within that habitat. This prediction was tested using the alarm calls of four species of marmots that live in acoustically different habitats and produce species-specific, long-distance alarm vocalizations: yellow-bellied marmot, Marmota flaviventris; Olympic marmot, M. olympus; hoary marmot, M. caligata; and woodchuck, M. monax. By doing so, we evaluated the relative importance the acoustic environment plays on selecting for divergent marmot alarm calls. Representative alarm calls of the four species were broadcast and rerecorded in each species' habitat at four distances from a source. Rerecorded, and therefore degraded alarm calls, were compared to undegraded calls using spectrogram correlation. If each species' alarm call was transmitted with less overall degradation in its own environment, a significant interaction between species' habitat and species' call type would be expected. Transmission fidelity at each of four distances was treated as a multivariate response and differences among habitat and call type were tested in a two-way MANOVA. Although significant overall differences in the transmission properties of the habitats were found, and significant overall differences in the transmission properties of the call types were found, there was no significant interaction between habitat and call type. Thus, the evidence did not support the acoustic adaptation hypothesis for these marmot species. Factors other than maximizing long-distance transmission through the environment may be important in the evolution of species-specific marmot alarm calls. (c) 1998 The Association for the Study of Animal Behaviour.
Dissimilarity based Partial Least Squares (DPLS) for genomic prediction from SNPs.
Singh, Priyanka; Engel, Jasper; Jansen, Jeroen; de Haan, Jorn; Buydens, Lutgarde Maria Celina
2016-05-04
Genomic prediction (GP) allows breeders to select plants and animals based on their breeding potential for desirable traits, without lengthy and expensive field trials or progeny testing. We have proposed to use Dissimilarity-based Partial Least Squares (DPLS) for GP. As a case study, we use the DPLS approach to predict Bacterial wilt (BW) in tomatoes using SNPs as predictors. The DPLS approach was compared with the Genomic Best-Linear Unbiased Prediction (GBLUP) and single-SNP regression with SNP as a fixed effect to assess the performance of DPLS. Eight genomic distance measures were used to quantify relationships between the tomato accessions from the SNPs. Subsequently, each of these distance measures was used to predict the BW using the DPLS prediction model. The DPLS model was found to be robust to the choice of distance measures; similar prediction performances were obtained for each distance measure. DPLS greatly outperformed the single-SNP regression approach, showing that BW is a comprehensive trait dependent on several loci. Next, the performance of the DPLS model was compared to that of GBLUP. Although GBLUP and DPLS are conceptually very different, the prediction quality (PQ) measured by DPLS models were similar to the prediction statistics obtained from GBLUP. A considerable advantage of DPLS is that the genotype-phenotype relationship can easily be visualized in a 2-D scatter plot. This so-called score-plot provides breeders an insight to select candidates for their future breeding program. DPLS is a highly appropriate method for GP. The model prediction performance was similar to the GBLUP and far better than the single-SNP approach. The proposed method can be used in combination with a wide range of genomic dissimilarity measures and genotype representations such as allele-count, haplotypes or allele-intensity values. Additionally, the data can be insightfully visualized by the DPLS model, allowing for selection of desirable candidates from the breeding experiments. In this study, we have assessed the DPLS performance on a single trait.
The Use of Factorial Forecasting to Predict Public Response
ERIC Educational Resources Information Center
Weiss, David J.
2012-01-01
Policies that call for members of the public to change their behavior fail if people don't change; predictions of whether the requisite changes will take place are needed prior to implementation. I propose to solve the prediction problem with Factorial Forecasting, a version of functional measurement methodology that employs group designs. Aspects…
Zhang, Jian; Zhao, Xiaowei; Sun, Pingping; Gao, Bo; Ma, Zhiqiang
2014-01-01
B-cell epitopes are regions of the antigen surface which can be recognized by certain antibodies and elicit the immune response. Identification of epitopes for a given antigen chain finds vital applications in vaccine and drug research. Experimental prediction of B-cell epitopes is time-consuming and resource intensive, which may benefit from the computational approaches to identify B-cell epitopes. In this paper, a novel cost-sensitive ensemble algorithm is proposed for predicting the antigenic determinant residues and then a spatial clustering algorithm is adopted to identify the potential epitopes. Firstly, we explore various discriminative features from primary sequences. Secondly, cost-sensitive ensemble scheme is introduced to deal with imbalanced learning problem. Thirdly, we adopt spatial algorithm to tell which residues may potentially form the epitopes. Based on the strategies mentioned above, a new predictor, called CBEP (conformational B-cell epitopes prediction), is proposed in this study. CBEP achieves good prediction performance with the mean AUC scores (AUCs) of 0.721 and 0.703 on two benchmark datasets (bound and unbound) using the leave-one-out cross-validation (LOOCV). When compared with previous prediction tools, CBEP produces higher sensitivity and comparable specificity values. A web server named CBEP which implements the proposed method is available for academic use.
Predicting Student Success using Analytics in Course Learning Management Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Olama, Mohammed M; Thakur, Gautam; McNair, Wade
Educational data analytics is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from the educational context. For example, predicting college student performance is crucial for both the student and educational institutions. It can support timely intervention to prevent students from failing a course, increasing efficacy of advising functions, and improving course completion rate. In this paper, we present the efforts carried out at Oak Ridge National Laboratory (ORNL) toward conducting predictive analytics to academic data collected from 2009 through 2013 and available in one of the most commonly used learning management systems,more » called Moodle. First, we have identified the data features useful for predicting student outcomes such as students scores in homework assignments, quizzes, exams, in addition to their activities in discussion forums and their total GPA at the same term they enrolled in the course. Then, Logistic Regression and Neural Network predictive models are used to identify students as early as possible that are in danger of failing the course they are currently enrolled in. These models compute the likelihood of any given student failing (or passing) the current course. Numerical results are presented to evaluate and compare the performance of the developed models and their predictive accuracy.« less
Predicting student success using analytics in course learning management systems
NASA Astrophysics Data System (ADS)
Olama, Mohammed M.; Thakur, Gautam; McNair, Allen W.; Sukumar, Sreenivas R.
2014-05-01
Educational data analytics is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from the educational context. For example, predicting college student performance is crucial for both the student and educational institutions. It can support timely intervention to prevent students from failing a course, increasing efficacy of advising functions, and improving course completion rate. In this paper, we present the efforts carried out at Oak Ridge National Laboratory (ORNL) toward conducting predictive analytics to academic data collected from 2009 through 2013 and available in one of the most commonly used learning management systems, called Moodle. First, we have identified the data features useful for predicting student outcomes such as students' scores in homework assignments, quizzes, exams, in addition to their activities in discussion forums and their total GPA at the same term they enrolled in the course. Then, Logistic Regression and Neural Network predictive models are used to identify students as early as possible that are in danger of failing the course they are currently enrolled in. These models compute the likelihood of any given student failing (or passing) the current course. Numerical results are presented to evaluate and compare the performance of the developed models and their predictive accuracy.
Jiang, Xiaoyu; Fuchs, Mathias
2017-01-01
As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors) and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility. PMID:28546826
NASA Astrophysics Data System (ADS)
Guo, Kun; Sun, Yi; Qian, Xin
2017-03-01
With the development of the social network, the interaction between investors in stock market became more fast and convenient. Thus, investor sentiment which can influence their investment decisions may be quickly spread and magnified through the network, and to a certain extent the stock market can be affected. This paper collected the user comments data from a popular professional social networking site of China stock market called Xueqiu, then the investor sentiment data can be obtained through semantic analysis. The dynamic analysis on relationship between investor sentiment and stock market is proposed based on Thermal Optimal Path (TOP) method. The results show that the sentiment data was not always leading over stock market price, and it can be used to predict the stock price only when the stock has high investor attention.
Distributed Damage Estimation for Prognostics based on Structural Model Decomposition
NASA Technical Reports Server (NTRS)
Daigle, Matthew; Bregon, Anibal; Roychoudhury, Indranil
2011-01-01
Model-based prognostics approaches capture system knowledge in the form of physics-based models of components, and how they fail. These methods consist of a damage estimation phase, in which the health state of a component is estimated, and a prediction phase, in which the health state is projected forward in time to determine end of life. However, the damage estimation problem is often multi-dimensional and computationally intensive. We propose a model decomposition approach adapted from the diagnosis community, called possible conflicts, in order to both improve the computational efficiency of damage estimation, and formulate a damage estimation approach that is inherently distributed. Local state estimates are combined into a global state estimate from which prediction is performed. Using a centrifugal pump as a case study, we perform a number of simulation-based experiments to demonstrate the approach.
Sheffler, Will; Baker, David
2009-01-01
We present a novel method called RosettaHoles for visual and quantitative assessment of underpacking in the protein core. RosettaHoles generates a set of spherical cavity balls that fill the empty volume between atoms in the protein interior. For visualization, the cavity balls are aggregated into contiguous overlapping clusters and small cavities are discarded, leaving an uncluttered representation of the unfilled regions of space in a structure. For quantitative analysis, the cavity ball data are used to estimate the probability of observing a given cavity in a high-resolution crystal structure. RosettaHoles provides excellent discrimination between real and computationally generated structures, is predictive of incorrect regions in models, identifies problematic structures in the Protein Data Bank, and promises to be a useful validation tool for newly solved experimental structures.
An Extended Kalman Filter-Based Attitude Tracking Algorithm for Star Sensors
Li, Jian; Wei, Xinguo; Zhang, Guangjun
2017-01-01
Efficiency and reliability are key issues when a star sensor operates in tracking mode. In the case of high attitude dynamics, the performance of existing attitude tracking algorithms degenerates rapidly. In this paper an extended Kalman filtering-based attitude tracking algorithm is presented. The star sensor is modeled as a nonlinear stochastic system with the state estimate providing the three degree-of-freedom attitude quaternion and angular velocity. The star positions in the star image are predicted and measured to estimate the optimal attitude. Furthermore, all the cataloged stars observed in the sensor field-of-view according the predicted image motion are accessed using a catalog partition table to speed up the tracking, called star mapping. Software simulation and night-sky experiment are performed to validate the efficiency and reliability of the proposed method. PMID:28825684
An Extended Kalman Filter-Based Attitude Tracking Algorithm for Star Sensors.
Li, Jian; Wei, Xinguo; Zhang, Guangjun
2017-08-21
Efficiency and reliability are key issues when a star sensor operates in tracking mode. In the case of high attitude dynamics, the performance of existing attitude tracking algorithms degenerates rapidly. In this paper an extended Kalman filtering-based attitude tracking algorithm is presented. The star sensor is modeled as a nonlinear stochastic system with the state estimate providing the three degree-of-freedom attitude quaternion and angular velocity. The star positions in the star image are predicted and measured to estimate the optimal attitude. Furthermore, all the cataloged stars observed in the sensor field-of-view according the predicted image motion are accessed using a catalog partition table to speed up the tracking, called star mapping. Software simulation and night-sky experiment are performed to validate the efficiency and reliability of the proposed method.
The Prediction of the Motion of Atens, Apollos and Amors over Long Intervals of Time
NASA Astrophysics Data System (ADS)
Wlodarczyk, I.
2002-01-01
Equations of motion of 930 Atens, Apollos and Amors (AAA) were integrated 300,000 years forward using RA15 Everhart method (Everhart, 1974). The Osterwinter model of Solar System was used (Osterwinter and Cohen, 1972). The differences in mean anomaly between unchanged and changed orbits were calculated. The changed orbits were constructed by adding or subtracting to the starting orbital elements one after the other errors of determination of orbital elements. When the differences in mean anomaly were greater than 360 deg. then computations were stopped. In almost all cases after about 1000 years in forwards or backwards integrations differences in mean anomaly between neighbors orbits growth rapidly. It denotes that it is impossible to predict behavior of asteroids outside this time. This time I have called time of stability.
Sheffler, Will; Baker, David
2009-01-01
We present a novel method called RosettaHoles for visual and quantitative assessment of underpacking in the protein core. RosettaHoles generates a set of spherical cavity balls that fill the empty volume between atoms in the protein interior. For visualization, the cavity balls are aggregated into contiguous overlapping clusters and small cavities are discarded, leaving an uncluttered representation of the unfilled regions of space in a structure. For quantitative analysis, the cavity ball data are used to estimate the probability of observing a given cavity in a high-resolution crystal structure. RosettaHoles provides excellent discrimination between real and computationally generated structures, is predictive of incorrect regions in models, identifies problematic structures in the Protein Data Bank, and promises to be a useful validation tool for newly solved experimental structures. PMID:19177366
Sound imaging of nocturnal animal calls in their natural habitat.
Mizumoto, Takeshi; Aihara, Ikkyu; Otsuka, Takuma; Takeda, Ryu; Aihara, Kazuyuki; Okuno, Hiroshi G
2011-09-01
We present a novel method for imaging acoustic communication between nocturnal animals. Investigating the spatio-temporal calling behavior of nocturnal animals, e.g., frogs and crickets, has been difficult because of the need to distinguish many animals' calls in noisy environments without being able to see them. Our method visualizes the spatial and temporal dynamics using dozens of sound-to-light conversion devices (called "Firefly") and an off-the-shelf video camera. The Firefly, which consists of a microphone and a light emitting diode, emits light when it captures nearby sound. Deploying dozens of Fireflies in a target area, we record calls of multiple individuals through the video camera. We conduct two experiments, one indoors and the other in the field, using Japanese tree frogs (Hyla japonica). The indoor experiment demonstrates that our method correctly visualizes Japanese tree frogs' calling behavior. It has confirmed the known behavior; two frogs call synchronously or in anti-phase synchronization. The field experiment (in a rice paddy where Japanese tree frogs live) also visualizes the same calling behavior to confirm anti-phase synchronization in the field. Experimental results confirm that our method can visualize the calling behavior of nocturnal animals in their natural habitat.
Exchange inlet optimization by genetic algorithm for improved RBCC performance
NASA Astrophysics Data System (ADS)
Chorkawy, G.; Etele, J.
2017-09-01
A genetic algorithm based on real parameter representation using a variable selection pressure and variable probability of mutation is used to optimize an annular air breathing rocket inlet called the Exchange Inlet. A rapid and accurate design method which provides estimates for air breathing, mixing, and isentropic flow performance is used as the engine of the optimization routine. Comparison to detailed numerical simulations show that the design method yields desired exit Mach numbers to within approximately 1% over 75% of the annular exit area and predicts entrained air massflows to between 1% and 9% of numerically simulated values depending on the flight condition. Optimum designs are shown to be obtained within approximately 8000 fitness function evaluations in a search space on the order of 106. The method is also shown to be able to identify beneficial values for particular alleles when they exist while showing the ability to handle cases where physical and aphysical designs co-exist at particular values of a subset of alleles within a gene. For an air breathing engine based on a hydrogen fuelled rocket an exchange inlet is designed which yields a predicted air entrainment ratio within 95% of the theoretical maximum.