We present a highly accurate gene-prediction system for eukaryotic genomes, called mGene. It combines in an unprecedented manner the flexibility of generalized hidden Markov models (gHMMs) with the predictive power of modern machine learning methods, such as Support Vector Machines (SVMs). Its excellent performance was proved in an ...
PubMed
PubMed Central
Streptomyces spp. produce a variety of valuable secondary metabolites, which are regulated in a spatio-temporal manner by a complex network of inter-connected gene products. Using a compilation of genome-scale temporal transcriptome data for the model organism, Streptomyces coelicolor, under different environmental and genetic perturbations, we have developed a supervised ...
Motivation: At the heart of many important bioinformatics problems, such as gene finding and function prediction, is the classification of biological sequences. Frequently the most accurate classifiers are obtained by training support vector machines (SVMs) with complex sequence kernels. However, a cumbersome shortcoming of SVMs is that their learned ...
, it is not so easy for noncoding RNA (ncRNA) genes like miRNA. Usually only weakly-conserved promoter designed to de- tect signals like splice sites, start and stop codons, branch points, promoters
E-print Network
An optimal design of support vector machine (SVM)-based classifiers for prediction aims to optimize the combination of feature selection, parameter setting of SVM, and cross-validation methods. However, SVMs do not offer the mechanism of automatic internal relevant feature detection. The appropriate setting of their control parameters is often treated as another independent ...
BackgroundDifferent isoforms of Cytochrome P450 (CYP) metabolized different types of substrates (or drugs molecule) and make them soluble during biotransformation. Therefore, fate of any drug molecule depends on how they are treated or metabolized by CYP isoform. There is a need to develop models for predicting substrate specificity of major isoforms of P450, in order to understand whether a given ...
A contact map is a key factor representing a specific protein structure. To simplify the protein contact map prediction, we predict the inter-residue contact clusters centered at the groups of their surrounding inter-residue contacts. In this paper, we adopt a Support Vector Machine (SVM)-based approach to predict the inter-residue contact cluster centers. The input of the SVM ...
BackgroundGuanosine triphosphate (GTP)-binding proteins play an important role in regulation of G-protein. Thus prediction of GTP interacting residues in a protein is one of the major challenges in the field of the computational biology. In this study, an attempt has been made to develop a computational method for predicting GTP interacting residues in a protein with high accuracy (Acc), precision ...
BackgroundSynthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the ...
RNA-binding proteins (RBPs) play crucial role in transcription and gene-regulation. This paper describes a support vector machine (SVM) based method for discriminating and classifying RNA-binding and non-binding proteins using sequence features. With the threshold of 30% interacting residues, RNA-binding amino acid prediction method ...
The rapid advances in proteomic analyses coupled with the completion of multiple genomes have led to an increased demand for determining protein functions. The first step is classification or prediction into families. A method was developed for the prediction of protein family based only on protein sequence using support vector machine (SVM) models. In these models, the amino acids were classified ...
Diffuse reflectance spectroscopy (DRS) has been extensively applied for the characterization of biological tissue, especially for dysplasia and cancer detection, by determination of the tissue optical properties. A major challenge in performing routine clinical diagnosis lies in the extraction of the relevant parameters, especially at high absorption levels typically observed in cancerous tissue. ...
Prognostic and diagnostic biomarker discovery is one of the key issues for a successful stratification of patients according to clinical risk factors. For this purpose, statistical classification methods, such as support vector machines (SVM), are frequently used tools. Different groups have recently shown that the usage of prior biological knowledge significantly improves the classification ...
This paper presents a novel computer-aided diagnosis (CAD) technique for the early diagnosis of the Alzheimers disease (AD) based on non-negative matrix factorization (NMF) and Support Vector Machines (SVM) with bounds of confidence. The CAD tool is designed for the study and classification of functional brain images. For this purpose, two different brain image databases are selected: a single ...
Control chart patterns are important statistical process control tools for determining whether a process is run in its intended mode or in the presence of unnatural patterns. Accurate recognition of control chart patterns is essential for efficient system monitoring to maintain high-quality products. This paper introduces a novel hybrid intelligent system that includes three ...
In this paper, an integrated framework comprising of computer vision algorithms, Database system and Batch processing techniques has been developed to facilitate effective automatic threat recognition and detection for security applications. The proposed approach is used for automatic threat detection. The novel features of this structure include utilizing the Human Visual System model for ...
NASA Astrophysics Data System (ADS)
High throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional maps�the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using ...
Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated the possibility of using DCNs to predict protein and domain functions. Based on our experiment ...
Syntactic methods in pattern recognition have been used extensively in bioinformatics, and in particular, in the analysis of gene and protein expressions, and in the recognition and classification of bio-sequences. These methods are almost universally distance-based. This paper concerns the use of an Optimal and Information Theoretic (OIT) probabilistic model [11] to achieve ...
With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their ...
The accuracy of land cover maps derived via supervised classification is often insufficient for operational applications. One of the important reasons for this is associated with the inputs to supervised classification analyses, especially the training data. The aim of this poster paper is to highlight the effect of variation in training set size on classification accuracy with respect to a series ...